The Long Game in Data Engineering

From No Experience to Startup Opportunity: How One International Student Used Projects, GRA, and Wellfound to Create His Own Breakthrough

Sachin Chandrashekhar — Fri, 08 May 2026 23:20:39 GMT

Sometimes…

It begins with clarity.

Recently, I spoke to one of my students, Nischal.

He came to the U.S. as a master’s student.

No major corporate experience.
No deep understanding of Data Engineering initially.
No perfectly mapped career blueprint.

Like many students…

He simply wanted an opportunity.

But what stood out was not luck.

It was how he gradually positioned himself.

And his story carries an important lesson for many aspiring Data Engineers, especially international students trying to break into the U.S. market.

The Reality Many Students Face

For many international students, the challenge is rarely just “learning technology.”

The bigger challenge is:

How do I build enough credibility for someone to take a chance on me?

Because often:

You may not have prior U.S. work experience
You may not yet know how the corporate ecosystem works
You may be competing against experienced professionals
Visa sponsorship may heavily influence career decisions

This is where many people feel stuck.

They keep learning…

But struggle to convert learning into opportunity.

What Changed for Nischal?

Nischal’s transformation did not happen because he randomly applied to hundreds of jobs.

It also did not happen because he consumed random tutorials without structure.

A major turning point in his journey was gaining roadmap clarity through structured learning.

Before that…

Like many aspiring professionals…

He knew technology terms existed.

Lambda.
AWS.
Cloud.

But he did not yet fully understand:

Which service solves what problem
When to use specific tools
How real-world architecture decisions are made
How Data Engineering actually functions as an industry role

This is an important distinction.

Because many learners today are exposed to information…

But exposure is not the same as clarity.

Through the course, one of the biggest advantages he repeatedly highlighted was not just “content.”

It was:

Structured understanding.

He specifically spoke about how the program helped him:

Understand AWS from a Data Engineering lens
Connect services to practical use cases
Think more like industry
Build confidence in discussing architecture
Strengthen fundamentals before deeper specialization

In simple words:

The course helped convert scattered awareness into usable direction.

And once that direction became clearer…

He was able to build evidence.

1. He Built Real Projects

This was one of the biggest turning points.

He built:

AWS-based cloud projects
GRA (Graduate Research Assistantship) projects
Databricks projects
Real demos with GitHub-backed implementation

During interviews…

They did not just ask him generic questions.

They specifically asked about his projects.

He was able to explain:

Why he chose certain AWS services
How data flowed through his architecture
How his solutions worked end-to-end
What decisions he made

That matters.

Because projects move you from:

“I learned this”

To:

“I built this.”

And that difference is massive.

The Hidden Advantage of GRA Roles

One major insight from his journey:

His Graduate Research Assistantship became more than just campus work.

It became resume credibility.

He worked on:

Real AWS architecture
Lambda
S3
ETL-like systems
Dashboarding
Data workflows

This gave him something many students lack:

Experience that can be explained.

Not theoretical awareness.

Practical, discussable implementation.

For students…

This is powerful.

Sometimes your first big opportunity may not come directly from a dream company.

It may come from:

University labs
Research roles
Startups
Smaller but meaningful builds

And these can become stepping stones.

The Wellfound Insight: A Startup Marketplace More People Should Know About

One of the most practical parts of Nischal’s story was how he discovered his startup opportunity.

He used Wellfound (formerly AngelList Talent).

What is Wellfound?

Wellfound is essentially a startup-focused job marketplace where:

Startups post opportunities
Job seekers create detailed profiles
Founders and hiring teams evaluate fit faster
Applications can feel more direct than traditional platforms

Unlike the black-hole feeling many experience on large job boards…

Nischal found that Wellfound gave quicker visibility into:

Whether he was moving forward
Whether he was rejected
Which startups aligned with his profile

Why this matters:

Startups often care deeply about:

Execution ability
Ownership mindset
Adaptability
Problem-solving

Sometimes…

A strong project portfolio can stand out more here than polished corporate history.

That does not mean it is easier.

But it can mean:

Different pathways exist.

A Powerful Career Lesson: Sometimes the “Perfect” Role Is Not the First Door

Interestingly…

Nischal initially interviewed for something closer to product QA.

But during the process…

He communicated his deeper interest in Data Engineering and specific workflows.

That matters.

Because many professionals underestimate this:

Positioning matters.

Sometimes opportunities evolve when you demonstrate:

Technical interest
Project relevance
Curiosity
Initiative

He did not passively accept a label.

He communicated where he could create value.

The Visa Sponsorship Reality

This was another deeply practical part of our conversation.

He received:

Option A:

Startup opportunity with equity now + potential visa sponsorship after funding

Option B:

Paid AI internship, but sponsorship uncertain

This is where career strategy becomes personal.

For international students…

Short-term salary is important.

But immigration stability may sometimes matter more.

This does not mean everyone should choose unpaid roles.

But it does highlight:

Career decisions are not always about immediate income alone.

Sometimes:

Sponsorship pathway
Domain alignment
Long-term positioning
Industry entry point

…may carry larger strategic weight.

What His Story Reinforced

If I had to simplify his journey:

Fundamentals:

SQL
Python
Cloud (AWS)

Execution:

Projects
GRA
Databricks
GitHub

Visibility:

Wellfound
Intentional profile building
Applying strategically

Outcome:

Startup role
Equity pathway
Potential sponsorship
Industry credibility

A Bigger Lesson About Learning: Information Alone Is Not Enough

One of the strongest takeaways from this conversation was something many learners need to hear:

Random information is everywhere.

But roadmap clarity is rare.

Nischal openly reflected that before structured guidance, he did not fully understand how the ecosystem connected.

This is where many people lose time.

They may learn:

A little AWS
A little Python
A little SQL
A few tutorials

…but still struggle to answer:

“How does this all fit together in the real world?”

This is why structured, mentor-led, roadmap-driven learning can accelerate growth.

Not because information is hidden.

But because sequence matters.

When someone helps you understand:

Fundamentals first
Why architecture choices matter
How projects build credibility
How to speak industry language

…your confidence compounds differently.

In many ways…

That was one of the hidden advantages in his journey.

Not just learning more.

Learning in the right order.

My Bigger Reflection for Aspiring Data Engineers

We live in a time where many people are overwhelmed by:

AI
Agentic systems
Cloud
Spark
Databricks
Streaming

And yes… these matter.

But often the real question is simpler:

“Can you build enough proof that someone trusts you?”

Because careers are often built in this sequence:

Learn → Build → Explain → Position → Opportunity

Not:

Learn endlessly → Hope

Final Thought

Nischal’s journey is not just about securing an opportunity.

It is about something deeper:

Clarity creates momentum.

He did not begin with perfect knowledge.

He began by:

Learning fundamentals
Building projects
Gaining practical exposure
Using platforms like Wellfound strategically
Making long-term decisions

For many of you…

Especially students, career switchers, or those feeling behind…

This matters.

You may not need the perfect start.

But you do need:

Direction + Execution + Visibility

And sometimes…

That combination can create opportunities you may not have imagined when you first began.

If you are currently trying to break into Data Engineering:

Do not just ask:

“What should I learn next?”

Also ask:

“What can I build that proves I belong?”

That question can change everything.

A Final Note From Me

Nischal’s journey did not happen overnight.

He played the long game.

With:

Grit
Patience
Structured learning
Real projects
Strategic positioning

And over time…

That compounded.

If you are serious about building toward opportunities like this…

Not just learning randomly…

But understanding roadmap, projects, cloud, and real-world Data Engineering from a deeper lens…

Join my webinar:

https://aws.sachin.cloud

Because sometimes…

The right direction, combined with consistent execution, can change far more than you currently realize.

Depth Before Breadth: Is Data Engineering Alone Still Enough Anymore?

Sachin Chandrashekhar — Tue, 05 May 2026 09:45:16 GMT

A question came in recently from a learner that I believe many professionals are quietly thinking about right now:

“Is strong Data Engineering alone still enough… or are we moving toward a world where engineers need to also understand AI, APIs, architecture, orchestration, and broader technical ecosystems?”

It is a powerful question.

Because if you look around today, the industry can feel noisy.

AI is everywhere.
Agents are everywhere.
Full-stack expectations are growing.
Companies are experimenting fast.

And for many professionals, this creates an uncomfortable fear:

“Am I falling behind if I focus deeply on Data Engineering?”

My honest answer?

Not necessarily.
But how you approach your growth matters more than ever.

The Real Problem: Many Professionals Are Confusing Expansion With Progress

One of the biggest mistakes I see today is that people often assume career growth means learning everything at once.

So they jump from:

Data Engineering → AI → Full Stack → Frontend → APIs → Agentic AI → Architecture → DevOps

…and before long, they are overwhelmed.

This creates a dangerous pattern:

Breadth without depth.

They know “about” many things…

…but are not truly strong enough in one domain to create credibility.

And in real-world projects, credibility matters.

What I Am Actually Seeing in the Industry

From what I observe:

Yes — companies are increasingly asking Data Engineers to participate in:

AI POCs
Data foundations for LLM systems
API integrations
Cross-functional collaboration
Automation discussions
Tool ecosystem decisions

But this does not automatically mean every Data Engineer must immediately become:

AI Engineer
Architect
Full Stack Developer
ML Engineer
Agent Orchestrator

The market is still evolving.

Many of these roles are still being shaped.

Which means this is not the time for panic.

This is the time for strategic positioning.

My Personal Approach: Depth First

I made a conscious decision myself.

Even after exploring AI Engineering to prepare for industry conversations…

I chose to go deeper into Data Engineering first.

Why?

Because Data Engineering itself is already a massive domain:

Cloud
SQL
Python
Spark
Distributed systems
Databricks
Snowflake
Architecture
Scalability
Lakehouse patterns
Performance

And AI?

That is another massive domain on its own.

Trying to deeply master both too early can dilute focus.

So I would rather build:

Depth first → Then breadth

Instead of:

Shallow exposure to everything → Mastery in nothing

Why Depth Still Matters More Than Most People Realize

Depth gives you something critical:

Judgment.

This matters even more in the AI era.

Because yes…

AI can absolutely help you write code faster.
AI can help you build faster.
AI can help you explore adjacent technologies faster.

But AI cannot replace your ability to ask:

“Is this architecture scalable?”
“Is this pipeline reliable?”
“Is this design production-worthy?”
“Is this actually the right solution?”

Without foundational depth…

You risk accepting AI output at face value.

And that can become dangerous.

So the future is likely not:

AI replaces engineers

It is more likely:

Engineers with depth + AI fluency outperform engineers without either

A More Strategic Career Framework

Here is the path I currently believe makes the most sense for many professionals:

Near Term:

Build undeniable depth

Focus on:

SQL
Python
Cloud
Spark
Real-world pipelines
Architecture thinking
Scalability
Databricks / Snowflake ecosystem understanding

Mid Term:

Add strategic breadth

Expand into:

AI fluency
Prompting
AI validation
APIs
Automation
Agent orchestration
Broader system integration

Long Term:

Move toward technical leadership

This may include:

Architecture ownership
Platform decisions
Cross-functional oversight
Team leadership
AI strategy

The New Career Advantage

I believe the strongest professionals going forward will not necessarily be the people who chase every trend first…

They will likely be the people who combine:

Strong foundational depth

Clear thinking

Adaptability

Strategic expansion

So… Is Data Engineering Alone Enough?

Here is my honest take:

Yes — if you are building real depth.

But…

Pure tool-only Data Engineering without broader awareness may eventually become limiting.

In simple terms:

Depth gives you credibility
Breadth gives you adaptability

And right now?

For most people…

Depth should probably come first.

Final Thought

You do not need to become everything overnight.

You do not need to panic every time the industry shifts.

You do need to ask:

“What core capability, if mastered deeply, will give me leverage… and make future expansion easier?”

For many professionals today…

That answer may still very well be:

Data Engineering

Not as the final destination…

…but as the foundation.

And foundations matter.

What do you think?
Are you currently prioritizing depth… or trying to balance breadth too early?

Want Clarity On What To Learn First?

If you are trying to figure out:

Should you focus on SQL first? AWS first? Databricks? Snowflake? AI?

…and want a practical roadmap instead of random overwhelm…

I cover this in my live masterclass:

Register here: https://aws.sachin.cloud

I break down:

What to prioritize first
How to avoid learning backwards
How to build real-world depth
How to position yourself for high-paying Cloud & AI-powered Data Engineering roles

Because sometimes…

The biggest career advantage is not learning faster.

It is learning in the right order.

Why AWS Glue, EMR Serverless, and EMR Are Not the Same Thing (And Why It Matters Right Now)

Sachin Chandrashekhar — Sun, 03 May 2026 17:36:14 GMT

A company I know let go of several engineers last week. Not junior folks. A senior leader too.

This is not a scare post. But it is a reality check.

The engineers who are safe are the ones who keep adding skills — consistently, one layer at a time. The ones who understand not just what a tool is, but when to use it, why companies choose it, and what it costs.

Today I want to break down one of the most misunderstood areas in AWS data engineering: the difference between AWS Glue, EMR Serverless, and EMR on EC2. I covered this in a live session recently, and the questions from the room told me this needs a proper written explanation.

Let’s go from scratch.

What Are We Actually Trying to Do?

Data engineering is fundamentally simple at its core:

You have data sitting somewhere (a source)
You want to process it — clean it, transform it, aggregate it
You load the result somewhere (a target)

In AWS, your source and target are often S3 — think of it as Google Drive on the cloud. The interesting question is what sits in the middle: the processing layer.

For small data — say 1 or 2 GB — Python on a single machine is fine. You read the file, do your aggregations, write the output. Done.

But what happens when your data is in the terabytes? Spread across multiple files? Growing every day?

A single machine can’t handle it. You need multiple machines working together, splitting the data, processing in parallel, and combining results into one output. That’s where Apache Spark comes in — it’s the framework built exactly for this. And PySpark is just Python code that uses Spark under the hood.

Now, the question becomes: where do you run Spark on AWS?

You have three main options.

Option 1: AWS Glue (Spark)

Glue is a fully managed ETL service. When you use it with Spark, here’s what you do:

Write your PySpark code
Choose a worker type (e.g., G1X = 4 CPU cores, 16 GB RAM)
Choose a number of workers (e.g., 4 workers)
Run the job

AWS handles everything else. It provisions the machines, installs Spark, runs your code, and tears it all down. You don’t touch a single server.

One of those workers is always the driver — it coordinates the work. The rest are executors — they do the actual processing in parallel.

If you choose 4 workers with G1X, you’re getting 3 executors × 4 CPU cores = 12 cores and 48 GB RAM total working on your data simultaneously.

You can also enable auto-scaling in Glue, which means it will automatically decide how many workers your job actually needs rather than spinning up all 4 every time.

The upside: Simple, fast to get started, fully abstracted. You’re not managing infrastructure at all.

The downside: It’s the most expensive option. You’re paying a premium for that simplicity.

Option 2: EMR on EC2

EMR stands for Elastic MapReduce. This is where things get more powerful — and more complex.

With EMR on EC2, you are creating an actual cluster of Linux machines (EC2 instances) on AWS. You choose:

The type of EC2 instance (how many cores, how much RAM)
Primary nodes, core nodes, task nodes
The number of each

You have far more flexibility than Glue. You can tune every aspect of your infrastructure to match your exact workload.

But here’s the catch: when you turn that cluster on, it stays on. Whether your job is running or not, those EC2 machines are running — and you are paying for them. You’re essentially renting Linux machines by the hour.

The analogy I use: EMR on EC2 is like renting a car for 3 days. You pay for the car whether you drive it or not.

To avoid unnecessary costs, companies often build automation around it — scripts that spin up the cluster when a job needs to run, and tear it down when it’s done. That adds maintenance overhead.

Who uses EMR on EC2? Teams that already have deep big data expertise — people who’ve been running on-premise Hadoop or Spark clusters for years and know exactly what they need. When they move to AWS, they replicate that setup in the cloud. It’s 40–70% cheaper than Glue, but only worth it if you have the expertise to manage it.

Option 3: EMR Serverless

EMR Serverless sits between Glue and EMR on EC2. It gives you the cost efficiency of EMR without the cluster management overhead.

Here’s the key difference: instead of choosing workers, you configure executors and drivers directly, and set a maximum CPU and memory limit.

Pre-initialized capacity:
  Driver:    1 × (2 vCPU, 4 GB)
  Executors: 3 × (4 vCPU, 16 GB)

Maximum capacity:
  CPU:    40 vCPU
  Memory: 160 GB

When your job runs, EMR Serverless automatically scales up to whatever resources it needs — up to your defined maximum. Behind the scenes, AWS is spinning up machines and running executors on them. But you never see those machines, you never configure them, and you don’t pay for them when nothing is running.

The Uber analogy: Glue and EMR Serverless are like calling an Uber. AWS brings the car from its fleet, you pay for the ride, and when you’re done it’s gone. EMR on EC2 is like renting a car — you have it parked in your driveway all day whether you use it or not.

Who uses EMR Serverless? Companies that started with Glue, found it getting expensive as their data grew, and wanted a more cost-efficient path without the full complexity of managing EC2 clusters.

How Companies Actually Evolve

This is the pattern I see in the real world:

Start with Glue — Fast to set up, easy to iterate, perfect when you’re building out your first pipelines
Move to EMR Serverless — As data volume grows and the Glue bill becomes noticeable
Move to EMR on EC2 — When the team has deep Spark expertise and wants maximum control and cost efficiency

Most companies hiring AWS data engineers today are somewhere in steps 1 or 2. Knowing all three — and being able to reason about the tradeoffs — is what separates a strong candidate from an average one.

One More Thing: Glue Bookmark

While we’re here — one of the most common interview questions about Glue is about Glue Bookmark.

When you’re processing files that land in S3 daily, you don’t want to reprocess yesterday’s files. Glue Bookmark tracks which files have already been processed, so when your job runs tomorrow, it only picks up the new ones. It’s Glue’s built-in mechanism for incremental data processing. Simple concept, but knowing it exists and how to enable it puts you ahead in interviews.

What I Heard From Two Students This Week

After the hands-on session where we ran both Glue and EMR Serverless jobs live, two students from the community — Shilpa and Hema — shared something that stayed with me.

Shilpa said:

“Whoever is new to these technologies — the skill booster courses are enough to get you started. It’s not worth spending your first month just learning Python. And the hands-on labs are addictive — make sure you have enough time when you start, because you won’t want to stop midway.”

Hema added:

“What Sachin explained today about workers and executors — I could relate to it immediately because I’d just covered it in the performance tuning section two days ago. It all connected.”

That’s the thing about learning this way — the theory lands differently once you’ve actually run the job yourself and seen the job fail because you were in the wrong AWS region, or because your account wasn’t upgraded. That friction is the learning.

Want to See This Live?

If you want to go from zero to running real AWS data engineering pipelines — with Glue, EMR Serverless, PySpark, and production-grade project work — I run a live webinar where I walk through the RADE program (Real-world AWS Data Engineering) end to end.

In our Live Classes, it is NOT slides-only theory. We go into the console. We run jobs. We break things and fix them.

[Register for the next free webinar →] https://aws.sachin.cloud

If you’re an IT professional working in legacy tools like Informatica, SSIS, or DataStage and you know it’s time to modernize — this is where to start.

Questions or thoughts on Glue vs EMR? Drop them in the comments. I read every one.

From 3 Years Experience to Multiple Offers — What Actually Made the Difference

Sachin Chandrashekhar — Sat, 02 May 2026 01:10:23 GMT

I had a session with one of my students, Bhumika who 10Xed her salary.

She shared her journey — openly — with the entire community.

Not just the outcome.

But the process behind it.

The Outcome (Context, Not the Story)

She had:

~3–3.5 years of experience
A data engineering background
Exposure to AWS and pipelines

And in her recent job search:

She got 5+ product-based company calls
Cleared multiple interviews
Converted 3 offers

That’s the visible part.

But that’s not the interesting part.

What Most People Don’t See

When people hear this, they assume:

“She must be very talented”
“She must have been naturally good”
“She got lucky”

But when you listen carefully, a very different picture emerges.

1. This Was Planned — Not Random

She didn’t start applying casually.

She worked backwards.

Set a 4–6 month timeline
Built a structured plan
Aligned her daily schedule

2–4 hours of focused effort. Every single day.

No bursts. No shortcuts.

Just consistency.

2. Preparation Was Layered

She didn’t treat preparation as:

“Learn everything once”

She broke it down into layers:

Coding (Python, SQL)
Data engineering concepts
AWS architecture
Project storytelling

And importantly:

She practiced each of these separately

3. She Didn’t Wait to Feel Ready

One thing she said stood out:

“We never feel that we are fully prepared.”

And because of that:

She didn’t wait
She started giving interviews early

Even for companies she didn’t intend to join.

4. Interviews Became the Learning Loop

This is where most people get it wrong.

They treat interviews as:

A final test

She treated them as:

A feedback system

Gave interviews
Noted every question she couldn’t answer
Filled those gaps
Repeated the cycle

“Each interview was a stepping stone.”

5. Storytelling Was a Differentiator

She didn’t just prepare answers.

She prepared:

How to explain her work

Projects
Use cases
Trade-offs
Real-world decisions

Because at a certain level:

Clarity of explanation matters as much as knowledge

6. She Optimized for the Market

A few practical things she did:

Tailored her resume to each job description
Ensured strong keyword match (ATS)
Used multiple channels:
- Job portals
- Direct applications
- Referrals
Stayed active on LinkedIn

None of this is “secret.”

But very few people do all of it consistently.

7. She Understood Timing

She mentioned something important:

Good companies take time
Hiring processes stretch over weeks
Calls don’t come immediately

So she:

Started early and stayed patient

The Real Takeaway

This was not about:

One course
One resource
One lucky interview

This was about:

Structured effort, over time

That’s what compounds.

The Part Most People Skip

Everyone is willing to:

Watch videos
Take notes
“Learn”

Very few are willing to:

Practice consistently
Face interviews early
Iterate based on feedback

That’s the difference.

The Long Game Perspective

There is nothing extreme here.

No hacks.

No shortcuts.

Just:

Clarity
Consistency
Execution

Repeated long enough.

If You’re in This Phase

If you have:

2–4 years of experience
Some exposure to data engineering
And are trying to move into stronger roles

Then your focus should not be:

“What else should I learn?”

It should be:

“How do I execute better?”

If You Want to Understand This in a Structured Way

I break this down in detail in a live masterclass.

What actually matters in interviews
How to structure your preparation
How to avoid common mistakes

You can register here:

https://aws.sachin.cloud

If you attend, you’ll also get access to my
Agentic AI–powered Data Engineering course (worth $100 / ₹8000)

Join only if you’re serious about building this the right way.

Schema Evolution in Databricks Delta: What No One Tells You About Bronze, Silver, and Gold

Sachin Chandrashekhar — Tue, 28 Apr 2026 16:58:11 GMT

Schema Evolution in Databricks Delta: What No One Tells You About Bronze, Silver, and Gold

A practical guide for data engineers who want resilient medallion pipelines — without the silent disasters

Here’s a scenario I see constantly with engineers transitioning into modern data engineering:

A source system drops a column. Or adds one. Maybe renames it. The ingestion job either crashes immediately or — worse — silently swallows the change. Two weeks later, your Gold dashboards are showing wrong numbers, your data science team is asking questions you can’t answer, and you’re reverse-engineering what broke and when.

This is not a Databricks problem. It’s a schema governance problem. And Delta Lake gives you the tools to solve it — if you understand where the boundaries actually are.

Let me break this down layer by layer.

The Core Mental Model: One Rule to Anchor Everything

Here’s the practical rule that should govern your entire medallion architecture:

Let Bronze absorb additive changes safely. Make Silver and Gold evolve intentionally.

That’s it. Everything else flows from this.

Schema evolution in Delta Lake is a resilience feature for ingestion — not a substitute for data contracts and downstream engineering discipline. The moment you treat mergeSchema as a reason to stop thinking about what changes mean, you’ve taken on invisible debt that will surface at the worst time.

Why Source Schemas Change (And Why You Can’t Ignore It)

New columns get added. Old columns disappear. Names get changed in ways that look harmless but break your joins. Types shift from string to integer because the upstream team “cleaned up” their model.

Delta Lake addresses this with two fundamental features:

Schema enforcement — rejects writes that don’t conform to the existing table schema
Schema evolution — allows the schema to update when you explicitly opt in

This combination means you’re not stuck choosing between “fail on every upstream change” and “let anything through.” You get to be deliberate about it — by layer.

Medallion Layer Responsibilities

Think of each layer as having a different job, and therefore a different schema policy:

Bronze: Built to Absorb

Bronze should tolerate new columns automatically. This is where mergeSchema earns its keep.

PySpark — append with schema evolution:

bronze_df.write \
  .format("delta") \
  .mode("append") \
  .option("mergeSchema", "true") \
  .saveAsTable("main.bronze.orders")

SQL — merge with schema evolution:

MERGE INTO main.bronze.orders AS t
USING staging_orders AS s
ON t.order_id = s.order_id
WHEN MATCHED THEN UPDATE SET *
WHEN NOT MATCHED THEN INSERT *
WITH SCHEMA EVOLUTION;

Both patterns let a new source column land without dropping and recreating the target table. Table history is preserved. You don’t need an emergency deployment just because an upstream team added order_region to their API response.

That’s the Bronze promise: keep the historical record intact, don’t fail on harmless additions.

Silver: The Governed Transformation Contract

Silver is where the thinking happens.

Even though Delta can absorb anything upstream, Silver is where column naming, typing, nullability expectations, and business semantics get curated. A new Bronze column doesn’t automatically deserve a spot in Silver. Someone has to decide:

Does this column belong in the cleaned data model?
Does it need transformation before it’s usable?
Are downstream consumers ready for it?

PySpark — explicit column projection:

silver_df = (
    spark.table("main.bronze.orders")
    .select(
        "order_id",
        "customer_id",
        "order_ts",
        "status",
        "new_source_column"  # added deliberately after review
    )
)

silver_df.write \
  .format("delta") \
  .mode("append") \
  .option("mergeSchema", "true") \
  .saveAsTable("main.silver.orders")

SQL — explicit contract merge:

MERGE INTO main.silver.orders AS t
USING (
  SELECT
    order_id,
    customer_id,
    order_ts,
    status,
    new_source_column
  FROM main.bronze.orders
) AS s
ON t.order_id = s.order_id
WHEN MATCHED THEN UPDATE SET *
WHEN NOT MATCHED THEN INSERT *
WITH SCHEMA EVOLUTION;

Notice what both of these do: they name what’s going in. The contract is visible in the code. If someone reviews this PR, they can see that new_source_column was a deliberate choice, not an accident.

Gold: The Stable Product Layer

Gold should be the least surprising layer for anyone consuming your data.

New columns in Gold should only appear when there’s an actual reporting, analytics, ML, or product use case that requires them. If the new column matters for that use case — add it. If it doesn’t — leave Gold unchanged. Let Bronze and Silver hold it until there’s a reason to promote it.

This prevents source-system churn from leaking into dashboards and business-facing datasets. Your BI team should never open a report and find a column they didn’t know about.

The More Dangerous Problem: Dropped Columns

Everyone asks about adding columns. The dropped column scenario is where teams actually get hurt.

Delta schema evolution helps you absorb additive drift. A removed column is a different situation.

If a source stops sending a column, Bronze doesn’t necessarily need to be rebuilt. New rows can still land with NULL in that position. Old rows retain their previous values. The table can keep running.

The real danger is downstream logic. If Silver or Gold notebooks contain:

SELECT dropped_col ...
WHERE dropped_col = 'X' ...

Those jobs will fail. The column is gone from the data but still referenced in the transformation code.

Safe response to a dropped column:

Confirm it’s truly deprecated at the source — not a temporary outage or deployment issue
Update Bronze ingestion expectations if the column was treated as mandatory
Remove or replace references in Silver transformations, tests, and expectations
Remove or replace references in Gold aggregates, dashboards, and semantic models
Optionally clean up the metadata with ALTER TABLE ... DROP COLUMN once everything downstream is updated

ALTER TABLE main.silver.orders DROP COLUMN dropped_col;

When Delta column mapping is enabled, this operation is metadata-only — fast and non-disruptive, no full table recreation needed.

The `SELECT *` Question

I get asked this constantly. The short answer: use it in Bronze, avoid it in Silver and Gold.

In curated layers, explicit column lists are the better practice. They make the data contract visible, reviewable, and testable. When you use SELECT * in Silver or Gold, new Bronze columns can flow downstream unexpectedly — which sounds convenient until it quietly breaks a BI semantic layer or an ML feature pipeline.

Explicit projections in PySpark work well here:

expected_cols = ["order_id", "customer_id", "order_ts", "status"]
silver_df = spark.table("main.bronze.orders").select(*expected_cols)

If Bronze evolves, Silver doesn’t unless you decide it should.

The Governance Model That Actually Works

Here’s a practical framework that balances flexibility and control:

And the operating model that makes this real:

Upstream teams announce additive and breaking changes before deployment — this is a lightweight data contract, not just a courtesy message
Data engineering validates in a lower environment before production rollout
Bronze accepts new columns safely when appropriate
Silver and Gold evolve only after a conscious design decision
Monitoring and tests detect unexpected schema drift early

Two Misconceptions Worth Clearing Up

Misconception 1: “Delta schema evolution means no engineering work after source changes.”

It means fewer ingestion failures. It does not mean curated transformations and downstream contracts maintain themselves. Those still require human decisions.

Misconception 2: “You have to drop and recreate tables after schema changes.”

Almost never. Delta tables can evolve in place, preserving history and avoiding disruptive rebuilds. Even when a column disappears from the source, table recreation is usually unnecessary — what matters is removing invalid downstream references and deciding later whether to drop the column from metadata.

The Final Position

Delta Lake schema evolution is best understood as a controlled flexibility mechanism.

It’s excellent for keeping Bronze ingestion robust when source systems add columns. It simplifies the physical handling of approved schema changes in Silver and Gold. But stable medallion pipelines still depend on clear ownership, explicit downstream contracts, and coordination between source teams and data engineers.

The tools work. The question is always whether the process around them is designed with enough intentionality.

Get that right, and schema changes stop being emergencies. They become routine — handled calmly, layer by layer, with history intact and consumers informed.

If you’re building on Databricks and want to go deeper on medallion architecture design, Unity Catalog governance, or PySpark transformation patterns — this is exactly the kind of topic we teach in RADE (Real-world AWS Data Engineering). Check out https://dataengineeringhub.in and https://sachinchandrashekhar.com for more.

Was this useful? Share it with a data engineer who’s dealt with a schema surprise at 2am. They’ll appreciate it.

40% of a Data Engineering Team Got Cut. Here's What They Didn't Know.

Sachin Chandrashekhar — Mon, 27 Apr 2026 00:17:24 GMT

A manager I know told me something that stopped me mid-conversation.

His friend — a manager at Wells Fargo in Bangalore — watched 40% of his data engineering team get let go.

Not because they were bad engineers. Not because the business was struggling. But because after introducing Amazon Q and an AI co-pilot, the work that once needed 10 people could now be done by 6. The math was simple. The human cost wasn’t.

I’ve seen a version of this at my own company too — our projects now go through peer code reviews, and where that process used to take significant back-and-forth, I’m able to implement the suggested changes a lot faster using Amazon Q. What used to be hours of rework is now handled in a fraction of the time.

This isn’t a story about fear. It’s a story about what skills protect you — and what skills leave you exposed.

And it starts with understanding the ground beneath your feet.

What Data Engineering Actually Is (Before We Talk About the Fancy Stuff)

Strip away the buzzwords and data engineering is just this:

You have data sitting somewhere. You move it, clean it, transform it, and land it somewhere useful. That’s it.

The “somewhere” on both ends? That’s where it gets interesting — and where most people get lost.

Your source data could be in a transactional database (Oracle, MySQL), in files dropped by an external vendor, or already sitting in a data lake your upstream team owns. The destination could be a data warehouse, another data lake, or something newer that I’ll get to in a moment.

Your job as a data engineer is to build reliable pipelines between those two points. In bulk. At scale. Without losing data. Without it breaking at 2am.

That’s the job. The tools change. The fundamentals don’t.

The Three Eras — And Why They All Still Exist

Era 1: The Data Warehouse

Before data lakes were a thing, companies loaded everything into data warehouses. Teradata. Netezza. Oracle. These were powerful machines, purpose-built for analytical queries.

They ran on something called MPP — Massively Parallel Processing — where data was distributed across nodes and queried in parallel. Fast. Reliable. Battle-tested.

Also: expensive. We’re talking $30–40 million a year for large enterprises. And rigid. Every table had to be modeled upfront. ETL developers would spend weeks just designing the schema before a single row of data moved.

You had to know what your data looked like before you stored it. That was the deal.

Era 2: The Data Lake

Then came the data lake.

The idea was elegant: stop modeling everything upfront. Just dump your data — structured, semi-structured, raw files, all of it — into cheap storage. On AWS, that’s S3. Pennies per gigabyte. No schema required. You figure out what you need later.

This was the birth of ELT (Extract, Load, Transform) — load first, transform second. You land the file, throw an AWS Glue Crawler at it to infer the schema, register it in the Glue Catalog, and suddenly you’ve got a queryable table via Athena. No server. No pre-built schema. Run SQL straight on top of your S3 files.

You can even build dashboards on this — Amazon QuickSight can connect directly to Athena. Great for quick POCs. Great for proving the value of new datasets before investing in heavy infrastructure.

But here’s the problem nobody told you about:

You can’t update a file.

A CSV sitting in S3 is a file. If record ID 3 changes its address from “Mumbai” to “Bangalore,” you can’t just UPDATE that file. There’s no insert/update/delete in a data lake. No ACID properties. No transactions.

And then there’s schema drift. Your source team sends you a file every day with 20 columns. One morning you wake up and it has 25 columns — or a data type changed from integer to string — and your pipeline breaks, silently or loudly. There’s no built-in protection for that.

Data lakes were cheap and flexible. They were also fragile.

Era 3: The Data Lakehouse

This is where we are now.

The data lakehouse takes the cheap, scalable storage of a data lake and adds the transactional capabilities of a data warehouse — on top of the same S3 layer you already have.

How? Through open table formats. The three most important ones:

Delta Lake (popularized by Databricks)
Apache Iceberg (increasingly the industry default, especially on AWS)
Apache Hudi (strong at CDC and streaming use cases)

These aren’t just file formats. They’re metadata layers that sit on top of your S3 files and give you things you never had in a plain data lake:

ACID transactions. You can now insert, update, and delete individual records. That record ID 3 with the changed address? Just run MERGE INTO and it’s handled.

Schema evolution. When the source team adds columns, the table doesn’t break. The format adjusts. You can even configure it to reject schema changes until you’ve explicitly validated them — which is the right call for production pipelines.

Time travel. Every write to a lakehouse table creates a new version. Want to know what your table looked like 3 days ago? Query it. Want to roll back to a previous state because bad data got loaded? Do it. This is version control for your data, built in by default.

The result: you keep the economics of a data lake (S3 storage, serverless compute) and gain the reliability of a data warehouse. That’s why lakehouses have become the default architecture at companies serious about their data platform.

The AWS Toolkit, Honestly Explained

If you’re working on AWS, here’s how the pieces connect — without the marketing fluff.

AWS Glue is your primary ETL service. It’s serverless, meaning you don’t manage servers. You write PySpark code, point it at your data, and Glue provisions the cluster, runs the job, and tears it down. You pay only for what you use. For most companies starting out, Glue is the right choice.

The tradeoff: it’s a bit more expensive per compute hour than managing your own cluster. That’s the price of convenience.

Amazon EMR (Elastic MapReduce) is what companies graduate to when Glue starts getting expensive. You configure your own Spark cluster — choose the instance types, the number of nodes, the memory. More control, more complexity, lower cost at scale. EMR also has a serverless flavor if you want a middle ground.

Both Glue and EMR use PySpark underneath. Same language. Different operating model.

AWS Athena is your serverless query engine. It reads directly from S3, uses the Glue Catalog for table metadata, and lets you run SQL on raw files without moving data anywhere. Perfect for exploration, quick POCs, and lightweight transformations before you need something more robust.

Databricks sits across all of this as an alternative path — an end-to-end platform that bundles compute, storage, Delta Lake, ML tooling, and orchestration into one console. If AWS feels like assembling a puzzle from individual pieces, Databricks hands you a mostly-assembled one. The tradeoff is vendor lock-in and cost. Both paths lead to PySpark.

And that’s the point: PySpark is the common thread. Whether you’re on Glue, EMR, or Databricks, PySpark is what you’ll use to process data at scale. Get that foundation right and the rest becomes a matter of configuration.

The Question Nobody Is Asking

Here’s what I see every week in my community:

Smart, experienced IT professionals — people who’ve spent 8, 10, 15 years on Informatica, SSIS, Teradata — who know their current tools deeply but feel increasingly like those tools are aging out beneath them.

They’re right to feel that.

But the answer isn’t panic. It’s not “learn everything in 3 months.” The answer is a structured, prioritized skill stack — built on fundamentals that don’t change even when the tools do.

The data engineering lifecycle is: ingest → store → process → serve → monitor. That hasn’t changed since the mainframe days. What’s changed is the technology at each layer, and how quickly AI tools can help you work within those layers.

The engineers who got cut at Wells Fargo weren’t doing the wrong things. They were doing things that AI could replicate faster. The engineers who didn’t get cut were the ones who understood why the systems were designed the way they were — who could prompt an AI tool intelligently, review its output critically, and architect solutions that humans and machines together couldn’t produce separately.

That’s the skill you’re building toward. Not memorizing API documentation. Not blindly running tutorials. Understanding the landscape well enough to make real decisions.

The lake is no longer enough. The warehouse is too expensive to be the whole story. The lakehouse is where production data engineering lives in 2025.

Now you know why.

If you’re an IT professional looking to transition into cloud data engineering — AWS, PySpark, lakehouse architecture, real production projects — I run a program called RADE (Real-world AWS Data Engineering).

You can register for my free MasterClass where I talk about the Roadmap to high paying jobs here:

https://aws.sachin.cloud

If you attend, you’ll also get access to my

Agentic AI–powered Data Engineering course (worth $100 / ₹8000) —

where I show how to use AI tools to accelerate real engineering work.

Join only if you’re serious about getting into high paying AWS roles.

My Student Told Me He Might Get Laid Off. This Is What I Told Him.

Sachin Chandrashekhar — Thu, 16 Apr 2026 23:35:31 GMT

My student reached out to me recently for a 1–1 call.

He wasn’t a beginner.

In fact, he had done what many people struggle to do.

He had:

Moved from a non-tech background into analytics
Built dashboards and pipelines
Even worked on a data warehouse and AWS systems

On paper, he was doing well.

But there was a problem.

He said:

“I might get laid off soon. I want to switch to data engineering… but I’m not sure how to position myself.”

This is where most people get stuck.

Not because they lack skills.

But because they lack clarity.

The Real Problem Was Not Skills

As we spoke, one thing became clear.

He wasn’t confused about technology.

He was confused about identity.

His resume reflected:

Data Analyst
Analytics Engineer
Some Data Engineering work

Everything mixed together.

And that is exactly what hurts you in the market.

Because hiring managers don’t hire “mixed profiles.”

They hire clarity.

The Market Does Not Reward “Jack of All Trades”

At one point, I told him something very simple:

“Your resume should reflect the role you want — not just the work you’ve done.”

This is where most professionals go wrong.

They try to be honest.

They try to show everything.

They try to say:

“I’ve done analytics…”
“I’ve also worked on pipelines…”
“I’ve also touched AWS…”

But what the hiring manager sees is:

“This person is not a specialist.”

And in today’s market, that is a problem.

Because teams are not looking for generalists.

They are looking for:

Someone who can own a problem
Someone who has gone deep
Someone who can contribute from day one

Depth Changes Everything

During the conversation, the student himself said something important.

When he worked with data engineers, he noticed:

Their systems were scalable
Their code was structured
Everything was plug-and-play

That was his turning point.

He realized:

Analytics often solves immediate problems.
Engineering solves problems that scale.

And that is the difference.

The Shift You Need to Make

If you are trying to move into data engineering, understand this:

It is not just about learning PySpark.

It is not just about doing a course.

It is about realignment.

You need to align:

Your resume
Your LinkedIn
Your projects
Your narrative

All towards one thing:

“I am a Data Engineer.”

Not “trying to become one.”

Not “part-time.”

Not “50-50.”

Clear.

Focused.

Intentional.

A Practical Strategy I Gave Him

I didn’t give him a motivational speech.

I gave him a strategy:

Refactor your resume completely
- Remove confusion
- Focus only on data engineering
A/B test your positioning
- Apply for 1–2 weeks
- If calls are low → double down on DE positioning
Focus on depth, not breadth
- PySpark
- Real-world issues (memory, performance)
- Systems thinking
Ignore unnecessary distractions
- Not every company needs DSA
- Not every tool matters

Focus on what moves the needle.

The Hard Truth

Most people delay this decision.

They keep one foot in analytics.

One foot in engineering.

And they stay stuck for years.

Because they never commit.

The Long Game

Careers are not built on reacting to layoffs.

They are built on:

deliberate decisions
consistent depth
and clear positioning

If you do that:

The market starts recognizing you differently.

If You’re Serious About This

I write about these real scenarios — not theory.

Not generic advice.

But what actually happens when professionals try to transition into data engineering.

If this is something you’re working through:

You can read more here →

https://dataengineeringhub.substack.com

If You’re Trying to make a transition to higher paying AWS roles

If you’re currently:

Moving from analytics to data engineering
Sitting on partial exposure but lacking depth
Or trying to reposition yourself in the market

Then the next step is not more random tutorials.

It’s understanding:

What actually matters in real-world data engineering
What skills move the needle in interviews
And how to build depth — not just surface-level knowledge

I break this down in a live masterclass.

You can register here:

https://aws.sachin.cloud

If you attend, you’ll also get access to my
Agentic AI–powered Data Engineering course (worth $100 / ₹8000) —
where I show how to use AI tools to accelerate real engineering work.

Join only if you’re serious about getting into high paying AWS roles.

The Market Isn’t Bad. It’s Selective. (Insights From a Community Discussion This Saturday)

Sachin Chandrashekhar — Sat, 21 Feb 2026 18:11:09 GMT

India vs US.
Interview patterns.
What companies are actually testing.
Why some candidates struggle.
And why others move ahead quietly.

Here’s what became very clear.

The market is not dead.

It is selective.

And selectivity exposes shallow preparation.

What Interviews Actually Look Like Now

Across geographies, members shared a similar pattern.

Companies are no longer impressed by:

“I completed 5 courses.”
“I know Spark, Kafka, Airflow, Snowflake.”
“Here are 200 LeetCode solutions.”

Instead, interviews are evolving into conversations like:

Why did you choose this architecture?
What trade-offs did you consider?
How would you reduce cost in this pipeline?
What happens if this job fails at 2 AM?
How would you design this for scale?

This is engineering thinking.

Not tutorial repetition.

India vs US: Structural Differences

During our discussion, some differences became obvious.

🇮🇳 India Market

Often more tool-specific questioning.
More implementation detail focus.
Sometimes more interview rounds.
Higher competition at mid-level roles.

🇺🇸 US Market

Strong emphasis on system design.
Ownership mindset.
Trade-off discussions.
Architectural clarity.

But here’s the common denominator:

Both markets reward depth.

Neither rewards superficial exposure.

Why Many Candidates Struggle

One thing that stood out in the discussion:

Many candidates prepare reactively.

They:

Watch content.
Memorize answers.
Practice common questions.
Hope the interviewer sticks to script.

But modern interviews are adaptive.

If you cannot reason through ambiguity,
you struggle.

If you cannot explain decisions,
you stall.

If you cannot connect business needs to technical design,
you plateau.

The market isn’t punishing you.

It’s measuring you.

What Actually Wins (As Observed in Real Conversations)

From both experience and community insights, five patterns stand out:

1️⃣ Deep understanding of fundamentals
2️⃣ Real-world project exposure
3️⃣ Ability to articulate trade-offs
4️⃣ Structured practice
5️⃣ Consistency over bursts

You don’t need 20 tools.

You need mastery of fewer tools at greater depth.

The Calm Advantage

The engineers who are moving ahead right now are not panicking.

They are:

Studying deliberately.
Practicing system design.
Building projects with cost and scale in mind.
Discussing architectures with peers.
Thinking long term.

They are playing the long game.

Final Thought

If you feel overwhelmed by the market,
don’t ask:

“Is the market bad?”

Ask:

“Am I prepared at the level the market expects?”

That question changes everything.

Because in 2026,
the winners won’t be the fastest learners.

They’ll be the deepest.

Why Working Harder Stopped Working for Me

Sachin Chandrashekhar — Sun, 08 Feb 2026 03:12:56 GMT

Sometimes, we get stuck with the past—especially when that past brought us success.

In the summer of 2003, when I was in 12th grade at a private class called Expert Coaching Classes in Mangalore, my maths teacher told us something that stayed with me:

“You shouldn’t be wasting a single second of your time this year.”

I took that advice extremely seriously.

So seriously that I remember turning down a movie plan with my sister and cousin.
Kal Ho Na Ho was huge back then—but I said no. I chose to study instead.

I sacrificed a lot that year.

And when the results came out, I scored 96 in Chemistry, 98 in Maths, and 100 in Physics.

For more than two decades after that, I was stuck with this formula for success:

Never waste time.
Use every possible hour.

It worked—because I was 17.

But I carried the same mindset into adulthood.
Into my career.
Into relationships.
Into health.
Into everything.

And that’s where the problems began.

I struggled every single year to bring balance into my life.

Career.
Physical health.
Mental health.
Relationships.

I would fixate on one thing at a time, obsess over it, and neglect everything else—exactly how I did in 12th grade.

Things got even harder after I started teaching AWS in 2024.

Suddenly, I wasn’t just balancing a career.
I was also building a coaching business.

Something had to change.

So in 2025, I started searching seriously for answers.

I read a lot.
And a few books fundamentally changed how I think about productivity:

Buy Back Your Time
Deep Work
Atomic Habits
Digital Minimalism
Indistractable

Here’s what finally clicked for me:

Productivity is not about cramming more hours.
It’s about creating a schedule—and respecting it.

It’s about intentionally allocating your waking hours so that every important area of your life gets attention.

And this isn’t a one-time fix.

Old habits don’t die easily.
This kind of optimization takes months—sometimes longer.

I’m still working on it as I write this.

One powerful lesson I learned from Dan Martell was this:

Learn to say NO.

But here’s the catch:

You can only confidently say NO when you are crystal clear about what you’ll do with the time you save.

In the end, everything boils down to clarity.

Clarity in thought.
Clarity in action.

Most of us don’t lack solutions.
We’re overwhelmed by too many of them—and that’s what makes prioritization so hard.

So how do you fix that?

By being persistent.
By being relentless about your goals.

A ship without a captain will drift in the ocean.

When I look back, I can clearly see how often I drifted—until recently.

One last thing.

For many years, I ignored one of my mentor’s simplest pieces of advice:

Read books.

If there’s one thing I’d strongly recommend—it’s this:

Read.

Whatever area of life you want to improve, books are a treasure trove of wisdom.

Go to Perplexity.ai.
Ask for the best books on that topic.
Pick one.
Start reading.

You won’t regret it. I promise.

When 12 Years of Experience Suddenly Isn’t Enough Anymore

Sachin Chandrashekhar — Tue, 03 Feb 2026 01:45:31 GMT

12+ years of experience.
IBM → Teradata → Big Data → Azure → Databricks → Snowflake.
Worked in India. Then moved to the US on H1B.
Multiple enterprise clients. Big brands. Real production work.

On paper, this is a strong profile.

And yet, he was rejected in the interview.

The feedback was blunt:

“Your projects all look the same.”

That sentence stayed with me long after the call ended.

The uncomfortable truth about senior data engineering careers

This wasn’t a skill issue.

He knows Spark.
He knows Databricks.
He knows Azure deeply.
He understands data pipelines end to end.

The real issue was something more subtle — and far more common than people realize.

Repetition.

Every project on his resume followed the same pattern:

Ingest data into a raw layer
Transform with Spark
Publish to curated / gold
Different company, different timeline… same architecture

This was cutting-edge in 2018.
In 2026, it’s expected.

What once made you valuable can quietly become the reason you’re filtered out.

Why “more experience” isn’t the answer anymore

A lot of senior engineers respond to this situation by thinking:

“I just need one more big project.”
“I need deeper Spark knowledge.”
“I need another certification.”

But that’s not the real gap.

The gap is how your experience is framed — and how the market now evaluates senior talent.

At 10–15 years of experience, companies aren’t just hiring:

builders
implementers
task executors

They’re hiring people who can:

make architectural decisions
justify trade-offs
think in cost, scale, and governance
explain why one approach beats another

That’s a different job — even if the title still says Data Engineer.

The “Azure → AWS” illusion many engineers fall into

One part of our conversation really stood out.

He told me:

“I actually got more interview calls for AWS roles than I expected.”

And in those interviews, he did well — almost well enough.

He explained confidently:

Databricks → EMR
ADF → Glue
Synapse → Redshift

The interviewer agreed.
He liked the profile.

But the final response was:

“I like your understanding… but you don’t have real AWS experience.”

This is where many experienced engineers get stuck.

Conceptual mapping is not the same as architectural confidence.

AWS interviews don’t just test:

Can you build this?

They test:

Why would you choose this service?
What happens when data volume grows 10x?
How much does this cost at scale?
What breaks first?
How would you secure it?

That’s not about tools.
That’s about decision-making.

The quiet shift from “developer” to “architect”

Here’s the part most people don’t talk about openly:

You don’t become an architect by getting promoted.
You become one by changing how you think and speak.

Developers focus on:

implementation
correctness
delivery

Architects focus on:

trade-offs
patterns
constraints
cost
governance
stakeholder expectations

Same data.
Same cloud.
Completely different mental model.

And here’s the key mistake many people make:

They wait for the architect title before acting like one.

In reality, it works the other way around.

Why AWS exposes career ceilings faster

In Azure + Databricks ecosystems, many teams converge on a single “default” approach:

ADF + Databricks + ADLS
Spark everywhere
Medallion architecture everywhere

AWS is different.

For the same problem, you might reasonably choose:

Lambda
Glue
Athena
EMR
Redshift
ECS/Fargate

Each choice has cost, scale, and operational implications.

That’s why AWS interviews feel harder — not because the tech is harder, but because you’re forced to justify your thinking.

And that’s exactly why AWS becomes a growth catalyst for senior engineers who feel stuck.

Architecture isn’t about knowing everything

Another misconception I see a lot:

“I need to know everything before I move to an architect role.”

No architect knows everything.

What they do know is:

how to evaluate options
how to say no to bad ideas
how to explain trade-offs to non-technical stakeholders
how to design systems that won’t collapse under scale or cost

They also understand adjacent areas:

data governance
access control
security boundaries
data modeling
batch vs streaming trade-offs

Not because they implement all of it — but because they lead conversations about it.

The part nobody wants to hear (but needs to)

Career pivots don’t happen in weekends.

They don’t happen in 30-day challenges.
They don’t happen after one course.

Real clarity takes:

months of consistent learning
exposure to multiple architectures
reflection
rewriting your resume
rewriting your narrative

For most experienced engineers, this is a 6–8 month journey.

And that’s okay.

If you’re 10–15 years into your career, this isn’t a failure point.
It’s an inflection point.

A message I want senior engineers to hear clearly

If you’re feeling:

bored
repetitive
boxed into the same architecture
worried about long-term growth

You’re not behind.

You’re just being asked to evolve.

The engineers who thrive long-term aren’t the ones who chase every new tool.
They’re the ones who learn how to think in systems, decisions, and outcomes.

That shift is uncomfortable.
But it’s also where careers open up again.

I see this pattern very often.
And I’ve seen enough people on the other side of it to say this confidently:

👉 You’re not stuck.
👉 You’re just at the point where you will need add skills that matter.

And that’s not a bad place to be.

You can get to your dream role if you consistently add deep skills!