Databricks
Databricks Data Engineer Associate
Databricks Certified Data Engineer Associate
Databricks' entry certification for data engineers who build ELT pipelines on the lakehouse with Spark SQL, Python, and Delta Lake. Increasingly requested for data engineering roles in Databricks shops.
What's on the exam
Databricks Certified Data Engineer Associate Exam Guide
Databricks Lakehouse Platform
24%Lakehouse architecture vs data warehouse and data lake · Workspace, notebooks, and Repos · Clusters and compute management · Delta Lake fundamentals (ACID, time travel) · Medallion architecture
ELT with Spark SQL and Python
29%Relational entities (databases, tables, views) · Creating and writing to tables (CTAS, INSERT, MERGE) · Data cleaning and transformation · SQL UDFs and higher-order functions · Python-SQL interoperability in notebooks
Incremental Data Processing
22%Structured Streaming basics · Auto Loader · Multi-hop (medallion) pipelines · Delta Live Tables · Change data capture
Production Pipelines
16%Databricks Jobs and multi-task workflows · Scheduling and orchestration · Error handling, repairs, and retries · Databricks SQL dashboards and alerts
Data Governance
9%Unity Catalog concepts · Entity permissions and grants · Securables and access patterns · Governance best practices
Frequently asked questions
How much does the Databricks Data Engineer Associate cost?
The Databricks Data Engineer Associate costs $200. Per attempt; retakes are full price.
How long is the Databricks Data Engineer Associate and how many questions does it have?
45 scored questions — 90 minutes.
What do you need to pass the Databricks Data Engineer Associate?
Pass/fail; passing threshold not published.
Can you retake the Databricks Data Engineer Associate?
14-day waiting period between attempts; each attempt paid.
What is the best way to study for the Databricks Data Engineer Associate?
Study the official blueprint, not random material: the exam is weighted by domain (Databricks Lakehouse Platform 24%, ELT with Spark SQL and Python 29%, Incremental Data Processing 22%, Production Pipelines 16%, Data Governance 9%). Spaced-repetition flashcards built domain-by-domain against that blueprint are the most time-efficient way to cover everything the exam tests.
Program in development
We're building a blueprint-complete program for this exam. Meanwhile, explore live programs across 7 exam.
Explore programs →