Instructed by Derar Alhussein
By the end of this course, you should be able to:
- Model data management solutions, including:
- Lakehouse (bronze/silver/gold architecture, tables, views, and the physical layout)
- General data modeling concepts (constraints, lookup tables, slowly changing dimensions)
- Build data processing pipelines using the Spark and Delta Lake APIs, including:
- Building batch-processed ETL pipelines
- Building incrementally processed ETL pipelines
- Deduplicating data
- Using Change Data Capture (CDC) to propagate changes
- Optimizing workloads
- Understand how to use and the benefits of using the Databricks platform and its tools, including:
- Databricks CLI (deploying notebook-based workflows)
- Databricks REST API (configure and trigger production pipelines)
- Build production pipelines using best practices around security and governance, including:
- Managing clusters and jobs permissions with ACLs
- Creating row- and column-oriented dynamic views to control user/group access
- Securely delete data as requested according to GDPR & CCPA
- Configure alerting and storage to monitor and log production jobs, including:
- Recording logged metrics
- Debugging errors
- Follow best practices for managing, testing and deploying code, including:
- Relative imports
- Scheduling Jobs
- Orchestration Jobs
Who this course is for:
- Anyone aiming to pass the Databricks Data Engineer Professional certification exam
- Junior Data Engineers on Databricks wanting to gain the skills of Professional Data Engineers
Similar Course Coupons
Deal Score0
Disclosure: This post may contain affiliate links and we may get small commission if you make a purchase. Read more about Affiliate disclosure here.