Azure Databricks & Spark For Data Engineers (PySpark / SQL)
Coupon Verified on January 10th, 2025
Course Name : Azure Databricks & Spark For Data Engineers (PySpark / SQL)
Students : 109,402
Duration : 20 hrs & 21 downloadable resources
Avg Rating : 4.7 (18,255 ratings)
Original Price : $119.99
Discount Price : $13
Instructor / Provider : Ramesh Retnasamy, Udemy
Course Type : Self Paced Online Course. Lifetime Access
Coupon : Click on ENROLL NOW to apply discount code
What You’ll Learn:
Build Real-World Data Engineering Projects
- Develop a complete data project using Azure Databricks and Spark Core with real-world datasets.
Professional Data Engineering Skills
- Master tools like Azure Databricks, Delta Lake, Spark Core, Azure Data Lake Gen2, and Azure Data Factory (ADF) to tackle enterprise-level projects.
Azure Databricks Mastery
- Create and manage notebooks, dashboards, clusters, cluster pools, and jobs in Azure Databricks.
- Learn to ingest and transform data using PySpark and analyze it using Spark SQL.
- Implement Lakehouse architecture using Delta Lake and visualize outputs with dashboards.
- Connect Azure Databricks to Power BI for report generation.
Data Lake and Lakehouse Architectures
- Understand the principles of Data Lake and Lakehouse Architecture.
- Implement Lakehouse solutions using Delta Lake for efficient data storage and processing.
Azure Data Factory Expertise
- Build and schedule pipelines to execute Databricks notebooks.
- Create robust workflows to handle missing files and other unexpected scenarios.
- Monitor and troubleshoot pipelines and triggers to ensure seamless operations.
Unity Catalog and Data Governance
- Gain a comprehensive understanding of Unity Catalog for managing data governance.
- Implement data governance solutions, including data discovery, data lineage, auditing, and access control using Unity Catalog.
Frequently Bought Together
Course Modules:
Azure Databricks
- Understand Databricks architecture within Azure.
- Work with Databricks notebooks, utilities, and magic commands.
- Pass parameters between notebooks and create workflows.
- Configure clusters, cluster pools, and jobs for efficient processing.
- Use Azure Key Vault to securely mount Azure Storage in Databricks.
- Work with Databricks Tables, DBFS, and Delta Lake for data storage and transformation.
Spark Core (PySpark & SQL)
- Understand Spark architecture, Data Sources API, and DataFrame API.
- Perform data ingestion with PySpark (CSV, JSON) into the data lake as Parquet files or tables.
- Execute transformations like Filter, Join, Aggregations, GroupBy, and Window functions.
- Create local and temporary views and use Spark SQL for complex queries.
- Implement incremental load patterns and full refresh strategies using partitions.
Delta Lake
- Learn about Data Lakehouse architecture and Delta Lake’s role.
- Perform advanced operations like Read, Write, Update, Delete, and Merge using PySpark and SQL.
- Use features like Time Travel, Vacuum, and convert Parquet files to Delta format.
- Implement incremental load patterns for efficient data handling.
Unity Catalog
- Understand the basics of Data Governance and Unity Catalog’s features.
- Set up Unity Catalog Metastore and enable Databricks workspaces.
- Create and manage Unity Catalog objects using the 3-level namespace.
- Access and configure external data lakes via Unity Catalog.
- Develop a mini project showcasing key governance capabilities like Data Audit, Lineage, and Access Control.
Azure Data Factory
- Create pipelines to execute Databricks notebooks.
- Design robust workflows with activity dependencies and triggers.
- Schedule pipelines for regular execution and monitor for errors and outputs.