What you’ll learn
-
Setting up self support lab with Hadoop (HDFS and YARN), Hive, Spark, and Kafka
-
Overview of Kafka to build streaming pipelines
-
Data Ingestion to Kafka topics using Kafka Connect using File Source
-
Data Ingestion to HDFS using Kafka Connect using HDFS 3 Connector Plugin
-
Overview of Spark Structured Streaming to process data as part of Streaming Pipelines
-
Incremental Data Processing using Spark Structured Streaming using File Source and File Target
-
Integration of Kafka and Spark Structured Streaming – Reading Data from Kafka Topics
Here is a brief outline of the course. You can choose either Cloud9 or GCP to provision a server to set up the environment.
- Setting up Environment using AWS Cloud9 or GCP
- Setup Single Node Hadoop Cluster
- Setup Hive and Spark on top of Single Node Hadoop Cluster
- Setup Single Node Kafka Cluster on top of Single Node Hadoop Cluster
- Getting Started with Kafka
- Data Ingestion using Kafka Connect – Web server log files as a source to Kafka Topic
- Data Ingestion using Kafka Connect – Kafka Topic to HDFS a sink
- Overview of Spark Structured Streaming
- Kafka and Spark Structured Streaming Integration
- Incremental Loads using Spark Structured Streaming
Who this course is for:
- Experienced ETL Developers who want to learn Kafka and Spark to build streaming pipelines
- Experienced PL/SQL Developers who want to learn Kafka and Spark to build streaming pipelines
- Beginner or Experienced Data Engineers who want to learn Kafka and Spark to build streaming pipelines
Recommended Courses
Deal Score-1
Disclosure: This post may contain affiliate links and we may get small commission if you make a purchase. Read more about Affiliate disclosure here.