What you’ll learn
-
Setup the Single Node Hadoop and Spark using Docker locally or on AWS Cloud9
-
Review ITVersity Labs (exclusively for ITVersity Lab Customers)
-
All the HDFS Commands that are relevant to validate files and folders in HDFS.
-
Quick recap of Python which is relevant to learn Spark
-
Ability to use Spark SQL to solve the problems using SQL style syntax.
-
Pyspark Dataframe APIs to solve the problems using Dataframe style APIs.
-
Relevance of Spark Metastore to convert Dataframs into Temporary Views so that one can process data in Dataframes using Spark SQL.
-
Apache Spark Application Development Life Cycle
-
Apache Spark Application Execution Life Cycle and Spark UI
-
Setup SSH Proxy to access Spark Application logs
-
Deployment Modes of Spark Applications (Cluster and Client)
-
Passing Application Properties Files and External Dependencies while running Spark Applications
Who this course is for:
- Any IT aspirant/professional willing to learn Data Engineering using Apache Spark
- Python Developers who want to learn Spark to add the key skill to be a Data Engineer
- Scala based Data Engineers who would like to learn Spark using Python as Programming Language
Recommended Courses
Deal Score0
Disclosure: This post may contain affiliate links and we may get small commission if you make a purchase. Read more about Affiliate disclosure here.