Apache Spark with Python – Big Data with PySpark and Spark

Get Started with Apache Spark :
Course Overview
Introduction to Spark
Install Java and Git
Set up Spark
Winutils URL
Run our first Spark job

RDD Basics
Create RDDs
Spark Data Sources
Map and Filter Transformation
Solution to Airports by Latitude Problem
FlatMap Transformation
Set Operations
Sampling with Replacement and Sampling without Replacement
Solution for the Same Hosts Problem
Solution to Sum of Numbers Problem
Important Aspects about RDD
Summary of RDD Operations
Caching and Persistance

Spark Architecture and Components :
Spark Architecture
Spark Components

Pair RDD :
Introduction to Pair RDD
Create Pair RDDs
Filter and MapValue Transformations on Pair RDD
Reduce By Key Aggregation
Solution for the Average House Problem
Group By Key Transformation
Sort By Key Transformation
Solution for the Sorted Word Count Problem
Data Partitioning
Join Operations
Extra Learning Material: How are Big Companies using Apache Spark

Advanced Spark Topics :
Solution to StackOverflow Survey Follow-up Problem
Broadcast Variables

Spark SQL :
Introduction to Spark SQL
Spark SQL in Action
Spark SQL practice: House Price Problem
Spark SQL Joins
Dataframe or RDD
Dataframe and RDD Conversion
Performance Tuning of Spark SQL

Running Spark in a Cluster :
Introduction to Running Spark in a Cluster
Run Spark Application on Amazon EMR (ElasticMapReduce) cluster
Extra Learning Material: Avoid These Mistakes While Writing Apache Spark Program

