There are currently no items in your shopping cart.

User Panel

Forgot your password?.

The Ultimate Hands-on Hadoop

Video Introducing this tutorial

Learn all the buzzwords! And install Hadoop :
[Activity] Introduction, and install Hadoop on your desktop!
Hadoop Overview and History
Overview of Hadoop Ecosystem
Tips for Using This Course

Using Hadoop's Core: HDFs and MapReduce :
HDFS: What it is, and how it works
[Activity] Install the MovieLens dataset into HDFS using the Ambari UI
[Activity] Install the MovieLens dataset into HDFS using the command line
MapReduce: What it is, and how it works
How MapReduce distributes processing
MapReduce example: Break down movie ratings by rating score
[Activity] Installing Python, MRJob, and nano
[Activity] Code up the ratings histogram MapReduce job and run it
[Exercise] Rank Movies by their popularity
[Activity] Check your results against mine!

Programming Hadoop with Pig :
Introducing Ambari
Introducing Pig
Example: Find the oldest movie with 5-star rating using Pig
[Activity] Find old 5-star movies with Pig
More Pig Latin
[Exercise] Find the most-rated one-star movie
Pig Challenge: Compare Your Results to Mine!

Programming Hadoop with Spark :
Why Spark?
The Resilient Distributed Datasets(RDD)
[Activity] Find the movie with the lowest average rating - with RDD's
Datasets and Spark 2.0
[Activity] Find the movie with the lowest average rating - with DataFrames
[Activity] Movie recommendations with MLLib
[Exercise] Filter the lowest-rated movies by number of ratings
[Activity] Check your results against mine!

Using relational data stores with Hadoop :
What is Hive?
[Activity] Use Hive to find the most popular movie
How Hive Works?
[Exercise] Use Hive to find the movie with the highest average rating
Compare your solution to mine
Integrating MySQL with Hadoop
[Activity] Install MySQL and import our movie data
[Activity] Use Sqoop to import data from MySQL to HFDS/Hive
[Activity] Use Sqoop to export data from Hadoop to MySQL

Using non-relational data stores with Hadoop :
Why NoSQL?
What is HBase
[Activity] Import movie ratings into HBase
[Activity] Use HBase with Pig to import data at scale
Cassandra Overview
[Activity] Installing Cassandra
[Activity] Write Spark output into Cassandra
MongoDB overview
[Activity] Install MongoDB, and integrate Spark with MongoDB
[Activity] Using the MongoDB shell
Choosing a database technology
[Exercise] Choose a database for a given problem

Querying Your Data Interactively :
Overview of Drill
[Activity] Setting up Drill
[Activity] Querying across multiple databases with Drill
Overview of Phoenix
[Activity] Install Phoenix and query HBase with it
[Activity] Integrate Phoenix with Pig
Overview of Presto
[Activity] Install Presto, and query Hive with it
[Activity] Query both Cassandra and Hive using Presto

Managing your Cluster :
YARN Explained
Tez explained
[Activity] Use Hive on Tez and measure the performance benefit
Mesos explained
ZooKeeper explained
[Activity] Simulating a failing master with ZooKeeper
Oozie explained
[Activity] Set up a simple Oozie workflow
Zeppelin overview
[Activity] Use Zeppelin to analyze movie ratings, part 1
[Activity] Use Zeppelin to analyze movie ratings, part 2
Hue Overview
Other technologies worth mentioning

Feeding Data to your Cluster :
Kafka explained
[Activity] Setting up Kafka, and publishing some data
[Activity] Publishing web logs with Kafka
Flume explained
[Activity] Set up Flume and publish logs with it
[Activity] Set up Flume to monitor a directory and store its data in HDFS

Analysing Streams of Data :
Spark Streaming: Introduction
[Activity] Analyze web logs published with Flume using Spark streaming
[Exercise] Monitor Flume-published logs for errors in real time
Exercise solution: Aggregating HTTP access codes with Spark Streaming
Apache Storm: Introduction
[Activity] Count words with Storm
Flink: An Overview
[Activity] Counting words with Flink

Designing Real-World Systems :
The Best of the Rest
Review: How the pieces fit together
Understanding your requirements
Sample Application: consume web server logs and keep tracks of top-sellers
Sample application: serving movie recommendations to a website
[Exercise] Design a system to report web sessions per day
Exercise solution: Design a system to count daily sessions

Learning More :
Books and online resources
Bonus lecture: Discounts on my other big data / data science courses!