Categories

There are currently no items in your shopping cart.

User Panel

Forgot your password?.

Taming Big Data using Spark & Scala


Introduction :
Introduction
Practice Test Added for CCA 175 Certification

Big Data Platform Setup :
Different forms of Big Data Platforms
Installation on Windows or Cloudera
Browse through Shared Course content
Course -Additional Section Info

Use Windows/Cloudera VM provided in the course :
Setup VM
Setup IntelliJ on VM
WIndows HDFS Error & Fix

Simply setup IntelliJ and Spark and Practice only these two :
Setup Mysql & Basics
Setup Spark
Setup IntelliJ - Part 1
Setup IntelliJ - Part 2
Possible Issue in IntelliJ
SBT Setup forScala CLI/REPL
Winutil Setup in Windows for Hadoop like implementation

Learning Hadoop - Architecture, Concepts & Implementation :
Hadoop Architecture - Part 1 - Basics of Hadoop
Hadoop Architecture - Part 2 - Understanding NameNode and DataNode
Hadoop Architecture - Part 3 - Understanding Job Tracker & Task Tracker
Hadoop Refresh & File Systems
Hadoop Terminologies & Configurations in XML Files
Hadoop Commands on Windows or Windows VM - Part 1
Hadoop Commands on Windows or Windows VM - Part 2
Hadoop Commands on Cloudera Quick Start VM

Learning Sqoop - Architecture, Concepts & Implementation :
Sqoop Architecture
Sqoop Eval on Windows/ Windows VM
Sqoop Eval on Windows - Using -e & --query options
Sqoop List Database and List Tables - Used for creating Generic Code
Sqoop Import Command - Understanding and Analysing the Map-Reduce Functionality
Sqoop Import - Append Mode of Execution
Sqoop Import - Overwrite option & Different File Formats supported
Sqoop Import - Using Where & Columns Options to filter the data import
Sqoop Import - Executing User Specific Query with Where Clause
Sqoop Import - Incremental Load Execution
Sqoop Jobs - Create, List & Execute Sqoop Jobs
Sqoop Import All Option to Import all tables from Mysql to HDFS
Sqoop Import - Import from MySQL To Hive - Basic Import
Sqoop Import - Import from MySQL To Hive - More Options
Sqoop Import All - Import from MySQL to Hive using Import All
Sqoop Import - from Mainframe - A basic know how
Sqoop Export - Bring Data from HDFS to MySQL
Sqoop Assignment for Practice

Learning Hive - Architecture, Concepts & Implementation :
Hive - Introduction & Features
Hive - Architecture & Map-Reduce Execution
Hive Tables
Hive Partitioning & Bucketing - Concepts and Difference
Hive Query Language - Overview and Syntax
Hive QL - Practicals - Create Database & Tables & load sample data
Hive QL - Practicals - Load Huge Data to Managed Tables
Hive QL - Practicals - Creating and Loading Manged & External Tables
Hive QL - Practicals - Partitioning in Hive
Hive QL - Practicals - Bucketing in Hive
Hive User Defined Functions
Hive Performance Tuning Methods

Learning Flume - Architecture, Concepts & Implementation :
Flume - Concepts, Usage, Features & Advantages
Flume Architecture
Flume Data Flows , Contextual Routing & Other Concepts
Basics of Flume Configurations
Setup of Telnet in Windows
Flume Practicals - Simple Flume Job using NetCat
Flume Practicals - Flume Job using EXEC
Flume Practicals - Flume Job using Sequence Generator
Flume Practicals - Flume Job using Sequence Generator on HDFS
Flume Practicals - Flume Job using Twitter on Windows
Flume Practicals - Flume Job using Twitter on Cloudera
Flume Practicals - Flume Job using Twitter on File Channel
Flume Practicals - Flume Job using Twitter to Hive Sink
Flume Multiplexing - One Source, One Channel & Two Sink - Logger and HDFS Sinks
Industry Usage of Flume

Learning Kafka - Architecture, Concepts & Implementation :
Kafka Concepts and Architecture 1
Kafka Concepts and Architecture 2
Kafka Concepts and Architecture 3
Kafka Sample Execution on Cloudera
Flume and Kafka Together

Learning Scala in Command Line Interface (REPL) & IntelliJ :
Scala CLI/REPL on Windows & Cloudera with Mutable and Immutable Variables
Scala - Session 2 - Data Types Used & Applicable Functions
Scala - Session 3 - Range
Scala - Session 4 - For Loops
Scala While loops
Functions in Scala
Functions in Scala 2
Functions and Function Overloading in Scala
Object Oriented Programming in Scala using Classes & Objects
Scala Collections
Scala Input Output Files

Learning Spark - Architecture & Concepts :
Spark Architecture
Spark Components, Lazy Executions, DAG, SparkSQL ,Performance Tuning etc
Spark - Shuffles ,Coalesce, Repartition & Shared Variables

Spark RDD - Implementations :
Spark-shell execution Mode & RDD creation from HDFS & Local Files
Spark RDD Transformations - Filter, Sample, Union, Intersection, Distinct
Spark RDD Transformations - Map, FlatMap & Reduce
Spark RDD - Joining RDD
Spark RDD - Foreach & Splitting RDD String to Columns
Spark RDD - Removing Header from RDD, CountByKey, ReduceByKey, GroupByKey etc
Spark RDD - SortByKey, Coalesce, Repartition & Shared Variables
Spark RDD - Write the RDD to HDFS
Spark RDD in IntelliJ

Spark SQL, DataFrames & DataSets :
Spark SQL - Executing SQL & storing in Dataframes
Spark SQL - Functions & Executions.mp4
DataFrames - Read Files in DataFrames & Implement different DataFrames Functions
DataFrames - Read Files in DataFrames & Implement different DF Functions 2
DataFrames - Read from File , Write to File and Convert to SparkSQL Format
DataFrames - Dataframe columns type Conversion
Datasets - Convert/Create Datasets from DataFrames
Spark - Writing & Executing RDD, DataFrames & Datasets in IntelliJ

IntelliJ & Spark-Submit :
IntelliJ & Spark-Submit
Execute Spark Submit through Parameterized script
Spark-submit Config Options

Learning Spark Streaming - Concepts & Implementation :
Spark Streaming Concepts & DStream
Spark Streaming - Word Count Example on Telnet
Spark Streaming - Twitter Word Count
Spark Streaming - Flume with Spark Streaming -Read files from HDFS and WordCount
Spark Streaming - Flume and Spark Together - Pull Based Module

Additional Information :
SPARK - RDD VS DATAFRAME VS DATASETS
Spark - Catalyst Optimizer and Tungsten Engine
Spark - WebUI
Spark - Read JSON Files
Spark - Read & Write to Parquet & ORC Files

Project Scenarios :
Overall Big Data Project Structure
Project Scenario - Bring Data from BI Database to Data Lake in Layer1
Project Scenario 2
Project Solution - Scenario 1 & 2
Project Scenario 3 - Bring Files from Local File System to HDFS in Data lake
Project Scenario 4 - Create Generic Jobs to read data from Data lake to layer 2
Project Scenario 5 - Use SparkSQL to read data from layer 2 and write to Layer 3
Project Scenario 5 - Solution
Project Scenario 6 - Merge MultipleFiles
Project Scenario 6 - Solution
Project Scenario 7 - Compare two Dataframes Col by Col - Scenario & Solutions

CCA 175 Practice Questions :
Practice Test 1 : CCA 175 Spark & Hadoop Developer Exam
7 questions
Practice Test 2 - CCA 175 Spark & Hadoop Certification