ORACLE

Cloudera Developer Training for Spark and Hadoop

Scala and Python developers new to Hadoop will learn key concepts and expertise participants need to ingest and process data on a Hadoop cluster using the most up-to-date tools and techniques, including Apache Spark, Impala, Hive, Flume, and Sqoop.

 

Overview

 

This four-day hands-on training course delivers the key concepts and expertise participants need to ingest and process data on a Hadoop cluster using the most up-to-date tools and techniques. Employing Hadoop ecosystem projects such as Spark (including Spark Streaming and Spark SQL), Flume, Kafka, and Sqoop, this training course is the best preparation for the real-world challenges faced by Hadoop developers.

Hands-On Hadoop

 

Through instructor-led discussion and interactive, hands-on exercises, participants will learn Apache Spark and how it integrates with the entire Hadoop ecosystem, including:

 

    Distribute, store, and process data in a Hadoop cluster

    Write, configure, and deploy Apache Spark applications on a Hadoop cluster

    Use the Spark shell for interactive data analysis

    Process and query structured data using Spark SQL

    Use Spark Streaming to process a live data stream

    Use Flume and Kafka to ingest data for Spark Streaming

 

Audience & Prerequisites

 

This course is designed for developers and engineers who have programming experience. Apache Spark examples and hands-on exercises are presented in Scala and Python, so the ability to program in one of those languages is required. Basic familiarity with the Linux command line is assumed. Basic knowledge of SQL is helpful; prior knowledge of Hadoop is not required.



Course Application