Hadoop 101

This beginner Apache Hadoop course introduces you to Big Data concepts, and teaches you how to perform distributed processing of large data sets with Hadoop.

About This Course

Learn the basics of

Apache Hadoop, a free, open source, Java-based programming framework. Why was it invented?
Learn about Hadoop’s architecture and core components, such as MapReduce and the Hadoop Distributed File System (HDFS).
Learn how to add and remove nodes from Hadoop clusters, how to check available disk space on each node, and how to modify configuration parameters.
Learn about other Apache projects that are part of the Hadoop ecosystem, including Pig, Hive, HBase, ZooKeeper, Oozie, Sqoop, Flume, among others. BDUprovides separate courses on these other projects, but we recommend you start here.

Course Syllabus

Module 1 – Introduction to Hadoop
Understand what Hadoop is
Understand what Big Data is
Learn about other open source software related to Hadoop
Understand how Big Data solutions can work on the Cloud
Module 2 – Hadoop Architecture
Understand the main Hadoop components
Learn how HDFS works
List data access patterns for which HDFS is designed
Describe how data is stored in an HDFS cluster
Module 3 – Hadoop Administration
Add and remove nodes from a cluster
Verify the health of a clusterStart and stop a clusters components
Modify Hadoop configuration parameters
Setup a rack topology
Module 4 – Hadoop Components
Describe the MapReduce philosophy
Explain how Pig and Hive can be used in a Hadoop environment
Describe how Flume and Sqoop can be used to move data into Hadoop
Describe how Oozie is used to schedule and control Hadoop job execution