Hadoop Course Content

  1. Overview

    What is Hadoop?
    Hadoop is a free, Java-based programming framework that supports the processing of large data sets in a parallel distributed computing environment. It is part of the Apache project sponsored by the Apache Software Foundation. Hadoop makes it possible to run applications on systems with thousands of nodes involving thousands of terabytes of data which is not feasible with traditional systems.

    Why Hadoop?
    Today we live in a DATA world. Anything and everything that we do in the internet is becoming a source of business information for the organizations across the globe. The world has seen an exponential growth of data in the last decade or so and more so since last 3 years. Hence, the industry has started to look out for the ways to handle the data and get some business value out of it through data analytics. One such jail-break is “HADOOP”. Yes, Hadoop is here to stay and lead the industry in helping the business with numerous ways to store, retrieve and analyze data.

  2. Hadoop Content Outline

    INTRODUCTION
    • Big Data
    • 3Vs
    • Role of Hadoop in Big data
    • Hadoop and its ecosystem
    • Overview of other Big Data Systems
    • Requirements in Hadoop
    • UseCases of Hadoop

    HDFS
    • Design
    • Architecture
    • Data Flow
    • CLI Commands
    • Java API
    • Data Flow Archives
    • Data Integrity
    • WebHDFS
    • Compression

    MAPREDUCE
    • Theory
    • Data Flow (Map – Shuffle – Reduce)
    • Programming [Mapper, Reducer, Combiner, Partitioner]
    • Writables
    • InputFormat
    • Outputformat
    • Streaming API

    ADVANCED MAPREDUCE PROGRAMMING
    • Counters
    • CustomInputFormat
    • Distributed Cache
    • Side Data Distribution
    • Joins
    • Sorting
    • ToolRunner
    • Debugging
    • Performance Fine tuning

    ADMINISTRATION – Information required at Developer level
    • Hardware Considerations – Tips and Tricks
    • Schedulers
    • Balancers
    • NameNode Failure and Recovery

    HBase
    • NoSQLvs SQL
    • CAP Theorem
    • Architecture
    • Configuration
    • Role of Zookeeper
    • Java Based APIs
    • MapReduce Integration
    • Performance Tuning

    HIVE
    • Architecture
    • Tables
    • DDL – DML – UDF – UDAF
    • Partitioning
    • Bucketing
    • Hive-Hbase Integration
    • Hive Web Interface
    • Hive Server

    OTHER HADOOP ECOSYSTEMS
    • Pig (Pig Latin , Programming)
    • Sqoop (Need – Architecture ,Examples)
    • Introduction to Components (Flume, Oozie,ambari)

Navigate to

Advertisment

ad

X

Request a call back