Hadoop

What is Hadoop?

Hadoop is an open source framework used to store, process and analyse huge volumes of data, both structured and unstructured. It consists of computer clusters which can do distributed and parallel computing. Its distributed file system allows high data transfer rate and thus enabling faster and efficient processing. This feature also helps to circumvent node failure and data loss. Hadoop was created by Doug Cutting and Mike Cafarella, inspired from MapReduce framework by Google. And it was first publicly available in 2012 as part of Apache Project by Apache Software Foundation. In recent years, Hadoop has emerged as one of the important base for Big Data analytics which was rapidly emerging as the next big thing in software field.

Why to enrol in Hadoop training at Trishana Technologies, Bangalore?

Hadoop training at Trishana Technologies, has something special other than better facilities and the best faculties – Vision. Our syllabus includes elaborate classes on Big Data, HDFS and YARN which are useful in other domains as well. We will also teach you other apache components like Pig, Sqoop, Flume etc which will be useful in your Hadoop career. Our staffs are well experienced and helping minded. Our infrastructure is top-notch and provides you the latest versions of Hadoop and its tools for practice. Our support team also installs Hadoop on your personal laptop if you need to practice at home. We also arrange for video conferences and webinars with industry leaders to maximise your exposure to software industry. We provide unmatched placement support with 100% placement assistance. We also provide guidance to our students to clear Cloudera Developer Certification exam.

Our Hadoop training syllabus:

SECTION 1
INTRODUCTION

  • Big Data
  • 3Vs
  • Role of Hadoop in Big data
  • Hadoop and its ecosystem
  • Overview of other Big Data Systems
  • Requirements in Hadoop
  • Use cases of Hadoop

SECTION 2
HDFS

  • Design
  • Architecture
  • Data Flow
  • CLI Commands
  • Java API
  • Data Flow Archives
  • Data Integrity
  • WebHDFS
  • Compression

SECTION 3
MAPREDUCE

  • Theory
  • Data Flow (Map – Shuffle – Reduce)
  • Programming [Mapper, Reducer, Combiner, Partitioner]
  • Writables
  • Input format
  • Output format
  • Streaming API

SECTION 4
ADVANCED MAPREDUCE PROGRAMMING

  • Counters
  • Custom Input Format
  • Distributed Cache
  • Side Data Distribution
  • Joins
  • Sorting
  • Tool Runner
  • Debugging
  • Performance Fine tuning

SECTION 5
ADMINISTRATION – Information required at Developer level

  • Hardware Considerations – Tips and Tricks
  • Schedulers
  • Balancers
  • Name Node Failure and Recovery

SECTION 6
HBase

  • NoSQL vs SQL
  • CAP Theorem
  • Architecture
  • Configuration
  • Role of Zookeeper
  • Java Based APIs
  • MapReduce Integration
  • Performance Tuning

SECTION 7
HIVE

  • Architecture
  • Tables
  • DDL – DML – UDF – UDAF
  • Partitioning
  • Bucketing
  • Hive-Hbase Integration
  • Hive Web Interface
  • Hive Server

SECTION 8
OTHER HADOOP ECOSYSTEMS

  • Pig (Pig Latin , Programming)
  • Sqoop (Need – Architecture ,Examples)
  • Introduction to Components (Flume, Oozie,ambari)

Career opportunities in Hadoop

  • Hadoop Developer
  • Big Data Architect – Hadoop
  • Hadoop Admin
  • Data Engineer – Hadoop
  • Data warehousing Developer
Available for weekends
and weekdays
Register for Demo
Class