About Big-Data

Hadoop is an Apache project (i.e. an open source software) to store & process Big Data. Hadoop stores Big Data in a distributed & fault tolerant manner over commodity hardware. Afterwards, Hadoop tools are used to perform parallel data processing over HDFS (Hadoop Distributed File System).

Why this course is required

Big Data is one of the accelerating and most promising fields, considering all the technologies available in the IT market today. In order to take benefit of these opportunities, you need a structured training with the latest curriculum as per current industry requirements and best practices.Besides strong theoretical understanding, you need to work on various real world big data projects using different Big Data and Hadoop tools as a part of solution strategy.

Pre-requisites for Big-Data

Course Contents

    • Master the concepts of HDFS (Hadoop Distributed File System)
    • YARN (Yet Another Resource Negotiator)
    • Understand MapReduce Framework
    • Implement complex business solution using MapReduce
    • Learn data ingestion techniques using Sqoop and Flume
    • Perform ETL operations & data analytics using Pig and Hive
    • Implementing Partitioning, Bucketing and Indexing in Hive
    • Understand HBase, i.e a NoSQL Database in Hadoop, HBase Architecture & Mechanisms
    • Integrate HBase with Hive
    • Schedule jobs using Oozie
    • Implement best practices for Hadoop development
    • Understand Apache Spark and its Ecosystem
    • Learn how to work with RDD in Apache Spark
    • Work on real world Big Data Analytics Project
    • Work on a real-time Hadoop cluster.