Hadoop Operations and Cluster Management

Data has become an integral part of every organization, be it small or large; and maintaining it in a proper form has become difficult. Hadoop is a revolutionary opensource framework for software programming that took the data storage and processing to next level. Hadoop platform is used for structuring data and solves formatting problem for subsequent analytic purposes. Hadoop Administration is one of the specialization areas of Hadoop framework which helps in Hadoop installation, security, configuration, designing, testing and building Hadoop environments. This course is aimed at IT professionals who are new to Hadoop and want to acquire new skills in administering Hadoop cluster. Course focuses on Hadoop basics, installation, configuration, monitoring, backups and other cluster administration tasks. Experience with Linux/Unix administration is required.

Audience:

  • System administrators who have zero experience with Hadoop and want to
    learn new skills
  • IT professionals with some experience writing applications for Hadoop and want
    to dig deeper into administration part

Prerequisites:

  • General troubleshooting knowledge
  • Solid Linux/Unix command line knowledge

Course Objectives:

Participants will learn:

  • How Hadoop solves the Big Data problems
  • Hadoop cluster architecture, core components
  • Cluster management options
  • How to deploy and a cluster using Apache Ambari
  • Cluster sizing, hardware, software and network considerations
  • How to manage, configure and install components following hands-on labs

Course content:

Introduction

  • Big Data and Hadoop introduction
  • Hadoop architecture
  • Ecosystem and components

Hadoop Distributed File System

  • HDFS architecture, NameNode and Data Node
  • Manage HDFS storage
  • Configure HDFS storage
  • HDFS quotas and snapshots
  • WebHDFS
  • High Availability

YARN

  • Architecture, Node Manager and Resource Manager
  • Managing YARN using Ambari
  • Running YARN applications
  • Capacity Scheduler
  • Queues
  • High Availability

Ambari

  • Working with Ambari Web UI
  • Adding and deleting worker Nodes
  • Monitoring
  • Handling Alerts
  • Rack Awareness

Installation

  • Plan, configure and install Hadoop cluster in the cloud
  • Best practices

Materials:

  • Course is 30% - 40% practical, virtual environments are provided for all
    candidates
  • Printed learning materials
  • Course is practical, bring your own laptop

Other relevant courses

22. November
2 days
20. November
1 days
14. November
2 days