Building Batch Data Analytics Solutions on AWS

In this course, you will learn how to build batch data analytics solutions using Amazon EMR, an enterprise-grade managed service for Apache Spark and Apache Hadoop. The course focuses on designing, implementing, securing, and operating batch analytics pipelines on AWS.

The training covers data collection, ingestion, cataloging, storage, and processing in the context of Spark and Hadoop workloads. You work with Amazon EMR and learn how it integrates with open-source projects such as Apache Hive, Hue, and HBase, as well as AWS services including AWS Glue and AWS Lake Formation. The course also covers the use of EMR Notebooks for analytics and machine learning workloads, along with best practices for security, performance, and cost management.

Course objectives

In this course, you will learn to:

  • Compare the features and benefits of data warehouses, data lakes, and modern data architectures
  • Design and implement a batch data analytics solution
  • Optimize data storage using appropriate techniques such as compression
  • Select and deploy suitable options for ingesting, transforming, and storing data
  • Choose appropriate instance and node types, clusters, auto scaling, and network topology for business use cases
  • Understand how storage and processing choices affect analysis and visualization
  • Secure data at rest and in transit
  • Monitor analytics workloads and remediate issues
  • Apply cost management best practices

Prerequisites

We recommend that attendees of this course have:

  • Completed either AWS Technical Essentials or Architecting on AWS
  • Completed either Building Data Lakes on AWS or Getting Started with AWS Glue

Students with at least one year of experience managing open-source data frameworks such as Apache Spark or Apache Hadoop will benefit from this course.

Target audience

This course is intended for:

  • Data platform engineers
  • Architects and operators who build and manage data analytics pipelines

Overview of data analytics and the data pipeline

You are introduced to common analytics use cases and how data pipelines are used to support batch analytics workloads.

Introduction to Amazon EMR

This section covers Amazon EMR cluster architecture, cost management strategies, and includes an interactive demonstration of launching an EMR cluster.

Data analytics pipeline using Amazon EMR

You learn techniques for ingesting and storing data, including storage optimization strategies for batch analytics workloads.

High-performance batch analytics with Apache Spark

The course covers Apache Spark concepts and use cases on Amazon EMR, working with Spark shells and notebooks, and includes hands-on labs for low-latency analytics.

Batch processing with Apache Hive

You process and analyse batch data using Apache Hive on Amazon EMR and are introduced to Apache HBase through practical labs.

Serverless data processing

This section explores serverless processing and orchestration using AWS Glue and AWS Step Functions with EMR workloads.

Security and monitoring

You learn how to secure EMR clusters, monitor workloads, and troubleshoot performance and operational issues.

Designing batch data analytics solutions

The course concludes with designing batch analytics workflows and exploring modern data architectures on AWS.

Practical information

Duration: 1 day
Price: 9 900 NOK
Course level: Intermediate

FAQ

Er dette et sertifiseringskurs?
Nei, dette er et opplæringskurs og gir ingen formell sertifisering.

Er kurset praktisk rettet?
Ja, kurset inkluderer presentasjoner, interaktive demoer, praktiske laber, diskusjoner og klasseøvelser.

Hvilke AWS-tjenester jobber man med i kurset?
Kurset dekker blant annet Amazon EMR, AWS Glue, AWS Lake Formation og AWS Step Functions.

Passer kurset for deltakere uten erfaring med Spark eller Hadoop?
Noe erfaring med Apache Spark eller Hadoop anbefales for å få fullt utbytte av kurset.

Handler kurset om moderne dataarkitekturer?
Ja, kurset setter batch analytics inn i konteksten av moderne dataarkitekturer på AWS.

Andre relevante kurs

17. mars
1 dager
Classroom Virtual
18. mars
3 dager
Classroom Virtual
25. mars
3 dager
Classroom Virtual
8. april
3 dager
Classroom Virtual