In this course, you will learn how to build batch data analytics solutions using Amazon EMR, an enterprise-grade managed service for Apache Spark and Apache Hadoop. The course focuses on designing, implementing, securing, and operating batch analytics pipelines on AWS.
The training covers data collection, ingestion, cataloging, storage, and processing in the context of Spark and Hadoop workloads. You work with Amazon EMR and learn how it integrates with open-source projects such as Apache Hive, Hue, and HBase, as well as AWS services including AWS Glue and AWS Lake Formation. The course also covers the use of EMR Notebooks for analytics and machine learning workloads, along with best practices for security, performance, and cost management.
Course objectivesIn this course, you will learn to:
PrerequisitesWe recommend that attendees of this course have:
Students with at least one year of experience managing open-source data frameworks such as Apache Spark or Apache Hadoop will benefit from this course.
Target audienceThis course is intended for:

You are introduced to common analytics use cases and how data pipelines are used to support batch analytics workloads.
This section covers Amazon EMR cluster architecture, cost management strategies, and includes an interactive demonstration of launching an EMR cluster.
You learn techniques for ingesting and storing data, including storage optimization strategies for batch analytics workloads.
The course covers Apache Spark concepts and use cases on Amazon EMR, working with Spark shells and notebooks, and includes hands-on labs for low-latency analytics.
You process and analyse batch data using Apache Hive on Amazon EMR and are introduced to Apache HBase through practical labs.
This section explores serverless processing and orchestration using AWS Glue and AWS Step Functions with EMR workloads.
You learn how to secure EMR clusters, monitor workloads, and troubleshoot performance and operational issues.
The course concludes with designing batch analytics workflows and exploring modern data architectures on AWS.

Duration: 1 day
Price: 9 900 NOK
Course level: Intermediate
Er dette et sertifiseringskurs?
Nei, dette er et opplæringskurs og gir ingen formell sertifisering.
Er kurset praktisk rettet?
Ja, kurset inkluderer presentasjoner, interaktive demoer, praktiske laber, diskusjoner og klasseøvelser.
Hvilke AWS-tjenester jobber man med i kurset?
Kurset dekker blant annet Amazon EMR, AWS Glue, AWS Lake Formation og AWS Step Functions.
Passer kurset for deltakere uten erfaring med Spark eller Hadoop?
Noe erfaring med Apache Spark eller Hadoop anbefales for å få fullt utbytte av kurset.
Handler kurset om moderne dataarkitekturer?
Ja, kurset setter batch analytics inn i konteksten av moderne dataarkitekturer på AWS.
