DP-100: Designing and Implementing a Data Science Solution on Azure

Learn how to operate machine learning solutions at cloud scale using Azure Machine Learning. This course teaches you to leverage your existing knowledge of Python and machine learning to manage data ingestion and preparation, model training and deployment, and machine learning solution monitoring in Microsoft Azure. The course had been updated with new modules on how to use Azure Machine Learning with open-source models from Open AI, Hugging Face and Meta and others. In these new modules we will explore topics like transfer learning, fine-tuning, and prompt engineering these pre-trained models and how to deploy them. We will also explore how to perform MLOps and govern the models in production.

IMPORTANT NOTICE!

This course is focused on Azure and does not teach the student the basics on how to do data science. But when it comes to the new updated models on pre-trained models from companies like Hugging Face and Meta we will explore the building blocks – with a focus on Large Language Models and Transformers – with technologies for NLP, LLM, transfer learning and prompt engineering as well.

Audience

This course is designed for data scientists and data engineers with existing knowledge of Python and a basic understanding of machine learning frameworks like scikit-learn, who want to build and operate machine learning solutions in the cloud.

Prerequisites

Before attending this course, students must have fundamental knowledge of cloud computing concepts, and some experience in general data science and machine learning tools and techniques.

Specifically:

  • Creating cloud resources in Microsoft Azure
  • Using Python to explore and visualize data
  • A basic understanding on how to train and evaluate machine learning models using common frameworks like scikit-learn

If you are completely new to data science and machine learning, please complete Microsoft Azure AI Fundamentals first.

Course content

Module 1: Getting Started with Azure Machine Learning

In this module, you will learn how to provision an Azure Machine Learning workspace and use it to manage machine learning assets such as data, compute, model training code, logged metrics, and trained models. You will learn how to use the web-based Azure Machine Learning studio interface as well as the Azure Machine Learning SDK and developer tools like Visual Studio Code and Jupyter Notebooks to work with the assets in your workspace.

Lessons:

  • Introduction to Azure Machine Learning
  • Working with Azure Machine Learning

Module 2: No-Code Machine Learning

This module introduces the Automated Machine Learning and Designer visual tools, which you can use to train, evaluate, and deploy machine learning models without writing any code.

Lessons:

  • Automated Machine Learning
  • Azure Machine Learning Designer

Module 3: Running Experiments and Training Models

In this module, you will get started with experiments that encapsulate data processing and model training code and use them to train machine learning models.

Lessons:

  • Introduction to Experiments
  • Training and Registering Models

Module 4: Working with Data

Data is a fundamental element in any machine learning workload, so in this module, you will learn how to create and manage datastores and datasets in an Azure Machine Learning workspace, and how to use them in model training experiments.

Lessons:

  • Working with Datastores
  • Working with Datasets

Module 5: Working with Compute

One of the key benefits of the cloud is the ability to leverage compute resources on demand and use them to scale machine learning processes to an extent that would be infeasible on your own hardware. In this module, you'll learn how to manage experiment environments that ensure consistent runtime consistency for experiments, and how to create and use compute targets for experiment runs.

Lessons:

  • Working with Environments
  • Working with Compute Targets

Module 6: Orchestrating Operations with Pipelines

Now that you understand the basics of running workloads as experiments that leverage data assets and compute resources, it's time to learn how to orchestrate these workloads as pipelines of connected steps. Pipelines are key to implementing an effective Machine Learning Operationalization (ML Ops) solution in Azure, so you'll explore how to define and run them in this module.

Lessons:

  • Introduction to Pipelines

Module 7: Deploying and Consuming Models

Models are designed to help decision making through predictions, so they're only useful when deployed and available for an application to consume. In this module learn how to deploy models for real-time inferencing, and for batch inferencing.

Lessons:

  • Real-time Inferencing
  • Batch Inferencing
  • Continuous Integration and Delivery

Module 8: Training Optimal Models

By this stage of the course, you've learned the end-to-end process for training, deploying, and consuming machine learning models; but how do you ensure your model produces the best predictive outputs for your data? In this module, you'll explore how you can use hyperparameter tuning and automated machine learning to take advantage of cloud-scale compute and find the best model for your data.

Lessons:

  • Hyperparameter Tuning
  • Automated Machine Learning

Module 9: Responsible Machine Learning

Data scientists have a duty to ensure they analyse data and train machine learning models responsibly; respecting individual privacy, mitigating bias, and ensuring transparency. This module explores some considerations and techniques for applying responsible machine learning principles.

Lessons:

  • Differential Privacy
  • Model Interpretability
  • Fairness

Module 10: Monitoring Models

After a model has been deployed, it's important to understand how the model is being used in production, and to detect any degradation in its effectiveness due to data drift. This module describes techniques for monitoring models and their data.

Lessons:

  • Monitoring Models
  • Monitoring Data Drift

Module 11: Pre-trained models and Transfer Learning

Transfer learning is a technique that enables models to leverage pre-existing knowledge to solve new problems more efficiently. In the realm of natural language processing (NLP), transfer learning has become increasingly popular in recent years due to the advent of large language models (LLMs), such as GPT and BERT.

LLMs have revolutionized the field of NLP by learning from massive amounts of data to generate coherent and human-like text. However, training these models can be computationally expensive and time-consuming. Transfer learning provides a solution to this problem by enabling models to leverage pre-existing knowledge to learn new tasks more efficiently.

Lessons:

  • An understanding of transformer models
  • What is transfer learning
  • Transfer learning in Azure

Module 12: Exploring pre-trained models from companies like Hugging Face in Azure

In this module we will explore different types of pre-trained models from companies like Open AI, Hugging Face and Meta in Azure, and we will experiment with different ways of doing transfer learning with different pre-trained models.

Lessons:

  • Explore different kinds of pre-trained models in Azure
  • Fine-tuning on a collection of pre-trained models in Azure
  • Deploying and monitoring fine-tuned models in Azure

Module 13: Prompt engineering

Prompt engineering is a relatively new discipline for developing and optimizing prompts to efficiently use language models (LMs) for a wide variety of applications and research topics. Prompt engineering skills help to better understand the capabilities and limitations of large language models (LLMs).

In this module we will explore different ways of doing prompt engineering.
In this module you will learn different techniques for prompt engineering and prompt flow for pre-trained models like GPT.

We will explore models like GPT-3.5, and GPT-4 from OpenAI that are prompt-based. With prompt-based models, the user interacts with the model by entering a text prompt, to which the model responds with a text completion. This completion is the model’s continuation of the input text.

While these models are extremely powerful, their behaviour is also very sensitive to the prompt. This makes prompt construction an important skill to develop.

We will explore Azure Machine Learning prompt flow. Azure Machine Learning prompt flow is a development tool designed to streamline the entire development cycle of AI applications powered by Large Language Models (LLMs). As the momentum for LLM-based AI applications continues to grow across the globe, Azure Machine Learning prompt flow provides a comprehensive solution that simplifies the process of prototyping, experimenting, iterating, and deploying your AI applications.

Module 14: Fine-tuning and working with your own data

In this module you will learn different techniques for customizing the pre-trained models like GPT for fine-tuning, embedding, working with your own data, and using content filters for detecting and preventing the output of harmful content.

You will learn how to customize the pre-trained models through REST APIs, Python SDK, or the web-based interface in the Azure OpenAI Studio.

We will also explore Azure OpenAI - a service that enables you to run supported chat models such as GPT-35-Turbo and GPT-4 on your data without needing to train or fine-tune models. Running models on your data enables you to chat on top of and analyse your data with greater accuracy and speed. By doing so, you can unlock valuable insights that can help you make better business decisions, identify trends and patterns, and optimize your operations. One of the key benefits of Azure OpenAI on your data is its ability to tailor the content of conversational AI.

Certification

This course is recommended as preparation for exam DP-100, which leads to the Azure Data Scientist Associate certification.