Perform advanced streaming data transformations with Apache Spark and Kafka in Azure HDInsight

Module
11 Units

Intermediate

Data Engineer

Data Scientist

Azure HDInsight

In this module, you learn how to create real-time streaming data analytics pipelines and applications on the cloud by using Azure HDInsight with Apache Kafka and Apache Spark.

Learning objectives

At the end of this module, you understand:

When to use Apache Spark and Kafka with HDInsight.
Spark Structured Streaming.
The architecture of a Kafka and Spark solution.
How to provision HDInsight, create a Kafka producer, and stream Kafka data to a Jupyter notebook.
How to replicate data to a secondary cluster.

Prerequisites

The following prerequisite should be completed:

Successfully log in to the Azure portal.
Understand the Azure storage options.
Understand the Azure compute options.
Create and configure a HDInsight Cluster in the Azure portal.

Introduction min
Use HDInsight Spark and Kafka min
Stream data with Apache Kafka min
Describe Spark structured streaming min
Create a Kafka and Spark architecture min
Exercise - Provision HDInsight to perform advanced streaming data transformations min
Exercise - Create the Kafka producer min
Exercise - Stream Kafka data to a Jupyter notebook and window the data min
Replicate data to a secondary cluster min
Knowledge check min
Summary min