Perform advanced streaming data transformations with Apache Spark and Kafka in Azure HDInsight
In this module, you learn how to create real-time streaming data analytics pipelines and applications on the cloud by using Azure HDInsight with Apache Kafka and Apache Spark.
Learning objectives
At the end of this module, you understand:
- When to use Apache Spark and Kafka with HDInsight.
- Spark Structured Streaming.
- The architecture of a Kafka and Spark solution.
- How to provision HDInsight, create a Kafka producer, and stream Kafka data to a Jupyter notebook.
- How to replicate data to a secondary cluster.
Prerequisites
The following prerequisite should be completed:
- Successfully log in to the Azure portal.
- Understand the Azure storage options.
- Understand the Azure compute options.
- Create and configure a HDInsight Cluster in the Azure portal.