Data streaming with AKS

Azure App Service
Azure API Management
Azure Container Registry
Azure Cache for Redis
Azure Cosmos DB

Solution ideas

This article describes a solution idea. Your cloud architect can use this guidance to help visualize the major components for a typical implementation of this architecture. Use this article as a starting point to design a well-architected solution that aligns with your workload's specific requirements.

This article presents a solution for using Azure Kubernetes Service (AKS) to quickly process and analyze a large volume of streaming data from devices.

*ApacheĀ®, Apache Kafka, and Apache Spark are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. No endorsement by The Apache Software Foundation is implied by the use of these marks. Splunk is a registered trademark of Cisco. *

Architecture

Architecture diagram that shows how streaming data from devices is ingested, processed, and analyzed.

Download a Visio file of this architecture.

Dataflow

  1. Sensors generate data and stream it to Azure API Management.
  2. An AKS cluster runs microservices that are deployed as containers behind a service mesh. The containers are built by using a DevOps process. The container images are stored in Azure Container Registry.
  3. An ingest service in AKS stores data in Azure Cosmos DB.
  4. Asynchronously, an analysis service in AKS receives the data and streams it to Apache Kafka on Azure HDInsight.
  5. Data scientists use machine learning models on Azure HDInsights and the Splunk platform to analyze the data.
  6. A processing service in AKS processes the data and stores the results in Azure Database for PostgreSQL. The service also caches the data in Azure Cache for Redis.
  7. A web app that runs in Azure App Service creates visualizations of the results.

Components

The solution uses the following key technologies:

Scenario details

This solution is a good fit for a scenario that involves millions of data points, where data sources include Internet of Things (IoT) devices, sensors, and vehicles. In such a situation, processing the large volume of data is one challenge. Quickly analyzing the data is another demanding task, as organizations seek to gain insight into complex scenarios.

Containerized microservices in AKS form a key part of the solution. These self-contained services ingest and process the real-time data stream. They also scale as needed. The containers' portability makes it possible for the services to run in different environments and process data from multiple sources. To develop and deploy the microservices, DevOps and continuous integration/continuous delivery (CI/CD) are used. These approaches shorten the development cycle.

To store the ingested data, the solution uses Azure Cosmos DB. This database elastically scales throughput and storage, which makes it a good choice for large volumes of data.

The solution also uses Apache Kafka. This low-latency streaming platform handles real-time data feeds at extremely high speeds.

Another key solution component is Azure HDInsight, which is a managed cloud service that enables you to efficiently process massive amounts of data using the most popular open source frameworks. Azure HDInsight simplifies running big data frameworks in large volume and velocity while using Apache Spark in Azure. Splunk helps in the data analysis process. Splunk creates visualizations from real-time data and provides business intelligence.

Potential use cases

This solution benefits the following areas:

  • Vehicle safety, especially in the automotive industry
  • Customer service in retail and other industries
  • Healthcare cloud solutions
  • Financial technology solutions in the finance industry

Next steps

Product documentation:

Microsoft training modules: