Use Qlik to replicate mainframe and midrange data to Azure
This solution uses an on-premises instance of Qlik to replicate on-premises data sources to Azure in real time.
Note
Pronounce "Qlik" like "click."
Apache® and Apache Kafka® are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. No endorsement by The Apache Software Foundation is implied by the use of these marks.
Architecture
Download a Visio file of this architecture.
Workflow
Host agent: The host agent on the on-premises system captures change log information from Db2, Information Management System (IMS), and Virtual Storage Access Method (VSAM) data stores and passes it to the Qlik replication server.
Replication server: The Qlik replication server software passes the change log information to Kafka and Azure Event Hubs. In this example, Qlik is on-premises, but you can deploy it on a virtual machine in Azure.
Stream ingestion: Kafka and Event Hubs provide message brokers to receive and store change log information.
Kafka Connect: The Kafka Connect API receives data from Kafka to update Azure data stores like Azure Data Lake Storage, Azure Databricks, and Azure Synapse Analytics.
Data Lake Storage: Data Lake Storage is a staging area for the change log data.
Azure Databricks: Azure Databricks processes the change log data and updates the corresponding files on Azure.
Azure data services: Azure provides the following efficient data storage services.
Relational database services:
- SQL Server on Azure Virtual Machines
- Azure SQL Database
- Azure SQL Managed Instance
- Azure Database for PostgreSQL
- Azure Database for MySQL
- Azure Cosmos DB
There are many factors to consider when you choose a data storage service. Consider the type of workload, cross-database queries, two-phase commit requirements, the ability to access the file system, amount of data, required throughput, and latency.
Azure Cosmos DB: Azure Cosmos DB is a NoSQL database that provides quick response, automatic scalability, and guaranteed speed at any scale.
Azure Synapse Analytics: Azure Synapse Analytics is an analytics service that combines data integration, enterprise data warehousing, and big data analytics. Use it to query data by using either serverless or dedicated resources at scale.
Microsoft Fabric: Microsoft Fabric is an all-in-one analytics solution for enterprises. It covers everything from data movement to data science, real-time analytics, and business intelligence. It provides a comprehensive suite of services, including data lake, data engineering, and data integration.
Components
This architecture consists of several Azure cloud services and is divided into four categories of resources: networking and identity, application, storage, and monitoring. The following sections describe the services for each resource and their roles.
Networking and identity
When you design application architecture, it's crucial to prioritize networking and identity components to help ensure security, performance, and manageability during interactions over the public internet or private connections.
Azure ExpressRoute extends your on-premises networks into cloud services provided by Microsoft over a private connection from a connectivity provider. Use ExpressRoute to establish connections to cloud services such as Azure and Microsoft 365.
Azure VPN Gateway is a specific type of virtual network gateway that sends encrypted traffic between an Azure virtual network and an on-premises location over the public internet.
Microsoft Entra ID is an identity and access management service that can synchronize with an on-premises active directory.
Application
Azure provides managed services that support more secure, scalable, and efficient application deployment. This architecture uses application tier services that can help you optimize your application architecture.
Event Hubs is a big data streaming platform and event ingestion service that can store Db2, IMS, and VSAM change data messages. It can receive and process millions of messages per second. You can transform and store event hub data by using a real-time analytics provider or a custom adapter.
Apache Kafka is an open-source distributed event streaming platform that's used for high-performance data pipelines, streaming analytics, data integration, and mission-critical applications. It can be easily integrated with Qlik data integration to store Db2 change data.
Data Lake Storage provides a data lake for storing the processed on-premises change log data.
Azure Databricks is a cloud-based data engineering tool built on Apache Spark. It can process and transform massive quantities of data. You can explore the data by using machine learning models. Jobs can be written in R, Python, Java, Scala, and Spark SQL.
Storage and databases
This architecture addresses scalable and more secure cloud storage as well as managed databases for flexible and intelligent data management.
Azure Storage is a set of massively scalable and more secure cloud services for data, apps, and workloads. It includes Azure Files, Azure Table Storage, and Azure Queue Storage. Azure Files is an effective tool for migrating mainframe workloads.
Azure SQL is a family of SQL cloud databases that provides flexible options for application migration, modernization, and development. This family includes:
Azure Cosmos DB is a fully managed NoSQL database service that has open-source APIs for MongoDB and Cassandra. You can use it to migrate mainframe nontabular data to Azure.
Azure Database for PostgreSQL is a fully managed, intelligent, and scalable PostgreSQL that has native connectivity with Azure services.
Azure Database for MySQL is a fully managed, scalable MySQL database.
Monitoring
Monitoring tools provide comprehensive data analysis and valuable insights into application performance.
Azure Monitor is a comprehensive solution for collecting, analyzing, and acting on telemetry from cloud and on-premises environments. Comprende:
Application Insights, for analyzing and presenting telemetry.
Azure Monitor Logs, which collects and organizes log and performance data from monitored resources. You can combine data from sources like Azure platform logs, virtual machine agents, and application performance into one workspace for analysis. The query language enables analysis of your records.
Log Analytics, which can query Azure Monitor Logs. A powerful query language lets you join data from multiple tables, aggregate large sets of data, and perform complex operations with minimal code.
Alternatives
The preceding diagram shows Qlik installed on-premises. This approach is a recommended best practice to keep Qlik close to the on-premises data sources. An alternative is to install Qlik in the cloud on an Azure virtual machine.
Qlik Data Integration can deliver data directly to Azure Databricks without going through Kafka or an event hub.
Qlik Data Integration can't replicate data directly to Azure Cosmos DB, but you can integrate Azure Cosmos DB with an event hub by using event-sourcing architecture.
Scenario details
Many organizations use mainframe and midrange systems to run demanding and critical workloads. Most applications use shared databases, often across multiple systems. In this environment, modernizing to the cloud means that on-premises data must be provided to cloud-based applications. Therefore, data replication becomes an important modernization tactic.
The Qlik Data Integration platform includes Qlik Replicate, which does data replication. It uses change data capture to replicate on-premises data stores in real time to Azure. The change data can come from Db2, IMS, and VSAM change logs. This replication technique eliminates inconvenient batch bulk loads. This solution uses an on-premises instance of Qlik to replicate on-premises data sources to Azure in real time.
Potential use cases
This solution might be appropriate for:
Hybrid environments that require replication of data changes from a mainframe or midrange system to Azure databases.
Online database migration from Db2 to an Azure SQL database with little downtime.
Data replication from various on-premises data stores to Azure for consolidation and analysis.
Considerations
These considerations implement the pillars of the Azure Well-Architected Framework, which is a set of guiding tenets that you can use to improve the quality of a workload. For more information, see Well-Architected Framework.
Reliability
Reliability helps ensure that your application can meet the commitments that you make to your customers. For more information, see Design review checklist for Reliability.
Qlik Data Integration can be configured in a high-availability cluster.
The Azure database services support zone redundancy and can be designed to fail over to a secondary node during a maintenance window or if an outage occurs.
Security
Security provides assurances against deliberate attacks and the misuse of your valuable data and systems. For more information, see Design review checklist for Security.
ExpressRoute provides a private and efficient connection to Azure from on-premises, but you can use a site-to-site VPN instead.
Azure resources can be authenticated by using Microsoft Entra ID, and permissions are managed through role-based access control.
Azure database services support various security options, such as:
Data encryption at rest.
Dynamic data masking.
Always-encrypted databases.
For more information, see Azure security documentation.
Cost Optimization
Cost Optimization focuses on ways to reduce unnecessary expenses and improve operational efficiencies. For more information, see Design review checklist for Cost Optimization.
Use the Azure pricing calculator to estimate costs for your implementation.
Operational Excellence
Operational Excellence covers the operations processes that deploy an application and keep it running in production. For more information, see Design review checklist for Operational Excellence.
You can combine Application Insights and Log Analytics features to monitor the health of Azure resources. You can set alerts so that you can manage problems proactively.
Performance Efficiency
Performance Efficiency refers to your workload's ability to scale to meet user demands efficiently. For more information, see Design review checklist for Performance Efficiency.
Azure Databricks, Data Lake Storage, and other Azure database services have autoscaling capabilities. For more information, see Autoscaling.
Contributors
Microsoft gestisce questo articolo. I collaboratori seguenti hanno scritto questo articolo.
Autori principali:
- Nithish Aruldoss | Engineering Architect
- Ashish Khandelwal | Principal Engineering Architecture Manager
Per visualizzare i profili LinkedIn non pubblici, accedere a LinkedIn.
Next steps
- Qlik Data Integration platform
- Unleash new Azure analytics initiatives (PDF data sheet)
- What is ExpressRoute?
- Event Hubs: A real-time data streaming platform with native Apache Kafka support
- Introduction to Storage
- Che cos'è il database SQL di Azure?
- Azure Cosmos DB
- Introduction to Application Insights with OpenTelemetry
- Azure Monitor Logs overview
- Log queries in Azure Monitor
- Contact us (select to create email)