Not able to process the blob data

Mani 45 Reputation points
2025-11-26T21:27:32.3633333+00:00

Hi,

We are extracting data from an on-premises DB2 LUW database and loading it into an Azure Database for PostgreSQL. The source tables contain BLOB data types used to store PDF files, and the data copy process is getting stuck on tables with BLOB attributes. We are using the 'Copy Command' write method in the sink, with both the write batch size and degree of copy parallelism set to 1, but the load is still not progressing. We are connecting the source DB2 and target Azure PostgreSQL database using a self-hosted integration runtime.

Regards,

Vijay

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
{count} votes

1 answer

Sort by: Most helpful
  1. VRISHABHANATH PATIL 1,725 Reputation points Microsoft External Staff Moderator
    2025-11-28T08:15:17.8666667+00:00

    Hi @Mani,

    Thank you for contacting Microsoft Q&A and detailed explanation of the issue. I’ve reviewed your set-up and wanted to share some clarity on what’s causing the behavior, along with a solution that will give you a stable and predictable loading experience going forward.

    -- Why the load is getting stuck

    Your configuration is correct — the issue isn’t related to batch size, parallelism, or the Self-Hosted Integration Runtime. The challenge comes from how the DB2 connector handles large BLOB (PDF) columns. When ADF pulls BLOB data from DB2 LUW, it cannot stream these large objects efficiently. Instead, the connector internally buffers them, which often results in the pipeline appearing “stuck,” especially when the PostgreSQL sink uses the COPY write method.

    So, the behavior you're seeing is due to a connector limitation, not a misconfiguration on your side.

    Recommended approach

    To ensure the data loads consistently and without hanging, the most reliable pattern is:

    -- Extract the BLOBs (PDFs) from DB2 into files on the source side Use DB2 utilities or a simple script/application to export each BLOB into a file. This avoids the connector’s streaming limitation.

    -- Upload those extracted PDF files to Azure Blob Storage This can be done directly via ADF using the same Self-Hosted IR.

    -- Load only the metadata into Azure Database for PostgreSQL Run your ADF copy as usual for the remaining table columns, excluding the BLOB column, and store a file path or reference instead of the binary content.

    This pattern offloads the large files into Blob Storage, which handles binary content very efficiently, while keeping PostgreSQL clean and performant.

    If storing BLOBs directly in PostgreSQL is mandatory

    You may switch the sink write method to Auto instead of COPY, although performance may vary with very large PDFs. The file-based approach above remains the most stable and scalable option.

    In summary

    Nothing is misconfigured in your pipeline — the behavior is due to a known limitation in how the DB2 connector handles large BLOBs. The recommended extract-to-file approach removes this bottleneck completely and ensures smooth, reliable data movement.

    0 comments No comments

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.