Azure Data Factory (ADF) — where the Data Flow activity shows a status of "Successful" taking about 52 minutes to write to the SINK but the pipeline trigger still shows "In Progress" util duration 10 hours.

Khang Nguyen Ba 0 Reputation points
2024-11-11T14:48:26.8166667+00:00

I'm getting issue when using adf pipeline call to execute ADF Dataflow.

  • Dataflow name: df_attachment
  • Pipeline name: pl_ingest_attachment.

I'm using pl_ingest_attachment trigger to df_attachment.

but i see some issue:

  • i see status write to sink in inside dataflow is success taking about 52 minutes to write to the SINK (image 1) User's image
  • but i see outside status of adf pipeline trigger is still shows "In Progress" util duration 10 hours.. and timeout. help me explain why? User's image
Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
10,873 questions
{count} votes

2 answers

Sort by: Most helpful
  1. Smaran Thoomu 16,965 Reputation points Microsoft Vendor
    2024-11-11T17:33:23.9133333+00:00

    Hi @Khang Nguyen Ba
    Welcome to Microsoft Q&A platform and thanks for posting your query here.

    It looks like you are facing an issue where your Data Flow (df_attachment) successfully writes to the sink in 52 minutes, but the pipeline (pl_ingest_attachment) that triggers it still shows as "In Progress" for up to 10 hours before timing out.

    Potential Causes and Solutions:

    1. If there are other activities in your pipeline after the Data Flow, they may be causing the pipeline to remain "In Progress." Check if any subsequent tasks are running long or waiting for a condition.
    2. Sometimes, the Integration Runtime (IR) used for executing your Data Flow can experience delays or communication issues, causing the pipeline trigger to not recognize the Data Flow completion promptly.
    3. If your pipeline or activities have long timeout settings, it may appear as "In Progress" even after the Data Flow completes.
    4. Azure resources or connectivity issues could also lead to slower completion signals from the Data Flow to the main pipeline.

    In general, the lifecycle of an ADF pipeline that calls a Data Flow activity involves several stages, including initialization, execution, and completion. During initialization, the pipeline and its associated activities are validated and prepared for execution. During execution, the Data Flow activity is executed and data is processed according to the transformations and settings specified in the activity. Finally, during completion, the pipeline and its activities are cleaned up and any output or error messages are logged.

    To better understand the issue, could you please provide more information about your pipeline trigger and the settings you have configured? Additionally, have you checked the logs for any error messages or warnings that may provide more insight into the issue?

    I hope this helps. Please let us know if you have any further questions.


  2. Smaran Thoomu 16,965 Reputation points Microsoft Vendor
    2024-11-12T10:21:23.9166667+00:00

    Khang Nguyen Ba We appreciate your feedback and apologize for not providing a satisfactory answer to your query earlier.

    To better understand the issue, could you please provide more information about your pipeline and the settings you have configured? Additionally, have you checked the logs for any error messages or warnings that may provide more insight into the issue?

    Regarding the potential causes and solutions I mentioned earlier, since you only have one activity in your pipeline, we can eliminate the first possibility. However, it is possible that there are delays or communication issues between the Integration Runtime (IR) used for executing your Data Flow and the pipeline trigger. This can happen due to various reasons such as network latency, resource contention, or IR performance issues. To troubleshoot this, you can try the following steps:

    • Check the IR performance metrics and logs to see if there are any errors or warnings that may indicate issues with the IR.
    • Check the resource utilization of the IR and the pipeline trigger. If either of them is under heavy load, it can cause delays in the completion signals.
    • If feasible, run the Data Flow with a smaller dataset to see if the issue persists. This can help determine if the problem is related to data volume or processing time.
    • Try using a different IR to execute your Data Flow and see if the issue persists.

    Regarding the timeout settings, since you have set the timeout to 12 hours, it is possible that the pipeline trigger is still showing as "In Progress" even after the Data Flow completes. You can try reducing the timeout settings to a more reasonable value and see if it helps.

    Reference:

    If the issue continues, consider reaching out to MS Support. They can provide deeper insights into the ADF service and help diagnose any underlying issues.

    Hope this helps. Do let us know if you any further queries.


    If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.