gRPC connection errors in copy activity

Question

gRPC connection errors in copy activity

Hamza Outa 61

I have a copy data activity in ADF that copies from Google BigQuery to Blob storage. Occasionaly I get one of these errors:

"grpc_message":"End of TCP stream"
"grpc_message":"keepalive watchdog timeout"

We are running multiple copy datas parallel through a foreach loop, and sometimes this happens. So of the 5 parallell copies only 1 of them fails every so often.

we checked on BigQuery and don't see any issues.

Why is this happening and how can it be fixed?

here's the full error:

ErrorCode=UserErrorFailedFileOperation,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=The file operation is failed, upload file failed at path: 'CSV_NAME_00060.csv'.,Source=Microsoft.DataTransfer.Common,''Type=Microsoft.DI.Connector.GoogleBigQuery.ExceptionUtils.GoogleBigQueryConnectorException,Message=Error occurred when reading, ErrorCode: UnknownError,Source=Microsoft.DI.Connector.GoogleBigQuery,''Type=Grpc.Core.RpcException,Message=Status(StatusCode="Unavailable", Detail="End of TCP stream", DebugException="Grpc.Core.Internal.CoreErrorDetailException: {"created":"@1726772029.383000000","description":"Error received from peer ipv4:<IP:PORT>","file":"..\..\..\src\core\lib\surface\call.cc","file_line":953,"grpc_message":"End of TCP stream","grpc_status":14}"),Source=mscorlib,''Type=Grpc.Core.Internal.CoreErrorDetailException,Message={"created":"@1726772029.383000000","description":"Error received from peer ipv4:<same_IP:same_PORT>","file":"..\..\..\src\core\lib\surface\call.cc","file_line":953,"grpc_message":"End of TCP stream","grpc_status":14},Source=,'

phemanth 15,755 Reputation points Microsoft External Staff Moderator

2024-09-23T11:23:13.7433333+00:00

@Hamza Outa We haven’t heard from you on the last response and was just checking back to see if you have a resolution yet. In case if you have any resolution please do share that same with the community as it can be helpful to others. Otherwise, will respond with more details and we will try to help.
phemanth 15,755 Reputation points Microsoft External Staff Moderator

2024-09-24T13:43:58.09+00:00

@Hamza Outa just checking back to see if you have a resolution yet. In case if you have any resolution please do share that same with the community as it can be helpful to others. Otherwise, will respond with more details and we will try to help.

1 answer

Your answer

phemanth 15,755 Reputation points Microsoft External Staff Moderator

2024-09-23T11:23:13.7433333+00:00

@Hamza Outa We haven’t heard from you on the last response and was just checking back to see if you have a resolution yet. In case if you have any resolution please do share that same with the community as it can be helpful to others. Otherwise, will respond with more details and we will try to help.
phemanth 15,755 Reputation points Microsoft External Staff Moderator

2024-09-24T13:43:58.09+00:00

@Hamza Outa just checking back to see if you have a resolution yet. In case if you have any resolution please do share that same with the community as it can be helpful to others. Otherwise, will respond with more details and we will try to help.

Answer 1

@Hamza Outa

Thanks for reaching out to Microsoft Q&A

It sounds like you’re encountering intermittent network issues when copying data from Google BigQuery to Blob storage. The errors “End of TCP stream” and “keepalive watchdog timeout” suggest that the gRPC connection between your ADF pipeline and BigQuery is being interrupted.

Possible Causes

Network Instability: Temporary network issues can cause the gRPC connection to drop.

BigQuery Connection Limits: BigQuery might be closing idle connections or hitting connection limits.

Resource Contention: Running multiple parallel copy activities might be causing resource contention, leading to timeouts.

Solutions

Retry Logic: Implement retry logic in your ADF pipeline to handle transient errors. This can help mitigate the impact of temporary network issues.
Increase Timeout Settings: Adjust the timeout settings for your gRPC connections to allow more time for the operations to complete.
Optimize Parallelism: Reduce the number of parallel copy activities to see if it alleviates the issue. You can gradually increase the parallelism to find a balance.
Monitor Network Health: Use network monitoring tools to check for any instability or issues in your network that might be causing the interruptions.
BigQuery API Usage: Ensure you are using the most efficient API for your use case. The BigQuery Storage Read API might offer better performance and reliability for large data transfers.

Example Retry Logic in ADF

{
    "name": "CopyData",
    "type": "Copy",
    "policy": {
        "retry": 3,
        "retryIntervalInSeconds": 30
    },
    "source": {
        "type": "BigQuerySource",
        "query": "SELECT * FROM your_table"
    },
    "sink": {
        "type": "BlobSink",
        "blobPath": "your_blob_path"
    }
}

This configuration will retry the copy operation up to three times with a 30-second interval between attempts.

Hope this helps. Do let us know if you any further queries.

Share via

gRPC connection errors in copy activity

1 answer

Your answer