Failing to copy large dataset from Salesforce using new salesforce V2 connector in Azure data factory
I am trying to migrate the salesforce legacy linked service to new salesforce V2 linked service in Azure data factory. The test connection is successful and I am able to copy the smaller datasets using new salesforce V2 linked service. But when i am trying to copy the larger datasets(Around 50GB in my scenario) from salesforce using the new linked service connector, the copy activity is failing with the following exception.
Failure happened on 'Source' side. ErrorCode=SalesforceAPITaskCancelException,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=Getting an unexpected TaskCanceledException when sending request to Salesforce API even after multiple retries!,Source=Microsoft.Connectors.Salesforce,''Type=System.Threading.Tasks.TaskCanceledException,Message=A task was canceled.,Source=mscorlib
Azure Data Factory
-
phemanth 11,280 Reputation points • Microsoft Vendor
2024-09-18T10:27:46.1266667+00:00 Thanks for posting your question in the Microsoft Q&A forum.
The error message indicates that the copy activity is failing with a
TaskCanceledException
when trying to copy larger datasets (around 50GB) from Salesforce using the new linked service connector. This suggests that the issue is related to the timeout or throttling of the Salesforce API.- Increase Timeout Values: Adjust the timeout settings in your Azure Data Factory copy activity to allow more time for the data transfer. This can help prevent the task from being canceled due to timeouts.
- Check Salesforce Bulk API Limits: Ensure that you are not exceeding the Salesforce Bulk API limits. Salesforce Bulk API 2.0 has specific limits on the number of batches you can submit per 24-hour period. If you exceed these limits, you may encounter failures.
Optimize Data Transfer: Consider breaking down the large dataset into smaller chunks and transferring them sequentially. This can help manage the load and reduce the risk of timeouts.
- Increase Node Concurrency: If you are using a self-hosted integration runtime (IR), increase the node concurrency settings to allow more parallel data transfers.
Monitor API Usage: Keep an eye on your Salesforce API usage and ensure that you have sufficient API calls available for the data transfer process.
Retry Logic: Implement retry logic in your pipeline to handle transient errors and retries automatically.
refer :https://learn.microsoft.com/en-us/answers/questions/1669230/salesforce-v2-link-service-doenst-work-error-messa?comment=question#newest-question-comment https://learn.microsoft.com/en-us/azure/data-factory/connector-salesforce?tabs=data-factory
Let me know if you need further help or have any other questions!
-
LY 10 Reputation points
2024-09-18T21:14:37.6933333+00:00 Thanks for the reply. I am not trying to copy the data into Salesforce instead i am trying to pull the data from Salesforce into a blob storage. I have checked the timeout settings for my adf pipeline and i can see it is mentioned as below. Also, the exception says, it has retried multiple times. I would like to check on Bulk API limits to retrieve the data from salesforce when using ADF. Is there any documentation on how to implement bulk api in ADF for salesforce data retrieval?
-
phemanth 11,280 Reputation points • Microsoft Vendor
2024-09-19T11:32:05.5066667+00:00 Thanks for clarifying! Since you’re pulling data from Salesforce into Blob Storage, it’s important to understand the Bulk API limits and how to implement it in Azure Data Factory (ADF).
Salesforce Bulk API Limits
Batch Limits: You can submit up to 15,000 batches per rolling 24-hour period. Record Limits: Each batch can contain up to 10,000 records, and you can upload up to 150 million records per 24-hour period. Data Size: A bulk query can retrieve up to 15 GB of data, divided into 15 files of 1 GB each.
Implementing Bulk API in ADF
To use the Bulk API for data retrieval in ADF, follow these steps:
Configure the Linked Service:
Ensure that the Salesforce linked service in ADF is configured to use Bulk API 2.0. You can set this in the linked service properties by specifying the
apiVersion
property.Set Up the Copy Activity:
In your ADF pipeline, configure the copy activity to use the Salesforce linked service. Ensure that the
source
settings are optimized for bulk data retrieval.Adjust Timeout and Retry Settings:
Since your timeout is already set to 7 days, ensure that your retry settings are configured to handle transient errors effectively.
Monitor and Optimize:
Keep an eye on your API usage and batch limits. You can monitor this through Salesforce’s API usage dashboard.
For detailed guidance, you can refer to the official documentation on using the Salesforce connector in ADF.
-
LY 10 Reputation points
2024-09-19T19:37:22.85+00:00 Thanks for the reply. Regarding the data size limitations, when you mention that a bulk query can retrieve up to 15 GB of data, is this limit for each time or for within a 24-hour period?
My Salesforce linked service API version is configured as below,
Does this indicate it is using the Bulk API? If not, where can i provide the apiVersion property and what needs to be given to that? Also, you have mentioned that, in copy activity, we need to ensure that the source settings are optimized for bulk data retrieval. May i know how to check on this?
-
phemanth 11,280 Reputation points • Microsoft Vendor
2024-09-20T11:48:15.0233333+00:00 The 15 GB data size limit for a bulk query in Salesforce is per query, not per 24-hour period. This means each bulk query can retrieve up to 15 GB of data, divided into 15 files of 1 GB each.
Regarding your Salesforce linked service configuration, the API version set to “60.0” does not necessarily indicate that the Bulk API is being used. The Bulk API is a specific feature that needs to be enabled in the settings.
To ensure you are using the Bulk API and to optimize your source settings for bulk data retrieval in Azure Data Factory (ADF), follow these steps:
Enable Bulk API in Linked Service:
- Go to your Salesforce linked service in ADF.
- Check for an option to enable the Bulk API. This might be a checkbox or a setting within the linked service configuration.
Set the
apiVersion
Property:- If there is no direct option to enable the Bulk API, you might need to set the
apiVersion
property manually. - In the linked service JSON configuration, add or update the
apiVersion
property to the desired version (e.g., “60.0”).
Optimize Copy Activity Settings:
- In your ADF pipeline, open the copy activity that pulls data from Salesforce.
- Go to the Source tab and look for settings related to the Bulk API. Ensure that the Bulk API option is enabled.
- Adjust the Batch Size and Timeout settings to handle large data volumes efficiently.
-
phemanth 11,280 Reputation points • Microsoft Vendor
2024-09-23T11:34:20.29+00:00 @LY We haven’t heard from you on the last response and was just checking back to see if you have a resolution yet. In case if you have any resolution please do share that same with the community as it can be helpful to others. Otherwise, will respond with more details and we will try to help.
-
LY 10 Reputation points
2024-09-23T17:09:37.05+00:00 I am still facing the issue and looking for more help on this. My Salesforce linked service json has the below condition already.
"apiVersion": "60.0"
Also, as you have mentioned, i do not see any other options/checkboxes in Linked Service or in copy activity to enable the Bulk API. If you are aware of Bulk API option in ADF copy activity, please share the screenshots to follow the same. Thanks.
-
LY 10 Reputation points
2024-09-23T17:10:25.6633333+00:00 I am still facing the issue and looking for more help on this. My Salesforce linked service json has the below condition already.
"apiVersion": "60.0"
Also, as you have mentioned, i do not see any other options/checkboxes in Linked Service or in copy activity to enable the Bulk API. If you are aware of Bulk API option in ADF copy activity, please share the screenshots to follow the same. Thanks.
-
DF 0 Reputation points
2024-09-24T08:14:33.4066667+00:00 Hi @phemanth , thanks for your advice on this issue so far.
We are in the same situation as the original poster. We were using the legacy connector. We have upgraded because the legacy connector:
- uses Bulk API 1.0, which fails to query tables with 40 million rows (retries up to 30 times)
- is no longer supported after October 2024
- uses a discouraged auth flow (username/password)
The legacy connector was able to handle most of the tables as it was able to leverage Primary Key Chunking. We really don't want to downgrade to an unsupported connector, but the new one doesn't seem fit for purpose.
Please note that there is not a 15GB limit here. The Bulk API Limits Cheatsheet you shared earlier says there is a 15 file limit on Bulk API 1.0, but there is no limit on Bulk API 2.0 used by SalesforceV2Source.
The query would succeed on the Salesforce side, but the ADF connector stops listening for the response prematurely. We can run same queries against our production org using Bulk API 2.0 directly using Postman. The jobs complete successfully after ~1 hour and we can fetch the results.
-
Jovan Pajic 0 Reputation points
2024-09-24T11:55:25.5133333+00:00 Thanks for this post. Facing the same problem.
-
phemanth 11,280 Reputation points • Microsoft Vendor
2024-09-25T11:25:13.4333333+00:00 We are reaching out to the internal team to get more information related to your query and will get back to you as soon as we have an update.
-
-
phemanth 11,280 Reputation points • Microsoft Vendor
2024-10-14T03:48:02.26+00:00 @LY We haven’t heard from you on the last response and was just checking back to see if you have a resolution yet. In case if you have any resolution please do share that same with the community as it can be helpful to others. Otherwise, will respond with more details and we will try to help.
-
Kartik Parashar 0 Reputation points
2024-10-24T06:36:41.0066667+00:00 @phemanth
Hi
In Salesforce Bulk API Limits below we have information that about 150 million records can be uploaded. But is the limit same for the number of records that can be fetched using Bulk API 2.0 with salesforce as a source?Batch Limits: You can submit up to 15,000 batches per rolling 24-hour period. Record Limits: Each batch can contain up to 10,000 records, and you can upload up to 150 million records per 24-hour period. Data Size: A bulk query can retrieve up to 15 GB of data, divided into 15 files of 1 GB each.
-
Binway 696 Reputation points
2024-11-10T23:16:20.3133333+00:00 I am getting exactly the same error message. With the version 1 connector I am pulling apprx 470,000 records from 130 fields. When I try this with the version 2 connector I get the same error. Reading the comments here I am not seeing a fix - is there one or do we have to log a support call.
-
Binway 696 Reputation points
2024-11-11T02:21:13.64+00:00 My current configuration that at least got the preview data workingis set out below - although this has blown the execution time out from 15 minutes to still running after an hour.
First I confirmed our Salesforce API Version with the SF Admin
Then I changed the Data set so it was looking at a specific table
Then I changed the pipeline source so it is using the object api and the not the SOQL query that I used originally.
As stated - this at least got the preview data working but has not yet completed when I run the pipeline in debug mode.
Sign in to comment