Share via


out of memory in ADF

Question

Tuesday, October 1, 2019 10:52 AM

Hi ,

I am creating a pipeline with copy activity. My source is Azure blob (CSV) sink is Parquet file on azure blob. The file size in 800 MB. while running hte pipeline i am receiving below error.

Activity Copy data1 failed: Failure happened on 'Sink' side. ErrorCode=UserErrorFailedFileOperation,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=Upload file failed at path parquet\Consumer_Complaints.parquet.,Source=Microsoft.DataTransfer.Common,''Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=An error occurred when invoking java, message: java.lang.OutOfMemoryError:Direct buffer memory

I am not sure why the memory is a problem when all the operations are happening over cloud.

Appreciate you help to identify the issue.

TIA 

Rahul

All replies (6)

Thursday, October 3, 2019 2:09 AM âś…Answered

Use dataflow for performance and scale for handling parquet data


Wednesday, October 2, 2019 1:04 AM

Hello Rahul,

First thing , since its s copy activity , please add a retry and see if thats solve the issue in the next retry. 
Just curious if you are using self hosted IR , we have seen this issue when users are on SHIR . 

 

Thanks Himanshu


Thursday, October 3, 2019 1:51 AM

Hello Rahul,

If you are using a Self Hosted Integration Runtime:

Please follow below steps to increase the heap memory for self-hosted IR machine and do restart IR to apply the changes. 

For selfhosted IR: 

  • First check the total RAM of the IR host machine. Please make sure total RAM is >= 8GB
  • Add the following environment variable in the machine that hosts the self hosted IR: 
    _JAVA_OPTIONS "-Xms256m -Xmx16g" (NoteThis is only a sample value. You can determine the min/max heap size by yourself).

  • Then restart the IR. 

Also please do not set too many nodes on this machine if the parquet/orc work is heavy. 
You may change this value in ADF portal -> Connections -> Integration Runtimes -> Edit -> Nodes 

Thanks Himanshu


Friday, October 4, 2019 9:18 AM

unfortunately, I am using DefaultIR in my linked services and it is failing with above error. Data factory being a cloud solution should be scaleble but seems it a naive in data transformation.

I used dataflow and it is working fine. only concern is it is in preview mode and i can't use this in production environment.

TIA

Rahul

Thanks


Friday, October 4, 2019 4:08 PM

Good to know that you have got a work around . Regarding the DF in preview mode , if all goes as planned it will be in GA very soon . 

Thanks Himanshu


Monday, October 7, 2019 6:49 PM

I used dataflow and it is working fine. only concern is it is in preview mode and i can't use this in production environment.

Data Flow feature in ADF is generally available now. 

See:

https://azure.microsoft.com/en-us/updates/azure-data-factory-mapping-data-flows-ga/

Regards,
Vaibhav