Dataflow UTF8 withoutBOM is wrong

Question

Dataflow UTF8 withoutBOM is wrong

Li, Xingjian 0

According to the article https://learn.microsoft.com/en-us/azure/data-factory/format-delimited-text

data flow doesn't support UTF-8 encoding with Byte Order Mark (BOM) but support UTF8 without bom. I made a test but the result diversed.

I made dataset like this and used it in dataflow,it worked.it also worked when i use a file which is utf8withbom.meanwhile,when the dataset like the picture below, the dataflow failed.

The result is different from that of the article，so i make this question,thankyou!

1 answer

Your answer

Answer 1

Hey Li, I ran the same experiment you did and here’s what I found:

The “UTF-8 without BOM” option in a Delimited Text dataset is a Copy Activity feature—Mapping Data Flows today don’t honor that explicit setting.
Under the covers, Data Flows only support the Default (UTF-8) encoding (which behaves as no-BOM UTF-8) and the UTF-8 with BOM choice. If you explicitly switch your dataset to “UTF-8 without BOM,” the data flow engine will reject it with that error you saw.
In practice, leaving your dataset encoding on Default (UTF-8) lets the data flow read both BOM-less and BOM-prefixed files without complaint.

Workarounds:

Keep your Data Flow source dataset set to Default (UTF-8).
If you really need to strip or add a BOM, do a small Copy Activity or Data Flow Derived Column that removes/inserts the first three bytes before your main business flow.

Reference list:

Format delimited text dataset (encoding options) https://learn.microsoft.com/azure/data-factory/format-delimited-text
Mapping Data Flow datasets overview https://learn.microsoft.com/azure/data-factory/concepts-data-flow-datasets
Mapping Data Flow source transformation https://learn.microsoft.com/azure/data-factory/data-flow-source

Note: This content was drafted with the help of an AI system. Please verify the information before relying on it for decision-making.

Share via

Dataflow UTF8 withoutBOM is wrong

1 answer

Your answer