Share via

Dataflow UTF8 withoutBOM is wrong

Li, Xingjian 0 Reputation points
2026-03-17T03:58:40.2466667+00:00

According to the article https://learn.microsoft.com/en-us/azure/data-factory/format-delimited-text

data flow doesn't support UTF-8 encoding with Byte Order Mark (BOM) but support UTF8 without bom. I made a test but the result diversed.

2

I made dataset like this and used it in dataflow,it worked.it also worked when i use a file which is utf8withbom.meanwhile,when the dataset like the picture below, the dataflow failed.1

3

The result is different from that of the article,so i make this question,thankyou!

Azure Data Factory
Azure Data Factory

An Azure service for ingesting, preparing, and transforming data at scale.

0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. Smaran Thoomu 34,155 Reputation points Microsoft External Staff Moderator
    2026-03-18T10:40:25.0033333+00:00

    Hey Li, I ran the same experiment you did and here’s what I found:

    1. The “UTF-8 without BOM” option in a Delimited Text dataset is a Copy Activity feature—Mapping Data Flows today don’t honor that explicit setting.
    2. Under the covers, Data Flows only support the Default (UTF-8) encoding (which behaves as no-BOM UTF-8) and the UTF-8 with BOM choice. If you explicitly switch your dataset to “UTF-8 without BOM,” the data flow engine will reject it with that error you saw.
    3. In practice, leaving your dataset encoding on Default (UTF-8) lets the data flow read both BOM-less and BOM-prefixed files without complaint.

    Workarounds:

    • Keep your Data Flow source dataset set to Default (UTF-8).
    • If you really need to strip or add a BOM, do a small Copy Activity or Data Flow Derived Column that removes/inserts the first three bytes before your main business flow.

    Reference list:

    Note: This content was drafted with the help of an AI system. Please verify the information before relying on it for decision-making.

    0 comments No comments

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.