Share via

Intermittent FileSystem Corruption I/O errors in Azure Container Instances (UK South)

Scott Gray 0 Reputation points
2026-04-09T10:30:20.56+00:00

Service: Azure Container Instances

Region: UK South

Image: debian:bullseye-slim (No recent changes)

We are experiencing what appears to be intermittent filesystem corruption and Input/Output error failures when running Azure DevOps self-hosted agents inside Azure Container Instances.

The issue mainly manifests during Git operations and normal file reads / writes. Files like /etc/resolv.conf are being unreadable with I/O errors which also leads to networking failures.

Error from Git:

fatal: fsync error on './git/objects/pack/tmp_idx_*': Input/Output error

fatal: index-pack failed

Similar error reading /etc/resolv.conf

cat /etc/resolv.conf: Input/output error

the meta data is accessible through the stat command

We are using the overlay storage for these operations.

All investigation seems to point to an underlaying storage layer / host issue in UK South.

As a side note, the same workload works fine in UK West

Are Microsoft aware of any issues that would cause this sporadic behaviour in UK South?

Azure Container Instances

2 answers

Sort by: Most helpful
  1. Alex Burlachenko 21,805 Reputation points MVP Volunteer Moderator
    2026-04-14T07:27:05.9166667+00:00

    Scott Gray hi,

    this looks like ACI host/storage layer issue in that region, especially since same workload works in UK West and u are seeing low-level I O errors even on /etc/resolv.conf, thats not app level, thats underlying filesystem/overlay fs breaking.

    PLS stop trusting that region for this workload, move to UK West or another region as primary, thats the fastest fix.

    second avoid heavy git/pack operations on ACI ephemeral storage, use Azure Files or mounted volume for workspace instead of container overlay fs.

    third reduce fsync pressure if possible (git can hammer disk hard), but tbh if host is flaky this wont fully save u.

    fourth add retry logic on container runs (ACI sometimes lands u on bad host, next run may be fine).

    fifth capture logs and open Azure support ticket with region + timestamps, this is backend issue they need to investigate

    optional if this is critical workload consider moving to AKS or VM-based agents, ACI is not great for heavy IO workloads

    rgds,

    Alex

    Was this answer helpful?

    0 comments No comments

  2. Ankit Yadav 14,455 Reputation points Microsoft External Staff Moderator
    2026-04-09T11:00:29.6133333+00:00

    Hello Scott,

    Thanks for the detailed description that helps narrow things down.

    From a service perspective, Azure Container Instances use ephemeral, host-backed storage for the container’s root filesystem. This includes paths such as /etc, the container image layer, and any file I/O performed directly on the container filesystem. This storage is tied to the health of the underlying host and is not designed for heavy or durable I/O operations. (see:https://learn.microsoft.com/en-us/azure/reliability/reliability-container-instances)

    There are no broad service issue specific to UK South that would indicate ongoing filesystem corruption in Azure Container Instances. Differences in behavior between regions (for example, UK South vs UK West) can occur due to capacity placement or individual host health, but these are not surfaced as public incidents unless there is a widespread impact.

    The symptoms you’re seeing- intermittent fsync failures, unreadable files such as /etc/resolv.conf, and transient I/O errors during Git operations- are consistent with ephemeral local storage encountering a transient host-level failure. For this reason, we recommend that workloads running on ACI:

    • Treat container filesystem storage as temporary and failure-prone
    • Implement retry logic for I/O-heavy operations (especially Git)
    • Avoid performing critical build or workspace operations directly on the container root filesystem

    To mitigate this:

    • Move your Git workspace off the container filesystem
    • Add resilience to the pipeline
      • Transient faults are expected in ACI. Git operations should include retries and exponential backoff.
    • Collect diagnostics
      • Capture container events and logs using az container logs and az container show so we can evaluate host placement and restart history.

    If the issue continues after moving I/O off the container filesystem, please share:

    • The container group definition (CPU, memory, volume mounts)
    • Frequency and duration of failures

    Hope this answers your concerns with the intermittent failures!!

    Was this answer helpful?

    0 comments No comments

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.