Share via

AI foundry evaluation is stuck at starting

Ipsita Dutta 0 Reputation points
2026-03-26T11:09:07.2433333+00:00

AI foundry evaluation is stuck at starting, its not moving ahead, I have checked the project and the resource in the same region. I am getting answer from model from playgound chat

Foundry Tools
Foundry Tools

Formerly known as Azure AI Services or Azure Cognitive Services is a unified collection of prebuilt AI capabilities within the Microsoft Foundry platform

0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. Q&A Assist
    2026-03-26T11:09:23.4133333+00:00
    AI generated content. This question contains an answer created with AI from Q&A Assist. Learn more

    Use the following checks and actions when an Azure AI Foundry evaluation is stuck in the Starting (or effectively never moves to Running/Completed) state, even though chat in the playground works:

    1. Verify the evaluation job status and cancel if it is stuck
      • If using the SDK, check the run status.
      • If the run has been in Running/Starting for a long time with no progress, cancel it:
             client.evals.runs.cancel(run_id, eval_id=eval_id)
        
      • After canceling, create a new evaluation run.
    2. Check Azure OpenAI model capacity and quota
      • A common cause of long-running or stuck evaluation jobs is insufficient capacity on the Azure OpenAI deployment used by the evaluation.
      • In the Azure portal, open the Azure OpenAI (or Foundry model deployment) resource used for the evaluation and:
        • Verify the deployment is healthy.
        • Increase the model capacity / tokens-per-minute (TPM) quota if it is low or saturated.
      • After increasing capacity, rerun the evaluation.
    3. Confirm authentication and permissions
      • If the evaluation is created via SDK and authentication is misconfigured, the job may never progress.
      • Ensure DefaultAzureCredential is correctly set up (run az login if using Azure CLI) and that the identity used has the Azure AI User role on the Foundry project.
      • Verify the project endpoint URL is correct and includes both account and project names.
    4. Validate evaluation dataset and mapping
      • If the evaluation uses a dataset (CSV/JSONL), schema or mapping issues can cause failures:
        • Ensure JSONL has one valid JSON object per line.
        • Confirm data_mapping field names exactly match the dataset fields (case-sensitive).
        • Ensure item_schema properties align with the dataset fields.
      • Fix any schema/mapping issues and recreate the evaluation run.
    5. Handle rate limiting
      • If the environment is hitting rate limits, evaluation creation or execution can stall:
        • If you receive 429 Too Many Requests when creating runs, respect the retry-after header and use exponential backoff.
        • If jobs fail with 429 during execution, reduce dataset size or split into smaller batches and/or increase TPM quota for the model deployment.
    6. Re-run after adjustments
      • After adjusting capacity, fixing schema, or correcting permissions, submit a new evaluation run and monitor its status.

    If the job continues to stay in Starting after these steps, capture the run ID and error details (if any) and open a support ticket, as the portal chat working already confirms the model deployment itself is healthy.


    References:

    0 comments No comments

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.