Fine-tuning job stuck in "Training" status for over 24 hours - Microsoft Learning Exercise

Question

Fine-tuning job stuck in "Training" status for over 24 hours - Microsoft Learning Exercise

James 40

User's image

Hi there! I need some help with a fine-tuning job I started for a Microsoft Learning module. I'm working through the exercise at https://microsoftlearning.github.io/mslearn-ai-studio/Instructions/05-Finetune-model.html where you fine-tune a language model.

I started a fine-tuning job yesterday evening using the small travel assistant JSONL file from the tutorial, but it's still showing as "Training started" more than 24 hours later.

I'm pretty sure something's wrong because it's a small dataset that shouldn't take this long to train. Do you know if there are any issues with the fine-tuning service right now? On the first training run I did cancel it after 3 hours of still being on "Running" Any ideas on what might be happening or how I can fix it?

Thanks for your help!

SriLakshmi C 4,395 Reputation points Microsoft External Staff

2025-04-23T17:02:32.4766667+00:00

Hello @James,

I understand that your fine-tuning job is stuck in "Training" status for over 24 hours, there could be several reasons for this issue. Here are some potential causes and steps you can take:

Reviewing the job status in the Fine-tuning section of the Azure AI Studio portal. Fine-tuning jobs are sometimes queued due to high demand or limited resources. Try refreshing the portal to check for any recent updates.

Although you mentioned that the dataset is small, it's important to ensure that it meets the minimum requirements for fine-tuning. If the dataset is too small or not well-structured, it might lead to unexpected behavior.

If the job continues to hang with no progress, consider canceling and resubmitting it. Since you've already canceled a previous long-running job, resubmitting with validated data and parameters might help rule out edge issues.

Azure’s compute resources for fine-tuning are shared across tenants and may not be immediately available. Check the Azure Status Page to rule out regional delays or service incidents that might be impacting availability.

Also you can refer Check the status of your custom model,

Troubleshooting for Azure OpenAI fine-tuning.

I Hope this helps. Do let me know if you have any further queries.

Thank you!
SriLakshmi C 4,395 Reputation points Microsoft External Staff

2025-04-24T09:22:37.2+00:00

Hi @James,

Following up to see if the above suggestion was helpful. And, if you have any further query do let us know.

Thank you!

Your answer

SriLakshmi C 4,395 Reputation points Microsoft External Staff

2025-04-23T17:02:32.4766667+00:00

Hello @James,

I understand that your fine-tuning job is stuck in "Training" status for over 24 hours, there could be several reasons for this issue. Here are some potential causes and steps you can take:

Reviewing the job status in the Fine-tuning section of the Azure AI Studio portal. Fine-tuning jobs are sometimes queued due to high demand or limited resources. Try refreshing the portal to check for any recent updates.

Although you mentioned that the dataset is small, it's important to ensure that it meets the minimum requirements for fine-tuning. If the dataset is too small or not well-structured, it might lead to unexpected behavior.

If the job continues to hang with no progress, consider canceling and resubmitting it. Since you've already canceled a previous long-running job, resubmitting with validated data and parameters might help rule out edge issues.

Azure’s compute resources for fine-tuning are shared across tenants and may not be immediately available. Check the Azure Status Page to rule out regional delays or service incidents that might be impacting availability.

Also you can refer Check the status of your custom model,

Troubleshooting for Azure OpenAI fine-tuning.

I Hope this helps. Do let me know if you have any further queries.

Thank you!
SriLakshmi C 4,395 Reputation points Microsoft External Staff

2025-04-24T09:22:37.2+00:00

Hi @James,

Following up to see if the above suggestion was helpful. And, if you have any further query do let us know.

Thank you!

Share via

Fine-tuning job stuck in "Training" status for over 24 hours - Microsoft Learning Exercise

Your answer