Cant create multimodal search due to can not find the correct models

Question

Cant create multimodal search due to can not find the correct models

System Administrator 0

I can not create multimodal RAG of multimodal embedding; however, I've tried to meet the regions deployment and still not working.

And the discussion here mentioned the Cohere models should be deployed as serverless but I can not find the serverless type no matter in hubs or projects...

https://learn.microsoft.com/en-us/answers/questions/5634290/deployed-cohere-embedding-model-not-available-in-a

Please kindly share your insights to help me :(

Praneeth Maddali 2,890 Reputation points Microsoft External Staff Moderator

2025-12-26T04:35:16.7333333+00:00
Hi @System Administrator

Thank you for reaching us for your question and for referencing the related thread. I appreciate the detailed information you provided regarding your efforts with different regions and deployments.

The main reason the Cohere embedding models don't appear in the Vectorize your images step or the multimodal workflow is that, although Cohere-embed-v3-english and Cohere-embed-v3-multilingual are strong text embedding models and can be deployed as serverless endpoints, the Azure AI Search portal wizard mainly supports them for text vectorization or image verbalization, not for direct multimodal (image and text) embeddings.

For genuine multimodal RAG, if you want to vectorize images directly instead of converting them to text with an LLM, the most dependable and recommended approach is to use Azure Vision multimodal embeddings. This solution works smoothly with the wizard and supports both text and images within a unified vector space.

Could you please follow the steps outlined below to ensure efficient implementation

Please create or utilize a Foundry resource (multi-service) in a region that supports Azure Vision multimodal APIs. Most regions are compatible; however, East US 2, West US 3, and Sweden Central are recommended options.

In the Azure portal, go to your Azure AI Search service > Import data (new) > select your data source > choose Multimodal RAG.

At the embedding stage, please select Multimodal embeddings and then choose the Azure Vision option. This option will be available automatically once the necessary permissions are configured.

Please ensure that your search service has the appropriate role assignments on the Foundry resource, such as the Cognitive Services User role through managed identity.

Once configured, the wizard will manage the extraction and vectorization of both text and images from your documents, such as PDFs.

If you still want to use Cohere models, they're available for text embeddings and will appear in the text vectorization options after deploying serverless. You can also combine them with image verbalization if needed. However, Azure Vision offers the simplest solution for multimodal tasks.

Reference

Quickstart for multimodal search in the portal: https://learn.microsoft.com/en-us/azure/search/search-get-started-portal-image-search?referrer=grok.com&tabs=search-perms%2Copenai-perms%2Cdocument-extraction%2Cimage-verbalization

Supported models and methods table (shows Azure Vision as primary for direct multimodal): https://learn.microsoft.com/en-us/azure/search/search-get-started-portal-image-search?referrer=grok.com&tabs=search-perms%2Copenai-perms%2Cdocument-extraction%2Cimage-verbalization#supported-embedding-methods

Deploying models as serverless endpoints (for Cohere if you go that route):
https://learn.microsoft.com/en-us/azure/ai-foundry/how-to/deploy-models-serverless?view=foundry-classic&referrer=grok.com&tabs=azure-direct&pivots=ai-foundry-portal

Integrated vectorization overview: https://learn.microsoft.com/en-us/azure/search/vector-search-integrated-vectorization-ai-studio

Kindly let us know if the above helps or you need further assistance on this issue.

Please "upvote" if the information helped you. This will help us and others in the community as well.

1 answer

Your answer

Answer 1

To create a multimodal RAG (Retrieval-Augmented Generation) using multimodal embeddings, ensure that you have the correct models deployed and that they are accessible in your Azure environment. Here are some steps to consider:

Deployment of Cohere Models: Make sure that the Cohere embedding models are provisioned as serverless API deployments. You can use an ARM/Bicep template for this task. If you are unable to find the serverless option, double-check that you are looking in the correct sections of the Azure Foundry portal or your project settings.
Regional Availability: Confirm that both your Azure OpenAI resource and Azure AI Search service are created in the same region. While integrated vectorization does not strictly require this, having them in the same region can improve performance and reduce latency.
Model Selection: When adding knowledge to your agent workflow in the Foundry portal, ensure you are selecting the correct embedding models that are supported for your project. The supported models include text-embedding-ada-002, text-embedding-3-small, and text-embedding-3-large, among others.
Public Access: Ensure that all resources have public access enabled so that the Azure portal can access them. This is crucial for the wizard to run successfully.

If you have followed these steps and are still encountering issues, consider reaching out to Azure support for further assistance.

References:

Share via

Cant create multimodal search due to can not find the correct models

1 answer

Your answer