Share via

Azure AI Search - Text Split Skill related

G Sanil 0 Reputation points
2026-04-16T09:03:41.0066667+00:00

I am exploring the Text Split Skill in Azure AI Search and had a question around the textSplitMode parameter behavior.

When textSplitMode = pages, fixed token size parameters like maximumPageLength, pageOverlapLength, maximumPagesToTake etc. are applicable and control how the text is chunked.

However, I noticed that these parameters do not seem to apply when textSplitMode = sentences.

This raises a few questions:

  1. When textSplitMode = sentences, does each sentence become its own individual chunk? i.e., 1 sentence = 1 chunk?
  2. If that is the case, parameters like maximumPageLength and pageOverlapLength are essentially ignored ?
  3. What would be the recommended use case for textSplitMode = sentences? I am unable to figure out.
Azure AI Search
Azure AI Search

An Azure search service with built-in artificial intelligence capabilities that enrich information to help identify and explore relevant content at scale.

0 comments No comments

2 answers

Sort by: Most helpful
  1. Praneeth Maddali 8,445 Reputation points Microsoft External Staff Moderator
    2026-04-16T11:02:32.2833333+00:00

    Hi @G Sanil

    Thanks for your clear and detailed question about the Text Split skill in Azure AI Search — it's a great one, and you're spot on with your observations.

    Quick answers to your questions:

    1. Yes — when textSplitMode is set to sentences, the skill splits the text so that each sentence becomes its own individual chunk. It breaks strictly on sentence-ending punctuation (., ?, !, etc., respecting the language set in defaultLanguageCode).
      User's image
    2. Correct — parameters like maximumPageLength, pageOverlapLength, and maximumPagesToTake are ignored in sentences mode. They only apply when textSplitMode is set to pages. The skill simply performs a clean punctuation-based split in sentences mode.
      User's image
    3. Recommended use cases for sentences mode:
      • You need very fine-grained, precise chunks where preserving exact sentence boundaries is important.
      • Your content consists of short, self-contained sentences and you want natural language units rather than fixed-length chunks.
      • You're doing highly targeted semantic search or analysis where one sentence = one meaningful retrieval unit.

    That said, for most vector search / RAG scenarios, Microsoft recommends using pages mode instead. It gives you better control over chunk size (to stay within embedding model token limits) and supports overlap for improved context. Sentences mode often creates a much larger number of chunks, which can increase indexing time and storage costs.

    Reference :

    https://learn.microsoft.com/en-us/azure/search/cognitive-search-skill-textsplit

    https://learn.microsoft.com/en-us/azure/search/vector-search-how-to-chunk-documents
    User's image

    Please do not forget to click "Accept the answer” and Yes, this can be beneficial to other community members.

    If you have any other questions, let me know in the "comments" and I would be happy to help you

    1 person found this answer helpful.

  2. Martin Šamudovský 180 Reputation points
    2026-04-16T10:07:33.14+00:00

    Hi, G Sanil

    When using the Text Split Skill in Azure AI Search, the behavior of the textSplitMode parameter changes significantly depending on whether you choose pages or sentences. The differences you observed are expected, and the parameters you mentioned are only applicable in specific modes.

    Below is a clear breakdown of how the skill behaves and how each mode is intended to be used.

    1. Does textSplitMode = sentences create one chunk per sentence?

    Yes. When textSplitMode is set to sentences, the skill splits the input strictly along sentence boundaries. Each sentence becomes its own chunk, and the skill does not attempt to merge or size-balance them.

    This mode is designed to preserve natural linguistic boundaries rather than enforce token or character limits.

    2. Are maximumPageLength, pageOverlapLength, and similar parameters ignored in sentence mode?

    Correct. All page‑related parameters apply only when textSplitMode = pages.

    When using sentences mode:

    • maximumPageLength is ignored

    pageOverlapLength is ignored

    maximumPagesToTake is ignored

    The skill does not attempt to enforce chunk size or overlap when splitting by sentences.

    Sentence‑level splitting is useful when you need:

    Highly granular semantic units

    Fine‑grained embeddings for downstream retrieval

    Precise alignment between input text and model responses

    Scenarios where each sentence may carry independent meaning (for example, FAQs, short statements, or structured content)

    This mode is typically chosen when the goal is to maximize semantic precision rather than optimize for chunk size or token efficiency.

    Summary

    pages mode is for controlled chunking with size limits and overlaps.

    sentences mode is for natural linguistic segmentation with no size‑based controls.

    The parameters you listed are intentionally ignored in sentence mode.

    If you share more about your indexing or retrieval scenario, I can help recommend which mode would be more effective.

    Hope this helps, Martin

    I look forward to your update and am happy to continue working with you until the issue is resolved. If you find the answer helpful, please click "Accept Answer" and consider upvoting it. Otherwise, please keep me posted by clicking "Add comments" below instead of selecting Yes or No.

    This response is drafted with the help of AI systems.

    1 person found this answer helpful.
    0 comments No comments

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.