Usage of Continuation Token in production CosmosDB

Divyanshu Bains 0 Reputation points
2025-04-24T09:25:06.2233333+00:00

My question relates to the working of Continuation Token in CosmosDB. My use case is the following - at some time t1, I want to start fetching some subset of documents from one of my containers in my CosmosDB account. This subset can be fetched using a simple boolean field, isActive, which needs to have a value of true. But the number of documents is huge (~300M) so doing it in one shot is not possible. I need some kind of pagination, and Continuation Token seems to achieve the same. But there are a few issues. To start off, the account is a production account and there are document creations, updates, and deletions happening at all times, even during the process of fetching pages. Secondly, my indexation policy has the following - "excludedPaths": [ { "path": "/*" } ] - and it also has some explicitly mentioned inludedPaths like - "includedPaths": [ { "path": "/myDocId/?" } ].

  1. The issue with this account being a production account is that I'm unsure if, and when, the documents created, updated, or deleted during the document fetching process will appear in my paginated results since I do not know the internal working of Continuation Token. From a requirements perspective, it is acceptable for me to miss the documents which are created, updated, or deleted after time t1 (when the document fetching process starts). However, if a document remains unchanged throughout the document fetching process, it has to appear at least once in my paginated response, i.e. it is acceptable if it appears once or more than once. Does Continuation Token guarantee this? If continuation token internally uses number of documents to skip to find the next page, it may happen that if a document is deleted which had already appeared in one of the previous pages, then the first document in the next page might get skipped.
  2. The issue with my indexation policy is that it explicitly excludes all fields except the specified ones. I'm not sure about this, but maybe it also excludes _rid from indexing, which is believed to be used in Continuation Token. If that is the case, will these fetch page calls become really expensive and slow? If that is indeed the case, can I change my query from "SELECT * FROM c WHERE c.isActive=@isActive" to "SELECT * FROM c.isActive=@isActive ORDER BY c.myDocId" where myDocId is indexed and guaranteed to be unique (however it is not guaranteed to be monotonically increasing or decreasing, if that matters).
Azure Cosmos DB
Azure Cosmos DB
An Azure NoSQL database service for app development.
1,843 questions
{count} votes

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.