5. Document Scrolling
Document Scrolling
This scenario demonstrates how to efficiently retrieve and process large sets of documents using the scrolling functionality. When dealing with large result sets that exceed practical pagination limits, scrolling provides a more efficient way to iterate through all matching documents.
Key features:
Process large document collections efficiently
Maintain search context across multiple requests
Retrieve all matching documents without pagination limitations
Combine with search filters from the previous scenario for targeted results
When to Use Scrolling
Scrolling is particularly useful in the following scenarios:
Processing all documents that match specific criteria
Implementing data migration or batch operations
Generating comprehensive reports across the entire document base
Scrolling is designed for processing large datasets in the background or for data export operations. For user interfaces and interactive searches, standard pagination (as shown in the Document Search scenario) is usually more appropriate.
Step 1 - Initialize document scrolling
This endpoint initiates a scrolling context that allows you to retrieve large sets of documents in batches. Unlike standard pagination, scrolling maintains a consistent view of the data even as documents are added or removed.
The scroll.timeout parameter specifies how long (in seconds) the scroll context should remain open
The size parameter controls how many documents are returned in each batch
The sort parameter ensures consistent document ordering across batches
You can combine scrolling with any of the search filters from the previous scenario to process specific document subsets.
Endpoint
POST /search
Request
POST /search?token=4sTB2MjaRQsnUDjmXcgGEaSV4gAU7sUOXQyFdQDYTQ5Kr9zhB60MExibSZ6rwRGv
Content-Type: application/json
{
"sort": {
"createDate": {
"order": "asc"
}
},
"size": 1000,
"scroll": {
"timeout": 30
}
}
Response
{
"scrollId": "byDbPVsm8azWn8eveNDL4kvjnlW8be3txaYgpY2UeCAgySQIJJdpifhBaRvbh5yd",
"results": [
{
"document": {
"documentId": "39728445-c0ef-482e-903a-08319eb79fcd",
"documentTitle": "Annual Report 2023",
"createDate": "2023-01-15T09:30:45Z",
...
}
},
...
],
...
}
The response contains the following important properties:
scrollId - Located at
scrollId
in the response.Example value:
byDbPVsm8azWn8eveNDL4kvjnlW8be3txaYgpY2UeCAgySQIJJdpifhBaRvbh5yd
The response contains the first batch of documents and a scrollId that must be used in subsequent requests to retrieve the next batches.
The scrollId is a unique identifier for your scroll context
The results array contains the first batch of documents
The scroll context will automatically expire after the specified timeout if not used
Store the scrollId for use in subsequent requests. Each new scroll request refreshes the timeout period.
Step 2 - Retrieve next batch of documents
This endpoint retrieves the next batch of documents using the scrollId from the previous response. The request must maintain the same structure as the initial scroll request, with the addition of the scrollId.
The scroll.id parameter must contain the scrollId from the previous response
The scroll.timeout parameter refreshes the scroll context expiration time
The size parameter should match the initial request for consistency
Do not include the from parameter in scroll requests. The scroll context internally tracks the current position.
Endpoint
POST /search
Request
POST /search?token=4sTB2MjaRQsnUDjmXcgGEaSV4gAU7sUOXQyFdQDYTQ5Kr9zhB60MExibSZ6rwRGv
Content-Type: application/json
{
"sort": {
"createDate": {
"order": "asc"
}
},
"size": 1000,
"scroll": {
"id": "byDbPVsm8azWn8eveNDL4kvjnlW8be3txaYgpY2UeCAgySQIJJdpifhBaRvbh5yd",
"timeout": 30
}
}
Response
{
"scrollId": "byDbPVsm8azWn8eveNDL4kvjnlW8be3txaYgpY2UeCAgySQIJJdpifhBaRvbh5yd",
"results": [
{
"document": {
"documentId": "58b03c60-2fad-4519-836d-5375346a65da",
"documentTitle": "Project Proposal",
"createDate": "2023-02-22T14:15:30Z",
...
}
},
...
],
...
}
The response contains the following important properties:
scrollId - Located at
scrollId
in the response.Example value:
byDbPVsm8azWn8eveNDL4kvjnlW8be3txaYgpY2UeCAgySQIJJdpifhBaRvbh5yd
The response contains the next batch of documents and a scrollId for subsequent requests.
The scrollId may change between requests, always use the most recent one
Continue making scroll requests until you've processed all documents or the results array is empty
Each request refreshes the scroll context timeout
Implement error handling to manage cases where the scroll context expires. If this happens, you'll need to start a new scroll context.
Step 3 - Continue scrolling through documents
This endpoint demonstrates how to continue the scrolling process. You'll repeat this request multiple times until you've processed all matching documents or until the results array is empty.
Use the same request structure for all scroll operations
Always use the scrollId from the most recent response
Process each batch of documents as they are received
For large document collections, you may need to make many scroll requests. Consider implementing a loop that continues until an empty results array is received.
Endpoint
POST /search
Request
POST /search?token=4sTB2MjaRQsnUDjmXcgGEaSV4gAU7sUOXQyFdQDYTQ5Kr9zhB60MExibSZ6rwRGv
Content-Type: application/json
{
"sort": {
"createDate": {
"order": "asc"
}
},
"size": 1000,
"scroll": {
"id": "byDbPVsm8azWn8eveNDL4kvjnlW8be3txaYgpY2UeCAgySQIJJdpifhBaRvbh5yd",
"timeout": 30
}
}
Response
{
"scrollId": "byDbPVsm8azWn8eveNDL4kvjnlW8be3txaYgpY2UeCAgySQIJJdpifhBaRvbh5yd",
"results": [
{
"document": {
"documentId": "e14273e7-8984-445c-8ba3-ee495043a49e",
"documentTitle": "Contract Agreement",
"createDate": "2023-03-10T11:45:22Z",
...
}
},
...
],
...
}
The response contains the following important properties:
scrollId - Located at
scrollId
in the response.Example value:
byDbPVsm8azWn8eveNDL4kvjnlW8be3txaYgpY2UeCAgySQIJJdpifhBaRvbh5yd
The response contains another batch of documents. Continue making scroll requests until you've processed all documents or the results array is empty.
An empty results array indicates that all matching documents have been retrieved
The scrollId should be used for the next request if more documents need to be retrieved
Once all documents have been processed, the scroll context should be closed
Step 4 - Close the scroll context
Once you've finished processing all documents or no longer need the scroll context, it's important to explicitly close it to free up server resources. This endpoint terminates the scroll context identified by the scrollId.
Provide the scrollId from the most recent scroll response
Closing the scroll context frees up server resources
This step is important for system performance, especially when processing large document sets
Always implement proper cleanup in your integration code.
Endpoint
DELETE /search/scroll
Request
DELETE /search/scroll?token=4sTB2MjaRQsnUDjmXcgGEaSV4gAU7sUOXQyFdQDYTQ5Kr9zhB60MExibSZ6rwRGv
Content-Type: application/json
{
"scrollId": "byDbPVsm8azWn8eveNDL4kvjnlW8be3txaYgpY2UeCAgySQIJJdpifhBaRvbh5yd"
}
Response
{
"success": true
}
The scroll context has been successfully closed. The server has released all resources associated with this scroll operation.
If you need to process more documents after closing a scroll context, you'll need to initialize a new scroll operation.
Document Scrolling Summary
Document scrolling provides an efficient way to process large sets of documents that would be impractical to retrieve using standard pagination.
Key Concepts
Scroll Context: A server-side cursor that maintains your position in the result set
Scroll ID: A unique identifier for your scroll context that must be used in subsequent requests
Scroll Timeout: The period of inactivity after which the scroll context expires
Batch Size: The number of documents returned in each scroll response
Scrolling Process
1. Initialize a scroll context with search criteria and receive the first batch of documents
2. Process the documents in the current batch
3. Request the next batch using the scrollId from the previous response
4. Repeat steps 2-3 until all documents are processed or the results array is empty
5. Close the scroll context to free up server resources
Example Implementation
See our OpenAPI documentation to learn about the full set of API endpoints and parameters.
Please use proper exception handling and function decomposition in your own code. The code is provided for illustrative purposes only and is not intended for production use.
// Document scrolling example
const URL = "https://sandbox.circularo.com";
const API_PATH = "/api/v1";
const TOKEN = "YOUR_AUTH_TOKEN"; // Obtained from login or API key
// Function to process each batch of documents
function processDocumentBatch(documents) {
for (const doc of documents) {
// Process each document as needed
console.log(`Processing document: ${doc.document.documentId} - ${doc.document.documentTitle}`);
// Implement your document processing logic here
}
}
try {
// Step 1: Initialize scroll context
let scrollId;
const batchSize = 1000;
const scrollTimeout = 30; // seconds
// Initial scroll request
const initialResponse = await fetch(`${URL}${API_PATH}/search?token=${TOKEN}`, {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
// You can add any search filters here
sort: {
createDate: { order: 'asc' }
},
size: batchSize,
scroll: {
timeout: scrollTimeout
}
})
});
if (!initialResponse.ok) {
throw new Error(`Failed to initialize scroll: ${initialResponse.status} ${initialResponse.statusText}`);
}
let responseData = await initialResponse.json();
scrollId = responseData.scrollId;
// Step 2: Process the batch and continue scrolling until all documents are processed
while (responseData.results.length > 0) {
processDocumentBatch(responseData.results);
const scrollResponse = await fetch(`${URL}${API_PATH}/search?token=${TOKEN}`, {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
sort: {
createDate: { order: 'asc' }
},
size: batchSize,
scroll: {
id: scrollId,
timeout: scrollTimeout
}
})
});
if (!scrollResponse.ok) {
throw new Error(`Scroll request failed: ${scrollResponse.status} ${scrollResponse.statusText}`);
}
responseData = await scrollResponse.json();
scrollId = responseData.scrollId;
}
console.log('Document processing complete');
} catch (error) {
console.error('Error processing documents:', error.message);
} finally {
// Step 3: Always try to close the scroll context if we have a scrollId, even if processing failed
if (scrollId) {
try {
await fetch(`${URL}${API_PATH}/search/scroll?token=${TOKEN}`, {
method: 'DELETE',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
scrollId: scrollId
})
});
} catch (closeError) {
console.error('Failed to close scroll context:', closeError.message);
}
}
}