5. Document Scrolling

Document Scrolling

This scenario demonstrates how to efficiently retrieve and process large sets of documents using the scrolling functionality. When dealing with large result sets that exceed practical pagination limits, scrolling provides a more efficient way to iterate through all matching documents.

Key features:

Process large document collections efficiently
Maintain search context across multiple requests
Retrieve all matching documents without pagination limitations
Combine with search filters from the previous scenario for targeted results

When to Use Scrolling

Scrolling is particularly useful in the following scenarios:

Processing all documents that match specific criteria
Implementing data migration or batch operations
Generating comprehensive reports across the entire document base

Scrolling is designed for processing large datasets in the background or for data export operations. For user interfaces and interactive searches, standard pagination (as shown in the Document Search scenario) is usually more appropriate.

Step 1 - Initialize document scrolling

This endpoint initiates a scrolling context that allows you to retrieve large sets of documents in batches. Unlike standard pagination, scrolling maintains a consistent view of the data even as documents are added or removed.

The scroll.timeout parameter specifies how long (in seconds) the scroll context should remain open
The size parameter controls how many documents are returned in each batch
The sort parameter ensures consistent document ordering across batches

You can combine scrolling with any of the search filters from the previous scenario to process specific document subsets.

Endpoint

CODE

POST /search

Request

JSON

POST /search?token=4sTB2MjaRQsnUDjmXcgGEaSV4gAU7sUOXQyFdQDYTQ5Kr9zhB60MExibSZ6rwRGv

Content-Type: application/json

{
  "sort": {
    "createDate": {
      "order": "asc"
    }
  },
  "size": 1000,
  "scroll": {
    "timeout": 30
  }
}

Response

JSON

{
  "scrollId": "byDbPVsm8azWn8eveNDL4kvjnlW8be3txaYgpY2UeCAgySQIJJdpifhBaRvbh5yd",
  "results": [
    {
      "document": {
        "documentId": "39728445-c0ef-482e-903a-08319eb79fcd",
        "documentTitle": "Annual Report 2023",
        "createDate": "2023-01-15T09:30:45Z",
        ...
      }
    },
    ...
  ],
  ...
}

The response contains the following important properties:

scrollId - Located at scrollId in the response.
- Example value: byDbPVsm8azWn8eveNDL4kvjnlW8be3txaYgpY2UeCAgySQIJJdpifhBaRvbh5yd

The response contains the first batch of documents and a scrollId that must be used in subsequent requests to retrieve the next batches.

The scrollId is a unique identifier for your scroll context
The results array contains the first batch of documents
The scroll context will automatically expire after the specified timeout if not used

Store the scrollId for use in subsequent requests. Each new scroll request refreshes the timeout period.

Step 2 - Retrieve next batch of documents

This endpoint retrieves the next batch of documents using the scrollId from the previous response. The request must maintain the same structure as the initial scroll request, with the addition of the scrollId.

The scroll.id parameter must contain the scrollId from the previous response
The scroll.timeout parameter refreshes the scroll context expiration time
The size parameter should match the initial request for consistency

Do not include the from parameter in scroll requests. The scroll context internally tracks the current position.

Endpoint

CODE

POST /search

Request

JSON

POST /search?token=4sTB2MjaRQsnUDjmXcgGEaSV4gAU7sUOXQyFdQDYTQ5Kr9zhB60MExibSZ6rwRGv

Content-Type: application/json

{
  "sort": {
    "createDate": {
      "order": "asc"
    }
  },
  "size": 1000,
  "scroll": {
    "id": "byDbPVsm8azWn8eveNDL4kvjnlW8be3txaYgpY2UeCAgySQIJJdpifhBaRvbh5yd",
    "timeout": 30
  }
}

Response

JSON

{
  "scrollId": "byDbPVsm8azWn8eveNDL4kvjnlW8be3txaYgpY2UeCAgySQIJJdpifhBaRvbh5yd",
  "results": [
    {
      "document": {
        "documentId": "58b03c60-2fad-4519-836d-5375346a65da",
        "documentTitle": "Project Proposal",
        "createDate": "2023-02-22T14:15:30Z",
        ...
      }
    },
    ...
  ],
  ...
}

The response contains the following important properties:

scrollId - Located at scrollId in the response.
- Example value: byDbPVsm8azWn8eveNDL4kvjnlW8be3txaYgpY2UeCAgySQIJJdpifhBaRvbh5yd

The response contains the next batch of documents and a scrollId for subsequent requests.

The scrollId may change between requests, always use the most recent one
Continue making scroll requests until you've processed all documents or the results array is empty
Each request refreshes the scroll context timeout

Implement error handling to manage cases where the scroll context expires. If this happens, you'll need to start a new scroll context.

Step 3 - Continue scrolling through documents

This endpoint demonstrates how to continue the scrolling process. You'll repeat this request multiple times until you've processed all matching documents or until the results array is empty.

Use the same request structure for all scroll operations
Always use the scrollId from the most recent response
Process each batch of documents as they are received

For large document collections, you may need to make many scroll requests. Consider implementing a loop that continues until an empty results array is received.

Endpoint

CODE

POST /search

Request

JSON

POST /search?token=4sTB2MjaRQsnUDjmXcgGEaSV4gAU7sUOXQyFdQDYTQ5Kr9zhB60MExibSZ6rwRGv

Content-Type: application/json

{
  "sort": {
    "createDate": {
      "order": "asc"
    }
  },
  "size": 1000,
  "scroll": {
    "id": "byDbPVsm8azWn8eveNDL4kvjnlW8be3txaYgpY2UeCAgySQIJJdpifhBaRvbh5yd",
    "timeout": 30
  }
}

Response

JSON

{
  "scrollId": "byDbPVsm8azWn8eveNDL4kvjnlW8be3txaYgpY2UeCAgySQIJJdpifhBaRvbh5yd",
  "results": [
    {
      "document": {
        "documentId": "e14273e7-8984-445c-8ba3-ee495043a49e",
        "documentTitle": "Contract Agreement",
        "createDate": "2023-03-10T11:45:22Z",
        ...
      }
    },
    ...
  ],
  ...
}

The response contains the following important properties:

scrollId - Located at scrollId in the response.
- Example value: byDbPVsm8azWn8eveNDL4kvjnlW8be3txaYgpY2UeCAgySQIJJdpifhBaRvbh5yd

The response contains another batch of documents. Continue making scroll requests until you've processed all documents or the results array is empty.

An empty results array indicates that all matching documents have been retrieved
The scrollId should be used for the next request if more documents need to be retrieved
Once all documents have been processed, the scroll context should be closed

Step 4 - Close the scroll context

Once you've finished processing all documents or no longer need the scroll context, it's important to explicitly close it to free up server resources. This endpoint terminates the scroll context identified by the scrollId.

Provide the scrollId from the most recent scroll response
Closing the scroll context frees up server resources
This step is important for system performance, especially when processing large document sets

Always implement proper cleanup in your integration code.

Endpoint

CODE

DELETE /search/scroll

Request

JSON

DELETE /search/scroll?token=4sTB2MjaRQsnUDjmXcgGEaSV4gAU7sUOXQyFdQDYTQ5Kr9zhB60MExibSZ6rwRGv

Content-Type: application/json

{
  "scrollId": "byDbPVsm8azWn8eveNDL4kvjnlW8be3txaYgpY2UeCAgySQIJJdpifhBaRvbh5yd"
}

Response

JSON

{
  "success": true
}

The scroll context has been successfully closed. The server has released all resources associated with this scroll operation.

If you need to process more documents after closing a scroll context, you'll need to initialize a new scroll operation.

Document Scrolling Summary

Document scrolling provides an efficient way to process large sets of documents that would be impractical to retrieve using standard pagination.

Key Concepts

Scroll Context: A server-side cursor that maintains your position in the result set
Scroll ID: A unique identifier for your scroll context that must be used in subsequent requests
Scroll Timeout: The period of inactivity after which the scroll context expires
Batch Size: The number of documents returned in each scroll response

Scrolling Process

1. Initialize a scroll context with search criteria and receive the first batch of documents
2. Process the documents in the current batch
3. Request the next batch using the scrollId from the previous response
4. Repeat steps 2-3 until all documents are processed or the results array is empty
5. Close the scroll context to free up server resources

Example Implementation

See our OpenAPI documentation to learn about the full set of API endpoints and parameters.

Please use proper exception handling and function decomposition in your own code. The code is provided for illustrative purposes only and is not intended for production use.

JAVASCRIPT

// Document scrolling example
const URL = "https://sandbox.circularo.com";
const API_PATH = "/api/v1";
const TOKEN = "YOUR_AUTH_TOKEN"; // Obtained from login or API key

// Function to process each batch of documents
function processDocumentBatch(documents) {
    for (const doc of documents) {
        // Process each document as needed
        console.log(`Processing document: ${doc.document.documentId} - ${doc.document.documentTitle}`);

        // Implement your document processing logic here
    }
}

try {
    // Step 1: Initialize scroll context
    let scrollId;
    const batchSize = 1000;
    const scrollTimeout = 30; // seconds

    // Initial scroll request
    const initialResponse = await fetch(`${URL}${API_PATH}/search?token=${TOKEN}`, {
        method: 'POST',
        headers: { 'Content-Type': 'application/json' },
        body: JSON.stringify({
            // You can add any search filters here
            sort: {
                createDate: { order: 'asc' }
            },
            size: batchSize,
            scroll: {
                timeout: scrollTimeout
            }
        })
    });
    if (!initialResponse.ok) {
        throw new Error(`Failed to initialize scroll: ${initialResponse.status} ${initialResponse.statusText}`);
    }

    let responseData = await initialResponse.json();
    scrollId = responseData.scrollId;

    // Step 2: Process the batch and continue scrolling until all documents are processed
    while (responseData.results.length > 0) {
        processDocumentBatch(responseData.results);

        const scrollResponse = await fetch(`${URL}${API_PATH}/search?token=${TOKEN}`, {
            method: 'POST',
            headers: { 'Content-Type': 'application/json' },
            body: JSON.stringify({
                sort: {
                    createDate: { order: 'asc' }
                },
                size: batchSize,
                scroll: {
                    id: scrollId,
                    timeout: scrollTimeout
                }
            })
        });
        if (!scrollResponse.ok) {
            throw new Error(`Scroll request failed: ${scrollResponse.status} ${scrollResponse.statusText}`);
        }

        responseData = await scrollResponse.json();
        scrollId = responseData.scrollId;
    }

    console.log('Document processing complete');

} catch (error) {
    console.error('Error processing documents:', error.message);

} finally {
    // Step 3: Always try to close the scroll context if we have a scrollId, even if processing failed
    if (scrollId) {
        try {
            await fetch(`${URL}${API_PATH}/search/scroll?token=${TOKEN}`, {
                method: 'DELETE',
                headers: { 'Content-Type': 'application/json' },
                body: JSON.stringify({
                    scrollId: scrollId
                })
            });
        } catch (closeError) {
            console.error('Failed to close scroll context:', closeError.message);
        }
    }
}