Socotra
Feature GuideReporting

Data Lake Delta Files

Socotra supports the replication of Data Lake tables to your own data infrastructure via a series of delta files. Appropriately permissioned API clients can list and retrieve the relevant files via a pair of endpoints optimized for programmatic consumption.

Getting Started with Data Lake Delta Files

Delta File generation is not enabled by default. Contact your Socotra representative for onboarding details.

Overview

Consuming the Socotra Data Lake Delta file API involves a recursive, two-step process:

  1. Get an index of available files for a given table.
  2. Retrieve the necessary individual files.

The delta files are provided in sql or csv format, containing all requisite upsert statements (for sql) or updated records (for csv) to replicate table records in the correct format and order.

Delta files in csv format follow :rfc:4180 formatting, with comma delimiters and standard quote escaping for fields containing special characters, and null values are represented as empty fields.

In order to ensure a complete and accurate replication, all delta files for a given table's latest schema version must be consumed, and in the order in which they are presented within the index.

While each delta file enumerated in the index array will include metadata related to the generation of that file's contents, it is not recommended to rely on that metadata to derive the correct order of consumption. The system handles and guarantees this via the ordering of the files in the index array.

The maximum size of a delta file is 1000 statements (for sql) or 1000 rows (for csv).

There may be more delta files available than can be returned in a single index response. See the section on pagination below.

Delta files are generated at most once every two hours following Data Lake updates. If an update occurs within two hours of the previous delta file generation, the system will generate the next set once the interval has elapsed. This two-hour interval is configurable by environment.

Schema Versioning

Since the schema of any source Data Lake table may evolve over time, each table consumed via the Delta File API has a corresponding version number. The version number is a sequentially incrementing integer.

When a source table schema update occurs, a new schema version is automatically made available in the Delta File API, and all historical data is regenerated into the newest version. Historical versions will remain available, but updated delta files will not be generated for it.

For each table's schema version, files containing the requisite drop and create statements are also provided.

On first consumption of the Delta File API, the create statement will be needed. The drop and create files will be used in sequence when a new table schema version becomes available.

Pagination

The number of files available for a specific table and schema version may vary based on data volume and growth rate. The Index API response is limited to 100 Delta files per request. If more than 100 files exist, the consuming client must paginate through results.

To ensure complete indexing, use the lastFile parameter in the Delta File Index API request to continue retrieving additional files beyond the initial response.

Client Example

A sample client implementation, illustrating how to consume the API programmatically, is available upon request.

File Index API

Clients can retrieve an index of the available files for a particular Data Lake table using the Fetch List of Delta Files endpoint.

DeltaFilesGetRequest

Required properties

PropertyTypeDescription
tenantLocatoruuid
transformationTableEnum DataLakeAccountDataExtensions | DataLakeAccounts | DataLakeAffectedTransactions | DataLakeAuxData | DataLakeBillingHolds | DataLakeClaimDataExtensions | DataLakeClaims | DataLakeCreditDistributions | DataLakeCreditItems | DataLakeDelinquencies | DataLakeDelinquencyReferences | DataLakeDiaries | DataLakeDisbursementDataExtensions | DataLakeDisbursements | DataLakeFaTransactionAccountLines | DataLakeFaTransactions | DataLakeFnolDataExtensions | DataLakeFnols | DataLakeInstallmentItems | DataLakeInstallments | DataLakeInstallmentSettings | DataLakeInvoiceItems | DataLakeInvoices | DataLakeLedgerAccountLineItems | DataLakeLedgerAccounts | DataLakeMoratoriumElections | DataLakeMoratoriums | DataLakeMoratoriumStatuses | DataLakePaymentDataExtensions | DataLakePayments | DataLakePolicies | DataLakePolicyAutoRenewals | DataLakePolicyCoverageTerms | DataLakePolicyDataExtensions | DataLakePolicyElementCharges | DataLakePolicyElements | DataLakePolicyElementTree | DataLakePolicyElementUnderwritingFlags | DataLakePolicyPreferences | DataLakePolicySegments | DataLakePolicyStatuses | DataLakePolicyTerms | DataLakePolicyTransactionChangeInstructions | DataLakePolicyTransactions | DataLakeProducerCodeDataExtensions | DataLakeProducerCodes | DataLakeProducerDataExtensions | DataLakeProducerHierarchy | DataLakeProducers | DataLakeQuoteCoverageTerms | DataLakeQuoteDataExtensions | DataLakeQuoteElementCharges | DataLakeQuoteElements | DataLakeQuoteElementTree | DataLakeQuoteElementUnderwritingFlags | DataLakeQuotes | DataLakeTaskReferences | DataLakeTasks | DataLakeUserAssociations | DataLakeUserQualifications | DataLakeWriteOffs

Optional properties

PropertyTypeDescription
startTimeinteger?
dataProcessedThroughTimeinteger?
deltaFileTypeEnum? sql | csv
lastFilestring?
versioninteger?

Sample DeltaFilesGetRequest

{
    // required
    "tenantLocator": "b6f8aa30-b978-4934-bef3-627XXXXXXXXX",
    "transformationTable":"DataLakeInvoices"

    // optional
    "deltaFileType": "csv",
    "version": 0,

    // optional, mutually exclusive
    // "startTime":1734542240221,
    "lastFile": "DataLakeInvoices/version_0/2025/March/b6f8aa30-b978-4934-bef3-627b0e6edd88_DataLakeInvoices_1451606100_1741713134734.csv",
}
Request PropertyTypeDescription
tenantLocatorULIDLocator of source tenant
transformationTablestringName of the desired Data Lake table
deltaFileTypeenum (sql, csv)Optional; defaults to sql
versionintTarget a specific schema version; defaults to latest if omitted
startTimeUnix timestamp (UTC ms)Files in returned index will all have a generationTime later than startTime
lastFilestringOnly files after this file in the index will be returned
DeltaFilesGetResponse

Required properties

PropertyTypeDescription
createTableFilestring
dataProcessedThroughTimeinteger
dropTableFilestring
s3Bucketstring
versioninteger
deltaFilesDeltaFile[]
DeltaFile

Required properties

PropertyTypeDescription
fileNamestring
deltaFileTypeEnum sql | csv
generationTimeinteger
jobEndTimeinteger
jobStartTimeinteger

Optional properties

PropertyTypeDescription
md5HashSumstring?
recordCountinteger?

Sample DeltaFilesGetResponse

{
    "version": 0,
    "createTableFile": "DataLakeInvoices/version_0/createTable.sql",
    "dropTableFile": "DataLakeInvoices/version_0/dropTable.sql",
    "s3Bucket": "socotra-kernel-develop-dm-delta",
    "deltaFiles": [
        {
            "deltaFileType": "csv",
            "fileName": "DataLakeInvoices/version_0/2025/March/b6f8aa30-b978-4934-bef3-627b0e6edd88_DataLakeInvoices_1451606100_1741713134934.sql",
            "jobStartTime": 1451606100,
            "jobEndTime": 1741582882,
            "generationTime": 1741713134934,
            "recordCount": 1000,
            "md5HashSum": "a1b2c3d4e5f67890abcdef1234567890"
        },
        {
            "deltaFileType": "csv",
            "fileName": "DataLakeInvoices/version_0/2025/March/b6f8aa30-b978-4934-bef3-627b0e6edd88_DataLakeInvoices_1451606100_1741713135163.sql",
            "jobStartTime": 1451606100,
            "jobEndTime": 1741582882,
            "generationTime": 1741713135163,
            "recordCount": 1000,
            "md5HashSum": "8d4a2f9c1e7b3d5a0f6c8e2b4a9d1f7c"
        },
        {
            "deltaFileType": "csv",
            "fileName": "DataLakeInvoices/version_0/2025/March/b6f8aa30-b978-4934-bef3-627b0e6edd88_DataLakeInvoices_1451606100_1741713135490.sql",
            "jobStartTime": 1451606100,
            "jobEndTime": 1741582882,
            "generationTime": 1741713135490,
            "recordCount": 198,
            "md5HashSum": "e3b0c44298fc1c149afbf4c8996fb924"
        }
    [
{
Request PropertyTypeDescription
versionintTarget a specific schema version, defaults to latest if omitted
createTableFilestringPath & name of file with necessary sql statement to create the table in the destination schema
dropTableFilestringPath & name of file with necessary sql statement to drop the existing version of the table in the destination schema
s3BucketstringThe source S3 bucket required for the file retrieval request
deltaFilesstring arrayThe index of individual files

File Retrieval API

Clients can download each individual Delta File using the Fetch Specific Delta File endpoint. The response will be a streamed file StreamingResponseBody<string>.

DeltaFileDownloadRequest

Required properties

PropertyTypeDescription
tenantLocatoruuid
fileNamestring
s3Bucketstring

See Also

On this page