Data Lake Delta Files
Socotra supports the replication of Data Lake tables to your own data infrastructure via a series of delta files. Appropriately permissioned API clients can list and retrieve the relevant files via a pair of endpoints optimized for programmatic consumption.
Getting Started with Data Lake Delta Files
Note
Delta File generation is not enabled by default. Contact your Socotra representative for onboarding details.
Overview
Consuming the Socotra Data Lake Delta file API involves a recursive, two-step process:
Get an index of available files for a given table.
Retrieve the necessary individual files.
The delta files are provided in sql or csv format, containing all requisite upsert statements (for sql) or updated records (for csv) to replicate table records in the correct format and order.
Delta files in csv format follow RFC 4180 formatting, with comma delimiters and standard quote escaping for fields containing special characters, and null values are represented as empty fields. Moratorium reporting is not supported in csv file format and will be available in a future release.
In order to ensure a complete and accurate replication, all delta files for a given table’s latest schema version must be consumed, and in the order in which they are presented within the index.
While each delta file enumerated in the index array will include metadata related to the generation of that file’s contents, it is not recommended to rely on that metadata to derive the correct order of consumption. The system handles and guarantees this via the ordering of the files in the index array.
The maximum size of a delta file is 1000 statements (for sql) or 1000 rows (for csv).
Note
There may be more delta files available than can be returned in a single index response. See the section on pagination below.
Delta files are generated at most once every two hours following Data Lake updates. If an update occurs within two hours of the previous delta file generation, the system will generate the next set once the interval has elapsed. This two-hour interval is configurable by environment.
Schema Versioning
Since the schema of any source Data Lake table may evolve over time, each table consumed via the Delta File API has a corresponding version number. The version number is a sequentially incrementing integer.
When a source table schema update occurs, a new schema version is automatically made available in the Delta File API, and all historical data is regenerated into the newest version. Historical versions will remain available, but updated delta files will not be generated for it.
For each table’s schema version, files containing the requisite drop and create statements are also provided.
On first consumption of the Delta File API, the create statement will be needed. The drop and create files will be used in sequence when a new table schema version becomes available.
Pagination
The number of files available for a specific table and schema version may vary based on data volume and growth rate. The Index API response is limited to 100 Delta files per request. If more than 100 files exist, the consuming client must paginate through results.
To ensure complete indexing, use the lastFile parameter in the Delta File Index API request to continue retrieving additional files beyond the initial response.
Client Example
A sample client implementation, illustrating how to consume the API programmatically, will be available in a future release.
File Index API
Clients can retrieve an index of the available files for a particular Data Lake table using the Fetch List of Delta Files endpoint.
requiredtenantLocator uuidtransformationTable Enum DataLakeAccountDataExtensions | DataLakeAccounts | DataLakeAuxData | DataLakeBillingHolds | DataLakeClaimDataExtensions | DataLakeClaims | DataLakeCreditDistributions | DataLakeCreditItems | DataLakeDelinquencies | DataLakeDisbursementDataExtensions | DataLakeDisbursements | DataLakeFaTransactions | DataLakeFnolDataExtensions | DataLakeFnols | DataLakeInstallmentItems | DataLakeInstallments | DataLakeInvoiceItems | DataLakeInvoices | DataLakeLedgerAccounts | DataLakeMoratoriumReports | DataLakePaymentDataExtensions | DataLakePayments | DataLakePolicies | DataLakePolicyAutoRenewals | DataLakePolicyCoverageTerms | DataLakePolicyDataExtensions | DataLakePolicyElementCharges | DataLakePolicyElements | DataLakePolicyElementTree | DataLakePolicyElementUnderwritingFlags | DataLakePolicyPreferences | DataLakePolicySegments | DataLakePolicyTerms | DataLakePolicyTransactionChangeInstructions | DataLakePolicyTransactions | DataLakeQuoteCoverageTerms | DataLakeQuoteDataExtensions | DataLakeQuoteElementCharges | DataLakeQuoteElements | DataLakeQuoteElementTree | DataLakeQuoteElementUnderwritingFlags | DataLakeQuotes | DataLakeWriteOffsoptionalstartTime long?deltaFileType Enum? sql | csvlastFile string?version int?
Sample DeltaFilesGetRequest
{
// required
"tenantLocator": "b6f8aa30-b978-4934-bef3-627XXXXXXXXX",
"transformationTable":"DataLakeInvoices"
// optional
"deltaFileType": "sql",
"version": 0,
// optional, mutually exclusive
// "startTime":1734542240221,
"lastFile": "DataLakeInvoices/version_0/2025/March/b6f8aa30-b978-4934-bef3-627b0e6edd88_DataLakeInvoices_1451606100_1741713134734.sql",
}
Request Property |
Type |
Description |
|---|---|---|
tenantLocator |
ULID |
Locator of source tenant |
transformationTable |
string |
Name of the desired Data Lake table |
deltaFileType |
enum ( |
Optional; defaults to |
version |
int |
Target a specific schema version; defaults to latest if omitted |
startTime |
Unix timestamp (UTC ms) |
Files in returned index will all have a |
lastFile |
string |
Only files after this file in the index will be returned |
requiredcreateTableFile stringdropTableFile strings3Bucket stringversion intdeltaFiles DeltaFile[]
requiredfileName stringdeltaFileType Enum sql | csvgenerationTime longjobEndTime longjobStartTime long
Sample DeltaFilesGetResponse
{
"version": 0,
"createTableFile": "DataLakeInvoices/version_0/createTable.sql",
"dropTableFile": "DataLakeInvoices/version_0/dropTable.sql",
"s3Bucket": "socotra-kernel-develop-dm-delta",
"deltaFiles": [
{
"deltaFileType": "sql",
"fileName": "DataLakeInvoices/version_0/2025/March/b6f8aa30-b978-4934-bef3-627b0e6edd88_DataLakeInvoices_1451606100_1741713134934.sql",
"jobStartTime": 1451606100,
"jobEndTime": 1741582882,
"generationTime": 1741713134934
},
{
"deltaFileType": "sql",
"fileName": "DataLakeInvoices/version_0/2025/March/b6f8aa30-b978-4934-bef3-627b0e6edd88_DataLakeInvoices_1451606100_1741713135163.sql",
"jobStartTime": 1451606100,
"jobEndTime": 1741582882,
"generationTime": 1741713135163
},
{
"deltaFileType": "sql",
"fileName": "DataLakeInvoices/version_0/2025/March/b6f8aa30-b978-4934-bef3-627b0e6edd88_DataLakeInvoices_1451606100_1741713135490.sql",
"jobStartTime": 1451606100,
"jobEndTime": 1741582882,
"generationTime": 1741713135490
}
[
{
Request Property |
Type |
Description |
|---|---|---|
version |
int |
Target a specific schema version, defaults to latest if omitted |
createTableFile |
string |
Path & name of file with necessary sql statement to create the table in the destination schema |
dropTableFile |
string |
Path & name of file with necessary sql statement to drop the existing version of the table in the destination schema |
s3Bucket |
string |
The source S3 bucket required for the file retrieval request |
deltaFiles |
string array |
The index of individual files |
File Retrieval API
Clients can download each individual Delta File using the Fetch Specific Delta File endpoint. The response will be a streamed file StreamingResponseBody<string>.
requiredtenantLocator uuidfileName strings3Bucket string