# Data Anonymization



<Callout type="warn">
  This feature is currently in beta and may be subject to change. Before using it in production, please contact your Socotra representative.
</Callout>

Data anonymization is the process of permanently removing personally identifiable information (PII) from a software system to protect an individual's identity. This process is often necessary to comply with regulations such as the General Data Protection Regulation (GDPR) and the California Delete Act. Currently, only [extension](/configuration/data-extensions/overview) `data` fields can be anonymized.

Data anonymization permanently removes PII from a software system, unlike [data masking](/features/security/data-masking), which hides PII from users without permanently removing the data.

<span id="anonymization-rules" />

Rules [#rules]

Socotra enforces a set of rules when processing data anonymization requests to maintain operational integrity. Data anonymization requests may be partially processed. This means that any entities that can't be anonymized as a result of a rule violation will not be anonymized, while all other entities in the request will be successfully anonymized.

Anonymization rules are evaluated using an entity hierarchy. [Accounts](/features/accounts) are top-level parent entities. Accounts can have one or more child entities, such as policies, and child entities can have their own child entities, forming a [tree structure ](https://en.wikipedia.org/wiki/Tree_%28abstract_data_type%29).

When an anonymization request is processed, the system will attempt to anonymize data in the specified entities in addition to all of their child entities and their descendants.

Accounts can only be anonymized if all of their child entities and descendants have already been successfully anonymized. Accounts can't be anonymized if they have at least one policy in the `onRisk` state.

Policies in the `onRisk` state can't be anonymized. Only policies in the `expired` or `cancelled` state can be anonymized.

Quotes in the issued state can't be anonymized if the resulting policy is still in the `onRisk` state. Quotes in the `accepted` state can be anonymized by default. The `includeAcceptedQuotes` flag can be set to `false` in the [anonymization request](#submitting-anonymization-request) to prevent the anonymization of quotes in the `accepted` state. Quotes can bypass all anonymization rules if the quotes are specified explicitly in the anonymization request.

If a parent entity can't be anonymized, then its child entities and their descendants also can't be anonymized.

Configuration [#configuration]

Data anonymization can be enabled at the tenant level by setting the `enableEntityAnonymization` flag to `true` at the top level of the tenant [configuration](/configuration/general-topics/deployment). Anonymization [requests](#submitting-anonymization-request) will fail if the `enableEntityAnonymization` flag is set to `false` or if this configuration is not provided.

For example:

```json
{
	"enableEntityAnonymization": true
}
```

Data anonymization can be enabled for each [extension](/configuration/data-extensions/overview) `data` field in the tenant configuration by setting the `anonymizable` flag to `true` in the <ApiLink name="RestrictedDataRef" /> configuration object for the target `data` field.

For example:

```json
{
	"data": {
		"ssn": {
			"type": "string?",
			"restrictedData": {
				"anonymizable": true
			}
		}
	}
}
```

<Callout type="warn">
  Anonymization only affects policies and accounts created after an anonymization configuration has been deployed. Contact your Socotra representative if pre-existing entities need to be anonymized.
</Callout>

Formatting Anonymized Values [#formatting-anonymized-values]

By default, the appearance of anonymized values depends on the field type:

```
string -> *****
guid -> *****
int -> -2147483648,
long -> -9223372036854775808,
date -> -999999999-01-01T00:00:00
datetime -> -999999999-01-01T00:00:00+18:00
```

Anonymized values of the same type will always be displayed the same way, regardless of how long the value is. For instance, two different anonymized integers, `1` and `1000`, will both be displayed as `-2147483648` by default.

The `value` field in the <ApiLink name="RestrictedDataRef" /> object can be used to override the default appearance of anonymized values. This configuration overrides the appearance of both anonymized values and [masked values](/features/security/data-masking).

For example:

```json
{
	"data": {
		"ssn": {
			"type": "string?",
			"restrictedData": {
				"anonymizable": true,
				"value": {
					"string": "***-**-****"
				}
			}
		}
	}
}
```

<span id="submitting-anonymization-request" />

Submitting an Anonymization Request [#submitting-an-anonymization-request]

Submit an anonymization request using the <ApiLink name="anonymizeData" /> API endpoint to anonymize entity data based on the anonymization [rules](#anonymization-rules) and the currently [deployed configuration](/api/configuration-and-development/deployments) for the tenant associated with the target entities. Only [admins](/features/security/roles-and-permissions#special_roles) can submit an anonymization request.

Make sure to deploy any necessary configuration changes before submitting an anonymization request.

<Callout type="warn">
  Data anonymization is an irreversible process. Please exercise caution before anonymizing data.
</Callout>

The `references` field can be used to specify one or more entity locators to anonymize. The `includeAcceptedQuotes` flag can be set to `false` to prevent the anonymization of quotes in the `accepted` state. The `policyStatuses` field can be used to restrict the anonymization of policies to policies with one of the specified statuses.

<ApiSchema name="AnonymizationRequest" />

For example:

```json
{
	"references": {
		"policy": ["1E9MFx5h9DGw1H"]
	},
	"includeAcceptedQuotes": true,
	"policyStatuses": ["expired", "cancelled"]
}
```

Preview the effects of your anonymization request through the <ApiLink name="previewAnonymization" /> API endpoint.

<Callout>
  There may be a delay before the [Anonymization API](/api/configuration-and-development/anonymization) recognizes newly created entities.
</Callout>

Data Lake [#data-lake]

Anonymized data will appear in anonymized form in [Data Lake](/features/reporting/datalake). The time of anonymization will be recorded in the `anonymized_time_utc` column for each applicable record in the corresponding entity and parent entity tables.

Next Steps [#next-steps]

* [Audit Logging](/features/security/audit-logging)

See Also [#see-also]

* [Anonymization API](/api/configuration-and-development/anonymization)
* [Data Access Controls](/features/security/data-access-controls)
* [Data Masking](/features/security/data-masking)
* [Configuration Deployment](/configuration/general-topics/deployment)
* [Configuration Deployments API](/api/configuration-and-development/deployments)


## API Reference

AnonymizationRequest
Properties:
  references (map<string, ulid[]>, required)
  includeAcceptedQuotes (boolean)
  policyStatuses (Enum[])