Better understand the retention policy options for Datameer artifacts.
The retention policy in Datameer allows to configure the following parameters:
- Keep the last N results (regardless of their age).
- Purge results older than N days.
- Purge results older than N days, but keep the last N results.
- Never delete historical data.
- ExportOnly (for Workbooks).
A corresponding configuration is stored in the
dap_job_configuration table under the columns
The possible combinations of values in these columns (respectively) are:
N / NULL - Keep the last N results (Purge results older than N days is empty).
NULL / N - Purge results older than N days (Keep the last N results is empty).
N / N - Purge results older than N days, but keep the last N results.
NULL / NULL - Never remove historical data.
0 / 1 - Export Only.
The query allows us to view all artifacts tighter with their retention policy and use the WHERE clause to filter the desired results.
SELECT dap_job_configuration.id ConfID, dap_file.name Name, CASE dap_file.extension WHEN 'IMPORT_LINK_JOB_EXTENSION' THEN 'Data Link' WHEN 'IMPORT_JOB_EXTENSION' THEN 'Import Job' WHEN 'WORKBOOK_EXTENSION' THEN 'Workbook' WHEN 'EXPORT_JOB_EXTENSION' THEN 'Export Job' END Type, permission.owner Owner, dap_file.creation_date CreationTime, dap_job_configuration.min_keep_count KeptResults, dap_job_configuration.expire_time_days PurgeResultsAfter, dap_file.id FileID FROM dap_job_configuration JOIN dap_file ON dap_job_configuration.dap_file__id = dap_file.id JOIN permission ON dap_file.permission_fk = permission.id;
With the option,
Append with sliding time window you could set the following retention policies:
- Use only the
Expire afterfield. Datameer will keep records ingested by an import job during the last N days/weeks/month. Older records will be removed. For example, in case you set
5 daysand will run the import job daily that ingests 10 records every day, Datameer will keep only 50 recent records each time. At the moment, this option is impacted by a bug reported under DAP-36437 (workaround is set
Keep last N resultsparameter together with
- Use only the
Keep last N resultsfield. Datameer will keep records ingested by the import job during the last N executions, regardless of the time. For example, in case you set
Keep last N resultsto
5, Datameer will keep data imported during the previous 5 job executions, irrespective of whether the job has run 5 times in 1 hour or 1 week. Please note that executions that don't import any records are also being considered. If there would be no new data added into the source table and the ImportJob will be executed 5 times, no records will be stored and this moment.
- Use both
Keep last N resultsparameters at the same time. This gives additional flexibility and allows to ensure that N results will still be stored, even if some of them are expired, e.g., in case you pause at the ingestion but still want to use previously ingested data.
When a Workbook is configured with ExportOnly retention policy, the data object it creates immediately gets status 1 (marked_for_deletion) at data table. Thereby it becomes a subject of the Housekeeping service right away. It will be removed during the next Housekeeping round, after a subsequent ExportJob completion.