Data synced in SPICE memory using greatest function though incremental refresh duplicates old data instead of overriding it for latest updates

If the behaviour of your datasets is similar to my explanation below , then as @DeepakS said, a periodic full refresh is required to avoid duplication.

Example on incremental refresh

Incremental refresh process creates a checkpoint on mutable and immutable records.

SPICE Dataset on 26/07/2022 [Consider incremental refresh has run on 26/07/2022, lookback window of 2 days on column Last Update Date]

ID Create Date Last Update Date Mutable/Immutable
1 7/21/2022 7/21/2022 Immutable
2 7/25/2022 7/25/2022 Mutable
3 7/25/2022 7/25/2022 Mutable
4 7/26/2022 7/26/2022 Mutable

Source Table 27/07/2022 [Lets say Redshift]
Record ID 5 is a new record inserted
Record ID 1 is an old record updated

ID Create Date Last Update Date
1 7/21/2022 7/27/2022
2 7/25/2022 7/25/2022
3 7/25/2022 7/25/2022
4 7/26/2022 7/26/2022
5 7/27/2022 7/27/2022

The checkpoint meaning when the previous incremental refresh executed for QuickSight is ( 26/07/2022 )
When an incremental refresh call is executed on 27/07/2022 , it will bring all records > 24rd July 2022
26-07-2022 - 2 days = 24-07-2022

SPICE Dataset on 27/07/2022

ID Create Date Last Update Date Mutable/Immutable
1 7/21/2022 7/21/2022 Immutable
2 7/25/2022 7/25/2022 Mutable
3 7/25/2022 7/25/2022 Mutable
4 7/26/2022 7/26/2022 Mutable
1 7/21/2022 7/27/2022 Mutable
5 7/27/2022 7/27/2022 Mutable

Note : Duplication for Record ID 1 exists because the lookback window is for 2 days. The Record ID 1 falls outside of the lookback window , hence the updated record id in source system is appended as a new record in SPICE ( the existing record is not updated ) .

The new checkpoint for QuickSight is (27/07/2022)

What is not happening ( Key matching ) :
Example Source Table 27/07/2022: Record ID 1 which was created on 21st July 2022 and had an update on 27th july
A typical database merge query would be
Select * from source_table where last_update_date >= 2 days ( dateadd ( day , -2 , getdate () )

when matched then update
when not matched then insert
when not matched by source then delete

What is actually happening for SPICE ( No key matching ) :
Get all records within the lookback window
Delete all mutable records from 24-07-2022 and after , append newly queried data.
Here Record ID 1 with last update date (27/07/2022) is a new row . The existing Record ID 1 with last update date ( 21/07/2022 ) is outside of the lookback window.

Regards,
Koushik

2 Likes