If the behaviour of your datasets is similar to my explanation below , then as @DeepakS said, a periodic full refresh is required to avoid duplication.
Example on incremental refresh
Incremental refresh process creates a checkpoint on mutable and immutable records.
SPICE Dataset on 26/07/2022 [Consider incremental refresh has run on 26/07/2022, lookback window of 2 days on column Last Update Date]
ID |
Create Date |
Last Update Date |
Mutable/Immutable |
1 |
7/21/2022 |
7/21/2022 |
Immutable |
2 |
7/25/2022 |
7/25/2022 |
Mutable |
3 |
7/25/2022 |
7/25/2022 |
Mutable |
4 |
7/26/2022 |
7/26/2022 |
Mutable |
Source Table 27/07/2022 [Lets say Redshift]
Record ID 5 is a new record inserted
Record ID 1 is an old record updated
ID |
Create Date |
Last Update Date |
1 |
7/21/2022 |
7/27/2022 |
2 |
7/25/2022 |
7/25/2022 |
3 |
7/25/2022 |
7/25/2022 |
4 |
7/26/2022 |
7/26/2022 |
5 |
7/27/2022 |
7/27/2022 |
The checkpoint meaning when the previous incremental refresh executed for QuickSight is ( 26/07/2022 )
When an incremental refresh call is executed on 27/07/2022 , it will bring all records > 24rd July 2022
26-07-2022 - 2 days = 24-07-2022
SPICE Dataset on 27/07/2022
ID |
Create Date |
Last Update Date |
Mutable/Immutable |
1 |
7/21/2022 |
7/21/2022 |
Immutable |
2 |
7/25/2022 |
7/25/2022 |
Mutable |
3 |
7/25/2022 |
7/25/2022 |
Mutable |
4 |
7/26/2022 |
7/26/2022 |
Mutable |
1 |
7/21/2022 |
7/27/2022 |
Mutable |
5 |
7/27/2022 |
7/27/2022 |
Mutable |
Note : Duplication for Record ID 1 exists because the lookback window is for 2 days. The Record ID 1 falls outside of the lookback window , hence the updated record id in source system is appended as a new record in SPICE ( the existing record is not updated ) .
The new checkpoint for QuickSight is (27/07/2022)
What is not happening ( Key matching ) :
Example Source Table 27/07/2022: Record ID 1 which was created on 21st July 2022 and had an update on 27th july
A typical database merge query would be
Select * from source_table where last_update_date >= 2 days ( dateadd ( day , -2 , getdate () )
when matched then update
when not matched then insert
when not matched by source then delete
What is actually happening for SPICE ( No key matching ) :
Get all records within the lookback window
Delete all mutable records from 24-07-2022 and after , append newly queried data.
Here Record ID 1 with last update date (27/07/2022) is a new row . The existing Record ID 1 with last update date ( 21/07/2022 ) is outside of the lookback window.
Regards,
Koushik