I have being using incremental loading for the heavy data set on spice. Moreover while loading it with respect to the date time stamp for a refresh time for 30min, with 1 day window as describe below.
Incremental refresh configuration Date column: CreatedAt Window size: 1 day
Now in every loading , some duplicates rows are getting added in the same dataset. This is making a major issue with the increase in number of loading.
Incremental refresh works on identifying 2 types of rows. So during initial full refresh after setting up the configuration, rows that are before CreatedAt are identified as immutable rows and rows after identified as mutable rows. As part of full refresh we store the lookback window (current time - 1 day, in this case).
With every subsequent incremental ingestion, we query all the rows identified since last look back window. We then compare each row timing with current lookback window. If it before the current lookback window its marked as immutable and rows that have createAt after lookback window are marked as mutable.
If there are updates in the rows that fall beyond the calculated lookback window, those will be appended (immutable rows). Rows that fall after lookback window they will be replace.
So you may want to set right lookback window based on how far your row updates. Try to set to 2 days and see if it fixes your issue
Hi! I have the same issue. But the problem is - it works right the same Quicksight describe it in their docs. And the way it works looks a but stupid for me (may be I am stupid? ) : " An incremental refresh queries only data defined by the dataset within a specified look-back window. It transfers all insertions, deletions, and modifications to the dataset, within that window’s timeframe, from its source to the dataset. The data currently in SPICE that’s within that window is deleted and replaced with the updates."
The problem is when I update data in my existing rows in my original table and change updated_at time stamp, incremental refresh ADDs this rows to the destination SPICE dataset, as soon as it doesn’t have any key row (like ID) for update, and it can’t delete those rows as “data currently in SPICE that’s within that window” as soon as in SPICE they have OLD timestamp and they are NOT within that window. I personally don’t understand this solution - it makes Incremental Update DOES NOT work for me. I will be glad if somebody can help me to use it.
Not sure if you were able to resolve this but if the date column is also being updated as part of you row updates, you will see duplicates in Spice. This is due to the fact that the timestamp stored by Quicksight to lookup later during previous incremental refresh is changed (with update). The suggestion is to use a column that is not updated (e.g. createdAtDate instead of lastUpdatedDate)
Hi @adi1994 In case you need them, here are the steps to open a support case. If your company has someone who manages your AWS account, you might not have direct access to AWS Support and will need to raise an internal ticket to your IT team or whomever manages your AWS account. They should be able to open an AWS Support case on your behalf. Hope this helps!