Duplicates rows are being created via Incremental loading on data set

Hello Guys,

I have being using incremental loading for the heavy data set on spice. Moreover while loading it with respect to the date time stamp for a refresh time for 30min, with 1 day window as describe below.

Incremental refresh configuration
Date column: CreatedAt
Window size: 1 day

Now in every loading , some duplicates rows are getting added in the same dataset. This is making a major issue with the increase in number of loading.

Is there any way around to neglect this issue?

Thanks in Advance :slight_smile:

Hi,

you can try with hours option 24 hours.

1 Like

Thanks for the response, i will surely check and update :wink:

Hi Naveed, The same is not working !! , Still duplicates are observed

Hi All,

Can you please check my questions and comment if anyone knows the answers.

Thanks in advance :slight_smile:

Hi,
can you Please verify your data have no duplicates ?

Hi,

Incremental refresh works on identifying 2 types of rows. So during initial full refresh after setting up the configuration, rows that are before CreatedAt are identified as immutable rows and rows after identified as mutable rows. As part of full refresh we store the lookback window (current time - 1 day, in this case).
With every subsequent incremental ingestion, we query all the rows identified since last look back window. We then compare each row timing with current lookback window. If it before the current lookback window its marked as immutable and rows that have createAt after lookback window are marked as mutable.

If there are updates in the rows that fall beyond the calculated lookback window, those will be appended (immutable rows). Rows that fall after lookback window they will be replace.

So you may want to set right lookback window based on how far your row updates. Try to set to 2 days and see if it fixes your issue

2 Likes

Hi! I have the same issue. But the problem is - it works right the same Quicksight describe it in their docs. And the way it works looks a but stupid for me (may be I am stupid? ) : " An incremental refresh queries only data defined by the dataset within a specified look-back window. It transfers all insertions, deletions, and modifications to the dataset, within that window’s timeframe, from its source to the dataset. The data currently in SPICE that’s within that window is deleted and replaced with the updates."
The problem is when I update data in my existing rows in my original table and change updated_at time stamp, incremental refresh ADDs this rows to the destination SPICE dataset, as soon as it doesn’t have any key row (like ID) for update, and it can’t delete those rows as “data currently in SPICE that’s within that window” as soon as in SPICE they have OLD timestamp and they are NOT within that window. I personally don’t understand this solution - it makes Incremental Update DOES NOT work for me. I will be glad if somebody can help me to use it.

1 Like

Not sure if you were able to resolve this but if the date column is also being updated as part of you row updates, you will see duplicates in Spice. This is due to the fact that the timestamp stored by Quicksight to lookup later during previous incremental refresh is changed (with update). The suggestion is to use a column that is not updated (e.g. createdAtDate instead of lastUpdatedDate)

I am using static timestamp column , then also it is under the same error.

I guess this need to get fix from quicksight technical teams end. How we can raise this issue to the relevant team , any idea??

Moreover, bcs of this is very hard to use this loading.

Hi @adi1994 In case you need them, here are the steps to open a support case. If your company has someone who manages your AWS account, you might not have direct access to AWS Support and will need to raise an internal ticket to your IT team or whomever manages your AWS account. They should be able to open an AWS Support case on your behalf. Hope this helps!