Duplicates rows are being created via Incremental loading on data set

adi1994 · March 16, 2022, 3:30pm

Hello Guys,

I have being using incremental loading for the heavy data set on spice. Moreover while loading it with respect to the date time stamp for a refresh time for 30min, with 1 day window as describe below.

Incremental refresh configuration
Date column: CreatedAt
Window size: 1 day

Now in every loading , some duplicates rows are getting added in the same dataset. This is making a major issue with the increase in number of loading.

Is there any way around to neglect this issue?

Thanks in Advance

Naveed · March 16, 2022, 6:55pm

Hi,

you can try with hours option 24 hours.

adi1994 · March 17, 2022, 4:06pm

Thanks for the response, i will surely check and update

adi1994 · March 19, 2022, 7:11pm

Hi Naveed, The same is not working !! , Still duplicates are observed

adi1994 · March 28, 2022, 8:59am

Hi All,

Can you please check my questions and comment if anyone knows the answers.

Thanks in advance

Naveed · March 28, 2022, 9:15am

Hi,
can you Please verify your data have no duplicates ?

rambhandage · March 28, 2022, 8:00pm

Hi,

Incremental refresh works on identifying 2 types of rows. So during initial full refresh after setting up the configuration, rows that are before CreatedAt are identified as immutable rows and rows after identified as mutable rows. As part of full refresh we store the lookback window (current time - 1 day, in this case).
With every subsequent incremental ingestion, we query all the rows identified since last look back window. We then compare each row timing with current lookback window. If it before the current lookback window its marked as immutable and rows that have createAt after lookback window are marked as mutable.

If there are updates in the rows that fall beyond the calculated lookback window, those will be appended (immutable rows). Rows that fall after lookback window they will be replace.

So you may want to set right lookback window based on how far your row updates. Try to set to 2 days and see if it fixes your issue

Mykhailo_Sorochev · June 15, 2022, 10:41am

Hi! I have the same issue. But the problem is - it works right the same Quicksight describe it in their docs. And the way it works looks a but stupid for me (may be I am stupid? ) : " An incremental refresh queries only data defined by the dataset within a specified look-back window. It transfers all insertions, deletions, and modifications to the dataset, within that window’s timeframe, from its source to the dataset. The data currently in SPICE that’s within that window is deleted and replaced with the updates."
The problem is when I update data in my existing rows in my original table and change updated_at time stamp, incremental refresh ADDs this rows to the destination SPICE dataset, as soon as it doesn’t have any key row (like ID) for update, and it can’t delete those rows as “data currently in SPICE that’s within that window” as soon as in SPICE they have OLD timestamp and they are NOT within that window. I personally don’t understand this solution - it makes Incremental Update DOES NOT work for me. I will be glad if somebody can help me to use it.

rambhandage · July 28, 2022, 5:38pm

Not sure if you were able to resolve this but if the date column is also being updated as part of you row updates, you will see duplicates in Spice. This is due to the fact that the timestamp stored by Quicksight to lookup later during previous incremental refresh is changed (with update). The suggestion is to use a column that is not updated (e.g. createdAtDate instead of lastUpdatedDate)

adi1994 · September 29, 2022, 10:19am

I am using static timestamp column , then also it is under the same error.

I guess this need to get fix from quicksight technical teams end. How we can raise this issue to the relevant team , any idea??

Moreover, bcs of this is very hard to use this loading.

rambhandage · October 5, 2022, 7:41pm

Hi @adi1994 In case you need them, here are the steps to open a support case. If your company has someone who manages your AWS account, you might not have direct access to AWS Support and will need to raise an internal ticket to your IT team or whomever manages your AWS account. They should be able to open an AWS Support case on your behalf. Hope this helps!

Topic		Replies	Views
Rows removed during incremental refresh Q&A spice , error	14	1301	November 2, 2022
Delete duplicate rows appended by incremental refresh Q&A Business-Intelligence-Engineer , tables	9	571	October 12, 2023
Incremental refresh created duplicates Q&A data-source , spice , feature-request , dataset	5	567	June 1, 2023
Unknown rows getting imported during incremental refresh Q&A date , dataset , qls-spice , Business-Intelligence-Engineer	4	58	May 16, 2024
Scheduled incremental update fault Q&A data-source , error , Business-Intelligence-Engineer	11	614	November 16, 2023

Duplicates rows are being created via Incremental loading on data set

Related topics