Duplicates rows are being created via Incremental loading on data set

Hello Guys,

I have being using incremental loading for the heavy data set on spice. Moreover while loading it with respect to the date time stamp for a refresh time for 30min, with 1 day window as describe below.

Incremental refresh configuration
Date column: CreatedAt
Window size: 1 day

Now in every loading , some duplicates rows are getting added in the same dataset. This is making a major issue with the increase in number of loading.

Is there any way around to neglect this issue?

Thanks in Advance :slight_smile:

Hi,

you can try with hours option 24 hours.

1 Like

Thanks for the response, i will surely check and update :wink:

Hi Naveed, The same is not working !! , Still duplicates are observed

Hi All,

Can you please check my questions and comment if anyone knows the answers.

Thanks in advance :slight_smile:

Hi,
can you Please verify your data have no duplicates ?

Hi,

Incremental refresh works on identifying 2 types of rows. So during initial full refresh after setting up the configuration, rows that are before CreatedAt are identified as immutable rows and rows after identified as mutable rows. As part of full refresh we store the lookback window (current time - 1 day, in this case).
With every subsequent incremental ingestion, we query all the rows identified since last look back window. We then compare each row timing with current lookback window. If it before the current lookback window its marked as immutable and rows that have createAt after lookback window are marked as mutable.

If there are updates in the rows that fall beyond the calculated lookback window, those will be appended (immutable rows). Rows that fall after lookback window they will be replace.

So you may want to set right lookback window based on how far your row updates. Try to set to 2 days and see if it fixes your issue

2 Likes