I have being using incremental loading for the heavy data set on spice. Moreover while loading it with respect to the date time stamp for a refresh time for 30min, with 1 day window as describe below.
Incremental refresh configuration Date column: CreatedAt Window size: 1 day
Now in every loading , some duplicates rows are getting added in the same dataset. This is making a major issue with the increase in number of loading.
Incremental refresh works on identifying 2 types of rows. So during initial full refresh after setting up the configuration, rows that are before CreatedAt are identified as immutable rows and rows after identified as mutable rows. As part of full refresh we store the lookback window (current time - 1 day, in this case).
With every subsequent incremental ingestion, we query all the rows identified since last look back window. We then compare each row timing with current lookback window. If it before the current lookback window its marked as immutable and rows that have createAt after lookback window are marked as mutable.
If there are updates in the rows that fall beyond the calculated lookback window, those will be appended (immutable rows). Rows that fall after lookback window they will be replace.
So you may want to set right lookback window based on how far your row updates. Try to set to 2 days and see if it fixes your issue