Context:
I am using incremental refresh for my dataset of size 25M rows.
Due to the incremental refresh, my dataset goes up to ~65-70M and thus caused data size limit failure as I have a limited budget.
What I need :
I just want to delete the appended rows automatically.
I currently use a max (rank) function to filter out the data to the latest modified date but that is not enough, I need to delete these duplicate rows that are created
You are loading from RDS to SPICE, right?
Would it be possible to reload the full data without the last rows? By custom sql or a filter within the dataset (latest modified date > xy)?
I’m not aware of a function to “unload” data out of SPICE.
@ajinkya_ghodake I understand your are trying to cleanup data already ingested into SPICE dataset. I don’t think there is a direct way to achieve that.
If your SPICE dataset grew from 25M rows to ~65-70M, it could be because the source dataset has a retention period set. Inorder to sync SPICE dataset to its source, you would want to do a full refresh as required to baseline. This would take care of duplicate scenario aswell. Happy to help if you have followup questions.