Is it possible to set a data retention period on a SQL dataset with incremental refresh enabled? Or even manually delete old data from it? I’m trying to prevent my dataset from growing too large and really only need it to retain the previous 2 weeks of data at most… It adds about 3.5M rows of data daily so the size will grow out of hand quickly if I can’t control the retention period.
Hi @Colin! You’re able to define the time range when configuring the incremental refresh. I’ve attached the documentation and added an excerpt just below:
- For Window size, enter a number for size, and then choose an amount of time that you want to look back for changes.You can choose to refresh changes to the data that occurred within a specified number of hours, days, or weeks from now. For example, you can choose to refresh changes to the data that occurred within two weeks of the current date.
I don’t think we have a way to set a data retention period in SPICE yet.
If your dataset uses Custom SQL, could you specify a date criteria in the WHERE OR HAVING clause that only pulls data for the last 2 weeks?
If its not a custom SQL data set, maybe include a join to another DATE dataset that always has the last 2 weeks of dates, and with an inner join, your main dataset will now have data for the dates from DATE.
You would want to do a full refresh periodically to baseline your dataset back to 2 weeks of data.
(There is the option to set a filter on your dataset via the SQL editor, but it would use a fixed date and not a rolling date. )
Does this help?
I also shared the same concerns regarding the use of incremental refresh, if we keep adding new data then the data set size keep growing. Configure a window size only make sure that the data within this window size is updated but did not mean that data outside of this window size will be deleted. Would an additional full refresh will solve this issue and are we able to schedule at the same time an incremental & a full refresh for the same data set?
This answer does not answer the question. Changing the window size does not delete old data but ingests new data.
I suspect QuickSight/AWS has not added this feature on purpose so that people don’t notice the data piling up along with the SPICE bill.
Full refresh wipes all the data so limits the size of the data set to your original query. I set all my datasets up to full refresh monthly.
resurface this question, appreciate any clarification from QuickSight team…
Currently there are no data retention possibilities when using SPICE.
Using Custom sql.
An example : My query always gets data from the last month
Daily loading process to get updated data every 30 minutes, I will setup an incremental refresh .
Additionally I will add another schedule which is a full refresh . In my scenario I am running a full refresh every day , I could also schedule it to run once a month .
SPICE dataset entering a new month, my full refresh will only bring in last month data+current month data. The incremental refresh will update + bring new records. Incremental refresh window size is set to 1 day.
The datasource is Redshift , the queries look as follows
You could also use filters during dataset preparation. No rolling date functionality, so you would have to specify a date.
Redshift SQL for incremental refresh
Please create a new post and link to this post if there are additional questions.