I am having difficulties with incremental refresh on spice dataset. To the point where I am not sure if its the right tool for what I am trying to achieve.
What I am trying to do:
a) I have data sitting on my redshift, that gets refreshed 1x per day and it always displays only current data available for that day. So on refresh in Redshift, it overwrites previous entries. It doesn’t have any snapshot built in.
b) My plan is to utilize incremental refresh to create SPICE dataset from above table that will store snapshotted data from said days.
So when I query data today, it will get data with timestamp of 6-10-2022. When I refresh it tomorrow, I want to have 6-10-2022 there, but add 7-10-2022. Again, when I refresh on 8-10-2022, I want to see 6,7,8th Octobers data. Gradually I want to build this dataset to be able to do historical analysis on it.
I tried several lookback window configurations, but it always seem to overwrite the data with current days data. I used current_date as snapshot_date,
in my DDL.
So question is: How should I set up my look back window to achieve this?
Thanks,
Daniel
Incremental refresh in QuickSight works by a look back window on a timestamp column in the database, last_updated_timestamp in a data warehouse (DW) table in many implementations. During incremental refresh QuickSight looks back the window configured during setup, deletes the rows in SPICE dataset during the look back window, add rows in SPICE from the database rows during the look back window.
If I understand it right you want to store the historical snapshot in SPICE to save space in the Redshift DW. Please note that SPICE capacity is limited 500 GB compressed in Enterprise edition at the moment and not a replacement of DW storage.
However there are ways to model the data in the DW to avoid full refresh in SPICE while updating with the latest data in DW. There are few sophisticated DW models you can review, especially transaction fact and accumulating fact. Periodic snapshot model has the space constraint irrespective of where being implemented. Here are some reads from dimensional modeling guru Ralph Kimball Dimensional Modeling Techniques - Kimball Group
Thanks for reply.
My concern is not redshift space, nor SPICE space.
I don’t control the redshift cluster the data is on, so I’d need to talk to Data Engineering to set up a job for me to snapshot data each day to new table.
My rationale is that I want to do this on my own, with as little outside help as possible.
Can you elaborate more on why you want to keep the snapshot days in the SPICE? A sample structure of the data might be helpful to understand the use case what you are trying to achieve.
So my dataset captures current status of Financial Data on certain partner accounts. Its my custom SQL, but main data source is table that refreshes every day.
Thanks Daniel for your detail explanation with clear examples!
it is a typical incremental refresh use case, you may setup incremental refresh according to below screenshot. You need to define a date column to identify data update date(e.g. sys_create_date in your case). In “Window size”, you need to define how many days/week/hour data need to be refreshed. For instance, last 40 days in my example, QuickSight will remove last 40days data in SPICE and ingest again from data source
Hi @Daniel_Dubravec. Did Roy’s solution help? I am marking his reply as, “Solution,” but let us know if this is not resolved. And thanks for posting your questions on the community Q&A Forum!
I’ve tried to implement it, but each time QS pulls data in, it will just replace the yesterdays data with todays and erases it. I suspect I will need to add one more column to indicate change, like day count, or something similar.
When you configure incremental refresh, there is a window size(in screenshot), QuickSight removes old data according to this window size. If you want to keep old dat in SPICE, you may reduce the size of this window size(for instance, down to 1 hour). In this case, QuickSight keeps historical data in SPICE(Oct7 data in your example) and ingests new data(Oct8).
However, SPICE is not a managed database, it is a fast storage designed for QuickSight visualisation. If historical data is only stored in QuickSight, there is a risk that historical data will lost if you need to do a Full data refresh in QuickSight.
Therefore, suggest to store the historical data in other storage such as database, S3 rather than only in QuickSight SPICE