How to configure 1h incremental refreshes for a dataset being overwritten hourly

sgabarro · September 9, 2024, 8:42am

Cheers!
I have a Redshift-query data source that offers a live view into an hourly KPI in a format similar to this:

|Site|KPI1|KPI2|Root_Cause|Refresh|
|Place1|0.5|78|Management|28/09/2024 14:50:00|
|Place2|0.75|90|Weather|28/09/2024 14:50:00|

It overwrites itself every time its ran, as it is constantly pulling the latest available information, so at any given point we only have access to the data for Refresh Time + 1h.

Im trying to create a greater dataset using incremental refreshes, based on 1h intervals. That would mean running the query once an hour and keeping the output each time.

I haven’t found a way to do this as the incremental refreshes clear the last 1h of data which is exactly the size of my interval.

Any workarounds? Thanks.

Xclipse · September 9, 2024, 9:17am

Hi @sgabarro

Welcome to the community!

To create an incremental dataset in QuickSight from a Redshift live query while preserving previous data (without overwriting the last 1 hour of data), one solution is to store each hourly query result into an intermediate table in Redshift before QuickSight fetches the data.

Append each new set of data (from the hourly query) into a new table, e.g., historical_kpi_table.
Timestamp each record with the current refresh time to ensure historical tracking.

Example (adjust the INSERT statement as needed for your fields):

INSERT INTO historical_kpi_table (site, kpi1, kpi2, root_cause, refresh_time, dateadded)
SELECT site, kpi1, kpi2, root_cause, refresh_time, getdate()
FROM live_kpi_table;

Redshift will continuously build historical data by appending to the historical_kpi_table.

Once this is set up, connect Amazon QuickSight to the historical_kpi_table for your reporting needs. The data will reflect incremental snapshots from each hour.

Automate the hourly data append using AWS Glue or Redshift’s native query scheduling. The query will run once every hour, inserting new data into the historical_kpi_table.

Allow a small interval between the Redshift query completion and scheduling the QuickSight dataset to run hourly to refresh the data in SPICE memory.

This way, you’ll build a dataset that doesn’t lose historical data with each refresh and continues to grow incrementally.

sgabarro · September 9, 2024, 12:17pm

Hello! Unfortunately I only have read access to the database as we’re not the owners, nevertheless I’ve found a workaround. I’m closing the issue but leaving it here for future reference.

I ended up adding two UTC-based columns that took the value (Now - 2h) and (Now -3h) and based the incremental refresh based on them (setting UTC as timezone as well), this way it never deleted any previous updates.