Hi,
I’m seeing an issue where Amazon Quick Sight incremental refresh is missing data that exists in S3.
My setup
-
Logs are written to CloudWatch.
-
A Firehose stream delivers these logs to a raw S3 bucket in 5-minute intervals.
-
A Lambda function is triggered by new objects in the raw S3 bucket. This Lambda processes the logs and writes the transformed files into a processed S3 bucket.
-
The Athena table on top of the processed bucket uses partition projection (no Glue crawler).
-
I created a complex Athena view with multiple CTEs that filters data within a time
boundswindow:start_ts = date_trunc('hour', current_timestamp - interval '6' hour) stop_ts = date_trunc('hour', current_timestamp)So the view only scans the last 6 hours of partitions.
-
In Quick Sight, I use this view as a dataset and set up hourly incremental refresh.
-
Incremental column:
activity_ts -
Window size: 3 hours
-
The problem
-
On the Quick Sight dashboard, some rows are missing.
-
If I change the view’s bounds from
6 hoursto something very large (e.g.9000 hours) and do a manual full refresh, the missing rows appear correctly. -
That tells me the data really is in S3 and queryable by Athena — but Quick Sight doesn’t pick it up during normal incremental refresh.