I currently an S3 bucket folder that receives data every 5 minutes. I used S3 from manifest dataset in quicksight that allows data refresh down to every hour but I need my datasets refreshed at most in 15 minute intervals. If it is not possible to refresh that fast from the s3 Source, what workflow of other AWS services can I use to achieve a 15 minute interval dataset refresh? (I still need the s3 bucket to hold the incoming data)
Hello @Adnan and welcome to the Quicksight community!
Would using Direct Query be an option? For an S3 data source the lowest granularity is a 1hr refresh rate.
Another workaround could be using Athena to query the data from the S3 bucket, and turn on incremental refresh for 15 mins.
Hi @Adnan - For real time “Direct Query” is the solution. It is better to put the file in S3, then crawl the data via Glue crawler and create an Athena table and change the file format to Parquet and partition ( depends upon the source system) and then do a direct query in the athena table. If you can sustain with near real time, possible a good approach will be 15 mins incremental refresh ( SPICE) suggested by @duncan
Regards - Sanjeeb
@Adnan
Lots of missing information…
Is your s3 bucket receives delta files every 5 min or full file?
how big is the file?
Does it carry aggregated data or raw data?
these answers will decide which additional AWS service or architecture piece will fit in…
Hi @nshah-quicksight,
The bucket receives a full file every 5 minutes. The objects within the bucket are named by day - ex. 07/10/23 - and have object versioning enabled. The received file (CSV) replaces/ becomes the new version of the current day’s file object.
The day’s final object end up around 275 - 325 KB while the version to version size difference is about 2 KB
The file contains raw data with a few aggregated fields
Thank you! I will look into this.
@Adnan you can use lambda trigger on s3, and from lambda call update-dataset api, which will trigger refresh data itself.
e.g.
change the s3 file_name of your dataset through [API]
(update_data_set - Boto3 1.28.1 documentation)
Lambda Steps:
- (optional) Check if there is a files in s3 bucket with file_name_{YYYYMMDDHHMM}.csv convention. and delete them
- Copy the latest version file as a new file with file_name_{YYYYMMDDHHMM}.csv
- Call Update dataset API changes the file_source to this file. (this will update the dataset with new data)
The idea I am proposing here is, to use update-dataset API to trigger data-refresh because QS doesn’t have refresh API yet.