We have been using Quicksight for a few weeks now.
We created several analysis and we use a date range control to display our monthly data.
However we spotted that for a given month (For example June 2023), the data is not always the same if we close and open again the analysis 10 min after for example. (And it is exactly the same date range, at the second !!).
How is that possible? Did anyone ever experienced that? And what could be the causes of that?
Our setup if the following:
- We use a S3 Quicksight manifest to ingest our data. Our manifest looks like that:
- A Python script is sending new data (.csv file) in the folder A every hour and is refreshing the dataset using boto3
- The refresh is usually always done within 1min10 sec. No skipped rows and around 4 067 013 ingested rows (and dataset rows)
- We saw data differences for the same month within the same hour, and it was not exactly when an automatic refresh was happening.
- In quicksight itself: At the dataset level we got calculated fields and one parameter. At the analysis level we got several filters that applied to all visuals, calculated fields and a linked parameter coming from dataset.
( - Not sure if it is important to mention: at the end of the day a script is merging all hours.csv into days.csv and delete hours.csv. Il also deletes days.csv > 2 years, because there is a limit of 1000 files for the S3 manifest. Anyway, it is not a merging error, as the merging is just for the last day. And we are in July, looking at June data. So june csv files are not modified by any scripts)
- The dates of the newly inserted data are always the date of the day. We were in July when we looked at the June analysis dashboard, so the new data that is arriving can’t be inserted in the June range.
Let us know your thoughts on why, sometimes, we see data differences for the same month (not the current one) when we close and open again the analysis 5 min after.