Quicksight data unreliability?

Hello,

We have been using Quicksight for a few weeks now.
We created several analysis and we use a date range control to display our monthly data.

image

However we spotted that for a given month (For example June 2023), the data is not always the same if we close and open again the analysis 10 min after for example. (And it is exactly the same date range, at the second !!).

How is that possible? Did anyone ever experienced that? And what could be the causes of that?

Our setup if the following:

  • We use a S3 Quicksight manifest to ingest our data. Our manifest looks like that:

{
“fileLocations”: [
{
“URIPrefixes”: [
“s3://our_bucket/folderA/”,
]
}
],
“globalUploadSettings”: {
“format”: “CSV”,
“delimiter”: “;”,
“textqualifier”: “"”,
“containsHeader”: “true”
}
}

  • A Python script is sending new data (.csv file) in the folder A every hour and is refreshing the dataset using boto3 create_ingestion method.
  • The refresh is usually always done within 1min10 sec. No skipped rows and around 4 067 013 ingested rows (and dataset rows)
  • We saw data differences for the same month within the same hour, and it was not exactly when an automatic refresh was happening.
  • In quicksight itself: At the dataset level we got calculated fields and one parameter. At the analysis level we got several filters that applied to all visuals, calculated fields and a linked parameter coming from dataset.
    ( - Not sure if it is important to mention: at the end of the day a script is merging all hours.csv into days.csv and delete hours.csv. Il also deletes days.csv > 2 years, because there is a limit of 1000 files for the S3 manifest. Anyway, it is not a merging error, as the merging is just for the last day. And we are in July, looking at June data. So june csv files are not modified by any scripts)
  • The dates of the newly inserted data are always the date of the day. We were in July when we looked at the June analysis dashboard, so the new data that is arriving can’t be inserted in the June range.

Let us know your thoughts on why, sometimes, we see data differences for the same month (not the current one) when we close and open again the analysis 5 min after.

Thanks
Pam

Hi @pam !

Sorry to hear that you’re seeing these discrepancies. Let me verify my understanding of the scenario:

  1. You have a relatively real-time system, updating the QuickSight data source every minute
  2. The data source is a CSV file in S3
  3. You are seeing differences between the data source and the Quicksight visualizations, after the refresh is happening and with the same time filters
  4. There is a nightly process that aggregates hours into days

If those are all true, here’s a few things to check:

  1. The SPICE cache. Are you using direct queries or SPICE. If you’re using SPICE, how often is that updated?
  2. Filters on the visualizations. We often see that there are filters on individual visualizations that work against the global dashboard filter. Are there any filters on the individual visualizations?
  3. The updates somehow contain back-dated records.

Also, can you verify record counts by day, in the source files, in the quicksight visualizations and in the dataset? Do these match or do you see the same discrepancies?

Thanks!

ws

2 Likes

Hi @pam !

Just a quick check-in to see how you’re doing with this issue. Is it still a concern or do you have it figured out.

Happy to jump on a call if it will help.

ws

1 Like

Hello ws,
thanks a lot for your answer.
We spotted that behavior one day, then we monitored it for several days. Never happened again.
So maybe it was just a mistake from our side or a reading during a dataset refresh.
We have considered the case closed.
Thanks a lot for your help and have an excellent day!
P

Thanks @pam !

Appreciate the update!!

You have a great day as well!

Steve