QuickSight Dataset scheduled refresh failures - Complex SQL Query Causing Temporary Table Overflow

Hello QuickSight Community, We’re experiencing issues with some of our datasets failing to refresh in QuickSight. This is causing significant disruptions to our reporting and analytics processes.Specifically:

  • Multiple datasets are not updating as scheduled
  • Error messages indicate temporary table overflow in our MySQL database

Has anyone encountered similar problems or have insights on how to resolve this?

Any help or guidance would be greatly appreciated. This is impacting our business operations, so we’re eager for solutions.

Hi @Kalyan_reddy

Welcome to the QuickSight community!

The error in the screenshot indicates that the temporary table on your MySQL database is full during the dataset refresh.

Check the storage space available for your MySQL database. If the database is running out of space, expand its capacity. Additionally, review and optimize query performance in your MySQL database.

Use SPICE for the dataset, as it will handle data processing in memory and reduce the load on your database. Instead of refreshing the entire dataset, load data incrementally to reduce the query size.

@Xclipse Thank you for your response.

Our data changes regularly and includes multiple timestamps, so it’s not feasible to implement incremental refreshes. While expanding capacity might be a temporary fix, it won’t address the underlying issue since our data size continues to grow over time.
What would be the best long-term solution to effectively manage this situation? Any insights would be greatly appreciated!

I have the same issue occurring with multiple datasets.

Thank you!

1 Like

Hi @Kalyan_reddy

To address this issue in the long term, here are some strategies to effectively manage your growing data size and multiple timestamps.

  • Transition your data from MySQL to a scalable data warehouse like Amazon Redshift or Redshift Serverless. These platforms are optimized for analytical workloads and can handle large, growing datasets more efficiently.

    • Use Amazon Redshift Spectrum or Athena for querying large datasets directly in S3 if a data warehouse isn’t viable.
  • Use ETL tools like AWS Glue to preprocess and aggregate data before loading it into QuickSight. Pre-aggregated data reduces query complexity and load times. Use pre-filters in the custom SQL or during the ETL process to limit the data being processed to only what’s required for analysis.

  • Even though incremental refreshes are challenging for datasets with multiple timestamps, identify parts of the data that are less volatile and apply incremental updates to those segments.

  • Regularly monitor and optimize queries that are causing performance bottlenecks. Use MySQL’s EXPLAIN statement to analyze query execution plans.

If the above steps are not helpful, I would recommend filing a case with AWS Support where we can dive into the details so that we can help you further. Here are the steps to open a support case. If your company has someone who manages your AWS account, you might not have direct access to AWS Support and will need to raise an internal ticket to your IT team or whomever manages your AWS account. They should be able to open an AWS Support case on your behalf.

Hope this helps!

2 Likes

Thank you for the recommendations. We will definitely consider transitioning our data to scalable platforms like AWS redshift to address the issues and accommodate our growing datasets.

However, I have a question about the existing dashboards. if we move the data to a different platform, will we need to rebuild the dashboards from scratch or is there a way to update the data sources for the current dashboards without losing the visualizations and configurations we’ve already set up?

Also Could you please explain what is causing the issue? Its because of MySQL or QuickSight?

Any guidance on this would be greatly appreciated.

1 Like

Hi @Kalyan_reddy

In an analysis, you can add, edit, replace, or remove datasets. After creating the new dataset from either Redshift or any other database, you can replace it with the existing analysis. Make sure to try replacing the dataset in a backup analysis first. You can save the existing analysis with a different name and then try replacing the dataset. If anything goes wrong, it won’t affect the original analysis and its dashboard.

When you replace a dataset, the new dataset should have similar columns, if you expect the visual and dashboard to work the way you designed it. Replacing the dataset also clears the undo and redo history for the analysis.

Please refer to the below documentation this might be helpful for you.

As per the error shown in the screenshot, the temporary table on your MySQL database is full during the dataset refresh. I cannot check any other details, such as what caused the issue, but to my knowledge, if your MySQL query executes successfully, the data will be automatically stored in SPICE memory. I would recommend filing a case with AWS Support for further analysis to check what was causing the issue.

1 Like

Hi @Kalyan_reddy

It’s been a while since we last heard from you. If you have any further questions, please let us know how we can assist you.

If we don’t hear back within the next 3 business days, we’ll proceed with close/archive this topic.

Thank you!

Hi @Xclipse Thank you for your response.

I’ve noticed that whenever a data refresh error occurs, the RDS Instance is at its maximum utilization rate (more than 99%). This might be causing issue.

Would limiting simultaneous dataset refreshes to no more than 2 at a time address this issue?

Thank you!

Hi @Kalyan_reddy

Yes, limiting simultaneous dataset refreshes to no more than 2 at a time could help address this issue. Start by limiting simultaneous dataset refreshes to 1 and gradually increase to 2 based on observed runtimes and RDS utilization.

Keeping concurrent refreshes low helps manage the CPU load on your RDS instance, preventing it from consistently reaching 99% utilization, which can lead to timeouts or errors.

Sequential or staggered refreshes allow each query to complete faster without overwhelming the database, improving overall performance and reliability.

1 Like

Great, Thank you for your time. That’s all from me.