Huge datasets on quicksight

sindhusb · April 12, 2023, 7:30pm

Hi, we have a use case where the number of output is >1M rows so, quicksight is unable to handle it. I wanted to understand if this is a limitation of quicksight and there is no way out of this?

Another question is- I’m powering one of my quicksight dashboards with a custom SQL query. Since there are many dashboards like this, its causing a lot of stress on the cluster. I wanted to explore options where I can run the queries off of our cluster and see if this is an option that quicksight provides.

David_Wong · April 12, 2023, 7:55pm

What you mean when you say QuickSight is unable to handle it? Do you get an error message or does your refresh take a very long time? I have a dataset that has 55M rows and it takes about 30-40 minutes to refresh.

Have you tried to break your datasets into smaller datasets? For example, you can create a dataset for each of your dimensions and facts, (e.g. one for Customers, one for Products and another for Orders) and then create a child dataset that joins all 3. The benefit of this approach is that you can reuse those parent datasets in other child datasets. For example, if you need to use Customers in another dataset, QuickSight only has to query your source once.

Sanjeeb2022 · April 12, 2023, 8:04pm

Thanks @David_Wong for insights. Yes I do agree with your approach.

Hi @sindhusb - Can you provide more details on the requirements. There are many ways you can handle the business requirements. Do we have a huge data in a single table and want to do reporting on top of that. Couple of things you may need to explore.

What is the size of the table and refresh interval of the table.
Can we use SPICE to store the data so instead of hitting db, you can use SPICE to get the information from QuickSight memory. More info for SPICE can be found - Importing data into SPICE - Amazon QuickSight

Note - There will be an upcoming SPICE best practice session - see the link -2023 Amazon QuickSight Learning Series: SPICE data best practices (EMEA-friendly timing) and if possible, please do join and also see future events of QuickSight as well.

If you can divide the data into a data model kind of star schema like @David_Wong advised, that will also tune the performance of the query.
Also some cases we also did summary table at ETL layer and summary table to exposed at QuickSight level.

Please provide more details and I do believe QuickSight can handle big data sets with some tricks… Give a try and if you are facing issues, please submit your queries to AWS QuickSight community to help.

Tagging QuickSight experts as well
@Max @Thomas @thomask @Kristin @Bhasi_Mehta @Tatyana_Yakushev @Biswajit_1993

Regards - San