We have a number of QuickSight visuals that are pivot tables with various filters on the data. Most of the data in these pivot tables loads in 3-5 seconds. However, total and subtotal calculations take more than 15 seconds to load. We experience this same issue for SPICE or DIRECT_QUERY data sets.
Is there any way to improve the performance of the calculation of totals? It is possible that some of our calculations are expensive, but the non-total rows in the table are calculated quickly. Why should it take much more time to calculate totals? My suspicion is that QuickSight is calculating totals on the full data set and not filtering data. However, even if we have a data set with data set parameters, we still see this slowness in totals calculation.
Are you including a lot of calculated fields in your pivot table that are being totaled, and are those calculated fields on the analysis level or the data prep level?
Pivot tables do have limitations such as row and column limits and a max of 500,000 records. Once you start to approach those limits it can cause things to slow down because it loads more computations on the dataset/source.
My recommendation is consider breaking up the pivot table if possible or adding more of the calculated fields to the data prep level.
We have a lot of calculated fields that we’ve added in our analysis, but there are few that we are actually using in the visual. Are calculated fields at the analysis calculated when they are not used in the visual? There is one calculated field that is used in the visual that we are planning to calculate when populating the glue table that is used by our DataSet. For these particular pivot tables, we are not close to any limits.
However, the other calculated fields we cannot calculate during the data prep (or before)… that I’m aware of. The calculated fields rely on dynamic selections by the user. Like comparing one thing that is selected against another thing that is selected. I know the calculations involved with this are expensive, but QuickSight does not seem to have a good way of doing this. We had multiple sessions back when one could get office hours with QuickSight, and the only solution was rather expensive calculated fields.
Why would the totals take so much longer to calculate than individual rows?
Breaking up the pivot table is also not a good option, since due to QuickSight limitations, our pivot tables are broken up more than our customer would like.
I wanted to test some things out, so I modified the DataSet I’m using to come from DIRECT_QUERY rather than SPICE. This way, I could see the queries that were executed. I then created an analysis / dashboard with just a single visualization.
I noticed that for the visual in question, QuickSight fired off 3 different queries.
The first query seemed to correspond the the regular (non-total) rows, and took about 2 seconds to run. It returned 49 rows.
Approximately 3.5 seconds after the first query, another query ran. This query took about 2.5 seconds and returned 0 results.
Approximately 7 seconds after the first query, the third query ran. This query’s results included the totals. It took approximately 3.3 seconds to complete and returned 49 rows.
To me, it seems like QuickSight is issuing queries in a sequential manner. The data that it is working on for these totals is rather small. It seems like there are two ways to improve performance. 1: QuickSight would need to run the Athena queries run in parallel (not sure if this is possible), or 2: I’d have to optimize what gets queried in Athena. However, what is surprising to me is that even when the DataSet is in SPICE, we are still seeing this delay in total calculation… and the loading of the regular (non-total) rows takes about the same time to load.
Are you doing any kind of dataset join within Quicksight on the data prep level? I could see that potentially causing a slow down even if they’re in Spice.
If that is not the case then I will tag this topic as a feature request based on your most recent response and the functionality that you are expecting in Quicksight.
Also, to answer a question from above, yes Quicksight will run every calculation when it refreshes the dashboard.
We are not performing any joins within QuickSight on the data prep level. I was thinking that part of the reason the SPICE DataSet has similar performance to the DIRECT_QUERY is that we are using some expensive calculations. There are some sumOver calculations that at least currently cannot be eliminated. My thought is that when DIRECT_QUERY is used, most of these calculations are off loaded to Athena, and perhaps these calculations can be executed more quickly with Athena? Otherwise, I assume that QuickSight resources are performing the calculations? Or do calculations get off loaded to the SPICE data store?
In theory, we could eliminate one of the sumOver calculations (we would still have at least one more sumOver), but it is not clear that we would see a significant improvement by removing the sumOver. At least when experimenting with the Athena queries produced by DIRECT_QUERY, the calculations do not appear to add a significant amount of time to the queries.