Large dataset with lots of calculated fields gives too long loading times

mikkel1 · January 20, 2025, 12:41am

Hi Community

I am having issues visualizing a large dataset with a bunch of calculated fields, and i am wondering what the best practice is.
My dataset consists of:
2.5 GB, 50 mio. rows
14 columns, 2 strings, the rest floats
Roughly 20 calculated fields

The calculated fields are a mix of extracting parts of strings, simple calculations like diffs, ifelse statements and some time calculations like taking the last 5 minutes. The primary function is showing averages within the last five minutes and labeling based on those averages.
Nothing too complex.

When I try to visualize the averages and the labels, it take 3-5 minutes load the data. This is the time it takes to refresh it seems. It doesn’t matter how many of the fields I put in the table i display. I am using SPICE.

What are the best practices for this sort of scenario? It makes sense if I can’t do too many calculations, as Quicksight is mainly a visualization tool. But should I be moving the calculations to Glue or somewhere different, or am I just doing it wrong in Quicksight?

I have tried to do incremental refresh to speed up the process, but as some of the data is from further back, I need full refreshing.
I have set all the calculations in the Data Prep area to try to increase speed, but without luck.

David_Wong · January 20, 2025, 3:01am

Hi @mikkel1,

Do you also have calculated fields at the analysis level or only at the dataset level?

mikkel1 · January 20, 2025, 3:08am

Hi @David_Wong

For the dataset in question, I only have calculated fields on dataset level.
I do have other datasets though, where I have calculated fields on analysis level. But they don’t take any time to visualize.

David_Wong · January 20, 2025, 3:47am

Normally if your dataset is in SPICE and all your calculated fields are in your dataset, I wouldn’t expect the calculated fields to impact the performance of your visuals that much. If your issue is with a table visual, it could be due to the visual-level aggregations. For example, if you have a large number of fields in the “Value” field well, those aggregations are calculated at runtime. How many fields do you have in your “Value” field well?

mikkel1 · January 20, 2025, 8:36pm

Just making sure I am looking at the right thing, the “value” field well, is in the “Visuals” tab right? Together with group by.
And when you say the calculated fields are in dataset, do you mean that i added them in the “Data Prep” section?

I had a play around with the table and how long it took to visualize as I added fields.
After adding the first three fields, 2 group by, 1 value, it stayed at around 20 seconds to load.
It rose with around 10 second extra pr. addition, as i added a 3rd group by and a 2nd value. So with 3 group by’s and 2 values, it took 40 seconds to load.
Adding a 3rd value took the load time to just under a minute and when i had 4 group by’s and 3 values, it took 1 minute 40 sec.

I went to “Data Prep” and did a full refresh. This took 10 min, 38 sec. I then went back to my analysis, where it took 1 min 24 sec, to visualize it again, with 4 group by’s and 3 values.

David_Wong · January 20, 2025, 11:07pm

Hi @mikkel1,

Yes, I’m referring to “Value” in the “Visuals” pane.

Calculated fields that are in the dataset are the ones that you added during data prep. Putting your calculated fields in your dataset and using SPICE help your visuals load faster. Since you’re already doing that, we need to see what we can do in the analysis/dashboard to further improve the performance.

When you put a field in the “Value” field well of your visual, you have to choose an aggregation, e.g., sum. That aggregation has to be calculated when the visual is loaded. As you have observed, the more fields you add to the “Value” and “Group by” field wells, the longer it takes the visual to load. How many fields did you have in your field wells when your visual couldn’t load? Is there any way to reduce that number by removing what you don’t absolutely need?

Sometimes the aggregations in the table visual are redundant if you’re already doing the same aggregations in your dataset. For example, if my dataset is already calculating the sum of sales grouped by Product, Industry, Region and Subregion, I can put all the fields in the “Value” field well and nothing in “Group by”. Can you try a similar approach? If you can’t remove any fields from your table visual, can you perform all the aggregations in your dataset?

You can also check out this blog for other tips about how to improve the performance of your dashboards.

mikkel1 · January 20, 2025, 11:44pm

Great.

So I had 4 group by’s and 7 values, when it couldn’t load. But that as before I started adding calculated fields to the dataset, so it might be possible to visualize now, but it would take a long time.

The data I have in group by are:
Name, which is extracted from a full name, which i extract from an id.

ifelse(
    {full_name} = ' ', 
    id,
    {full_name}
)

Latest date

maxOver(truncDate('MI', timestamp), [id], PRE_AGG)

Labels based on averages from the last 5 minutes

ifelse(
    isNull({latest_battery_level}), NULL,
    chargingstatus = 1, "Charging",
    {latest_battery_level} < 15, "Battery Low",
    "Battery OK"
)

I have tried moving them to values, but then i just get the count.

For context, the table I’m making is a warning overview. So each row is a user, with status labels based on the most recent data. So one row will usually be something like:

David_Wong · January 21, 2025, 1:21am

If you don’t want to aggregate anything at the visual level, you have to move all your fields to the “Value” field well. If you have at least one field in “Group by”, QuickSight will aggregate your data. That’s why it’s doing a count of your string values.

mikkel1 · January 21, 2025, 1:28am

Moving all the group by’s to value, still gives me count for them all. So I might not have done everything necessary in Data Prep.

That being said, I guess I still want the data to be grouped by user, so that I only get one row per user that I am monitoring.

David_Wong · January 21, 2025, 5:28am

Are you able to share a screenshot of your field wells?

Can you use SQL to group your data by user in your dataset so that you don’t need to aggregate again in your analysis?

mikkel1 · January 21, 2025, 8:39pm

Here is a screenshot

I might try that. For now I have removed the sensor data, which 10x’ed the size. It runs a lot smoother now.

Thanks for your help!

mikkel1 · January 22, 2025, 1:04am

I’ve been trying to just work with acceleration data now, which is the sensor that has a sampling rate of 25 HZ and i have 2 days worth of continuous data for 10 people, which gives me a 2.5 GB dataset.

I have added all the calculated fields within the dataset, but it still takes a long time whenever I change something in the table.

Right now I am down to 2 group by and one value.
I have 6 calculated fields, 3 for scaling data which is just a multiplication, two counts of a values over a threshold and one which labels data based on how many % is over a threshold.

What are the best practices here?

David_Wong · January 30, 2025, 6:29am

Is your table slow to load even with just 3 fields in it?

If it’s still slow despite having all your calculated fields in your dataset, my only suggestion is to move all your fields to the “Value” field well of your table. I’m not sure why that’s not working for you. You need to move every single field to the “Value” field well. If you have at least 1 field in the “Group by” field well, QuickSight will aggregate your data at the visual level.

If you want every row in your table to represent a user, you can use SQL to group your aggregations by user in your dataset.

Brett · February 10, 2025, 9:26pm

Hi @mikkel1,
It’s been awhile since we last heard from you, did you have any additional questions regarding your initial post?

If we do not hear back within the next 3 business days, I’ll close out this topic.

Thank you!

Brett · February 13, 2025, 10:27pm

Hi @mikkel1,
Since we haven’t heard back, I’ll go ahead and close out this topic. However, if you have any additional questions, feel free to create a new post in the community and link this discussion for relevant information if needed.

Thank you!