I am having issues visualizing a large dataset with a bunch of calculated fields, and i am wondering what the best practice is.
My dataset consists of:
2.5 GB, 50 mio. rows
14 columns, 2 strings, the rest floats
Roughly 20 calculated fields
The calculated fields are a mix of extracting parts of strings, simple calculations like diffs, ifelse statements and some time calculations like taking the last 5 minutes. The primary function is showing averages within the last five minutes and labeling based on those averages.
Nothing too complex.
When I try to visualize the averages and the labels, it take 3-5 minutes load the data. This is the time it takes to refresh it seems. It doesn’t matter how many of the fields I put in the table i display. I am using SPICE.
What are the best practices for this sort of scenario? It makes sense if I can’t do too many calculations, as Quicksight is mainly a visualization tool. But should I be moving the calculations to Glue or somewhere different, or am I just doing it wrong in Quicksight?
I have tried to do incremental refresh to speed up the process, but as some of the data is from further back, I need full refreshing.
I have set all the calculations in the Data Prep area to try to increase speed, but without luck.
For the dataset in question, I only have calculated fields on dataset level.
I do have other datasets though, where I have calculated fields on analysis level. But they don’t take any time to visualize.
Normally if your dataset is in SPICE and all your calculated fields are in your dataset, I wouldn’t expect the calculated fields to impact the performance of your visuals that much. If your issue is with a table visual, it could be due to the visual-level aggregations. For example, if you have a large number of fields in the “Value” field well, those aggregations are calculated at runtime. How many fields do you have in your “Value” field well?
Just making sure I am looking at the right thing, the “value” field well, is in the “Visuals” tab right? Together with group by.
And when you say the calculated fields are in dataset, do you mean that i added them in the “Data Prep” section?
I had a play around with the table and how long it took to visualize as I added fields.
After adding the first three fields, 2 group by, 1 value, it stayed at around 20 seconds to load.
It rose with around 10 second extra pr. addition, as i added a 3rd group by and a 2nd value. So with 3 group by’s and 2 values, it took 40 seconds to load.
Adding a 3rd value took the load time to just under a minute and when i had 4 group by’s and 3 values, it took 1 minute 40 sec.
I went to “Data Prep” and did a full refresh. This took 10 min, 38 sec. I then went back to my analysis, where it took 1 min 24 sec, to visualize it again, with 4 group by’s and 3 values.
Calculated fields that are in the dataset are the ones that you added during data prep. Putting your calculated fields in your dataset and using SPICE help your visuals load faster. Since you’re already doing that, we need to see what we can do in the analysis/dashboard to further improve the performance.
When you put a field in the “Value” field well of your visual, you have to choose an aggregation, e.g., sum. That aggregation has to be calculated when the visual is loaded. As you have observed, the more fields you add to the “Value” and “Group by” field wells, the longer it takes the visual to load. How many fields did you have in your field wells when your visual couldn’t load? Is there any way to reduce that number by removing what you don’t absolutely need?
Sometimes the aggregations in the table visual are redundant if you’re already doing the same aggregations in your dataset. For example, if my dataset is already calculating the sum of sales grouped by Product, Industry, Region and Subregion, I can put all the fields in the “Value” field well and nothing in “Group by”. Can you try a similar approach? If you can’t remove any fields from your table visual, can you perform all the aggregations in your dataset?
So I had 4 group by’s and 7 values, when it couldn’t load. But that as before I started adding calculated fields to the dataset, so it might be possible to visualize now, but it would take a long time.
The data I have in group by are: Name, which is extracted from a full name, which i extract from an id.
I have tried moving them to values, but then i just get the count.
For context, the table I’m making is a warning overview. So each row is a user, with status labels based on the most recent data. So one row will usually be something like:
name | latest recorded date device 1 | latest recorded date device 2 | status device 1 | status device 2 | flag device 1 | flag device 2
If you don’t want to aggregate anything at the visual level, you have to move all your fields to the “Value” field well. If you have at least one field in “Group by”, QuickSight will aggregate your data. That’s why it’s doing a count of your string values.
I’ve been trying to just work with acceleration data now, which is the sensor that has a sampling rate of 25 HZ and i have 2 days worth of continuous data for 10 people, which gives me a 2.5 GB dataset.
I have added all the calculated fields within the dataset, but it still takes a long time whenever I change something in the table.
Right now I am down to 2 group by and one value.
I have 6 calculated fields, 3 for scaling data which is just a multiplication, two counts of a values over a threshold and one which labels data based on how many % is over a threshold.
Is your table slow to load even with just 3 fields in it?
If it’s still slow despite having all your calculated fields in your dataset, my only suggestion is to move all your fields to the “Value” field well of your table. I’m not sure why that’s not working for you. You need to move every single field to the “Value” field well. If you have at least 1 field in the “Group by” field well, QuickSight will aggregate your data at the visual level.
If you want every row in your table to represent a user, you can use SQL to group your aggregations by user in your dataset.
Hi @mikkel1,
Since we haven’t heard back, I’ll go ahead and close out this topic. However, if you have any additional questions, feel free to create a new post in the community and link this discussion for relevant information if needed.