Level-Aware Calculations: Tips and Best Practices

Level-Aware Calculations (LAC) are powerful functions that allow us to perform calculations at a granularity level which is different from the granularity of our datasets and/or visuals.

The sample dataset referenced by this article is a dataset of product orders placed by customers. Each order can contain multiple products and each product in an order has a unique Line Item ID. Each row of our dataset represents one line item but we want to perform calculations at the order or customer level.

Aggregate functions

Aggregate functions group multiple rows in our dataset into one single row.
image

The grouping is dynamic based on the dimensions present in the visual. For example, if the dimension in our visual is the Product field, sum(Sales) calculates the sum grouped by product.
image

If the dimensions in our visual are Customer and Product, sum(Sales) calculates the sum grouped by customer and product.

We can aggregate our field by adding it to the Value field well and selecting an aggregation.

We can also use an aggregate function in a calculated field before adding it to the Value field well. When the aggregation is defined in a calculated field, it’s referred to as a custom aggregation.

Total Sales:
sum({Sales})

Refer to Aggregate functions for a list of supported aggregate functions.

Window functions

A window function performs a calculation over a window (set of rows) related to the current row. Window functions can be easily recognized by the fact that their syntax can include one or more partition fields.

Unlike an aggregate function, a window function does not group multiple rows into a single row but instead is conceptually similar to adding a calculated column to our underlying data at the row level. The same calculated value is repeated in every row of the window (partition).

image

Level-aware calculation - aggregate functions (LAC-A)

A LAC-A function is a type of aggregate function where the grouping is fixed and defined in the function. For example, in the following calculated field the sum of sales is grouped by Order ID.

Order Amount:
sum({Sales}, [{Order ID}])

When adding a calculated field containing a LAC-A function to our Value field well, we have to select another aggregation to perform on top of it.

Alternatively, we can nest the LAC-A function in an aggregate function in our calculated field before adding it to the Value field well as a custom aggregation. The inner LAC-A function is grouped by the dimensions specified in our calculated field and the outer aggregate function is grouped by the dimensions present in our visual.

Average Order Amount:
avg(sum({Sales}, [{Order ID}]))

Level-aware calculation - window functions (LAC-W)

A LAC-W function is a type of window function that we can perform at the pre-filter or pre-aggregate level. Refer to Order of evaluation in Amazon QuickSight for more information.

In the following calculated field the sum of Sales is calculated over all the rows having the same Order ID.

Order Amount:
sumOver({Sales}, [{Order ID}], PRE_AGG)

Order ID AMER-2018-100090 has two rows because it contains two line items and the same sum is displayed in both rows.

Use case #1: Calculations at different levels of granularity (LAC-A recommended)

We saw earlier that each row of our dataset represents a line item in an order. If we want to calculate the average order amount in each industry, we first need to calculate the amount for each order before calculating the average across all orders in each industry.

Average Order Amount:
avg(sum({Sales}, [{Order ID}]))

By grouping by Order ID, the inner LAC-A function calculates the amount for each order. The outer avg function then calculates the average, grouped by the Industry dimension at the visual level.
image

LAC-A is better for these sorts of multi-aggregation rollups because it will only count the aggregated value once for each partition. For instance, if we have 20 orders in an industry, QuickSight will average 20 values, regardless of the number of products in each order.

If we use LAC-W instead, the same value is calculated in every row of our partition. If we want to roll up the data at the next level using a min or max, it’s not an issue. However, if we want to roll up using avg or sum, we need to deal with the duplicates.

Use case #2: Deduplicating data (LAC-A recommended)

The other common use case for LAC-A is for deduplicating data. For example, if a customer in our dataset belongs to two industries, every line item will be present twice in the dataset and using sum(Sales) in our visual will double-count the sales for that customer.

To deduplicate the data we can use a LAC-A function with min, max or avg and wrap it with our sum function.

Order Amount:
sum(min({Sales}, [{Line Item ID}]))

By grouping our min of sales by Line Item ID, we’re effectively removing the duplicates before performing our sum to get the order amount.

Use case #3: Using calculated values as dimensions (LAC-W only)

Sometimes we want to use our calculated values as dimensions for filtering or grouping. In the example below we have a table visual showing the number of orders by customer and a KPI visual showing the total number of customers.

If we want to find the customers who have fewer than 20 orders, we can use LAC-W to calculate the number of orders by customer and then use it as filter.

Orders by Customer:
distinctCountOver({Order ID}, [{Customer ID}], PRE_AGG)

It gives the correct result in both visuals.

Note that if we use a custom aggregation instead, the table visual shows the correct result but the KPI visual does not show any data since the aggregation is being determined by the group by fields in the visual but the KPI is not grouping data by customer.

Number of Orders:
distinct_count({Order ID})

If we want to use our calculated values for grouping, e.g. in a bar chart, we can also use LAC-W. In the example below, the number of orders by customer calculated with LAC-W is on our y-axis and the number of customers is on our x-axis.

Using LAC-A as dimension is currently not supported. Custom aggregations are also not allowed as dimension.

Use case #4: Percentage of total calculations (LAC-A or LAC-W)

Since LAC-A and LAC-W functions allow us to specify the granularity at which our calculation is performed, they can be used for percentage of total calculations where the numerator is calculated at one granularity level and the denominator is calculated at another level. For example, if we want to show each customer’s sales as a percentage of total sales, the numerator is calculated at the customer level and the denominator is calculated across the entire dataset.

Percentage of Total Sales (LAC-A):
sum({Sales}) / sum({Sales}, [])

Percentage of Total Sales (LAC-W) (PRE_AGG):
sum({Sales}) / min(sumOver({Sales}, [], PRE_AGG))

Percentage of Total Sales (LAC-W) (PRE_FILTER):
sum({Sales}) / min(sumOver({Sales}, [], PRE_FILTER))

When LAC-W is used with PRE_FILTER, the calculation is performed before filters are applied. When we apply a filter, the total sales and the percentage for each customer remain unchanged. With LAC-W (PRE_AGG) and LAC-A, the percentage is calculated based on the total sales of the remaining customers after filters are applied and the percentages always add up to 100%.

PRE_FILTER is not supported with LAC-A, so if we need to keep the total fixed when calculating the percentage, our only option is to use LAC-W.

With no filters applied:

After applying a filter:

Nesting calculations

LAC-A and LAC-W functions can be nested in the following ways:
• Aggregation(LAC-A( ))
avg(sum({Sales}, [{Order ID}]))

• LAC-A(LAC-W( ))
sum(sumOver({Sales}, [{Order ID}], PRE_AGG), [{Customer ID}])

• LAC-W(LAC-W( ))
sumOver({Sales} / countOver({Line Item ID}, [{Line Item ID}], PRE_AGG), [{Customer ID}], PRE_AGG)

• Aggregation(LAC-A(LAC-W( )))
avg(min(sumOver({Sales}, [{Order ID}], PRE_AGG), [{Order ID}]))

The following calculations are not valid:

• Aggregation(Aggregation( ))
avg(sum({Sales}))

• LAC-A(Aggregation( ))
avg(sum({Sales}), [{Customer ID}])

• LAC-A(LAC-A( ))
sum(min({Sales}, [{Line Item ID}]), [{Order ID}])

• LAC-W(LAC-A( ))
sumOver(min({Sales}, [{Line Item ID}]), [{Order ID}], PRE_AGG)

Level-aware calculations with conditions

We can use the ifelse function with LAC functions to add conditions to our calculations.

Retail Sales (LAC-A):
sum(ifelse(Industry = 'Retail', Sales, null), [{Order ID}])

Retail Sales (LAC-W):
sumOver(ifelse(Industry = 'Retail', Sales, null), [{Order ID}], PRE_AGG)

Handling mismatched aggregation errors

This error happens when our calculated field contains both an aggregated value and a single value from one row. As we’ve seen, a LAC-W function adds a value to every row. The calculation below gives a mismatched aggregation error because we’re dividing an aggregated value by a single value from a row.

To fix the error, we can change the denominator to an aggregated value by wrapping it in an aggregate function like min, max or avg.
sum({Sales}) / min(sumOver({Sales}, [], PRE_FILTER))

The other way is to change the numerator to a single un-aggregated value, e.g. by using sumOver instead of sum.
sumOver({Sales}, [{Customer ID}]), PRE_AGG) / sumOver({Sales}, [], PRE_FILTER)

Another example of a mismatched aggregation error is when we use LAC-A inside an ifelse function. This happens because the condition is evaluated at the row level but when it is evaluated as true, the resulting calculation is performed by grouping multiple rows.

To fix the error, we simply have to move the ifelse function inside the LAC-A function.
sum(ifelse(Industry = 'Retail', Sales, null), [{Order ID}])

Conclusion

In this article, we’ve seen that both LAC-A and LAC-W functions can be used to specify the granularity at which our calculation is performed but in different ways.

LAC-A functions group the rows of our dataset by the dimensions defined in the function and calculate one single value for each group. LAC-W functions conceptually add a calculated column to our underlying table and perform the calculation over the set of rows (window) defined in the function. The same calculated value is repeated in every row of the window. Depending on the use case, LAC-A can be more suitable than LAC-W, or vice versa.


Author Bio

David Wong is Senior Business Analyst for Surge9, an adaptive microlearning platform that delivers a customized learning journey to users to help them close the competency gaps in their training. He led the initiative to adopt QuickSight as their BI platform to provide customers with actionable business insights.

Outside of analytics, David’s areas of focus for Surge9 include data migration, system integration and product design.

David has also been selected as an Amazon QuickSight Expert and an AWS Community Builder.

9 Likes

Thank you @David_Wong for this article, this will surely help wider audience of the community.

Regards - Sanjeeb

1 Like

Awesome article @David_Wong!

Really helpful explanation @David_Wong , thank you.

Can you post the sample dataset you used? I’d like to try recreating everything you did, to get hands on practice.

Hi @mcav,

I’m glad you found the article helpful.

Here are links for the sample datasets. The only difference between the 2 files is that the second one has some duplicates that I added to show how to deduplicate data.