The Ins and Outs of Where to Create Your Calculated Fields

When building a dashboard in any BI tool, you will start with a dataset that is a combination of dimensions (things you want to group or slice the data by) and facts (things you can count or sum up). As you develop your dashboard, these raw materials get aggregated in various ways to fill the “Field Wells” in Amazon QuickSight. As a new BI developer, you might rely almost exclusively on the raw data and the drop down menu to select your preferred aggregation such as sum or count distinct. However, you can develop more advanced dashboards by learning how to create calculated fields.

In this article, we will discuss the use cases and tradeoffs for creating calculated fields in the dataset versus the analysis.

In order to make the most of this article, you should have experience creating datasets within QuickSight as well as creating calculated fields in an analysis.

Calculated fields in the Analysis
As you begin developing more advanced dashboards, you will quickly run into the need to create calculated fields both on the dimensions and for the measures.

Some common situations that require calculated dimensions are:

  • Case logic: ifelse((month = 5 OR month < 3) AND year = 2000, ‘yes’, ‘no’)
  • Mapping data: ifelse(country = “United States”, “US”, country = “China”, “CN”, country = “India”, “IN”, “Others”)
  • Blending data fields: concat(salutation, ’ ', firstName, ’ ', lastName)
  • Null handling: coalesce(billingAddress, streetAddress, ‘No address listed’)
  • Better naming: Creating a calculated field is sometimes the best way to make your legend more interpretable.

Common situations that require calculated measures include:

  • Filtered metrics: sumIf(ProdRev,CalendarDay >= ${BasePeriodStartDate})
  • Combined aggregates: sum(ProdRev)/sum(ProdCost)
  • Date math: dateDiff(startDate, endDate, ‘WK’)

The key benefit of creating a calculated field in the analysis is that it is much easier to iterate and debug issues. On the other hand, as your calculations get more complex, performance can be impacted since analysis-level calculations are all computed on the fly. This is often most obvious for dimension calculations with complex ifelse logic or string manipulation. There are some kinds of calculations that either cannot or may not benefit from a performance perspective from moving into the dataset such as fields that use parameters or aggregate fields which will not be pre-calculated. For instance, in the above lists of example formulas, all the dimensions and the third measure would be materialized in SPICE, but the other two measures would not since they contain aggregation and/or a parameter.

Calculated fields in a Dataset
As you develop more advanced dashboards, there are many cases where calculated fields should move upstream into the dataset instead of living in the analysis. One key reason to implement this strategy is (if you use SPICE) any row-level calculations, meaning formulas that do not use aggregation (sum, sumOver, sumIf, etc) or parameters, will be materialized/precomputed in the SPICE dataset enhancing dashboard query performance. The second key reason is that calculations in a dataset can be used across multiple dashboards which enhances re-usability. In addition, there are certain kinds of calculated fields that are much easier to create in the dataset from the start. Some more specific use cases:

If a single dataset can be used by multiple analyses: creating the calculated field in the dataset allows you to define it once and use it multiple times.

  • Any row-level calculations that are slow to compute since it will be computed on refresh instead of dynamically at dashboard query run-time.
  • Any row-level calculations you use as filters are particularly good candidates.
  • If you would like the ability to roll back a change, version control in the datasets would allow you to do so.
  • Date-based flags e.g. Month-To-Date are often easier to create via SQL as you are loading data to then use in filters.

Long-term, the key tradeoff of moving calculations into the dataset is that the size of the dataset grows with every additional field and the refresh time may increase. However, unless your data is very large or you are managing SPICE costs very carefully, we generally find moving calculations into the dataset is worth it. In the shorter-term, it is often slower to iterate with updates to calculations in the dataset since it will need to refresh each time you want to check the impacts of your changes. Due to this issue, it often makes sense to test calculations in an analysis and then move them to the dataset when complete.

Even more advanced functionality through multiple datasets
In QuickSight, each dataset is a single table. However, if you include multiple datasets in the same analysis, QuickSight links fields with the same name across multiple datasets. For example, you can create a filter or control which applies to all datasets on a dashboard. Fundamentally, this approach is more complex, but solves some storage and performance issues when executed properly.

  • You can create an aggregated version of the dataset that is still drillable but has fewer records allowing you to save on storage and get better performance.
  • You can use aggregate data for high level visuals while using the detailed data only when necessary for more detail to speed up overall performance.
  • You can avoid duplicating data between datasets by thinking about which elements need to be viewed together such as number of orders on one visual and number of order lines on a different visual.

Concluding thoughts
As you identify the need to create calculated fields, consider the best place for those calculations to be done. If data storage is not a concern and performance is, consider moving those calculations into the dataset prep view. If you want to manage SPICE usage or develop a calculation, start in the analysis view.

Author and company bio

North Labs is a veteran-owned consultancy helping organizations build the future of their capabilities with long-term partnerships. We develop data analytic strategies to enhance operations, customer experiences, and overall performance. We are 1 of 20 Global Launch Partners selected to support Amazon QuickSight when it was released in 2016.

Zohar Strinka

Zohar Strinka is a Principal Solutions Architect who works with clients to navigate the often intertwined challenges of data and process problems. She has a PhD in Optimization which she often uses in her personal life to assess the right balance of life and work.

North Labs is an AWS Advanced Consulting Partner.

1 Like