How to evaluate the impact of changing a calculated field in a dashboard?

Yifei · May 31, 2023, 5:01pm

Overview: When a dashboard-level calculated field is used by multiple charts in a dashboard, I find it hard to evaluate the impact of making a change to a calculated field.

Context: I have a calculated field that computes the sum(A/B, [$visualDimensions]) - essentially calculating the ratio of A:B with respect to the pivot table aggregation dimensions. Now I want to change it to sum(A, [$visualDimensions])/sum(B, [$visualDimensions]) because the latter is more efficient for our usecase as the dataset is very verticle (close to 1 billions of rows with only 4 columns). We want to compute the sum before computing the division, instead of compute division for every A/B then add them together, since division is slower than addition.

Problem: What I find challenging is, how do I verify this change would not result in bad data in other charts? We have hundreds of them and it is very tedious to check them one by one, or to know what exactly each chart is doing (some charts are doing LAC aggregations, some others are doing LAC in pre-aggregation level). Since this dashboard is worked on by multiple stakeholders, it would be a time consuming task to do manual checks.

I am curious about what everyone thinks, did you encounter similar situations? AWS support please feel free to chime in! Love to hear your thoughts. I am especially interested in discovering a scalable solution to this problem.

Some ideas:

[SQL Result Equivalence]. Use boto3 to grab the chart metadata. Convert metadata to SQL queries. Make the changes, export metadata to get another set of SQL queries. Attach the SQL queries to the underlying database and verify the data producted by these two sets of SQLs is equalivant.
[PDF value Equivalence] Snapshot the dataset, and make a copy of the dashboard. Export the copy to a PDF. Make the changes, then export it to a PDF. Use python to compare every single value in these PDFs.
[Manual Checks] Ship the verification work oversea for cheaper and faster equivalence validation.

Sanjeeb2022 · May 31, 2023, 6:00pm

Hi @Yifei - Welcome to AWS QuickSight community and thanks for posting this interesting question. I believe there is NO straight forward approach available to understand or get the details in changing the logic of calculated field and find its impact across all dashboards. It has to be a manual process. The steps you have provided is interesting. Couple of things

Can we get all details of dashboard or analysis where the calculated field is used, I believe there is no simple process available as of now, like you mentioned we have to use boto3 API and describe the dashboard and see whether the calculated field is used or not, this will be a time taking process for sure. We need to see whether any meta data match can be done in the down the line which can provide this kind of info quickly.

Tagging some experts @David_Wong @Koushik_Muthanna @Karthik_Tharmarajan for their advise.

Regards - Sanjeeb

Yifei · June 1, 2023, 1:43pm

Hi Sanjeeb,

Thank you for the input and looking forward to more jumping in on this!

A1. I am afraid I cant provide ALL the details but we can narrow down the scope of the discussion with a specific scenario.

Specific Scenario
Calculated Field (CF): sumOver(A, [dim_1, dim_2, dim_3])
Two charts:
Chart 1: dim_1, dim_2 are in the pivot table
Chart 2: dim_1, dim_2, and dim3 are in the pivot table

Let’s say CF was initially designed for table 2, with the intention of investigating the sumOver with respect to dim_1 and dim_2. In this case, the calculation yields the same results in both charts.

Scenario Issue
Now the user added dim_4 to chart 1, and saw the dim_3 in the formular for CF. The user needs to know which charts are using CF, how would the user check that if there are hundreds of charts in a Dashboard? The proposed solutions can solve the problem but not in an efficient, scalable way.

Expanding Issue Scope
Let’s consider an even worse case, the user did not understand the purpose of dim_3 in CF, as the chart (chart 1) the user is working with only contains dim_1, dim_2. The user replaced dim_3 with dim_4 in CF. The user notices this change did generate correct results in chart 1. The user mark their task as finished, without noticing chart 2 (which could be in a different tab, or anywhere that’s not immediately visible to the user) has been changed and is now in a incorrect state.

Moving Forward
That was a specific case to help facilitate productive discussion. The bigger picture here is (preferably) automated Quality Control. We are seeing QA tools popping up as Data Infrastructure evolves. For example, GreatExpectations, etc. With QA issues that analysts face on a daily basis across different industries, it seems similar QA methodologies will be developed to solve these issue. It would be great if these issue can be addressed pre-emptively.

Sanjeeb2022 · June 1, 2023, 2:19pm

Thanks @Yifei . Basically you are looking for a lineage details and have a quick report or details report where the calculated field used. Unfortunately there is NO easy solution available at present by QuickSight and we can request for a feature request in down the line. However a custom script can be written which essentially scan all dashboard, analysis and dataset and find whether field is used in those object or not. This can be done using boto3 API’s. I never tried personally but it require good amount of analysis for sure.

Hi @Koushik_Muthanna @Karthik_Tharmarajan - Need your advise on this. If user wants to get a report on details which fields and calculated fields are used in QuickSight Dashboards, how efficiently user can get it. Is this something we can add as a feature request which will give this details from UI easy and simple way.

Regards - Sanjeeb

Yifei · June 1, 2023, 3:09pm

Totally! Having lineage/dependency information would certainly help. It would assist with the manual checks.

Is there any idea to QA the entire the dashboard in a scalable way? For example in software engineering, test cases are used to test against changes to the given code repository.

This does lure me to another question: Is my approach to this issue correct? I imaging a lot of other users would want some dependency information of the fields in a given dashboard. Are they doing something differently that dont require this piece of information?

EnriqueS · June 7, 2023, 1:01pm

Hello @Yifei , thanks for the feedback!!

At AWS, our roadmap is primarily driven by our customers. Your feedback helps us build a better service. I have tagged this as a feature request and our service team will consider to improve the tool in future releases.

Keep posted to the Whats new blog to be updated about this.

Thanks and happy dashboarding!