How to Compare Percentages of toal by Different Rating Columns in Amazon QuickSight

defulamingge · October 7, 2024, 6:06pm

I want to create a bar chart where:

The X-axis represents ratings.
For each rating, there are two bars side by side: one for Category A and one for Category B.
The Y-axis shows the Value percentage for each category.
The sum of the value percentages for Category A and Category B across all ratings should be 100%.

the category A is a column with rating of one standard
the category B is another column with rating of another standard
for example for

row A: entity A, category A rating:5, category B rating:7, entity value: 50000
row B: entity b, category A rating:6, category B rating:6, entity value: 200000

I want the bar chart that can display value percentage of total of each category rating, which means for each rating in X-axis, there are two bars, one bar is Value percentage of total of category A, the another is value of percentage of total of Category B. I tried all the way I could , like percentTotal({value},[category A,categoryB]), percentTotal({value},[category A), they all have erro like “the result is table calculation attribute reference are missing in field wells”
Please help

defulamingge · October 8, 2024, 1:46pm

Anyone could help. I appreciate it

Brett · October 8, 2024, 6:37pm

Hi @defulamingge and welcome to the QuickSight community!
This is a fairly complex scenario that may depend on the data you have available within your dataset.

One quick question regarding your scenario; you mentioned that the sum of value percentages for Cat. A and Cat. B should be 100%…so ideally, your two different rating categories will combine to create 100% and you want to see what percentage each of the categories equal of that total per entity?

My initial thought is that you may be able to achieve by the following:
Set up a denseRank by a common field like Entity for example.

Then you can use that denseRank in the following to create 2 arbitrary groups based around the rank.
ifelse(maxOver({rank}, [], PRE_AGG)*(1/2) >= {rank}, "Category A", "Category B")

Last, you’ll want to utilize that in another calculated field that will provide you with the y-axis:
ifelse({Above Calc} = 'Category A', sumOver({value}, [{Category A}, {entity}], PRE_AGG)/sumOver({value}, [{entity}], PRE_AGG), {Above Calc} = 'Category B', sumOver({value}, [{Category B}, {entity}], PRE_AGG)/sumOver({value}, [{entity}], PRE_AGG), NULL)

Note that some of the fields utilized in this last calculated field will be dependent on how you name the first 2 and dependent on how fields are listed in your database.

Again, this may be dependent on your dataset and with the level of complexity, could be more beneficial if you were to upload a copy of your analysis with sample data to the QuickSight Arena view. That way we could potentially try out a few different scenarios and see how they respond for your case.

defulamingge · October 8, 2024, 8:12pm

Hi Brett,

Thanks for your detailed reply!

Due to data confidentiality, I’m unable to upload a sample directly, but I can provide an analogy to explain the scenario I’m working with. Here’s a simplified example:

What I want to visualize (need divided by total for each entry):

Rating	Rating A Value	Rating B Value
4	200,000	200,000
5	50,000	0
6	400,000	400,000
9	30,000	60,000

For each rating, there are two bars:

One bar representing the percentage of total for Category A.
One bar representing the percentage of total for Category B.
here is what I want, since there are only 4 rows, rating 4 and 6 are same, but with 100000 rows, they are bascally different.

The tricky part is that Rating A and Rating B are different, and whichever rating I place on the axis in QuickSight, I get an error about missing fields in the Field Well.

Here’s what I’ve done so far:

I duplicated the dataset: one with the Rating A column and one with the Rating B column.
Then, I created a new column to distinguish between Rating Type A or B.
I uploaded this new dataset, using a bar chart with “Rating Type” as the group/color, “Rating” as the X-axis, and percentage of total as the value.

This works, but duplicating the dataset causes the total values to double. If there’s a way to achieve this without duplicating the dataset, I’d love to learn it!

Thanks again for your help!

Brett · October 8, 2024, 8:45pm

Hi @defulamingge,
What are the total percentages going to be based off of?

Also, I’m a bit confused by your two table visuals and how you’re arriving at your Rating A and B values (in second table) based off the first table. So I assume ‘Rating’ is ‘Entity’; if Entity D’s Rating A is 9 and Rating B is 9, how do they have different values in the second table?

defulamingge · October 8, 2024, 9:09pm

Hi @Brett ,

Thank you for the question. The percentages are based on the total value of the “Value” column. Each entity represents something like a person who has been given different ratings by two organizations (A and B). Rating A and Rating B could either be the same or different, but both share the same range (let’s say 4-12).

For the visualization, I’m using “Rating” as the X-axis. One bar represents the total percentage of value for all rows where Rating A matches the value on the X-axis, and the other bar represents the total percentage of value for all rows where Rating B matches the value on the X-axis.

For example, let’s say Entity A has a value of 50,000, with a Rating A of 5 and a Rating B of 6. In the visualization, the value of Entity A would contribute to Rating 5 for Rating A and also to Rating 6 for Rating B. The numerator in both cases would be the entity’s value (50,000), and the denominator is the sum of all entity values across all ratings. So, the sum of percentages for Rating A would total 100%, and the same goes for Rating B.

I hope this clarifies things!

Brett · October 8, 2024, 9:48pm

Hi @defulamingge, Thank you for the additional explanation, this makes a bit more sense now.

What if you tried something like the following:

You’ll want to start by creating a denseRank calculation to provide a rank value to Entity (or some kind of ID…whatever has the most values)

denseRank([Entity DESC], [], PRE_AGG)

Then, create A and B like this:

ifelse({Rank} <= maxOver({Rank}, [], PRE_AGG)*(1/2), "A", "B")

So you’ll use the Rating field for your X-axis, the last calc. field (to create A & B) as your ‘Color By’ field, and then for your Y-axis, you could use:

ifelse({Rank Calc} = "A", sumOver({Rating A Value}, [{Rating}], PRE_AGG)/sumOver({Rating A Value}, [], PRE_AGG), {Rank Calc} = "B", sumOver({Rating B Value}, [{Rating}], PRE_AGG)/sumOver({Rating A Value}, [], PRE_AGG), NULL)

defulamingge · October 9, 2024, 3:38am

Hi @Brett ,

Thank you again for your thoughtful reply and suggestion. I appreciate the effort in outlining the approach with denseRank and grouping into “A” and “B” categories, but after reviewing it further, I believe this method might not be suitable for my specific scenario. Here’s why:

No Natural Ranking of Entities: In my case, each entity (think of it like a person or object) receives two scores from different sources, let’s call them Score A and Score B. These scores might be the same or different, but there’s no inherent ranking of the entities themselves, so applying a ranking like denseRank doesn’t really help in this context.
Already Working with Two Independent Groups: The main goal is to visualize and compare Score A and Score B independently, as both relate to the same entities but come from two different sources. The comparison needs to show how each score is distributed for the same range of values. Splitting the data into “A” and “B” groups through ranking doesn’t align with the nature of the task, as it risks losing the full comparison between Score A and Score B.
Quarterly Data Adds Complexity: Additionally, the data is collected on a quarterly basis, meaning each entity appears multiple times across different quarters. So, applying a rank might complicate things further since the entities are repeated and should be evaluated independently for each quarter. What’s more important is comparing the percentage contribution of Score A and Score B across the same range rather than splitting them into ranked groups.

Given these points, a simpler approach—directly calculating and visualizing the percentage of values for both Score A and Score B within the same range—seems more suitable for my scenario. This allows for a clear side-by-side comparison without the additional complexity of ranking.

Thanks again for your help, and I’m happy to discuss further!

Brett · October 10, 2024, 3:19pm

Hi @defulamingge,
I understand. With the way the data is currently set-up though, something like the above mentioned, would be the way to accomplish this.
Now, if you were able to alter the table further and have columns like this:

Entity Ranking Ranking Type Value
A 4 A 300,000
A 6 B 300,000
B 8 A 600,000
B 2 B 200,000

With a dataset like this, making the visual will be a lot simpler.
X-Axis could be the Ranking,
Y-Axis can be the percent value:
sumOver({Value}, [{Entity}, {Ranking}, {Ranking Type}], PRE_AGG)/sumOver({Value}, [{Entity}, {Ranking Type}], PRE_AGG)
And the Group/Color by will just be the Ranking Type field.

If you want to simplify the process of visual creation, that data will need to be adjusted for this visual layout!

Brett · October 21, 2024, 3:39pm

Hi @defulamingge,
It’s been awhile since we last heard from you so following up to see if you had any additional questions?

If we do not hear back within the next 3 business days, I’ll go ahead and close out this topic.

Thank you!

Brett · October 24, 2024, 5:21pm

Hi @defulamingge,
Since we haven’t heard back, I’ll close out this topic. However, if you have any additional questions, feel free to create a new post in the community and link this discussion for relevant information if needed.

Thank you!