How to calculate average value based on distinct values of another field

Bharat_Kumar_Jangir · February 23, 2022, 8:00pm

I have data that looks like this:

Employee	Employment	Age
1	a	35
1	b	35
2	a	40
3	a	35
4	a	51
4	b	51

I want to calculate the average age of the individual employees. When I take the average value of the Age column then Employee 1 and 4’s ages are twice in data, giving me an average age of 41.2 .

I need a calculation that first get the Age value for distinct Employee then averages those Age values. This should give me an average age of 40.3.

Any idea about a formula that could do this?

darcoli · February 23, 2022, 10:36pm

You can do the average by considering just the first row for each employee, as follows:

ifelse(
  rank([{Employment ASC}, [{Employee}], PRE_AGG) = 1,
  {Age},
  NULL
)

Bharat_Kumar_Jangir · February 24, 2022, 6:58am

@darcoli. Thanks, It worked for me for this dataset. But It doesn’t work when the Employment column has duplicate value for an Employee.

When data is like below. It gives the incorrect average. So, in this scenario, the formula you have suggested doesn’t work.

Employee	Employment	Age
1	a	35
1	b	35
2	a	40
3	a	35
4	a	51
4	b	51
4	a	51
4	b	51

darcoli · February 24, 2022, 9:37pm

Ok so you basically could have duplicate rows. This would cause an issue with my previous approach because the duplicated rows could all have rank=1 and then the average would be incorrect.

Another approach could be to divide age by the number of rows for each employee. If we sum all these and divide by the count (distinct) of employees, then we would be working out the average.

Can you check if this works:

SUM( age / countOver(age, [{Employee}, {Employment}], PRE_AGG) )
/
DISTINCT_COUNT({Employee})

Bharat_Kumar_Jangir · March 4, 2022, 6:08pm

Thanks @darcoli, Your suggested formula needed a slight change. It helped me to move in the right direction.
The below formula worked for me. So, instead of doing partition by both Employee and Employment. Need to partition by Employee only.

SUM( Age / countOver(Age, [Employee], PRE_AGG) )/distinct_count(Employee)

darcoli · March 4, 2022, 10:08pm

doh that was actually a typo from my end - had the right description

Eleri · July 7, 2022, 12:11pm

Hi, thank you so much, i was able to use your formula.
Do you maybe know how to create a calculated measure to get a Percentage of overlapping sets -i.e. the sets are not mutually exclusive. I have described my issue here:

I don’t think QuickSight is able to do it.
Many thanks in adavance!

darcoli · July 7, 2022, 1:21pm

@Eleri this is not really a related question and it would be better to post it as a separate question.

However, from what I understand, it boils down how you are organizing the data in your dataset. Example if you format your dataset as follows:

Person	Symptom
person1	A
person1	B
person2	B
person2	C
person2	D
person3	NULL
person4	A

Then you can use something like:

  distinct_count(Person, [Symptom]) /   distinctCountOver(Person, [], PRE_AGG)

and then putting this in the Values field well of a Table visual (with Symptom in the Group By)

Eleri · July 7, 2022, 2:45pm

thank you so much!!! i had posted it separately, but nobody had replied. This is the solution btw:

avg(distinct_count({user_id_string}, [{condition_name}] ))
/avg(distinctCountOver
(
{user_id_string},
[], PRE_AGG
))

Topic		Replies	Views
Sum by Unique Value Not Q&A calculations	2	1005	September 29, 2022
SumOver -Summing distinct entries from a table with duplicate rows Q&A analysis , calculations	10	7538	July 1, 2022
Calculated measure help - need total average across total of all itemIDs Q&A calculations , how-to	9	2034	August 10, 2022
Count distinct numbers and email ids and make in a measure so that we can take average of those Q&A error , calculations , how-to	8	388	September 12, 2023
Avg nested with percentileContOver not giving expecpted results Q&A calculations	2	244	January 17, 2023

How to calculate average value based on distinct values of another field

Related topics