How to calculate average value based on distinct values of another field

I have data that looks like this:

Employee Employment Age
1 a 35
1 b 35
2 a 40
3 a 35
4 a 51
4 b 51

I want to calculate the average age of the individual employees. When I take the average value of the Age column then Employee 1 and 4’s ages are twice in data, giving me an average age of 41.2 .

I need a calculation that first get the Age value for distinct Employee then averages those Age values. This should give me an average age of 40.3.

Any idea about a formula that could do this?

You can do the average by considering just the first row for each employee, as follows:

  rank([{Employment ASC}, [{Employee}], PRE_AGG) = 1,

@darcoli. Thanks, It worked for me for this dataset. But It doesn’t work when the Employment column has duplicate value for an Employee.

When data is like below. It gives the incorrect average. So, in this scenario, the formula you have suggested doesn’t work.

Employee Employment Age
1 a 35
1 b 35
2 a 40
3 a 35
4 a 51
4 b 51
4 a 51
4 b 51

Ok so you basically could have duplicate rows. This would cause an issue with my previous approach because the duplicated rows could all have rank=1 and then the average would be incorrect.

Another approach could be to divide age by the number of rows for each employee. If we sum all these and divide by the count (distinct) of employees, then we would be working out the average.

Can you check if this works:

SUM( age / countOver(age, [{Employee}, {Employment}], PRE_AGG) )

Thanks @darcoli, Your suggested formula needed a slight change. It helped me to move in the right direction.
The below formula worked for me. So, instead of doing partition by both Employee and Employment. Need to partition by Employee only.

SUM( Age / countOver(Age, [Employee], PRE_AGG) )/distinct_count(Employee)

doh that was actually a typo from my end - had the right description :smiley:

1 Like

Hi, thank you so much, i was able to use your formula.
Do you maybe know how to create a calculated measure to get a Percentage of overlapping sets -i.e. the sets are not mutually exclusive. I have described my issue here:

I don’t think QuickSight is able to do it.
Many thanks in adavance!

@Eleri this is not really a related question and it would be better to post it as a separate question.

However, from what I understand, it boils down how you are organizing the data in your dataset. Example if you format your dataset as follows:

Person Symptom
person1 A
person1 B
person2 B
person2 C
person2 D
person3 NULL
person4 A

Then you can use something like:

  distinct_count(Person, [Symptom]) /   distinctCountOver(Person, [], PRE_AGG) 

and then putting this in the Values field well of a Table visual (with Symptom in the Group By)

thank you so much!!! i had posted it separately, but nobody had replied. This is the solution btw:

avg(distinct_count({user_id_string}, [{condition_name}] ))