How to calculate average value based on distinct values of another field

I have data that looks like this:

Employee Employment Age
1 a 35
1 b 35
2 a 40
3 a 35
4 a 51
4 b 51

I want to calculate the average age of the individual employees. When I take the average value of the Age column then Employee 1 and 4’s ages are twice in data, giving me an average age of 41.2 .

I need a calculation that first get the Age value for distinct Employee then averages those Age values. This should give me an average age of 40.3.

Any idea about a formula that could do this?

You can do the average by considering just the first row for each employee, as follows:

ifelse(
  rank([{Employment ASC}, [{Employee}], PRE_AGG) = 1,
  {Age},
  NULL
)

@darcoli. Thanks, It worked for me for this dataset. But It doesn’t work when the Employment column has duplicate value for an Employee.

When data is like below. It gives the incorrect average. So, in this scenario, the formula you have suggested doesn’t work.

Employee Employment Age
1 a 35
1 b 35
2 a 40
3 a 35
4 a 51
4 b 51
4 a 51
4 b 51

Ok so you basically could have duplicate rows. This would cause an issue with my previous approach because the duplicated rows could all have rank=1 and then the average would be incorrect.

Another approach could be to divide age by the number of rows for each employee. If we sum all these and divide by the count (distinct) of employees, then we would be working out the average.

Can you check if this works:

SUM( age / countOver(age, [{Employee}, {Employment}], PRE_AGG) )
/
DISTINCT_COUNT({Employee})
3 Likes

Thanks @darcoli, Your suggested formula needed a slight change. It helped me to move in the right direction.
The below formula worked for me. So, instead of doing partition by both Employee and Employment. Need to partition by Employee only.

SUM( Age / countOver(Age, [Employee], PRE_AGG) )/distinct_count(Employee)

doh that was actually a typo from my end - had the right description :smiley:

1 Like