I want to get the average amount per day and category.
In the example below, it should be 1350/6 = 225 in average for taxi per day.
Considering I canât change my data structure and adjust it to this specific calculation (doing group by in the data prep layer etc.), what can be the suggested calculation?
What this is doing it take the sum of amounts per category, and then dividing by the sum of the days where the days are averages over Category, Start Date, and End Date. This denominator should work as how you expected it by taking just the averages of days from start date to end date and summing them.
By âaverage by daysâ you mean all distinct dates in either Start or End date, right? This is how you end up with 6 as the denominator: Oct 20, 21, 25, 28 and Nov 5 and 7. (There is another way to look at it: include all days, even the missing dates. Or we can we include the dates between start to end dates and we end up with more than 6 days as the denominator.)
Considering there are duplicate dates (start and end dates), we need first to find a way to calculate the number of distinct dates. For example, for the first 2 rows, you want to sum up the amounts (100+600) but use only one day when you calculate the average.
If there are no overlapping dates on different rows, then we use a
distinct_count({Start Date}, [Category]) *2
to get the denominator (the number of distinct dates). I multiplied by 2 because for every Start Date there is another distinct End Date that needs to be included in the denominator (ânumber of daysâ)