sumOver function resulting in incorrect median

Hi All,

I am trying to show average AOV by customer (ie, getting the average order value for each customer, then taking the median of this), and then plotting this over time, with multiple lines representing each category, your standard line graph.

When I put these AOV values beside date, category and customer ID it calculates the correct values.

However the function I am using to calculate AOV does not produce the correct median value:

(
sumOver
(
     ({revenue})
     ,[{category}, {date}, {customer_id}],
     PRE_AGG
)
/
sumOver
(
     (orders)
     ,[{category}, {date}, {customer_id}],
     PRE_AGG
))

When I put this calculation into my graph and select aggregate via medium, for a given date and category it returns the 7th highest AOV out of a list of 9 AOV’s when it should be returning the 5th highest (median). I have checked this for different dates and it’s always returning the AOV of one of the customers… but not the median, it’s always off, either too high or too low.

Has anyone seen this before?

What I think is happening is that it isn’t looking at just your groupings but all of the rows.

I’m assuming that instead of 9 AOV’s there are a lot more. There might be 9 distinct AOV’s but depending on how many rows have each AOV it will skew to a new AOV.

Can you try to wrap it in an median expression and group it by customer?

Hi Max,

I have wrapped the calculation in a median expression, but it gives me the same result. It returns one of the AOV’s, but not the median AOV.

This is what my calculation looks like now:

median(
(
sumOver
(
     ({revenue})
     ,[{category}, {date}, {customer_id}],
     PRE_AGG
)
/
sumOver
(
     (orders)
     ,[{category}, {date}, {customer_id}],
     PRE_AGG
)), [{category}, {date}])

I need my median to be grouped by date + category rather than customer ID, as I want to produce the median for a given date + category, calculated by looking at the AOV’s for the customer_ids and taking the median.

Another thing I have noticed, if I calculate the mean via the same method, it’s also off, is lower than it should be. The values of the median and mean suggest there are hidden lower values that are not showing but being used in the calculation which is strange.

Yes, so I’m assuming there are rows that have multiple of the same value.

Like such

Column A | Column B | Value

a | b | 10
a | b | 10
a | b | 10
a | a | 5
a | c | 2

Median value should be 5 but it will instead show 10.

Sorry, did you try and group it by the customer_id?