Percentile question - QuickSight vs Athena

Hi team,

I’m working with some data and trying to calculate percentile (75th), and I’m to checking it in QS vs Athena.

In Athena I’m using approx_percentile and in QS I’m using percentileContOver and percentDiscOver, but I’m running into some isses as the numbers are not tieing out.

Can someone please help me on what function QS is using in Athena to caclculate both percentileContOver and percentDiscOver? I dont believe it is using the approx_percentile. I have checked “Recent Queries” in Athena, but can’t seem to find what is using.

I’m using data in QS as Direct Query and have loaded my table from Athena where my data is stored.

Any links to documentation or if people could share experiences with this, woudl be very helpful.

Regards,
Q

what are your calculated fields in quicksight and what is it in athena?

Here is documentation for quicksight:

Here is it for Athena:

https://prestodb.io/docs/current/functions/aggregate.html

2 Likes

Hey @Max

Thanks for the reply (as always! :slight_smile: ).

Not sure I understand what you mean? Do you want to see the calculation or do you want to see the values used in the calculation?

I’m trying to work out what the top (75th) quartile is for our sales people for a given product category and ccy , based on daily average sales. So the question im trying to answer is - how many sales do I need to make (on average per day) to be in the top quartile of sales for the given category.

I have done a little more work today and it seems the difference I was having was coming from the categories (or values) I was including in my partition calculation.

The differing numbers appear to be coming from the percentile calcualtion methodology. I understand that some calculations will consume all values in the range, then use the 75th percentile value (using an exect value from the range) - others may use a calculation where they exclude the median, then provide a percentile of the numbers. Some other will calculate the 75th pc value and where there is at tie, add the two values and divide by 2 - some other may interpolate similar to the last calculation I mentioned.

I’m just trying to understand which calculation approach the below are using:

  • approx_percentile (athena) - the presto link doesnt appear to define this (from what I can find)
  • percentileContOver (QS)
  • percentDiscOver (QS)

For reference, I have used excel and python to try and find the correct percentile value - each with differing results (admittedly, some are the same, but others differing. I’m trying to find out the reason why I see the differing results - if i can explain these, then I can have some comfort in the query and calculations.)

Basically, I need to know what is happening ‘under the covers’ so I can explain the quartilel value (if and when questioned)

Help appreicated!
Regards,
Q

@QSCommUser -
This link provides an example of how PercentContOver works.

This link provides an example of how PercentDiscOver works.

The PRE_FILTER, PRE_AGG or POST_AGG_FILTER will affect the calculations. Does this help give you the information you need to determine how the calculation is evaluating?

1 Like