# Calculating the percentage difference against the initial distinct count and not the previous column in a Pivot Table

Hi all,

I am trying to create a table showing the current retention against initial month of purchase.
Using pivot tables I can find the percentage difference against the previous month but not against the initial month.
This is my data (simplified):

Date User First Purchase Type Amount Q.ty
01/07/2022 10:00 001 01/02/22 09:20 Purchase 50 1
01/07/2022 12:00 001 01/02/22 09:20 Sale 50 1
02/07/2022 09:38 001 01/01/22 09:20 Purchase 100 2
03/07/2022 15:46 002 03/01/22 20:20 Purchase 50 1
04/07/2022 12:10 003 04/07/22 12:10 Purchase 50 1
04/07/2022 19:50 004 19/06/22 05:01 Purchase 50 1

So I’ve got a table with different transactions (at daily level) and I also have the users first purchase date field, which is what I care about.
A pivot table aggregating data by month would look like this (let’s assume it’s only three months):

First Purchase Month 2022-05 2022-06 2022-07
2022-05 200 150 100
2022-06 NULL 120 90
2022-07 NULL NULL 250

Which means that in May we had 200 First-time-purchasers. Then 150 of these made a purchase in June, so 75%, while in July, 100 of those initial 200 made a purchase, so it’s 50%.
The table would then look like this:

First Purchase Month 2022-05 2022-06 2022-07
2022-05 100% 75% 50%
2022-06 NULL 100% 75%
2022-07 NULL NULL 100%

(Showing NULL instead of 100% for the first month works as well.)

Is there any way to get this in QuickSight?
I am thinking that I would need to use distinctCountOver pre-aggregating the data, but the fact that I’ve daily data and need to pre-aggregate it monthly is throwing me off, I was guessing I could use formatDate and Extract but I cannot find the way to make it work (if that is even the way).

Thank you!

Hi Massi- You could accomplish this using calculated fields. You could find first purchase date by customer with a minOver calculation similar to

minOver({Order Date}, [Customer], PRE_AGG)

You could then using if else and/or datediff to calculate your counts for your months.

This blog on QuickSight level aware calculations may be helpful - Calculated fields, level-aware aggregations, and evaluation order in Amazon QuickSight | AWS Big Data Blog

First of all thank you for finding the time to reply.
Actually I do not thing I need the minOver calculation since I have the first purchase date as a field (as per my example), so that is already calculated.

What I would need to find is how to calculate unique customers who deposited during the month and whose first deposit was equal to selected month, divided all customers who had deposited in that selected month. Maybe it’s not even that hard but I cannot find how to do that (but yes, I believe I need PRE_AGG for the distinct count of users who deposited in a selected month).

Hi @Massi -

You are almost there, Kellie’s answer is a LAC-W calculation, this is needed if you didn’t have that “first_purchase” column, since it allows you to create a “window” across the dataset in relation to the current row. You just need the ifelse + dateTrunc functions she mentioned.

Breaking down your requirements. I don’t think you need a LAC-W function
. You can just use the new LAC-A functions.

calc 1 - “What I would need to find is how to calculate unique customers who deposited during the month”

c_LAC-A_customers_dcnt
`distinct_count(user,[truncDate("MM",date)])`

calc 2 - “divided all customers who had deposited in that selected month”

c_customers_withfirstdeposit_dcnt
`distinct_count(ifelse(truncDate('MM',{first_purchase})=truncDate('MM',date),user,null))`

Example:

Hi @robdhondt,

Thank you for the help
Unfortunately that does not work for me, and it’s close to something I had already come up with. Let’s take the second formula, I had already tried something similar using:

`distinct_countIf({User}, truncDate('MM', {First_Purchase}) = truncDate('MM', {_Date})) `

The problem with these formulas is that I get:

First Purchase Date 2022-05 2022-06 2022-06
2022-05 200 NULL NULL
2022-06 NULL 150 NULL
2022-07 NULL NULL 250

This is because the formula is counting the distinct user who were first time purchasers and the month of first purchase was the same as the date.
What the data should look like is:

First Purchase Date 2022-05 2022-06 2022-06
2022-05 200 200 200
2022-06 NULL 150 150
2022-07 NULL NULL 250

This is why I was thinking that here I’d need a level aware aggregation.

I need to check the other formula, since what I need is that the user is a purchaser in the current month (based on the column) AND was a first time purchaser based on the row.

In the meantime thank you again!

Ciao,
Massimo.

Hi @Massi -

I understand. This is a really cool use case!

There’s an added level of complexity here where you want to identify the point where a customer stops having consecutive purchases (not retained). QuickSight today doesn’t support a level-aware running calculations.

You can get around this with some creativity by using the Rank function which is a level-aware calculation.

There are probably some steps that can be cleaned up but leaving these expanded to help explain the method.

Step 1 - Create a calc field to check if the first purchase is <= the purchase month.
c_1_purchase_first_purchase_beforedt
`ifelse(truncDate('MM',{first_purchase})<=truncDate('MM',date),1,0)`

Step 2 - Create a calc field to count the months between the first purchase and purchase month
c_2_purchase_months_since_first_purchase
`maxover(dateDiff({first_purchase},date,'MM')+1,[user, truncDate('MM',date)], PRE_AGG)`

Step 3 - Use the rank function I mentioned to get the number of consecutive months with a purchase
c_3_purchase_running_months_with_purchase
`rank([ifelse({c_1_purchase_first_purchase_beforedt}>0,user,null) ASC, truncDate('MM',date) ASC],[user], PRE_AGG)`

Step 4 - distinct count of purchasers retained
c_4_purchase_retained_cnt
`distinct_count(ifelse({c_2_purchase_months_since_first_purchase}={c_3_purchase_running_months_with_purchase},user,null))`

Step 5 - get number of purchasers in the initial purchase month… a simple max works here because the first value is always the highest
c_5_max_purchase_retained_cnt
`maxOver({c_4_purchase_retained_cnt},[truncDate('MM',{first_purchase})],POST_AGG_FILTER)`

Step 6 - create your retained % … initial purchasers retained / beginning initial purchasers
c_6_percent_retained
`{c_4_purchase_retained_cnt}/{c_5_max_purchase_retained_cnt}`

Result: