Calculating the percentage difference against the initial distinct count and not the previous column in a Pivot Table

Massi · July 11, 2022, 12:57pm

Hi all,

I am trying to create a table showing the current retention against initial month of purchase.
Using pivot tables I can find the percentage difference against the previous month but not against the initial month.
This is my data (simplified):

Date	User	First Purchase	Type	Amount	Q.ty
01/07/2022 10:00	001	01/02/22 09:20	Purchase	50	1
01/07/2022 12:00	001	01/02/22 09:20	Sale	50	1
02/07/2022 09:38	001	01/01/22 09:20	Purchase	100	2
03/07/2022 15:46	002	03/01/22 20:20	Purchase	50	1
04/07/2022 12:10	003	04/07/22 12:10	Purchase	50	1
04/07/2022 19:50	004	19/06/22 05:01	Purchase	50	1

So I’ve got a table with different transactions (at daily level) and I also have the users first purchase date field, which is what I care about.
A pivot table aggregating data by month would look like this (let’s assume it’s only three months):

First Purchase Month	2022-05	2022-06	2022-07
2022-05	200	150	100
2022-06	NULL	120	90
2022-07	NULL	NULL	250

Which means that in May we had 200 First-time-purchasers. Then 150 of these made a purchase in June, so 75%, while in July, 100 of those initial 200 made a purchase, so it’s 50%.
The table would then look like this:

First Purchase Month	2022-05	2022-06	2022-07
2022-05	100%	75%	50%
2022-06	NULL	100%	75%
2022-07	NULL	NULL	100%

(Showing NULL instead of 100% for the first month works as well.)

Is there any way to get this in Quick Sight?
I am thinking that I would need to use distinctCountOver pre-aggregating the data, but the fact that I’ve daily data and need to pre-aggregate it monthly is throwing me off, I was guessing I could use formatDate and Extract but I cannot find the way to make it work (if that is even the way).

Thank you!

Kellie_Burton · July 11, 2022, 10:02pm

Hi Massi- You could accomplish this using calculated fields. You could find first purchase date by customer with a minOver calculation similar to

minOver({Order Date}, [Customer], PRE_AGG)

You could then using if else and/or datediff to calculate your counts for your months.

This blog on Quick Sight level aware calculations may be helpful - Calculated fields, level-aware aggregations, and evaluation order in Amazon QuickSight | AWS Big Data Blog

Massi · July 12, 2022, 1:25pm

Hi @Kellie_Burton,

First of all thank you for finding the time to reply.
Actually I do not thing I need the minOver calculation since I have the first purchase date as a field (as per my example), so that is already calculated.

What I would need to find is how to calculate unique customers who deposited during the month and whose first deposit was equal to selected month, divided all customers who had deposited in that selected month. Maybe it’s not even that hard but I cannot find how to do that (but yes, I believe I need PRE_AGG for the distinct count of users who deposited in a selected month).

robdhondt · July 12, 2022, 4:27pm

Hi @Massi -

You are almost there, Kellie’s answer is a LAC-W calculation, this is needed if you didn’t have that “first_purchase” column, since it allows you to create a “window” across the dataset in relation to the current row. You just need the ifelse + dateTrunc functions she mentioned.

Breaking down your requirements. I don’t think you need a LAC-W function
. You can just use the new LAC-A functions.

calc 1 - “What I would need to find is how to calculate unique customers who deposited during the month”

c_LAC-A_customers_dcnt
distinct_count(user,[truncDate("MM",date)])

calc 2 - “divided all customers who had deposited in that selected month”

c_customers_withfirstdeposit_dcnt
distinct_count(ifelse(truncDate('MM',{first_purchase})=truncDate('MM',date),user,null))

Example:

Massi · July 13, 2022, 11:21am

Hi @robdhondt,

Thank you for the help
Unfortunately that does not work for me, and it’s close to something I had already come up with. Let’s take the second formula, I had already tried something similar using:

distinct_countIf({User}, truncDate('MM', {First_Purchase}) = truncDate('MM', {_Date}))

The problem with these formulas is that I get:

First Purchase Date	2022-05	2022-06	2022-06
2022-05	200	NULL	NULL
2022-06	NULL	150	NULL
2022-07	NULL	NULL	250

This is because the formula is counting the distinct user who were first time purchasers and the month of first purchase was the same as the date.
What the data should look like is:

First Purchase Date	2022-05	2022-06	2022-06
2022-05	200	200	200
2022-06	NULL	150	150
2022-07	NULL	NULL	250

This is why I was thinking that here I’d need a level aware aggregation.

I need to check the other formula, since what I need is that the user is a purchaser in the current month (based on the column) AND was a first time purchaser based on the row.

In the meantime thank you again!

Ciao,
Massimo.

robdhondt · July 13, 2022, 4:57pm

Hi @Massi -

I understand. This is a really cool use case!

There’s an added level of complexity here where you want to identify the point where a customer stops having consecutive purchases (not retained). Quick Sight today doesn’t support a level-aware running calculations.

You can get around this with some creativity by using the Rank function which is a level-aware calculation.

There are probably some steps that can be cleaned up but leaving these expanded to help explain the method.

Step 1 - Create a calc field to check if the first purchase is <= the purchase month.
c_1_purchase_first_purchase_beforedt
ifelse(truncDate('MM',{first_purchase})<=truncDate('MM',date),1,0)

Step 2 - Create a calc field to count the months between the first purchase and purchase month
c_2_purchase_months_since_first_purchase
maxover(dateDiff({first_purchase},date,'MM')+1,[user, truncDate('MM',date)], PRE_AGG)

Step 3 - Use the rank function I mentioned to get the number of consecutive months with a purchase
c_3_purchase_running_months_with_purchase
rank([ifelse({c_1_purchase_first_purchase_beforedt}>0,user,null) ASC, truncDate('MM',date) ASC],[user], PRE_AGG)

Step 4 - distinct count of purchasers retained
c_4_purchase_retained_cnt
distinct_count(ifelse({c_2_purchase_months_since_first_purchase}={c_3_purchase_running_months_with_purchase},user,null))

Step 5 - get number of purchasers in the initial purchase month.. a simple max works here because the first value is always the highest
c_5_max_purchase_retained_cnt
maxOver({c_4_purchase_retained_cnt},[truncDate('MM',{first_purchase})],POST_AGG_FILTER)

Step 6 - create your retained % .. initial purchasers retained / beginning initial purchasers
c_6_percent_retained
{c_4_purchase_retained_cnt}/{c_5_max_purchase_retained_cnt}

Result:

Topic		Replies	Views
Calculating the percentage difference against first column in a Pivot Table Q&A calculations , pivot-table	3	1306	August 25, 2023
Calculating retention % Q&A analysis , period-over-period , calculations , pivot-table , Business-Intelligence-Engineer	8	860	April 16, 2024
Pivot table % error Q&A calculations	4	361	October 5, 2023
How to add month over month percentage difference in Pivot table when the table sorted Q&A calculations , pivot-table	2	614	March 13, 2023
Percent of Total( Initial Cohort) Q&A calculations , Business-Intelligence-Engineer	2	37	November 11, 2024

Calculating the percentage difference against the initial distinct count and not the previous column in a Pivot Table

Related topics