Segments based on activity in a specified time period for cohort analysis

Hello everyone,

I would need help to try understanding how to perform a basic segmentation for a cohort report.
I watched the cohort tutorial video but it does not explain what I am trying to achieve, and while the logic behind it is pretty basic to me, I cannot wrap my head around the way to get it in QuickSight.

My starting data is a table containing daily users transactions, and what I want to achieve is to have segments based on the amount spent in a specified time interval.
For example I would select thirty days as time interval.
My segments would be based on how much the users have spent in this time window: between €0 and €100, between €101 and €500 etc…
I would then show them either in a (pivot) table or in a column chart, so that I can see that in the last 30 days, the users who used to spend over €1,000 have dropped, while the smaller ones have increased.
I have been able to segment based on lifetime spend, but not on custom intervals.

My starting dataset is in this format:

Date User Spent Reversed
01/01/2023 123 €250 €0
02/01/2023 123 €250 €0
03/01/2023 123 €250 €0
04/01/2023 123 €300 €0
05/01/2023 123 €350 €50
06/01/2023 123 €350 €0
01/01/2023 456 €500 €0
02/01/2023 456 €400 €0
03/01/2023 456 €500 €100
04/01/2023 456 €100 €0
05/01/2023 456 €100 €0
06/01/2023 456 €200 €50
01/01/2023 789 €50 €0
02/01/2023 789 €50 €0
03/01/2023 789 €50 €0
04/01/2023 789 €100 €0
05/01/2023 789 €100 €0
06/01/2023 789 €100 €0

I would want to segment based on “Spent - Reversed”:

  1. €0 - €500
  2. €500 - €1,000
  3. €1,000+

If I break it into two three-day time periods, I would get that player 123 is in segment 2 for both the first (€750) and the second (€950) period. While 456 goes from segment 1 (€1,300) to segment 3 (€300).

So if I checked the unique users count I would get:

Segment P1 P2
€0 - € 500 1 2
€500 - €1,000 1 1
€1,000+ 1 0

Now, if I wanted to get segments based on lifetime data I would do:

revenue
({Spent} - {Reversed})

lifetime_revenue
sumOver({Revenue}, [{User}], PRE_AGG)

revenue_segments
ifelse({lifetime_revenue} >= 1000, '03. €1,000+', {lifetime_revenue} > 500, '02. €500 - €1,000', '01. €0 - €500')

Now I could use revenue_segments and count the unique users, or split the total revenues.

But how do I make it so the segment is for a period only?
Of course I cannot use PRE_AGG so I thought about:
sumOver(sum({Revenue}), [{User}])
And since it did not work out, I tried several other ways, but even though I am sure this problem has an easy solution, I cannot seem to figure it out (in several attempts I got the error that I cannot use a custom aggregation field as a dimension).

Thank you and sorry for the long post, trying to give the full picture.

Thank you for a very precise presentation of your case, and your initial efforts.

Having only vague ideas on how to proceed, the fact that you wish to distinguish between periods of time, it immediately strikes me, that you need to have the dataset field “Date” brought into play. More precisely, that field should, somehow, be part of your calculations.

Second, since you wish to count the occurences of user transactions, within a segment, within a time period, and display that, then the calculated field you introduce for that, needs to be converted to a dimension, by right-clicking in the field menu.

Hope these thoughts are able to bring you at least a bit forward,

Cheers
Kåre

1 Like

Hi @kaare,

First of all thank you for spending your time looking into this!

I do expect as well the date to be part of the calculation. My idea was to use the sumOver with the time window specified, but unfortunately I am not able to understand how to make that work.

With my current calculation, which I am pretty sure is wrong, I cannot convert the measure to dimension.

As a first attempt I tried to check the monthly amounts (then I’d have tried to check custom periods instead, which adds complexity), with a formula similar to my lifetime_revenue above, so:

sumOver(sum({Revenue}), [{User}, truncDate('MM', {Date})])

I was expecting that this formula would return the monthly sum of revenue per user, which I could then use in the revenue_segments formula, but that did not work.

Thank you again.

1 Like

@Massi thanks for sharing the details, since you want to calculate the revenue by segment, you need to include the Period as part of the formula.
For instance * ifelse(sumOver(Revenue, [User,Period], PRE_AGG) >= 1000, ‘03. €1,000+’,
sumOver(Revenue, [User,Period], PRE_AGG) > 500, ‘02. €500 - €1,000’, ‘01. €0 - €500’)*

In the above example, I has added a column “Period” in my data source.

Can I understand more on your use case, do you want to have a Date Range selection filter(e.g. 30days) in the dashboard. And users can also select number of days for the period(e.g. 3days in your example)?
According to the number of days and date range selected, the table can break into multiple periods dynamically? In the example, the first 3days(Jan1-3) will be period 1, Jan4-6 will be period 2… etc.
Or the period will be a fixed one?

2 Likes

Hi @royyung

Good questions that you ask here - How the date ranges are required to be, is an important thing to know.

On the side of this, could you please share the definition of your calculated field “Period”?

Cheers,
Kåre

1 Like

Hello @royyung and @kaare!

In an ideal world I would be able to select custom periods.
My aim is to add parameters to select the time frame, the segment thresholds (that is pretty easy as I just need to replace the hard-coded amounts with parameters) and limits periods.

So how would this work is:

  1. User selects the time frame: E.g. from 1st of May 2022 up to yesterday;
  2. User selects the period split: say it could be a ten days stretch, or a 30 day stretch, starting from the date I selected as a start;
  3. User selects the revenue segments: if we are looking at 10 days periods for example, the segments could be lower since we assume the revenues per user won’t be the same as what they would be in 30 days;

If this is not going to be feasible, then I will need to rely on fixed values.
In this case I am thinking that I will most probably need to run some analysis beforehand, say in Excel, selecting different periods and segments, check which one is the most meaningful and then build the dashboard around it. It is not ideal, but again, if it is impossible to do better, I will go with that route.

@royyung: You mention that you added a Period column, may I ask you how you calculated that, or if you added it as a fixed amount? I guess that logically I need to:

  1. Add the Period column based on the parameter: so, starting from the first date in my range, check the date field to calculate the difference and return period it falls under. Now, assuming that my starting date parameter is StartDate and my period amount selector is PeriodSelect, I guess that I would need a formula to check the difference between Date and StartDate, divide the result by PeriodSelect, and get the floor of this amount. (For simplicity I am not considering dates earlier than my StartDate, which can then be filtered out.)
  2. Check the segments calculating the revenues in the specified period. I see that you use PRE_AGG which I thought would not return the correct result since we are aggregating by period but seems like it is working in your example, so probably that is fine (I did not apply the formula to my dataset, yet).

I hope I did not miss anything, and thank you again for the help! :slight_smile:

1 Like

I used the logic above and seems like I managed :slight_smile:

I created the Period field:
floor(dateDiff(${StartDate}, {Date}, "DD") / ${PeriodSelect})

I then used your formula, @royyung, to segment the players, adding parameters so that everything could be customized.

I had an issue with a distinct count since I was using a filter but no time window function there, but after a lot of cursing I managed.

@royyung Thanks for the formula you gave me and the tip that I needed a Period column!

Thanks everyone!

2 Likes