Cumulative Distinct Count for a given time period

I have a dataset like this

I want to create a line graph with Cumulative distinct count of people by year. For example, Purple line represents cumulative distinct number of people by month for year 2023 and pink line represents year 2024 in the line chart.

I have to set the date range between 2023 and 2024 in order to calculate the cumulative distinct count for 2023 using PeriodOverPeriodLastValue, then I hide the 2023 on the x-axis. However, I have some issues to calculate the cumulative distinct count for 2024. One person can join an event multiple times, so I use MinOver to calculate the first event date and partition by name and event_year.

This calculated field still doesn’t calculate the distinct cumulative count. For example, if Tom joins 2 events - one on Jan 2 2024 and the other on Feb 1 2024, then Tom would be counted twice in the line chart for both Jan and Feb 2024, but he should be counted only once in Jan.

Is there any way to modify this calculated field?

Hello @kiko !

My thought here initially is that you could try adding another attribute to your distinct count function to partition by. So maybe adding event_id.

I think the tricky part here is convincing the table that it should only count one completion event in a year rather than month by month.

Hey @kiko !

Were you able to try my suggestion above of adding another attribute to partition by or did you find another work around?

Hi @duncan Thank you for your reply! Your advice helps but I am having some errors if I add a partition field for the RunningSum function. The partition field needs to be included in the graph otherwise it will show an error.

I agree the tricky part is convincing the table to count one completion by year. I added a RANK function in the data source to only select the first record by name for each year and it seems to work.

1 Like