I have a dataset which contains a snapshot of data from an external system for each month. Every month we add a new snapshot into the dataset, and the rows have a {Snapshot} field indicating the month it relates to. The rows in each snapshot have an {ID} field.
I am trying to add a calculated field, which will tell me if an {ID} value in one month, was in the snapshot data from the previous month? If it was, then the field should return “Existing”, otherwise it returns “New”.
I suspect I need a lag function to do this, but I cannot get this working - either the field shows “Unavailable” in the dataset, or I cannot visualise the breakdown of New/Existing per snapshot month.
For the October snapshot there are 3 items (A, B and C). For the November snapshot there are 4 items - A, B and C are “Existing”, D is “New” since the ID didn’t exist in the October snapshot.
The idea is that I would have a table visual, grouped by Snapshot date, and under each date group I would have a count of the number of Existing/New items within each snapshot. November would show 3 existing and 1 new. October would show 3 existing (because there is no previous month of snapshot data, so we assume it was existing).
Hopefully this is something I can calculate at the dataset level, rather than an analysis-level calculation or having to setup the aggregation on the visual itself (as we will be using the same query across multiple analyses/dashboards).
Can you try to use the rank function to calculate a snapshot number based on the date in ascending order? If the snapshot number is 1, you know it’s a new item. Otherwise, it’s an existing item. You should be able to do it at the dataset level.