Calculate Conditional Timespan Between Two Timestamps Within Group

Hi,

I’d like to calculate some kind of processing time for each item. The process has some status which indicate that the item is “in processing” like status 1 or 2. All other status are end status. Given the case that the item is in processing, the processing time equals the interval between start date of the process and “now”. In contrast, if the item reaches an end status the processing time equals the interval between the start date and the end date of the process. I created a calculated field:

ifelse(
in({current_status_id}, [1, 2]), dateDiff(min(Date,[{item_id}]), now()),
dateDiff(min(Date,[{item_id}]), max(Date,[{item_id}]))
)

Unfortunately, the formula returns an error

Mismatched aggregation. Custom aggregations can’t contain both aggregate “NESTED_MIN” and non-aggregated fields “NESTED_MIN(“Date”, “item_id”)”, in any combination.

Any idea how to solve this issue?

Regards,
David

Hello @dbb1 - I believe you can calculate it simply for each row of your dataset based on the current_status_id and you can use that derived calculated field into another calculation where you can use min and max based on your need. Hope this helps!

It will be really helpful if you can share a sample of your dataset and expected value so that I can try to replicate it at my end.

1 Like

Hi @sagmukhe,

the dataset has the format of an event log. Each activity with an item (process step) gets logged as a single record with a timestamp and further information. Please have a look at the dummy data which have a similar format to my original data.

case_id;activity;timestamp;costs;resource
3;register request;2010-12-30 14:32:00+01:00;50;Pete
3;examine casually;2010-12-30 15:06:00+01:00;400;Mike
3;check ticket;2010-12-30 16:34:00+01:00;100;Ellen
3;decide;2011-01-06 09:18:00+01:00;200;Sara
3;reinitiate request;2011-01-06 12:18:00+01:00;200;Sara
3;examine thoroughly;2011-01-06 13:06:00+01:00;400;Sean
3;check ticket;2011-01-08 11:43:00+01:00;100;Pete
3;decide;2011-01-09 09:55:00+01:00;200;Sara
3;pay compensation;2011-01-15 10:45:00+01:00;200;Ellen
2;register request;2010-12-30 11:32:00+01:00;50;Mike
2;check ticket;2010-12-30 12:12:00+01:00;100;Mike
2;examine casually;2010-12-30 14:16:00+01:00;400;Sean
2;decide;2011-01-05 11:22:00+01:00;200;Sara
2;pay compensation;2011-01-08 12:05:00+01:00;200;Ellen
1;register request;2010-12-30 11:02:00+01:00;50;Pete
1;examine thoroughly;2010-12-31 10:06:00+01:00;400;Sue
1;check ticket;2011-01-05 15:12:00+01:00;100;Mike
1;decide;2011-01-06 11:18:00+01:00;200;Sara
1;reject request;2011-01-07 14:24:00+01:00;200;Pete
6;register request;2011-01-06 15:02:00+01:00;50;Mike
6;examine casually;2011-01-06 16:06:00+01:00;400;Ellen
6;check ticket;2011-01-07 16:22:00+01:00;100;Mike
6;decide;2011-01-07 16:52:00+01:00;200;Sara
6;pay compensation;2011-01-16 11:47:00+01:00;200;Mike
5;register request;2011-01-06 09:02:00+01:00;50;Ellen
5;examine casually;2011-01-07 10:16:00+01:00;400;Mike
5;check ticket;2011-01-08 11:22:00+01:00;100;Pete
5;decide;2011-01-10 13:28:00+01:00;200;Sara
5;reinitiate request;2011-01-11 16:18:00+01:00;200;Sara
5;check ticket;2011-01-14 14:33:00+01:00;100;Ellen
5;examine casually;2011-01-16 15:50:00+01:00;400;Mike
5;decide;2011-01-19 11:18:00+01:00;200;Sara
5;reinitiate request;2011-01-20 12:48:00+01:00;200;Sara
5;examine casually;2011-01-21 09:06:00+01:00;400;Sue
5;check ticket;2011-01-21 11:34:00+01:00;100;Pete
5;decide;2011-01-23 13:12:00+01:00;200;Sara
5;reject request;2011-01-24 14:56:00+01:00;200;Mike
4;register request;2011-01-06 15:02:00+01:00;50;Pete
4;check ticket;2011-01-07 12:06:00+01:00;100;Mike
4;examine thoroughly;2011-01-08 14:43:00+01:00;400;Sean
4;decide;2011-01-09 12:02:00+01:00;200;Sara
4;reject request;2011-01-12 15:44:00+01:00;200;Ellen

Based on the data above my formula would be like:

ifelse(
in(activity, ["pay compensation", "reject request" ]), dateDiff(min(timestamp,[{case_id}]), now()),
dateDiff(min(timestamp,[case_id]), max(timestamp,[case_id]))
)

Thank you @dbb1 for your response. Based on your explanation, I believe you are looking for an output which is at a case_id level. So for your given example, we will end up with 6 records having case_id and timespan. Secondly based on the sample that you provided, all of the cases will fall under the first category i.e. where we need to measure between the minimum timestamp of a given ID and current timestamp as all the cases contain either of the 2 activities that you mentioned i.e. “pay compensation”, “reject request”. Please confirm if this is the right understanding. This will help me to replicate it at my end. Thank you!

Yes, you are right. The processing time is a case level calculation and we will have 6 processing times, one for each case.
For example let’s look at case_id 3. The process is finished, so the processing time would be the timespan between first timestamp 2010-12-30 14:32:00+01:00 and the last 2011-01-15 10:45:00+01:00, which would be roughly 16 days.
Let’s consider case_id 2 is not finished (ignore the last “pay compensation” record of case 2). In this situation the processing time would be the timespan between 2010-12-30 11:32:00+01, the first timestamp of case 2, and today, since we never finished the process…

hope that helps!

1 Like

Thank you @dbb1 for the details. I think this can be done by creating 2 calculated fields. First Create a Activity Indicator Field to denote the value 0 for rows having activities “pay compensation”, “reject request” and remaining activities as 1.

Once that is done, please leverage that field to conditionally calculate the timespan. I have tried a smaller simpler dataset and was able to achieve it. Please find the snapshots below. Hope this helps!

Calculated Field Details:

// Instead of the below, you can use yours like this : if(in(activity, [“pay compensation”, “reject request” ]), 0, 1)
ifelse(activity = ‘X’, 0, 1)

ifelse(minOver({X Activity Indicator}, [{case_id}], PRE_AGG) = 0,
datediff(minOver({Activity_Date}, [{case_id}], PRE_AGG), now(), ‘DD’),
datediff(minOver({Activity_Date}, [{case_id}], PRE_AGG), maxOver({Activity_Date}, [{case_id}], PRE_AGG), ‘DD’))

Did my suggestion help you in resolving your query? If yes, would request you to mark the post as “Solution”. This will help the community to find guidance and answers to similar question. Thank you!

1 Like

thanks for the swift reply! I’ll test your suggestion from my end and come back to you with the results :slight_smile:

@sagmukhe, I tested your approach based on this formula to omit the additional calculated field and it seems to work pretty well!

ifelse(
in({activity_id}, [1, 2]),
datediff(minOver(Date, [{case_id}], PRE_AGG), now()),
datediff(minOver(Date, [{case_id}], PRE_AGG), maxOver(Datum, [{case_id}], PRE_AGG))
)

One question that came to my mind refers to the PRE_AGG parameter. I tried to see how the calculation changes when I remove PRE_AGG, so it defaults to POST_AGG_FILTER but in this case the formula returns an error:

Mismatched aggregation. Custom aggregations can't contain both aggregated and nonaggregated fields, in any combination.

Could you shed some light on why this is the case?

PRE_AGG and POST_AGG filters work differently. As the name suggests, Quicksight will allow you to use un-aggregated measure or dimension. However, for POST_AGG it is mandatory to provide some aggregation function as the essence of the calculation is that everything will happen after the aggregation. Hence the syntax will become invalid as soon as you change the parameter to POST_AGG_FILTER, as we are not using any aggregation function over the “Date” column. Please see the below reference to go through the details of Level Aware Calculations. Hope this helps!

In case my suggestion helped you to resolve your query, would request you to mark the post as “Solution”. This will help the community to find guidance and answers to similar question. Thank you!

1 Like