Ifelse + LAC-A/LAC-W

Hi all,
I’m currently struggling with some blockers for ifelse function. I’m basically trying to utilize {outlier} parameter for determining if a specific item partitioned by project/subject/type is an outlier. I’ve been testing the same solution by the mean but now trying to replace it with median and QS isn’t allowing me to do so.

Here’s the working formula with the mean approach:


ifelse(${outlier} = 1, avgOver(Minutes,[project,subject,type],PRE_AGG) + stdevpOver(Minutes,[project,subject,type],PRE_AGG),
ifelse(${outlier} = 2, avgOver(Minutes,[project,subject,type],PRE_AGG) + (stdevpOver(Minutes,[project,subject,type],PRE_AGG)*2),
ifelse(${outlier} = 3, avgOver(Minutes,[project,subject,type],PRE_AGG) + (stdevpOver(Minutes,[project,subject,type],PRE_AGG)*3),Minutes)))

Here’s the formula which i’m struggling with. Could it be linked to LAC-A/LAC-W QS limitation?

ifelse(${outlier} = 1,median(Minutes,[project,subject,type]) + stdevpOver(Minutes,[project,subject,type],PRE_AGG),
ifelse(${outlier} = 2,median(Minutes,[project,subject,type]) + (stdevpOver(Minutes,[project,subject,type],PRE_AGG)*2),
ifelse(${outlier} = 3,median(Minutes,[project,subject,type]) + (stdevpOver(Minutes,[project,subject,type],PRE_AGG)*3),Minutes)))

I’m getting this error when saving 2nd formula:

image

Is there any workaround for this? Thanks in advance for any feedback!

The issue is with the median. When you use ifelse, the condition is evaluated in every row but median doesn’t give you a value from a single row - it’s an aggregated value obtained by grouping multiple rows.

Since the condition doesn’t apply to the median part of the calculation, can you try to move the median outside?

stpdevOver will also give you a single value and when you add it to the median, you’ll get another error. To fix that, wrap it in a min. Let me know if this works.

median(Minutes,[project,subject,type]) + 
min(
	ifelse(${outlier} = 1, stdevpOver(Minutes,[project,subject,type],PRE_AGG),
	${outlier} = 2, stdevpOver(Minutes,[project,subject,type],PRE_AGG)*2,
	${outlier} = 3, stdevpOver(Minutes,[project,subject,type],PRE_AGG)*3,
	Minutes)
)
1 Like

@David_Wong thanks for the reply and let me to clarify more on this. The min() function for stdevpOver cannot be applied, since i need an aggregated values. I need an aggregated median + aggregated standard deviation to assess if specific item with`project,subject,type’ values assigned is an outlier or not. I want to compare item’s minutes vs median + standard deviation.

As I mentioned - the same issues was solved with the mean by avgOver function.

What’s more - the median function as a standalone one works as expected, returning proper values:

median(Minutes,[project,subject,type])

The tricky part is why addition is not possible for the median since it’s possible for the mean?

Thanks in advance.

1 Like

Edit:

I solved it with LAC-A approach for stdevp instead of stdevpOver.

It’s working fine when have 2 nested ifelse:

Field name: OutlierFilter

ifelse(${outlier} = 1, median(Minutes,[project,subject,type]) + stdevp(Minutes,[project,subject,type),
ifelse(${outlier} = 2, median(Minutes,[project,subject,type) + stdevp(Minutes,[project,subject,type])*2,Minutes))

But somehow cannot add 3rd line - QS says i’m having syntax issue but not sure what’s missing there?


ifelse(${outlier} = 1, median(Minutes,[project,subject,type]) + stdevp(Minutes,[project,subject,type]),
ifelse(${outlier} = 2, median(Minutes,[project,subject,type]) + stdevp(Minutes,[project,subject,type])*2),
ifelse(${outlier} = 2, median(Minutes,[project,subject,type]) + stdevp(Minutes,[project,subject,type])*3,Minutes)))

Even when i utilize the 1st formula for my exercise and trying to create such a filter

ifelse({Minutes}>{OutlierFilter},"Yes","No")

Then i’m getting this error:

image

A mismatched aggregation error happens when you try to combine unaggregated values and aggregated values in a calculated field.

In the table below if you try to do something like Quantity * Sum(Price), it will give you a mismatched aggregation error. Sum(Price) is a single value obtained by grouping multiple rows. Quantity refers to a value in each row. When you do Quantity * Sum(Price), QuickSight doesn’t know which value of Quantity to multiply.

Quantity Price
1 10
2 10
2 15
5 20
2 22
3 5

To fix the error, you have to change everything to be aggregated or unaggregated.

  • sum(Quantity) * sum(Price) is valid.
  • Quantity * Price is valid.
  • Quantity * sum(Price) isn’t valid.

Now if we go back to the 3rd line in your calculated field, the reason why it’s giving you the mismatched aggregation error is because median and stdevp both give you an aggregated value but Minutes is unaggregated.

ifelse(${outlier} = 2, median(Minutes,[project,subject,type]) + stdevp(Minutes,[project,subject,type])*3,Minutes)))

To fix it, change Minutes to an aggregated value by putting it in a min, max or avg.

In your original calculated field there is no mismatched aggregation because both avgOver and stdevpOver are unaggregated.

ifelse(${outlier} = 1, avgOver(Minutes,[project,subject,type],PRE_AGG) + stdevpOver(Minutes,[project,subject,type],PRE_AGG),
ifelse(${outlier} = 2, avgOver(Minutes,[project,subject,type],PRE_AGG) + (stdevpOver(Minutes,[project,subject,type],PRE_AGG)*2),
ifelse(${outlier} = 3, avgOver(Minutes,[project,subject,type],PRE_AGG) + (stdevpOver(Minutes,[project,subject,type],PRE_AGG)*3),Minutes)))

When you change the avgOver to median, there’s a mismatched aggregation because median is aggregated and stdevOver is unaggregated.

ifelse(${outlier} = 1,median(Minutes,[project,subject,type]) + stdevpOver(Minutes,[project,subject,type],PRE_AGG),
ifelse(${outlier} = 2,median(Minutes,[project,subject,type]) + (stdevpOver(Minutes,[project,subject,type],PRE_AGG)*2),
ifelse(${outlier} = 3,median(Minutes,[project,subject,type]) + (stdevpOver(Minutes,[project,subject,type],PRE_AGG)*3),Minutes)))

@David_Wong thank you for a great explanation! Let me give a try on this and will get back if I’m still stuck with this task.

@David_Wong sorry for getting back on this pretty late. I’ve been trying to adjust the syntax as per hints and i’m affraid this isn’t working. Same goes for max/avg:

ifelse(${outlierMS} = 1, median(min(Minutes),[project,subject,type]) + stdevp(min(Minutes),[project,subject,type]),
ifelse(${outlierMS} = 2, median(min(Minutes),[project,subject,type]) + stdevp(min(Minutes),[project,subject,type])*2,Minutes))

I’m getting this error when trying to save the formula:

I’ve been trying to do computation with median/stddev field as a standalone calculated fields and was able to save the formula but QS doesn’t read it properly:

Do you think it’s even doable? I think this might be some QS limitation, as per QS guidance:

Thanks for any inputs.

There’s a syntax error in your calculated field.

This is incorrect:

median(min(Minutes),[project,subject,type])

It should be changed to this:

median(min(Minutes,[project,subject,type]))

Same for stdevp.

Even if you fix the syntax, I think you’ll still get another error because of the ifelse. The condition in your ifelse function is evaluated in each row of the dataset but median and stdevp are aggregated.

I posted an article about LAC today. I have a section that covers LAC with ifelse and mismatched aggregation errors. Take a look and let me know if it’s helpful.

Thanks @David_Wong the syntax works now but having same issue as before:

image

As for your article it’s very helpful and i’ve tried with the ifelse/LAC-A workaround but it’s not working for me (maybe the syntax is not right):

min
(
ifelse(
${outlierMS} = 1, 
median(Minutes,[project,subject,type]) + stdevp(Minutes,[project,subject,type]),
Minutes
      )
)

I believe if median function would be available as LAC-W function, this could work as expected. Any plans for having this implemented in the future?

Yes, I think a medianOver function could make this calculation easier. I’m not on the QuickSight product team, so I don’t know if it’s on their roadmap. I would suggest creating a separate topic for it and tagging it as a feature request.

Since we have no choice but to use LAC-A, let’s try this:

median(
	min(
		ifelse(${outlier} = 1 OR ${outlier} = 2, Minutes, null)
		[project,subject,type]
	)
) +
stdevp(
	min(
		ifelse(${outlier} = 1, Minutes, null),
		[project,subject,type]
	)
) +
stdevp(
	min(
		ifelse(${outlier} = 2, Minutes, null),
		[project,subject,type]
	)
) * 2 +
min(ifelse(${outlier} = 1 OR ${outlier} = 2, null, Minutes))

We have to break the calculation into 4 parts and move the ifelse inside each LAC-A function.

@David_Wong thanks for sharing, will give LAC-A a try and get back if still stuck with this.