QS Experts, Why does the standard deviation function return zero here?

This table aims to calculate the standard deviation of the number of sold dishes on the same weekday and hour across a selected date range.

The filters applied to the table above are:

  • Branch
  • Weekday
  • order hour
  • order_date

I wrote two expressions to calculate the STDv, the one in Red is wrong and the one in Green is the correct value.

I created the table above to debug both expressions. The end goal is to have a table like this:

Because this table does not have the order_date dimension/group by in the well, the std expression is now all 0’s while the Std Deviation has values despite being the wrong one. This table is supposed to show the standard deviation of every hour of every weekday.

Here are the expressions:

Number of Dishes

sum({total_item_quantity})

Std Deviation

stdevpOver(sumOver({total_item_quantity},[ {dummy branch name},{order_date},{order_hour},{branch_id}],PRE_AGG),[{dummy branch name},{weekday_full},{order_hour}],PRE_AGG)

std

stdevpOver(sum({total_item_quantity}),[{dummy branch name},{weekday_full},{order_hour}])

It is worth mentioning that there are multiple Items sold in any hour (there are other item dimensions that are not part of this table). So, taking the sum(total_item_quantity) is a must to get the total number of items sold in any hour.

How can I get the correct std to show up in the second table?

Hello @Ali_B, my apologies for the delayed response. I don’t have data that is exactly the same as what you are working with, but I was able to put together something close. I had to switch this to month level and comparing weekdays rather than weekdays and comparing the hour.

Here is an arena dashboard I created for us to test from QS Experts, Why does the standard deviation function return zero here?

Maybe sure when you run tests on here, to publish your analysis as a dashboard and share the link with me. Once you publish the analysis, click this icon:
image

Select the switch next to the dashboard name you create and type confirm. Then you will be able to share the dashboard link with me.

I believe I have it returning values as expected. In the both tables I am seeing matching values for the Standard Deviation set for Friday in March. Check out the calculated fields I wrote and implemented and let me know if you are able to resolve the issue.

Thank you @DylanM ,

I created a copy of your dashboard and uploaded my data. [check it here please](Standard Deviation Calculation)

Please check the second tab

I’m unsure if I understood your std expression in the first tab, but it seems to return the wrong value. for example:

We have two Fridays in October for this customer, and we have 5 and 70. if you calculate the standard deviation for this set {5,70} it will give you 32.5 not 14.95.

but going back to my issue, the expression std2 in the second tab gives the correct value but it only works if order_date is added to the well. you can see the second table below it the std2 becomes 0

1 Like

Hello @Ali_B, I ran some more tests and I have good news and bad news. Good news is I was able to validate the correct standard deviation values when comparing a specific week day for a month across all years for that client. So like above, AnyCompany Stores Intl (with a fix to the table display to show proper quantity values), I am returning 17.68 for values as 5, 22, and 48.

The calculation seems to work as expected.

Now, the issue is that for some reason it required me switching to a POST_AGG_FILTER aggregation level (the calculated field is labeled with POST_AGG_FILTER in the name for your reference). This is a problem because now it requires all of the fields referenced in the calculated field to be utilized in the visual. That really expands the pivot table once the specific order date has been added.

For some reason I was able to retrieve the correct value when only comparing a specific week day across a single month in a year, but I am getting 14.5 instead of 17.68 when comparing across all Fridays for October in every year.

Here you can see my changes to the first sheet:
Standard Deviation Follow-up

Hello @Ali_B, did the updates I made to the dashboard I shared help resolve the issue? If so, please mark my response as the solution. Otherwise, let me know if you have any further questions. If we do not hear back from you in 3 days, I will close out this topic. Thank you!

Hi @DylanM

Unfortunately, this does not resolve my issue. In my case, I want the user to select any date range and see the sales and the standard deviation for every hour and weekday. The final table should have 24 rows (one for every hour) and 7 columns (one for every weekday).

I’m trying to understand whether I’m using the standard deviation function incorrectly or if there was a bug in it that has to be fixed by the QuickSight team.

1 Like

Hello @Ali_B, I went through and tried to do some more testing, but I am not entirely sure what is causing the deviation between the PRE_AGG and POST_AGG_FILTER versions of the calculation. I suppose this could be some type of bug with the aggregation.

It may be worth submitting a support ticket on this issue:
Here are the steps for opening a support ticket - Creating support cases and case management - AWS Support . If your company has someone who manages your AWS account, you might not have direct access to AWS Support and will need to raise an internal ticket to your IT team or whomever manages your AWS account. They should be able to open an AWS Support case on your behalf

Alternatively, it may be worth running this calculation in SQL when building out the dataset to work-around the limitation we are hitting when using calculated fields.

1 Like

Thank you @DylanM

I think I will open a support ticket with AWS.

Your suggestion to do this in the backend via SQL won’t work in this case, as I want to give the end user the ability to examine the standard deviation over any date period they select.

1 Like

Hello @Ali_B, right that makes sense. The standard deviation will depend on the number of Sunday and Hour 12 values are present to determine the exact value. That makes sense that moving to SQL would complicate that. You could create it as a smaller dataset that is specific to this functionality, and handle the date filtering in the dataset through Direct Query. That would allow the user to update the period they are viewing which could then be passed as parameters into the SQL query to manage the where clause.

That would be the only other option I can think of to handle this aggregation if the calculated fields are not working as expected.

I have already opened a tickets with AWS. Let’s wait and see what happens.

1 Like

Hello @Ali_B, sounds great! I will tag this topic appropriately as well to provide visibility to our support team. I will archive this question.

If you want to follow-up on this after hearing back from the support team, please post a new question in the community and include a link to this topic to provide relevant information. Thank you!