Get count where number of occurrences = N

Say I have this dataset:

order_id,user_id,source
abc1, Dave, twitter
abc2, Dave, twitter
abc3, John, twitter
abc4, John, twitter
abc5, Sarah, facebook
abc6, Sarah, facebook
abc7, Kyle, tiktok
abc8, Seb, tiktok
abc9, Max, tiktok

I’d like to be able to get the number of users that have two orders and display that count against each source.

I’ve tried using countOver within an ifelse in a calculated field on the dataset:

ifelse(countOver({user_id}, [{user_id}], PRE_AGG) = 2, 1, 0)

I was hoping I could then display that in a table filtered on values that are 1:

Unfortunately the numbers aren’t what I expected, which would be this:

source	   usersWithTwoOrders
twitter	   2
facebook   1
tiktok	   0

I think I’m close, but I’m not sure what I’m doing wrong.

Hi, @jgmtsp. Welcome to the QuickSight Community, and thanks so much for posting your question! :slight_smile:

Due to the US holiday, our team will review this question on Monday. In the meantime, we encourage anyone form the Community to reply.

You are very close.

You need to count the users over the source.

ifelse(countOver({user_id}, [{source}], PRE_AGG) = 2, 1, 0)

Let me know if that new calculated field works

1 Like

Thanks for your response @Max. Unfortunately, that doesn’t seem to produce the correct results either:

I’m starting to wonder, big picture, if what I’m trying to do is even possible directly in Quicksight or if I need to do the work in a SQL query and pull that in as a dataset instead. Building on from my previous question, if I have this dataset (now with order_date and order_value columns):

order_date, order_id, order_value, user_id, source
2022-01-01, abc1, 9.99, Dave, twitter
2022-01-01, abc2, 9.99, Dave, twitter
2022-01-01, abc3, 9.99, John, twitter
2022-01-01, abc4, 19.99, John, twitter
2022-01-02, abc5, 19.99, Sarah, facebook
2022-01-02, abc6, 9.99, Sarah, facebook
2022-01-02, abc7, 9.99, Kyle, tiktok
2022-01-02, abc8, 29.99, Seb, tiktok
2022-01-02, abc9, 29.99, Max, tiktok

Is it possible in Quicksight to do the following calculation, over a given date period, and display it in a table against each source:

  1. Get average number of orders for users with at least 2 orders
  2. Get average value of those orders
  3. Multiply those two values together (and display against each source in a table)

ifelse(countOver({user_id}, [{source}], PRE_AGG) = 2, 1, 0)

I would sum this instead of count. Count will also include zeros.

image

Maybe you can try things like such.

  1. ifelse(countOver({session_id}, [{client_id[users]}], PRE_AGG) >= 2, countOver({order_id},[{source},{{user_id}],PRE_AGG), countOver(NULL,[{source},{{user_id}],PRE_AGG),PRE_AGG))

Filter out zeros ^ and average it.

  1. ifelse(countOver({session_id}, [{client_id[users]}], PRE_AGG) >= 2, avgOver({order_value},[{source},{{user_id}],PRE_AGG), avgOver(0,[{source},{{user_id}],PRE_AGG),PRE_AGG)))

Filter out zeros ^ and average it.

  1. Multiply the two fields.

Hi @Max, thanks again for your response. Unfortunately summing it just produces the same result:

ifelse(countOver({user_id}, [source], PRE_AGG) >= 2, 1, 0)

What are the session_id and {client_id[users]} parts in your solutions for steps 1 and 2? Are those typos?

Those are fields in my dataset that I am testing.

Could you try and sum on this calculated field?

ifelse(countOver({user_id}, [source,{user_id}], PRE_AGG) >= 2, 1/countOver({user_id}, [source,{user_id}], PRE_AGG), 0)

That worked! I see the 1 / ... division in the first branch of the ifelse did the trick, but how?

order_id,user_id,source, calculate_field
abc1, Dave, twitter, 1/2
abc2, Dave, twitter, 1/2
abc3, John, twitter, 1/2
abc4, John, twitter, 1/2
abc5, Sarah, facebook, 1/2
abc6, Sarah, facebook, 1/2
abc7, Kyle, tiktok, 0
abc8, Seb, tiktok, 0
abc9, Max, tiktok, 0

It will divide by the count so that when you sum it up it will be 1 for each user.

Ah yes, that makes sense, thank you.

Are you still able to help with those two other steps?

Did you try my solution? And if so what are the numbers that are off?

I couldn’t because of the session_id and {client_id[users]} values as they aren’t in my dataset. What should I replace those with?

Replace session_id with {user_id} and {client_id[users]} with {source}

I’m trying step one:

ifelse(countOver({user_id}, [{source}], PRE_AGG) >= 2, countOver({order_id},[{source},{user_id}],PRE_AGG), countOver(NULL,[{source},{user_id}],PRE_AGG),PRE_AGG)

But that’s giving me the following error:

The syntax of the calculated field expression is incorrect. Correct the syntax and choose Create again.

Ah, get rid of the pre_agg at the end

ifelse(countOver({user_id}, [{source}], PRE_AGG) >= 2, countOver({order_id},[{source},{user_id}],PRE_AGG), countOver(NULL,[{source},{user_id}],PRE_AGG))

Thanks, that fixed it. I’ve also updated the first countOver to take both source and user_id as partitions:

ifelse(countOver({user_id}, [source,{user_id}], PRE_AGG) >= 2, countOver({order_id},[source,{user_id}],PRE_AGG), countOver(NULL,[source,{user_id}],PRE_AGG))

And the results look correct! I’m going to continue playing about with this in the morning against my actual dataset and see if this does the trick. I really appreciate your help with this, thank you.

Hi @Max, I’ve been playing around with your solutions today and individually they work, however when I try to combine them into a single table I believe the usage of zeros are causing the average calculations to be inaccurate:

In order to include all of these in a single table, I can’t filter out zeros otherwise I’ll end up missing source columns for the multi or single order calculations. I tried replacing the use of zeros in the calculated fields with NULL so that they aren’t included in the calculations, however that causes some of the cells to be empty (see the “CLTV - ALL” table bottom left) and some of the calculations to be off.

Splitting them out into separate tables works fine because I can filter out the zeros (or NULLs), but when I then try to calculate the overall values in a separate table I run into the same issue with being unable filter.

Do you know what the best approach to solving this is?

Still stuck on this, so will focus in on the problem a bit in case @Max or anyone else is able to help.

Given the same dataset:

order_date,order_id,order_value,user_id,source
2022-01-01,abc1,9.99,Philip,twitter
2022-01-01,abc1,9.99,Dave,twitter
2022-01-01,abc2,9.99,Dave,twitter
2022-01-01,abc3,9.99,John,twitter
2022-01-01,abc4,19.99,John,twitter
2022-01-02,abc5,19.99,Sarah,facebook
2022-01-02,abc6,9.99,Sarah,facebook
2022-01-02,abc7,9.99,Kyle,tiktok
2022-01-02,abc8,29.99,Seb,tiktok
2022-01-02,abc9,29.99,Max,tiktok

I’m rendering this table:

Focusing just on the values pertaining to order volume, these are the calculations I’m doing:

numOfSingleOrders (Aggregate: Sum):

ifelse(countOver({user_id}, [source,{user_id}], PRE_AGG) = 1, 1, NULL)

numOfSingleOrderUsers (Aggregate: Sum):

ifelse(countOver({user_id}, [source,{user_id}], PRE_AGG) = 1, 1 / countOver({user_id}, [source,{user_id}], PRE_AGG), NULL)

avgNumOfSingleOrders (Aggregate: Average)::

ifelse(numOfSingleOrderUsers = 0, NULL, numOfSingleOrders / numOfSingleOrderUsers)

numOfMultiOrders (Aggregate: Sum):

ifelse(countOver({user_id}, [source,{user_id}], PRE_AGG) >= 2, 1, NULL)

numOfMultiOrderUsers (Aggregate: Sum):

ifelse(countOver({user_id}, [source,{user_id}], PRE_AGG) >= 2, 1 / countOver({user_id}, [source,{user_id}], PRE_AGG), NULL)

avgNumOfMultiOrders (Aggregate: Average):

ifelse(numOfMultiOrderUsers = 0, NULL, numOfMultiOrders / numOfMultiOrderUsers)

All of the above calculate correctly, but when I try to calculate the overall average number of orders (overallAvgNumOfOrders) using the following calculation I get no output:

(avgNumOfSingleOrders + avgNumOfMultiOrders) / 2

I’ve tried various different ways of calculating this value and aggregating it and I either get an incorrect result or no output (see screenshot). The value I’m expecting here is 1.5 ( (2 + 1) / 2 ).

What am I doing wrong?