Distinct Count from a Dataset with RIGHT JOIN

Hello everyone, relatively new to Quick Sight and having issues obtaining counts from given data sources and datasets. I appreciate any feedback or relevant links for guidance on the following scenario:

There are 2 tables in question, one is a list “Eligible Users” (users) and the other is an “Activity Log” (activities). The table “users” includes an ID, Email, and some Division/Department data which is irrelevant to the topic. The table “activities” does NOT include a dedicated ID for each record - there are only columns for type, user email, and datetime stamp. The data source is based upon the “users” table, which has a RIGHT JOIN upon the “activities” table, using the “user_email” as a key. This intent is to allow reporting off the entire activity log table, and only enriching the activity records which were generated by “Eligible Users” instead of “All Users”.

For reference, there are 350 total distinct user emails in the activity log table, but only 125 are in the users table relevant to the reporting, and only about 50 of those Eligible Users appear on records from the activity table. Conceptually - there are 350 historical users, 125 were trained on the system, but only 50 have adopted the tool since the training. This is the goal of the dashboards, tracking adoption over time (and ignoring users who are not on the eligible list).

Whenever I add visuals to the data set, and attempt distinct_counts of the Eligible Users from the “users” table, it seems to only be in the context of the activities table. I can get to my counts of 50 or 350, but I cannot achieve the count of 125. I have attempted a variety of approaches through calculated fields, and I have limited familiarity with Parameters. If anyone could point me in correct direction, or provide an example - it would be greatly appreciated!

Hi @jvetere

Based on your description there is no basis in the data for the 125 eligible users right? From a data perspective you either get the 350 from the users table or the 50 from the activity table based on data availability.

How do you define the 125 eligible users based on the data in the two tables?

Regards,
Giri

Thank you for the response, however I believe there is a misunderstanding. Let me clarify some points on the data:

  1. User table has 125 distinct user records. This reflects a subset of the broader organization, but it only includes records for 125 target users out of >350 total actual human users.

  2. Activity table has >500k records, with a column containing the “user_email” which serves as a Foreign Key for the RIGHT JOIN. However, if this table were not joined to the Users table, a query to SELECT DISTINCT user_email from the Activity table would return 350 unique “user_email” values. There is nearly triple the count of distinct user_email values on the Activity table, versus what could be found on the User table.

Maybe the answer is to catch the distinct count of users in a parameter, if that can be used in visuals with different data sources. When I run a distinct count in a calculated field leveraging the user_id from the user table (not the user_email from activity table), I get a count of 50 users, which reflects the 50 users from the user table, who also appear among the 350 users on the activity table.

I hope this provides some clarity. Again, I appreciate the assistance.

Hi @jvetere

I re-read your original post where you said the following:
The data source is based upon the “users” table, which has a RIGHT JOIN upon the “activities” table, using the “user_email” as a key.

Since you are tracking the valid user activities shouldn’t this be a left join?

  1. That way you can get your 125 distinct users from the users table
  2. sum(ifelse( isNull(“user email” (from activities table)), 0, 1) ) will give you the 50 you are looking for (the 50?)

Regards,
Giri

While I agree this would achieve the goal, it seems like it creates a new problem. There are other visuals in this dashboard collection which rely upon the RIGHT JOIN. These other visuals report off the entire pool of Activity records.

Perhaps the only solution is a new dataset with the same tables using a LEFT JOIN, and including it in the same dashboard collection on another tab. For some reason, I was hoping / assuming the tool would allow me to look at the individual tables in the dataset, almost as if they were being sub-queried. I may just need to adjust my approach - thank you for your response.