Multiple Duplicate Rows getting created on join function

AjayV · October 2, 2024, 5:27pm

Kindly answer this, because on the answer we have to decide whether to move with building Dashbaod on Quicksight or use other BI tools.

When I create a join between the tables, the rows get Duplicate rows. Due to this, when I use the sum function on a field, the data in the visual is shown as wrong, It take sum of all duplicated rows.

For Eg I have “User_table” " which is basic data like ‘age’, ‘name’, and ‘email’, if there is another table like Food which shows everyday fat, calories, protein data of the user Sleep which shows everyday max, min, resting heart rate of the user, which are joined If there are N days data. When I join this table it creates N rows of data. Here No problem here, I was able to handle using the filter. [rows are joined on Userid and Date for intraday data]

Main problem If the User has multiple Health conditions, We created a join from the user to the health condition table, for each condition the rows get duplicated.

Due to this multiple rows sum does not work correctly, If I want to show the sum of protein a user consumed it Shows a Higher Multiple of total protein due to duplicated rows. In Power Bi due to relationship, this error does not occur.

So please clarify, can Quicksight handle complex data with multiple joins, how to get a correct sum of fields? I have added the data model in the below image.

Brett · October 3, 2024, 3:13pm

Hi @AjayV,
There are a few options we could explore here to alleviate this situation.

First off though, in regards to your health condition table, is this just a list of conditions for each client or does it also have unique data attached to each condition that needs to be accounted for (like protein, etc.).
If it’s more of just a list of conditions, instead of joining the health condition table to your others, why don’t you just import 2 separate datasets to your analysis?

Another way you can handle these duplicates would be when creating your sum calculations, you could nest a maxOver or minOver calculation within a sumOver to remove duplicate measures from the calculation.

AjayV · October 4, 2024, 4:39pm

I am new to Quicksight so I don’t know how it works, I have worked on Power BI earlier where I used to handle data based on relationships. So, my Analysis may be wrong, kindly guide me.

Lets take the Company table instead of condition. The Company table lists two columns Company Id and Company name for different companies like “company 1”, “company 2”, “company 3”, and “company 4” which is a static table and user_company has two columns “user-id” and “company id”. Now I will reduce these two tables to one with two columns “user-id” and “company name” Considering it has one dataset2.

Now I have the main dataset in which table User is left joined with (sleep, food, heart rate, activity) tables - which has patient intraday data (one row per day/per user) considering it has dataset1. Should I join them? or store each as a separate table and can I use a cross dataset filter?

if they are kept as separate datasets, the Main thing Is if I have to use Row-Level-Security on the User table based on the column “company name” . If I apply RLS on dataset 2 which has company column, how will it restrict showing users in dataset 1?

Brett · October 7, 2024, 8:56pm

Hello @AjayV, I think I understand a little more about the issue. If you are wanting to apply RLS based on something like company_id for your dashboard, then you will likely need to run the joins like you were previously or find an alternate way to join company_id on each dataset. That is the only way to guarantee users have appropriate permissions when viewing the dashboard.

In that case, we can definitely find a way to aggregate the data to manage duplicate rows. You are going to have to utilize LAC-W aggregations so you can retrieve distinct values, and sum the results within a group.

The best way to figure out a solution would be if you could create a version of your analysis, with anonymized data for anything that may be sensitive or company specific, in QuickSight Arena and share a link to it in your next response. That way I can test some alternative ways to handle the calculations you are looking for.

Brett · October 21, 2024, 3:31pm

Hi @AjayV,
It’s been awhile since we last heard from you; following up to see if you had any additional questions regarding your initial topic?

If we do not hear back within the next 3 business days, I’ll go ahead and close out this topic.

Thank you!

Brett · October 24, 2024, 5:10pm

Hi @AjayV,
Since we haven’t heard back, I’ll close out this topic. However, if you have any additional questions, feel free to create a new post in the community and link this discussion for relevant information if needed.

Thank you!