Multiple Duplicate Rows getting created on join function

Kindly answer this, because on the answer we have to decide whether to move with building Dashbaod on Quicksight or use other BI tools.

When I create a join between the tables, the rows get Duplicate rows. Due to this, when I use the sum function on a field, the data in the visual is shown as wrong, It take sum of all duplicated rows.

For Eg I have “User_table” " which is basic data like ‘age’, ‘name’, and ‘email’, if there is another table like Food which shows everyday fat, calories, protein data of the user Sleep which shows everyday max, min, resting heart rate of the user, which are joined If there are N days data. When I join this table it creates N rows of data. Here No problem here, I was able to handle using the filter. [rows are joined on Userid and Date for intraday data]

Main problem If the User has multiple Health conditions, We created a join from the user to the health condition table, for each condition the rows get duplicated.

Due to this multiple rows sum does not work correctly, If I want to show the sum of protein a user consumed it Shows a Higher Multiple of total protein due to duplicated rows. In Power Bi due to relationship, this error does not occur.

So please clarify, can Quicksight handle complex data with multiple joins, how to get a correct sum of fields? I have added the data model in the below image.

Hi @AjayV,
There are a few options we could explore here to alleviate this situation.

First off though, in regards to your health condition table, is this just a list of conditions for each client or does it also have unique data attached to each condition that needs to be accounted for (like protein, etc.).
If it’s more of just a list of conditions, instead of joining the health condition table to your others, why don’t you just import 2 separate datasets to your analysis?

Another way you can handle these duplicates would be when creating your sum calculations, you could nest a maxOver or minOver calculation within a sumOver to remove duplicate measures from the calculation.

1 Like

I am new to Quicksight so I don’t know how it works, I have worked on Power BI earlier where I used to handle data based on relationships. So, my Analysis may be wrong, kindly guide me.

Lets take the Company table instead of condition. The Company table lists two columns Company Id and Company name for different companies like “company 1”, “company 2”, “company 3”, and “company 4” which is a static table and user_company has two columns “user-id” and “company id”. Now I will reduce these two tables to one with two columns “user-id” and “company name” Considering it has one dataset2.

Now I have the main dataset in which table User is left joined with (sleep, food, heart rate, activity) tables - which has patient intraday data (one row per day/per user) considering it has dataset1. Should I join them? or store each as a separate table and can I use a cross dataset filter?

if they are kept as separate datasets, the Main thing Is if I have to use Row-Level-Security on the User table based on the column “company name” . If I apply RLS on dataset 2 which has company column, how will it restrict showing users in dataset 1?