Joining Data -- Resultant Dataset Issues

amc_5 · June 26, 2023, 10:50am

Hi,

I am trying to join two different .xlsx files over a common column that I’ve uploaded as datasets in AWS Quicksight. One file contains Cases(all unique and identifiable through alm_case_no col) and the second file contains people info(people associated with a case, for e.g. there could be multiple people on a single case so the alm_case_no column contains duplication). I am using the alm_case_no column in both the files to form the join.

However, after joining the files, the resulting dataset only returns rows from the people info file no matter what type of join I use.

Additionally, if I use 2 columns from the Cases file in a visual the resultant numbers are not valid and are way off from when I analyze them in excel or power bi. My assumption is that since the columns are from the same file, join should be irrelevant here.

One last thing: If I were to import both the files in power bi, I would form a relation between the files through alm_case_no column and then I can reference columns from both the files in whatever visual I create. The resultant numbers are also validated. I am assuming this is how the joins should work in AWS quicksight as well.

Any help on above problems would be appreciated. Thanks!

Sanjeeb2022 · June 26, 2023, 11:54am

Hi @amc_5 - Welcome to AWS QuickSight community and thanks for posting the question. Can you please share sample input records for both data sets and joining key. This will help in replicating the issue and provide the right solution.

Note - Please do not expose the PII elements.

Regards -Sanjeeb

amc_5 · June 26, 2023, 12:31pm

Hi @Sanjeeb2022,

Here’s some sample data in a single file.

If you say compare the values for:

Make a horizontal bar chart of Case_type vs Award to a pivot table in excel file, you’ll see that there’s a difference in aggregated values.
You could check with different joins as well. Issue remains same.

amc_5 · July 1, 2023, 4:44pm

@Sanjeeb2022 Were you able to replicate the issue?

Sanjeeb2022 · July 1, 2023, 6:13pm

hi @amc_5 - Not yet, I will spend sometime next week on this. Apologies for the same.
Hi @Biswajit_1993 - Can you also have a look on this problem statement.

Regards - Sanjeeb

amc_5 · July 2, 2023, 2:43pm

@Sanjeeb2022 sure. looking forward to it.

Hi @Biswajit_1993 Could you please have a look?

Biswajit_1993 · July 3, 2023, 11:00am

HI @amc_5 ,
Thanks for posting you query in community.
No issues I am going through it and any finding I will revert back.

Thanks & Regards
Biswajit Dash

Biswajit_1993 · July 4, 2023, 8:54am

Hi @amc_5 , I checked with creating the two datasets one is cases and another is people_info as per your shared data. Then take join between these two datasets but from my end I can see the exact 104 rows as per your people_info row count.

PFB the join datasets & record count screen shots.

Left Outer Join between cases with people_info

Result KPI Chart

Thanks & Regards
Biswajit Dash

amc_5 · July 4, 2023, 6:25pm

Thanks for your reply @Biswajit_1993

Actually I am used to working with Power BI and am not able to replicate the same feature here.

When I connect two tables in Power BI, it creates a relationship between them and then allows cross referencing but it also allows to use the individual tables in visuals as if no relationship exists between the tables(even when relationship is active).

What is happening here in AWS quicksight is that is creating a new (resultant) dataset based on the specified condition which enhances my cross referencing ability but at the same time altogether confining me to the resultant dataset instead of enabling me to use the individual datasets as well.

Yes the count is 104 but see the 2nd image. Case_type and Award columns are both present in the casesFF file but the resultant figures are not accurate. If I were to present it to someone, this wouldn’t make sense.

I hope you are getting my point!

amc_5 · July 4, 2023, 6:26pm

2nd image:

Ramon_Lopez · July 7, 2023, 10:03pm

Hi @amc_5

I believe this is naturally something the occurs when we do joins in our dataprep. How it can be mitigated - at the formula/calculated field level by creating Level Aware Calculations (LAC) and applying the appropriate group bys. More info about lac-A or lac-W in this blog post.

amc_5 · July 10, 2023, 11:45am

Hi @Ramon_Lopez,

The closest I’ve come to solving this issue(i.e. sum of award according to case_type of unique alm_case_no since there are duplicate alm_case_no in the data) is:

sumOver(max({AWARD}),[{CASE_TYPE}])

but this is only returning the sum of max values. Could you please extend your help here?

Thanks!

Ramon_Lopez · July 10, 2023, 7:30pm

hi @amc_5

I would think that something like this would do the trick.

max(award,[{Case_Type},{alm_case_no}])

Please try it out and let me know.

thanks!

amc_5 · July 13, 2023, 6:30am

It works. Thanks @Ramon_Lopez!

Topic		Replies	Views
How do I join 1 dataset to two different datasets Question & Answer how-to	3	1228	July 24, 2023
Joined tables does not show one table data Question & Answer data-source , how-to	5	952	July 26, 2023
Is safe joining between datasets on QuickSight? Question & Answer data-source , analysis , how-to	8	226	August 31, 2023
QuickSight Cross Dataset Joins Not working Question & Answer data-source , analysis	9	455	October 8, 2023
Can I join a dataset with a "joined data" dataset on QuickSight? Question & Answer data-source , dataset , Business-Intelligence-Engineer	9	259	December 21, 2023

Joining Data -- Resultant Dataset Issues

Related topics