Removing Duplicates after joining three data sources

I have 3 datasets that are joined: the first dataset which lists each product once and other product information, the second is left joined and contains additional data and lists each product multiple times, and the 3rd is sales data which also is left joined to the first dataset and lists each product multiple times. I want to sum sales by product but when I do this the sales is multiplied by the number of times the product is listed on the additional data. I tried calculated fields to get an accurate sales number but they didn’t work: I tried sum(sales)/ count(product) and I tried using sum over and max over preagg. How can I accurately remove duplicates?

Hi @Sophia_Koch,

Glad to see you back in the community! The issue here is a fan-out problem caused by how the datasets are joined. When you join Dataset 1 to both Dataset 2 and Dataset 3, and both of those datasets have multiple rows per product, the rows multiply against each other, inflating your sales totals.

You can try a calculated field to work around this, but it probably won’t give you a reliable result. When the duplication is coming from two sides of the join at the same time, calculated fields have a hard time undoing that multiplication cleanly.

I believe the real solution is most likely going to come from restructuring your datasets.

My initial thoughts on how to approach this:

  • Left join Dataset 1 to Dataset 3

  • Left join Dataset 2 to Dataset 3

This way your joins flow in one direction and you avoid the row explosion entirely. Give that structure a try and let me know what output you get.

Hope this helps.

Hi Cesar,

Thanks for the response. Is this only possible on the new data preparation experience? I am currently using legacy experience as one of the datasets was prepared using the legacy experience. I am only getting the option to join dataset 3 to dataset 1.

Hi @Sophia_Koch,

It is to my knowledge that you cannot join datasets if they are from different data prep experiences. I would recommend creating a brand-new dataset from scratch.

  • Go to Datasets and click New Dataset
  • Start with your sales data as the base table
  • Left join your product info table to it (one row per product, so no duplication)
  • Left join your additional data table separately if needed

Additionally, depending on your scenario you could select “Use custom SQL” during the dataset configuration screen. Write a standard query that establishes Dataset 3 as your primary FROM clause, then left join the other tables to it. I believe this community post might be helpful for this case: Can I join multiple tables in Quicksight?

Feel free to reply with any updates!

Hi @Sophia_Koch,

Just checking back in since this thread hasn’t received a response in a while. Was Cesar’s reply helpful to you and/or were you able to find a solution yourself in the meantime? Please help the community by marking this answer as “Solution” or following up in general within the next 3 business days!

Thanks!

Still working on this, will follow up. Thank you.