Hi, I have a dataset in Athena and I am trying to add a left join through Quicksight to a csv upload. The csv upload is only 3 columns, which are date, fascia and orders. The data in the upload is correct in Quicksight, I then left join on date and fascia which both appear in my main dataset in Athena, I have also ensured the format of the date and fascia are exactly the same in both sets of data. But as soon as I perform the join the number of orders returns in the millions rather than actual numbers that are in the thousands. I have checked for duplicates in both datasets and there aren’t any. Does any one have any suggestions on where I am going wrong?
Hello @tom.rogers, welcome to the QuickSight community!
If you are going to join 2 datasets together, you would really need to join them on some kind of ID field that matches between them. Joining with something like a date seems a bit risky and could definitely cause duplications. Something like a user ID, order ID, or another field that is unique is important. Otherwise, any row that has a matching date and fascia field could be duplicated. Let me know if this helps!
Hi,
I have tried to create a composite key that combines the date and fascia and have added it to both datasets. Now when I view the data I have uploaded via csv the data the numbers are correct, but as soon as I join to my Athena dataset via the composite key, the numbers are being massively inflated. I have checked for any duplicates and there aren’t any.
Regards
Tom
Hello @tom.rogers, are you aggregating any of the data in Athena before bringing it into QuickSight? Like running any sums or counts on any of the fields?
Also, how are you creating the join? If you use the Athena dataset as your base dataset, then left join the CSV on the composite key, maybe that would work.
Hi Dylan,
I am using the Athena dataset as the base, and then left join to the CSV. The composite key from the CSV then returns as null and the order numbers return as blanks. I tried an inner join and the composite key data from the csv returns correctly, but then the order numbers are then all out of sync (increase substantially). I do have a calculated field with the Athena dataset that organizes the applicable queues and tags into the fascia’s, Would I be right in thinking this may be easier if I add the order data to an S3 bucket instead of using the CSV or is the calculated field within my base dataset in Athena that is causing the issue?
Hello @tom.rogers, it is hard to know for sure what the issue is without directly seeing the datasets you are merging, but I am also not sure pulling in the data from the S3 bucket would make much of a difference. If you were able to bring the data into Athena and merge them in an Athena view then I think that would be the best way to ensure the join is working as expected.
As for a possible fix, elimintation the calculated fields on the dataset level for the Athena dataset and trying the merge afterwards may help resolve the error and would be worth testing. It may help me debug this issue if you are able to send an anonymized screenshot of the join in the QuickSight console as well. Thank you!
Hi @tom.rogers - Can you also check whether you have any NULL data in the joining key, looks like it is not doing left join rather than doing a cartesian join.
Regards - Sanjeeb
Hello @tom.rogers, do you need any further guidance to help resolve this issue you are facing in QuickSight? Please let us know if you were able to find a solution, or mark one of the above responses as the solution if it helped you fix this issue.
If we do not hear back from you in 3 days, I will archive this topic. Thank you!