2 Fact tables in one dataset

miap · August 30, 2023, 8:03pm

Hello,
I’m struggling to understand how to combine in one data set 2 fact tables which have different granularity.
In my experience, best practice is not to connect between facts, only with dims tables.
How can I build a proper dataset when all connections are by using joins, which means the result is a one big table? How can I implement a star schema approach?

Thanks in advance,
Mia

Sanjeeb2022 · August 31, 2023, 12:36pm

Hi @miap - Thanks for the question. This is a common problem in data warehouse. To do an efficient data modelling, we need to understand the granularity and other details. In QS, you can always do the join and built a star model. However before doing it, please analyze the data and possibly you need to have a model like facts and dimensions and bring them to QS and do the join on the right key so that you can develop the report.

Regards - Sanjeeb

miap · August 31, 2023, 1:29pm

Sorry, I don’t understand.
Traditional BI is the manipulation of data and the creation of a data model with relationships between facts and dimension tables.
The QS editor only allows joins between tables, the same as executing a sql query, which means that if I have more than one fact table with different levels of granularity, some of my values will be duplicate for sure, assuming I want to maintain the same level of granularity for the reports.
What am I missing?

Sanjeeb2022 · August 31, 2023, 1:35pm

Hi @miap - Yes you are correct, QS will generate the sql as per the join and you will observe the duplicates. you have to manage this out of QS.

Tagging @David_Wong for his advise as well.

Regards - Sanjeeb

David_Wong · August 31, 2023, 3:49pm

Hi @miap,

Your understanding is correct. Relationships are not supported in QuickSight. You have to join the tables. If the joins create duplicates in your dataset, you’ll have to use level-aware calculations to handle them.

Take a look at use case #3 in this article:

miap · September 3, 2023, 2:21pm

Thank you very much for the article.
If I understand, to solve the duplicate values due to bad connection (since there is no better option), the solution is to use an aggregate function of min/max on the duplicate fields?

sagmukhe · September 3, 2023, 4:32pm

@miap - You are right. As @David_Wong alluded, you need to handle this at the calculation level using level aware calculations. Sharing a similar post thread for your reference. Hope this helps!

miap · September 3, 2023, 6:15pm

@sagmukhe thanks.
In my opinion, It’s not really a sustainable solution… if the dataset will grow and contain multiple fact tables.

miap · September 5, 2023, 9:46am

Here i have an example that the min/ max solution won’t help:

I have ‘Agricultural task’ table with different types of agricultural tasks and i have ‘Waste’ table.
The relationships in one-to-many, meaning that one growing ‘session’ can have many ‘waste’ according to the growing ‘stage’.
I can’t use min/ max function to solve the duplicate values because i will loos the waste data.
What can i do in this case?

Thanks.

sagmukhe · September 5, 2023, 10:07am

@miap - Sorry, it was a bit difficult for me to understand your use case. If you can provide the following details then I can try to replicate it at my end

Agricultural Task Table example data
Waste Table example data (related with your Agricultural Task Table data)
Joining condition between the above 2 tables
Expected Table Visual Output
Current Challenge that you are facing

Please let me know in case of any queries. Thank you!

miap · September 5, 2023, 11:27am

Agricultural Task Table:

image848×80 7.75 KB
Waste Table:
Left Join on ‘SesseionID’ field.
All kind of visualizes.
Current Challenge: How to avoid duplication of ‘sowing_amount’ values & ‘planting_amount’ values.

Thanks.

sagmukhe · September 5, 2023, 11:33am

@miap - Thanks for the response. The Sowing amount and planting amount are information present at the session level granularity and there’s no information on how it would need to break at the waste level since that information is unavailable in your tables / data model. Hence, I don’t see any issues here.

Please provide a snapshot of your expected visual output with the “expected/correct” values for these 2 measures i.e. sowing amount and planting amount that you would like to see based on the information available in these 2 table data snapshots that you have shared. That would help to understand the problem that you are currently having. Thank you!

Relisc · September 6, 2023, 2:58pm

We are looking forward to a native solution. Quicksight is really powerful but the lack of table relationships is making our reporting really difficult.

bennygene · September 7, 2023, 3:28am

Mia, I can see what you’re saying. In order for you be always display the correct sowing_amount, planting_amount together with data from Waste table you would need to do an LAC for both sowing_amount and planting_amount using avgOver e.g. avgOVer(sowing_amount,[SessionId],PRE_FILTER). This way the amount will not get double when reported with Waste table. Please let us know if this solves your concern. Thanks!

miap · September 10, 2023, 7:10am

Hey @bennygene,
Thanks for your solution, It’s suppose to be as a calculated field in the dataset?

bennygene · September 11, 2023, 3:11pm

miap. I tested it on the analyses level, normally you can do this at dataset (sematic layer) but if it’s an average calc it make more make sense at the reporting layer. You can also test if QS would allow you at the dataset. pls let me know. thanks!