2 Fact tables in one dataset

Hello,
I’m struggling to understand how to combine in one data set 2 fact tables which have different granularity.
In my experience, best practice is not to connect between facts, only with dims tables.
How can I build a proper dataset when all connections are by using joins, which means the result is a one big table? How can I implement a star schema approach?

Thanks in advance,
Mia

1 Like

Hi @miap - Thanks for the question. This is a common problem in data warehouse. To do an efficient data modelling, we need to understand the granularity and other details. In QS, you can always do the join and built a star model. However before doing it, please analyze the data and possibly you need to have a model like facts and dimensions and bring them to QS and do the join on the right key so that you can develop the report.

Regards - Sanjeeb

1 Like

Sorry, I don’t understand.
Traditional BI is the manipulation of data and the creation of a data model with relationships between facts and dimension tables.
The QS editor only allows joins between tables, the same as executing a sql query, which means that if I have more than one fact table with different levels of granularity, some of my values ​​will be duplicate for sure, assuming I want to maintain the same level of granularity for the reports.
What am I missing?

Hi @miap - Yes you are correct, QS will generate the sql as per the join and you will observe the duplicates. you have to manage this out of QS.

Tagging @David_Wong for his advise as well.

Regards - Sanjeeb

1 Like

Hi @miap,

Your understanding is correct. Relationships are not supported in QuickSight. You have to join the tables. If the joins create duplicates in your dataset, you’ll have to use level-aware calculations to handle them.

Take a look at use case #3 in this article:

3 Likes

Thank you very much for the article.
If I understand, to solve the duplicate values ​​due to bad connection (since there is no better option), the solution is to use an aggregate function of min/max on the duplicate fields?

@miap - You are right. As @David_Wong alluded, you need to handle this at the calculation level using level aware calculations. Sharing a similar post thread for your reference. Hope this helps!

1 Like

@sagmukhe thanks.
In my opinion, It’s not really a sustainable solution… if the dataset will grow and contain multiple fact tables.

1 Like

Here i have an example that the min/ max solution won’t help:

I have ‘Agricultural task’ table with different types of agricultural tasks and i have ‘Waste’ table.
The relationships in one-to-many, meaning that one growing ‘session’ can have many ‘waste’ according to the growing ‘stage’.
I can’t use min/ max function to solve the duplicate values because i will loos the waste data.
What can i do in this case?

Thanks.

@miap - Sorry, it was a bit difficult for me to understand your use case. If you can provide the following details then I can try to replicate it at my end

  1. Agricultural Task Table example data
  2. Waste Table example data (related with your Agricultural Task Table data)
  3. Joining condition between the above 2 tables
  4. Expected Table Visual Output
  5. Current Challenge that you are facing

Please let me know in case of any queries. Thank you!

1 Like
  1. Agricultural Task Table:
  2. Waste Table:
    image
  3. Left Join on ‘SesseionID’ field.
  4. All kind of visualizes.
  5. Current Challenge: How to avoid duplication of ‘sowing_amount’ values & ‘planting_amount’ values.

Thanks.

@miap - Thanks for the response. The Sowing amount and planting amount are information present at the session level granularity and there’s no information on how it would need to break at the waste level since that information is unavailable in your tables / data model. Hence, I don’t see any issues here.

Please provide a snapshot of your expected visual output with the “expected/correct” values for these 2 measures i.e. sowing amount and planting amount that you would like to see based on the information available in these 2 table data snapshots that you have shared. That would help to understand the problem that you are currently having. Thank you!

We are looking forward to a native solution. Quicksight is really powerful but the lack of table relationships is making our reporting really difficult.

Mia, I can see what you’re saying. In order for you be always display the correct sowing_amount, planting_amount together with data from Waste table you would need to do an LAC for both sowing_amount and planting_amount using avgOver e.g. avgOVer(sowing_amount,[SessionId],PRE_FILTER). This way the amount will not get double when reported with Waste table. Please let us know if this solves your concern. Thanks!

Hey @bennygene,
Thanks for your solution, It’s suppose to be as a calculated field in the dataset?

1 Like

miap. I tested it on the analyses level, normally you can do this at dataset (sematic layer) but if it’s an average calc it make more make sense at the reporting layer. You can also test if QS would allow you at the dataset. pls let me know. thanks!