Complex dataset execution flow

How are the QS datasets executed? Is there an execution plan we can see? If there are multiple datasets and sources joined together in one dataset with different join types. In what direction or order is the data retrieved and then joined? Does QS know to use the join logic to filter the data? Or each dataset is retrieved and then joined together?

For example:
If one dataset returns just one row which will join only 10/10000000 rows on another dataset does QS know to take that into account in some way?

Hi @Rotem - Welcome to AWS Quick Sight community and thanks for posting the question. If all your data sets are from same source and you did a join in QS, you can always see the sql in the database server. Once you can find the sql, you can see the execution plan of the sql. When you are joining multiple data sources data, QS usually bring the data to SPICE and do the joining in memory.

Regards - Sanjeeb

Thank you for the fast reply! So it doesn’t matter in which direction you structure your joins and which dataset you choose as your “root” it will still run them all separately into memory and then join. Do you know of any documentation for dataset performance best practices from multiple sources? Or it is just “bring the least amount of data into memory”?

Hi @Rotem - Good question. To be honest I am not sure how Spice is performing in extracting the data from different sources and this is something internal to QS. I do not see any documentation on this and this is something we can always ask to QS team.

Hi @DylanM @duncan - Any advise on this?

Regards - Sanjeeb

1 Like

Hello @Rotem and @Sanjeeb2022 !

That’s a good question @Rotem , and I’m not sure if there is a specific answer. Here is the most specific documentation I could find on the joins in Quick Sight:

In my opinion, if you need to really control how the joins happen, I would recommend doing the SQL outside Quick Sight or use the Custom SQL option.

Another thing to add, from our troubleshooting, to get the best results you should only use the join function in the console to join datasets from different datasources. If the datasets are from the same datasource you should use the Custom SQL option. Also, if you use the joins make sure that you refresh all of the separate datasets prior to your joined dataset. If you don’t, it will reference the data that was pulled in the last refresh and won’t actually bring in new rows.

2 Likes