Are Second and Third Level SPICE dataset materialized or they just have the metadata of the lowest level and queries them to get the data?

bennygene · May 13, 2024, 3:01pm

DylanM · May 14, 2024, 2:16pm

Hello @bennygene, can you explain a little more about the situation you are facing so I can help guide you further? Is this question in reference to a dataset that contains multiple joins in QuickSight to other datasets?

Depending on how you have set this up, the data should be materialized for each upon refresh, especially if you are running full refreshes in SPICE. You could double check this as well by having the datasets split up individually, running the refreshes, and comparing to the main table. You can even connect the individually ingested datasets.

Let me know if this helps!

bennygene · May 14, 2024, 7:10pm

@DylanM thank you for your reply I really appreciate it. Here’s more detail that pertains to my Q.
You have SPICE datasets DS1 and DS2 (level ones). You join and connect these two (2) datasets to create a dataset let’s say Lvl2. That’s going to be second level of dataset.
Lvl2 = DS1 join DS2

Similarly if you join that Lvl2 with another dataset say DS3 that will give you Lvl3.
Lvl3 = Lvl2 join DS3

Now, my Qs are:

Are Lvl2 and Lvl3 materialized SPICE dataset or just metadatas pointing to DS1,DS2,DS3 and Lvl2?
If these are materialized, why are we limited to 3 levels?

Undertanding what’s happening in the background will help us design our data model properly (efficiently). If everything is materialized then we’re just basically loading our entire data into memory and everytime we request for data or refresh our dashboards, QS will go through all these rows one by one to fetch our data even though we are only querying for Summary information.

As to the SPICE 3 level limit, I’m just curious why would there be a limit if everything is materialized, then the next level should always be the second.?.? (period, question mark, period as in Im not sure)

Thanks and I really appreciate it!

DylanM · May 16, 2024, 3:02pm

Hello @bennygene, okay I did some testing on this and I understand the issue. My first question is, are you pulling all of these datasets that you are joining from the same data source? If so, I would highly recommend switching the joins to custom SQL and running it as a single dataset.

Now, if they are different datasources, then it the problem is exactly what you were mentioning. On refresh, it will keep the stored data of the Lvl1 dataset, and query the Lvl2 and Lvl3 datasets and their linked datasources on refresh. The way to manage this would be to set your Lvl1 dataset to refresh first, and then stagger the refresh of the joined dataset to run after the initial dataset completes.

I understand that is not ideal, but one of those 2 options will be your path forward depending on the number of datasources you are utilizing.

DylanM · May 20, 2024, 3:24pm

Hello @bennygene, since we have not heard back from you with any remaining questions on this issue, I will mark my previous response as the solution. Please let me know if you still need assistance and I can guide you further. Thank you!