On the dataset page, users can join multiple QuickSight datasets with inner/left/right/full joins.
What is happening when users join two different mode datasets, SPICE and Direct Query datasets on the UI?
Does QuickSight run a SQL query for the joins? Does it run a direct query on the underlying database for the Direct Query dataset and joins the data with SPICE afterwards?
I would like to know what happens on the backend side for this join.
Yes, QuickSight effectively runs a SQL-like query in the background for Direct Query datasets. However, it doesn’t run a SQL query in the traditional sense for SPICE datasets since SPICE is an in-memory engine. The join happens after QuickSight retrieves the Direct Query data and combines it with SPICE-stored data in-memory.
When users view the dashboard that is generated from the dataset which has joins with Direct query datasets and SPICE datasets in QuickSight, does it always run each Direct query dataset first and combine them in-memory?
What is the timeout limit for joins?
What is the language to query data from in-memory data if that’s not SQL? I am curious how it efficiently retrieves data for dashboarding in seconds.
In addition to @shravya , when you join a spice data set and with direct query data set, the result is always store in SPICE. Please check this as well.
@Sanjeeb2022
Q1: If joined output is always stored in SPICE, when does it refresh data for direct query dataset that used in joins?
Q2: If a dataset enabled RLS rules, and joins with other dataset, does the joined dataset also convey/apply the RLS rules for the derived output?
Does joining logic care for RLS rules for the base dataset?
RLS enabled SPICE datasets are changed to Direct Query mode in joins. It fails when joining RLS enabled dataset with other datasets. (Error message: Can’t create join. Joining tables from different sources requires SPICE, which this dataset isn’t permitted to use. To continue, you’ll need to remove the join.)