How backend works when joining datasets

On the dataset page, users can join multiple QuickSight datasets with inner/left/right/full joins.
What is happening when users join two different mode datasets, SPICE and Direct Query datasets on the UI?
Does QuickSight run a SQL query for the joins? Does it run a direct query on the underlying database for the Direct Query dataset and joins the data with SPICE afterwards?

I would like to know what happens on the backend side for this join.

1 Like

Hi @hoyeon,

Yes, QuickSight effectively runs a SQL-like query in the background for Direct Query datasets. However, it doesn’t run a SQL query in the traditional sense for SPICE datasets since SPICE is an in-memory engine. The join happens after QuickSight retrieves the Direct Query data and combines it with SPICE-stored data in-memory.

Just a reference → Joining data - Amazon QuickSight

Let me know if that helps.

Thank you,
Shravya

2 Likes
  1. When users view the dashboard that is generated from the dataset which has joins with Direct query datasets and SPICE datasets in QuickSight, does it always run each Direct query dataset first and combine them in-memory?
  2. What is the timeout limit for joins?
  3. What is the language to query data from in-memory data if that’s not SQL? I am curious how it efficiently retrieves data for dashboarding in seconds.

Hello @hoyeon

Thank you for the follow up questions.

Timeout: QuickSight enforces a 2-minute timeout for visual generation. If the Direct Query exceeds this time, the visual may fail to load.

I would suggest you to have a look on these blogs, they are really very insightful and provides the required answers.

  1. Best practices for Amazon QuickSight SPICE and direct query mode
  2. SPICE and Direct Queries for Enhanced Data Analytics and Visualization
  3. Joining across data sources on Amazon QuickSight | AWS Big Data Blog

Let me know if these helps.

Thank you,
Shravya

2 Likes

In addition to @shravya , when you join a spice data set and with direct query data set, the result is always store in SPICE. Please check this as well.

Regards - Sanjeeb

2 Likes

@Sanjeeb2022
Q1: If joined output is always stored in SPICE, when does it refresh data for direct query dataset that used in joins?

Q2: If a dataset enabled RLS rules, and joins with other dataset, does the joined dataset also convey/apply the RLS rules for the derived output?
Does joining logic care for RLS rules for the base dataset?

Here are my findings.

  1. CLS enabled datasets cannot be used for join
  2. RLS enabled SPICE datasets are changed to Direct Query mode in joins. It fails when joining RLS enabled dataset with other datasets. (Error message: Can’t create join. Joining tables from different sources requires SPICE, which this dataset isn’t permitted to use. To continue, you’ll need to remove the join.)
1 Like