Joining 2 SPICE datasets to create a new one takes too much time

ashu.ap · April 28, 2023, 10:03am

Hi There,
I am trying to join two datasets out of which the left one is a huge fact table with almost half a billion records and the right one is just a lookup of values with a handful of records. Joining these 2 in SPICE, a refresh is taking around 3 hours.
Reason I can’t do it at the source is that my lookup is a SCD2 but I need to pick the latest value from it for all the records on fact and doing it at the source(GLUE DB) would mean reloading the full table each time.
Thanks

Sanjeeb2022 · April 28, 2023, 10:25am

Hi @ashu.ap - Welcome to AWS Quick Sight community and thanks for posting the question. are these tables are physicalize in any relational database? If yes, can you see the explain plan of the joining query. Possibly you may need to create an index for faster query performance.

Please give more details on the source of the data, and joining conditions so that it will help in guiding the solution.

Regards - Sanjeeb

ashu.ap · April 28, 2023, 10:29am

I have created 2 datasets in SPICE first from these tables and then joining these datasets into another one in SPICE using a left join. The underlying tables for these datasets are in a glue DB but it does not matter as we have already loaded the data into SPICE

Sanjeeb2022 · April 28, 2023, 10:35am

Thanks @ashu.ap - Thanks for the details. if both the data sets are in SPICE, then joining should be done faster. Can you check joining conditions whether the data type of join fields are same or not.

Also the SPICE refresh which is taking time, collect the ingestion id and raise a ticket to AWS Customer support team so that they can view the logs backend and give details why it is taking more time. To raise a ticket to AWS, please follow the link - Creating support cases and case management - AWS Support. If you do not have access to create ticket, request your admin to help you on that.

Regards - Sanjeeb

ashu.ap · April 28, 2023, 11:12am

Thanks for the prompt reply

Koushik_Muthanna · April 28, 2023, 1:22pm

Hi @ashu.ap ,

Can you
1/share a screenshot with an example of the huge fact table and the lookup table join ?
2/If the lookup table is supposed to pick the latest value , Is this like max date for records ?
3/screenshot of an example record and what you expect to see after the join happens ?

A few other questions
4/What is the total number of records after join ?
5/ You are glue as a metadata catalog , are you loading this data using the following method S3 > Athena > Quick Sight or is it S3 > Quick Sight (using manifest file ) ?
6/What is the frequency of reloading the huge fact table into SPICE if there are updates happening ? ( Is it a full refresh everytime ? )

Kind regards,
Koushik

ashu.ap · May 1, 2023, 4:17am

Hi @Koushik_Muthanna information attached below:

Can’t share exact details but it is like any other fact, some FK columns followed by some measures. It has around 40 columns and half a billion rows.
Query for lookup dataset in SPICE already gets the latest record and there would only be one matching record per row of fact.
Join fact and lookup on lookup_key and pick the lookup_value and add it to columns in fact.
Same as there were in the fact, ~half a billion.
We are materializing a table in a GLUE DB and then reading data using Athena Query into the SPICE.
Right now the dataset is an on demand refresh but in future it would be a daily incremental load.

Thanks
Ashutosh

Max · May 11, 2023, 7:23pm

Hi @ashu.ap

You have a lot of data which means it will take sometime to load.

The only thing I can think of is trying to incrementally refreshing to limit the amount of additional data that needs to be imported.

Topic		Replies	Views
Data Prep join not matching on any records Q&A data-source , quick-sight	3	775	March 27, 2023
Does dataset join location affect SPICE capacity usage? Q&A quick-sight , qls-spice	2	490	April 21, 2023
Joining spice dataset and Query mode dataset strategy Q&A data-source , quick-sight , dataset , Business-Intelligence-Engineer	7	115	March 24, 2025
Loading a dataset into SPICE take a very long time Q&A developer , amazon-quick	3	135	December 15, 2025
Dataset used in another dataset - Keys not recognized Q&A spice , error , quick-sight , data-preparation	7	1387	October 24, 2022

Joining 2 SPICE datasets to create a new one takes too much time

Related topics