Data Prep join not matching on any records

I have a direct sql query for my primary data source and then I am attempting to join on a dataset which has already been imported into spice.
My primary data source is around 105.3 MB whereas my dataset that I am trying to join is quite large at 8.5GB.
I am not getting any matches on resulting dataset but the refresh doesn’t give any errors. I know there are matches since I have manually verified that there are an have been using another product for years without issue.
What am I doing wrong?

@bstrech : The large data set which is in SPICE, please see when it was refreshed. When you join the primary data set with SPICE, QuickSight will not hit the DB for large data set and try to check with memory. For quick verification, take some sample from large data set and check which is already in SPICE, see those are available in Primary data set.

Is it possible to refresh your large data set again and also check the joining condition and give some screenshot as well. If both fields are string, see whether you need to trim it or not as sometime space may be an issue as well.

Regards - San

@Sanjeeb2022 I actually tried that before writing in here.
I checked that my large data set was in spice and did some quick analysis on it. The join is a string so I checked that both fields were trimmed.
Then I tried the join, the preview didn’t show any matched values, but I assumed it was due to the small sample from both sides didn’t find a match, but after the refresh there was still no matched values.
I then tried a SQL join for this. I was able to run the query directly from the Azure Data Studio, but the data refresh in Quicksight failed.

1 Like

Thanks @bstrech for the details. Can you please provide the below details.

  1. When you performed the joining of large data set with small in Azure data studio and it gave the result, what is the execution time? If it is taking more than 30 Mins, QuickSight spice refresh is timing out ( See the blog - Spice query_timeout - #2 by Jesse).
  2. What is your underline data source, if it is relational database, can you extract the explain plan for the joining query. The explain plan will provide the details on the cost of the query.
  3. Is it possible to make the small table as SPICE and then join both tables ( just for POC) and see the result.

Most of the performance tuning we usually do at ETL or database layer or final data set stored in SPICE so that it will NOT hit the DB again. However your usecase is interesting one, and happy to help you as well.

In parallel, please submit a ticket to AWS for see the details as well.

Tagging @Max @Kristin @Tatyana_Yakushev @Bhasi_Mehta @David_Wong @Biswajit_1993 for the expert advise as well.

Regards - San