Joining across data sources with more than a primary key (<, >, <=, >=)

I did some searching on if anyone did have success for the use case I am asking about. Below are the two resources I looked into:

  1. Joining datasets based on ranges
  2. Cross source joins: https://www.youtube.com/watch?v=rNuq4hZHzEw

I am looking to join a athena table (consider it a left join) for referencing purpose, with a redshift table (created as a spice table).

  1. Is there way to join with more than just primary key? (using greater than, less than operators). Visually, I was not able to see in Quicksight.

  2. I tried custom SQL but I was only able to find athena tables to join with. Identifying the redshift dataset created as a spice table was not possible within the custom SQL.

  3. I am trying using the redshift source as a starting point and then do a right join on athena table as referenced in the above video. However, this is intensive because the redshift data source is ~10x larger in number of records.

Please share any workarounds or other ways to achieving what I want.

Hello @akshayk, there are a few things you will need to consider when utilizing this join functionality that is built into the QuickSight console.

First of all, you are on the right track. Since you are joining datasets from 2 different data sources, this will be the way to accomplish it. You cannot access 2 different data sources from a single custom SQL query.

As for using <. >, <=, or >= opperands on the join, that is not possible functionality. The join is only to determine the fields that are matching between the 2 datasets so the columns can be mapped appropriately. If you want to reduce the number of fields that you are importing from Redshift, you can build the base redshift dataset with a custom SQL query. There you can use a where clause to filter out certain rows in the dataset that you do not require in the joined dataset. Once you are returning the expected values, you can join the 2 datasets based on their matching key.

Let me know if you have any further questions. I hope this helps!

2 Likes

Hi @akshayk, did this solution work for you? I am marking this reply as, “Solution,” but let us know if this is not resolved. Thanks for posting your questions on the QuickSight Community!

Thanks for the response.
Actually, in my use case, the usage of <, >, <=, and >= was not just to filter out the data but to use it for closely matching the timestamps in different data sources.
My takeaway is not being able to access 2 different data sources in single custom SQL.
The other option I could think of was to create a dataset in the same type of data source to be able to join in the custom SQL with time granularity. If you have any other ideas, let me know.

Yes, my question has been answered regarding the functionality.

1 Like

Hi akshayk, thanks. Also, if you join across sources, the result must be a SPICE dataset. Here is a link to documentation details of joins in QuickSight as well as a blog post in regards to joining data across sources.