UNION two datasets together instead of JOIN?

Is it possible? I have two SPICE datasets created with Custom SQL with same columns, I want to use a UNION ALL to concatenate these datasets into one, but the only option that seems to be available in the visual editor is JOIN… Why isn’t UNION an option here? Is it hiding somewhere else?
These are both small datasets which each only take about 2 minutes to load from their queries, but using UNION ALL within a single custom query for some reason makes it take so long that it is automatically cancelled, which is the reason I had to split into two datasets…

1 Like

Hi,

this is a good question. I wanted to use this in the past, too.

Do you have many columns in your datasets?

You could make a FULL JOIN on a column, where nothing matches, so you get every row from both tables.

col1| col2| col3| col4| col5|col6
x | x | x | | |
| | | y | y | y

after this you could use COALESCE on every column to move them together.
COALESCE(col1,col4) | COALESCE(col2,col5) | COALESCE(col3,col6)
x | x | x
y | y | y

This is not an optimal solution, but could work.

2 Likes

I have same issue, I have week few weeks data in excel and few in redshift i want to union manual and redshift data, please help me how to do it

Hi @mjdhav - You can use @thomask solution but you have make sure manual data set will be in SPICE and then result data set will also be in SPICE.

Also another note, why you are not loading the excel sheet to Redshift and do the union at database level?

Regards - Sanjeeb

1 Like

Can you please advise, how the data can be loaded to redshift, since I do not have write access to S3 bucket…

Also guide me on thomask…

Hi @mjdhav - One of the option is to provide the excel data set to your data team so that they can put the data in Redshift and you can use custom sql and do the union at data set level. However there is some development work require to import the excel data to Redshift. If you have direct access to Redshift, you can also load the data to Redshift via any client tool.

I prefer the first option so that there is a data pipeline created and data loaded in RedShift but it may have some operational work, Thomas suggestion is independent and work well if number of columns are less but a proper e2e POC is require and you can try that.

Regards - Sanjeeb

For less data its nor feasible to get data team involved, please help me with steps for thomas option

Hi @mjdhav - Ok in that case, you can follow below steps.

Step -1: Upload the excel sheet in QuickSight, By default File option data will be in SPICE. This is a simple and straight approach.

Step -2: Your Redshift data is already in QuickSight as a data set.

Step -3: Create a new data set by the below steps.
Note - In point #5, select the “Use in Data Set”

Step -4: Add the second data set by the below steps and do a FULL Join with a common field and apply.

Step -5: Create 2 calculated field, see one example below and remove all unwanted columns, you have create that many calculated fields that require in your analysis.

coalesce(id,{id[poc11.CSV]})

Step -6: Exclude the fields from that is not require ( mainly the columns which are from data sets) and then save it in a new data set so that it will NOT over write of any existing data set.

Note - Please do not forget to Save and Publish. You can use your final data set for analysis. This is what @thomask explained above.

Hope this will help you in fixing your issue.

Regards - Sanjeeb

1 Like