I have two datasets, 1 with a few hundred rows (dataset A), 1 with 49+ million rows (dataset B). My goal is to create a new dataset where these two datasets are joined on a common field.
Dataset B is from Athena query and in SPICE. Any sort of work to join on this dataset has failed due to timeout.
My thought was to create a filter initially to minimize dataset B prior to the join.
Any suggestions on how to do this?