Hi, I am attempting to re-architect some of my datasets because they are taking multiple one-to-many left joins and the data is taking almost an hour to refresh. My filters are based on 4-5 different child datasets, which are all joined together based on a unique id. As an alternative to doing this on a dataset level, I am hoping to replicate an inner join so that I can have multiple smaller datasets for each one-to-many relationship and use filters the same as having one larger dataset without the long refresh times. I attempted to use filters on the unique id and also on a calculated field which sets the id to null when the current dataset’s filter is not null, but this filter doesn’t apply across all datasets to filter out those ids, since every dataset’s calculated field is different. How can I replicate an inner join such that when I apply an analysis filter on one dataset, those same unique ids are filtered out in all other datasets based on something like a cross-dataset filter
Hello @dclutes3 !
Have you tried mapping dataset parameters to your analysis?
Thanks for the response, I have not tried that; I am looking at that now and am a little confused on how that can work with our data. I am not familiar with dataset parameters, so some guidance would be helpful for me to understand them better.
For example, I have one dataset where it has the id, name, and status and another with id, platform name. How can I leverage this to create filters for name, status, and platform name without needing to join those datasets at the dataset level.
I have done some research and this doesn’t seem to quite solve my problem. Currently, I am using a SPICE dataset that is joined in the dataset editor, and each child dataset joined is also a SPICE dataset that comes from this same dataset and is already imported to quicksight, which is giving me 40 million+ rows but achieves the filter functionality since the data is ingested with all fields we are filtering on right now. For example, one child dataset is an id and condition_id, where a id can have multiple condition ids, and there is also another child dataset that joins on id which contains fields such as game name, and game platform which I filter on. While this works, it is using more than 250GB of spice capacity and is not sustainable at all.
I could be wrong, but it would be idea to filter each dataset individually and store the unique ids in a parameter, and use each dataset’s parameter of ids to check whether or not the id should be displayed in the visual(s) across all datasets, similar to if we were doing a INNER JOIN between datasets on that id value. I tried using a calculated field that filters out all ids that are not filtered based on parameters and it works for the dataset, but when you apply it cross-dataset it doesn’t’ work because the same calculated field in that dataset includes all values, since each dataset’s version of the calculated field only takes into account the fields that are in the field well and not those from other datasets. I basically need a way to apply dataset filters individually and concatenate them so that the entire analysis can be filtered correctly without having to join datasets at the dataset level to prevent using unnecessary SPICE capacity.
Hope this helps,
Drew
Hi @dclutes3,
It’s been awhile since last communication on this thread, were you able to find a work around for your case or are you still encountering the same issues?
From my understanding, while your logic as to the setup could help achieve the desired outcome, connecting filters across datasets is currently not an available feature.
I can mark this as a feature request to promote visibility to the support team. If you have any other questions or if you were able to incorporate a work around that helped your case, let us know.
Otherwise, if we do not hear back within the next 3 business days, I’ll close out this topic.
Thank you!
Thanks for checking in! I talked to AWS support and tried a bunch of different things but had no luck, so I am working on optimizing my data to limit the number of rows in our current dataset architecture. You can absolutely mark this as a feature request since there isn’t a current workaround for my use-case
Best, Drew
Hi @dclutes3,
Thanks for sharing your update and no problem, I will mark this as a feature request!