I have a huge dataset that takes 45 minutes to finish a full refresh and it will grow more and more by the day. it makes sense to make it incrementally refresh.
However, by adding a tiny CSV to the dataset, QuickSight no longer allows me to refresh incrementally.
Is there a workaround?
Another question: Will be faster over all if I remove the CSV file from this dataset so it can refresh incrementally from Aurora, then I create another dataset where I add my previous dataset and join it with the CSV file?
i think you described one workaround already.
Maybe you can save the csv in S3 and join it from there.
I have tried to join with the csv in a separate dataset but it took a similar time to finish.
Hello @Ali_B, were you able to find a way to reduce your refresh time on this dataset? If not, I feel like there are a few options to improve performance.
- Create a view/table in Aurora to manage the joins, then query the new view from QuickSight with any remaining SQL updates
- Split this dataset into smaller, more specified queries. If you are able to find some groupings that would still allow you to access all of the data you need, you can use multiple datasets in a single analysis. You just need to make sure they are grouped in a way that all of your visuals could be made from one of the options.
- Add some where clauses to exclude any unnecessary data or maybe reduce the amount of time you are querying data from. If you are pulling data from multiple years, maybe just pull in data from the beginning of 2022 until now. Then you could make another dataset with historical data if it is needed for some visuals.
- The final option would be to make some efficiency improvements on your SQL and manage joins within the query rather than requiring QuickSight to run 4 queries and merge them together.
I hope this helps! I will mark this as a solution for now, but if you have any follow-up questions or are still experiencing issues, please let me know!
Unfortunately, I can’t relegate the join with the data from the CSV to Aurora. I don’t have access to the database beyond read access and the data in the CSV changes infrequently and the end users need to be able to update the mappings themselves and send the file to me. I’m ok with running a full refresh every time the data in the CSV changes but for any other time, the dataset should preferably refresh incrementally.