Have a question on strategy for optimizing performance with direct query based datasets with Athena as the data source.
Lets say I have a table in Athena that has the columns: id, DateTime and a bunch of other metrics. The table is already partitioned by the id column.
My dashboard will use the id, DateTime and a small subset of the metrics from this table. I expect users to filter visuals in the dashboard by Id, Year(DateTime) and Month(DateTime).
I’m thinking that in terms of optimizing performance:
- I should create a new table in Athena, with explicit Year(DateTime) and Month(DateTime) columns added and partition that table by Id, and these 2 new columns
- In QuickSight, I should create a Dataset using a custom sql query that has the Id, Year and Month as parameters
- Link filters in my Quicksight Analysis to these dataset parameters.
In general, I’m assuming that:
-
having appropriately partitioned/indexed tables in Athena would yield the fastest query performance relative to joining data sources in Quicksight.
-
Linking analysis filters to dataset parameters will result in the least amount of data being scanned by the query engine and return results faster
Is there anything I’m missing conceptually?
thanks!