Performance recommendations

Have a question on strategy for optimizing performance with direct query based datasets with Athena as the data source.

Lets say I have a table in Athena that has the columns: id, DateTime and a bunch of other metrics. The table is already partitioned by the id column.

My dashboard will use the id, DateTime and a small subset of the metrics from this table. I expect users to filter visuals in the dashboard by Id, Year(DateTime) and Month(DateTime).

I’m thinking that in terms of optimizing performance:

  1. I should create a new table in Athena, with explicit Year(DateTime) and Month(DateTime) columns added and partition that table by Id, and these 2 new columns
  2. In QuickSight, I should create a Dataset using a custom sql query that has the Id, Year and Month as parameters
  3. Link filters in my Quicksight Analysis to these dataset parameters.

In general, I’m assuming that:

  • having appropriately partitioned/indexed tables in Athena would yield the fastest query performance relative to joining data sources in Quicksight.

  • Linking analysis filters to dataset parameters will result in the least amount of data being scanned by the query engine and return results faster

Is there anything I’m missing conceptually?

thanks!

@evgeny55 ,

At a high level looks good, would say some testing and looking at queries and data scanned should give you enough data to validate. Furthermore follow the blog for Athena as there are other factors to consider : Top 10 Performance Tuning Tips for Amazon Athena | AWS Big Data Blog

I will be marking this question as solved.

Kind regards,
Koushik