I just read about AWS Athena query results reuse and it’s very good feature to be used, Since i’m using AWS Quicksight direct query over 150M records per day in a parquet format, I’m trying to optimize the cost and performance for Quicksight, I enabled the query results reuse through console and it’s wokring but every time I tried to load the view on QuickSight, However it’s not working through QuickSight and every time it’s trigger a new query, How I can optmize QuickSight to reuse the athena results each time I load the view?
Welcome to the QuickSight community, it is great having you here.
If query reusability is a good use case for you, then why not look into getting the data from Athena into SPICE?
In this case, you will be reading only once from Athena, which reseambles the query reusability from Athena, and at the same time you will get all the benefits of having your data in SPICE, like performance and cost effectiveness.
Thank you for reply.
My concern for the SPICE is size of data, since the data will be very large ( 150GB per day) which will be very costly to use SPICE for retaining the data for 1 year.
I can use SPICE maybe for a retention period of 3 months but for a year this will be very costly.
I am not sure also regarding SPICE limitation because I saw there is a limitation of 1Billion rows + 1TB of data
So every day you add 150GB of data to the Athena tables and these are queried completly every day? So on day 2 you have 300GB and in day 3 you have 450GB and you are scanning all the data every day? If this is the amount of data that you need stored every day, yes, SPICE is not going to cut it to keep it 365 days of history which will ammount to like 50+ TBs
Are you using Parquet in S3? Do you have partitioned tables? If this is the case, you could create a parameter that filters for a specific partition using a dataset parameter and then you will only be querying the data that is needed for the dashboard.
For this seer size of data, I am not even sure how will Athena perform, or if it will time out, when you are scanning 50+ TBs, even if we do not take in consideration the Athena scan cost.
I am currently using parquet and trying to combine the files to reduce number of files, results of direct queries are promising with 5~7seconds of scanning 200Million records, but not sure how it will perform in future,
I am using partitions while storing the data into S3 with /year/month/day and created partitioned tables based on this, not sure how I can apply filters for datasets to retrieve only 3 months back, but I think it will be an issue if I want to query 3 months back specially this will be every day, and for keeping 1 year it will be impossible as you mentioned,
I will try to use parameters filters but sometimes they need to query all the data
Are the query results the same amount of data that you are scanning? Because if this is the case, then even if you could use it in QuickSight, the only thing you will save will be the scan costs, but the amount of data transferred will be the same.
Maybe you can summarize the data if you dont need all the granularity in QuickSight and then use that summarization query in Athena to populate a SPICE dataset?
Sorry for my late reply, I’m looking for reuse the Athena query results because I’m currently working with direct query, loading data into SPICE failed due to Athena timeout, we are talking about loading 4 days and already Athena timed out, So I went to direct query instead, after data optimization, converting into parquet repartition the data into larger files, I was able to reduce time of data scanning from 40s to 3~4s, I need to reduce the time for loading visualization on QuickSight if I choose specific visual for a period of time
since QuickSight trigger a query even if I loaded same parameter into the visual or I did a refresh to the browser
Or if there are other suggestions to reduce the size/number of records loaded/scanned this might be helpful
This will not solve the size limitation of SPICE but the ammount of time reading from Athena will be decreased.
Another thing to look at is, if the granularity of this data needed for the visualization or are you aggregating in the visualization. If this is the case, you could create a SPICE dataset with a custom SQL that already aggregates the data before storing it into SPICE.
Since we have not heard back from you, I’ll go ahead and close/archive this topic. However, if you have any additional questions, feel free to create a new topic in the community and link this discussion for relevant information.