Quicksight / ICEBERG - Best read speed setup for large dataset?

Kai2 · September 28, 2024, 6:55am

Hi, I came across this AWS prescriptive guide:

I am excited about the possibilities of how a very well optimised Iceberg dataset can enable faster read speeds for my large Google Analytics dataset (about 30+ columns with 300+GB for 3 months worth of data, I need to setup for at least 2 years worth of data), in particular that last graph in that guide showing the potential read time savings.

I am however concerned as I realised our GA dataset in S3 / Athena (imported from Bigquery) have already been written in Iceberg format.

Setting up a test Quicksight dashboard connecting via Athena, my simple line / bar combo chart failed to load for 3 months worth of data.

I have tried connecting QS direct with all columns in that Athena dataset, and another QS dataset direct connecting the same Athena dataset, but only pulling 3 columns via custom SQL.

Both results didnt yield much difference.

I know there’s SPICE, but for my use case, I am doubtful SPICE can have enough space before we get charged.

Is there some kind of best practices in play, where after creating a highly read-optimised ICEBERG dataset, I can have a great QS experience for my users by having QS configured correctly to make full use of what ICEBERG is offering, as prescribed by that AWS documentation?

Koushik_Muthanna · September 30, 2024, 8:09am

@Kai2 ,

QuickSight when used with direct query is sending SQL queries back to your query engine. The query engine is then responsible in generating the results and the speed at which those queries complete also depend on the query engine. BI queries tend to be ad-hoc, so is the ICEBERG optimization done at your end able to solve most of the common query patterns for your use-case ? .

Setting up a test Quicksight dashboard connecting via Athena, my simple line / bar combo chart failed to load for 3 months worth of data.

QuickSight renders a timeout if the query does not complete within 2 minutes. Are the queries completing before 2 minutes based on your dataset volume ? A note on Athena [ How does Athena prepare a cluster of compute nodes for a specific query? | AWS re:Post ] Maybe Athena provisioned capacity could be an option for you [Introducing Athena Provisioned Capacity | AWS News Blog ]

Is there some kind of best practices in play, where after creating a highly read-optimised ICEBERG dataset, I can have a great QS experience for my users by having QS configured correctly to make full use of what ICEBERG is offering, as prescribed by that AWS documentation?

Which part in the documentation are you referring when asking about configuration. Note there is no configuration from QuickSight apart from creating the data source connecting to Athena.

Kind regards,
Koushik

Brett · October 14, 2024, 4:15pm

Hi @Kai2,
It’s been awhile since we last heard from you; did you have any additional questions regarding your initial topic?

If we do not hear back within the next 3 business days, I’ll go ahead and close out this topic.

Thank you!

Brett · October 17, 2024, 3:49pm

Hi @Kai2,
Since we haven’t heard back, I’ll close out this topic. However, if you have any additional questions, feel free to create a new post in the community and link this discussion for relevant information if needed.

Thank you!