QuickSight Incremental refresh with Athena

Mykhailo_Sorochev · February 4, 2025, 10:16am

Hi QuickSight Community,

I’m working on setting up incremental refresh in QuickSight using Athena, but I’m facing a performance issue where Athena scans all partitions instead of pruning them properly.

My Setup:
• My Athena table is partitioned by year, month, day.
• QuickSight incremental refresh is based on updated_at.
• Issue: When QuickSight generates SQL, it only filters by updated_at and does not apply partition filtering (year, month, day).
• Result: Athena scans all partitions instead of just the relevant ones, making the queries slow and expensive.
The question is: how can I connect updated_at with partitioning to let Athena know to scan just limited partitions?

Xclipse · February 4, 2025, 1:33pm

Hi @Mykhailo_Sorochev

To optimize your QuickSight incremental refresh with Athena and ensure partition pruning, follow these steps,

Modify your QuickSight dataset’s custom SQL query to manually filter partitions based on updated_at.
- Example:
```
SELECT *
FROM your_table
WHERE updated_at >= DATE_ADD('day', -1, CURRENT_DATE)  -- Incremental filter
```
Enable Partition Projection (Athena Optimization). If your dataset is large, enabling partition projection in Athena speeds up performance by reducing metadata scans.
Run EXPLAIN SELECT in Athena to verify partitions are being used.

Mykhailo_Sorochev · February 4, 2025, 2:11pm

Thank you @Xclipse! So. if I update it and use the ‘manual’ incremental filtering, how it will work when I need a full refresh? As I understand, I can’t have different SQL for Incremental refresh and for a Full refresh, correct? Why do you need Incremental refresh option in UI in this case? Also, do you know if there is any option to update existing data in SPICE by row id number or something like that?

Xclipse · February 4, 2025, 5:09pm

Hi @Mykhailo_Sorochev

You’re absolutely right. QuickSight does not allow different SQL queries for Incremental Refresh and Full Refresh in the UI. However, a possible approach is to create a view for the full load and later modify it to display only recent changes in the data based on your incremental refresh interval.

The Incremental Refresh option in QuickSight UI automatically generates SQL queries based on a timestamp column but may not efficiently handle partition pruning. If your data is not partitioned, QuickSight default incremental refresh should work well. However, for partitioned datasets, using custom SQL is often necessary to ensure optimal performance and avoid scanning all partitions.

No, QuickSight does not support row-level updates in SPICE, similar to a database UPDATE statement. Instead, SPICE allows only Append (adding new rows) and Replace (performing a full dataset refresh).

Mykhailo_Sorochev · February 4, 2025, 5:21pm

@Xclipce thank you! My test shows this:
/* QuickSight d51933e8-e20b-4df2-b111-9578697e5721 */

SELECT “posted_date”, “document_type”, “document_id”, “seller_id”, “company_id”, “marketplace_id”, “inventory_id”, “pnl_class_id”, “transaction_type”, “customer_order_id”, “order_item_id”, “amount_type”, “amount_description”, “quantity”, “currency”, “amount”, “debit”, “credit”, “sku”, “fnsku”, “asin”, “fulfillment_channel”, “sales_channel”, “neon_prt_data_type”, “neon_prt_company_id”, “neon_prt_date_month”, “date_day”, “updated_at”

FROM (SELECT

*,

CAST(concat(neon_prt_date_month,‘-’,date_day) as DATE) as updated_at

FROM “amzn_transac_journal”.“neon_prt_report_type_transaction_journal”

where neon_prt_date_month >= ‘2025-01’

and neon_prt_data_type =‘cogs’) AS “New custom SQL”

WHERE “updated_at” > from_unixtime(cast(substr(cast(1738502510855 as varchar), 1, 10) AS bigint))

as you can see in my SQL I used partitioned column neon_prt_date_month to cast as date. So, QuickSight worked out correctly added the last Where statement and scanned just 2 days according partitions. I checked the amount of data scanned - it was resonably small comparing with the full SLQ. So, I am going to use it as a solution.