SPICE Incremental refresh with column for partitioning the original source dataset

vitality1994 · December 17, 2024, 7:51am

Hello,

I’m having trouble reducing the cost of scanning data. I’ve implemented incremental refresh, but it seems like the entire original dataset is still being scanned, and only then filtered according to the incremental refresh window. As a result, while the refresh time might improve, I don’t see any real cost savings for data scanning.

My question is: If I partition the original dataset based on the same column used for the incremental refresh window, can I avoid a full data scan and achieve real cost savings?

Giridhar.Prabhu · December 17, 2024, 8:08am

Hi @vitality1994

We had similar issues with some incremental datasets. The explain plan for our MySQL data would go haywire when the incremental refresh query appended its time clause which looks something like the following

WHERE flightDate > CONVERT_TZ(DATE_ADD(FROM_UNIXTIME(0), INTERVAL cast(cast(1724989230622 as char(10)) as decimal) SECOND), @@session.time_zone,‘UTC’);

If I took the literal value from this and used in the query the plan would work fine. But eventually, when we simplified our underlying queries and reduced the level of nesting that we had things worked fine.

So, in nutshell you should still explore the query execution plan in your DB and look for opportunities to improve that.

Partitioning may help since the db engine may determine the parition that it requires to query. So you can give that a try

Regards,
Giri

vitality1994 · December 17, 2024, 9:06am

Thank you for replying!

So you’re suggesting that I can try using partitioning on the column to improve incremental refresh performance or focus on optimizing my query for better efficiency. But during the experiment, how can I check if the query is scanning the entire table or not?

Giridhar.Prabhu · December 17, 2024, 10:44am

Hi @vitality1994

When your dataset refresh is run you can have your DBA capture the complete query that is running in the DB (that include the condition that Quick Sight puts for incremental refresh).

You can then check the explain plan of the query. Based on the plan you could determine if the indexes on the underlying table are being used or not and take appropriate action.

This is my approach. I haven’t used partitioning; but if that is something in your control you could do that as well; if it provides the required benefits for you.

Brett · December 30, 2024, 4:57pm

Hi @vitality1994,
It’s been awhile since we last heard from you, did you have any additional questions regarding your initial post?

If we do not hear back within the next 3 business days, I’ll close out this topic.

Thank you!

Brett · January 3, 2025, 4:33pm

Hi @vitality1994,
Since we haven’t heard back, I’ll go ahead and close out this topic. However, if you have any additional questions, feel free to create a new post in the community and link this discussion for relevant information if needed.

Thank you!

Topic		Replies	Views
Quick Sight Incremental refresh with Athena Q&A Business-Intelligence-Engineer	4	201	February 4, 2025
Not Experiencing Refresh Duration Gains When Switching to Incremental Refresh Q&A reports , dashboards , business-user	5	24	August 22, 2025
Incremental refresh taking long Q&A dataset , Business-Intelligence-Engineer	3	84	March 10, 2025
Incremental dataset refresh taking too long Q&A admin	4	155	June 18, 2024
SPICE incremental refresh on int partition column in Athena Q&A dataset , Business-Intelligence-Engineer	3	441	April 18, 2024

SPICE Incremental refresh with column for partitioning the original source dataset

Related topics