SPICE Incremental refresh with column for partitioning the original source dataset

Hello,

I’m having trouble reducing the cost of scanning data. I’ve implemented incremental refresh, but it seems like the entire original dataset is still being scanned, and only then filtered according to the incremental refresh window. As a result, while the refresh time might improve, I don’t see any real cost savings for data scanning.

My question is: If I partition the original dataset based on the same column used for the incremental refresh window, can I avoid a full data scan and achieve real cost savings?

1 Like

Hi @vitality1994

We had similar issues with some incremental datasets. The explain plan for our MySQL data would go haywire when the incremental refresh query appended its time clause which looks something like the following

WHERE flightDate > CONVERT_TZ(DATE_ADD(FROM_UNIXTIME(0), INTERVAL cast(cast(1724989230622 as char(10)) as decimal) SECOND), @@session.time_zone,‘UTC’);

If I took the literal value from this and used in the query the plan would work fine. But eventually, when we simplified our underlying queries and reduced the level of nesting that we had things worked fine.

So, in nutshell you should still explore the query execution plan in your DB and look for opportunities to improve that.

Partitioning may help since the db engine may determine the parition that it requires to query. So you can give that a try

Regards,
Giri

2 Likes

Thank you for replying!

So you’re suggesting that I can try using partitioning on the column to improve incremental refresh performance or focus on optimizing my query for better efficiency. But during the experiment, how can I check if the query is scanning the entire table or not?

Hi @vitality1994

When your dataset refresh is run you can have your DBA capture the complete query that is running in the DB (that include the condition that QuickSight puts for incremental refresh).

You can then check the explain plan of the query. Based on the plan you could determine if the indexes on the underlying table are being used or not and take appropriate action.

This is my approach. I haven’t used partitioning; but if that is something in your control you could do that as well; if it provides the required benefits for you.

Hi @vitality1994,
It’s been awhile since we last heard from you, did you have any additional questions regarding your initial post?

If we do not hear back within the next 3 business days, I’ll close out this topic.

Thank you!

Hi @vitality1994,
Since we haven’t heard back, I’ll go ahead and close out this topic. However, if you have any additional questions, feel free to create a new post in the community and link this discussion for relevant information if needed.

Thank you!