Incremental Refresh Limitations Question

Hi,

I’m trying to understand how incremental dataset refreshes work. We have the enterprise edition, and I’m trying to understand what criteria needs to be true for incremental to be displayed as an option on a dataset.

In my situation, I have multiple datasets that are set up as SPICE storage and use the same Data Source type (MySQL RDS instance). In some cases, I am able to configure incremental refresh on dataset, but in other cases the only option shown to me is a full dataset refresh.

I cannot seem to understand how/why I cannot set up incremental on all my datasets. I have a few particularly large datasets (400M+ rows) where incremental refresh would allow faster refresh of my reporting data.

Has anyone else faced this issue, or is there some clarifications the QuickSight team can provide on why certain datasets allow configuration of incremental, while others do not despite the fact that they share the same data source connection?

Thanks in advance,
Brian

Hi,

Incremental refresh work on “TIMESTAMP” date type.
Can you please verify it from your dataset ?

Otherwise there is no limitation or issue in Quicksight.

Regards,
Naveed Ali

Hi Brian,

If your datasource is MySQL RDS instance and the query mode is SPICE : you would have the option to configure incremental refresh to load the data into SPICE.

RDS Source to SPICE

Incremental Refresh

Now if you have used this dataset ( device_data ) as a child dataset in a new dataset (device_3) , then incremental refresh here is not possible.
New dataset

Previously created dataset is used as a child ( Query mode : SPICE )

Only full refresh on a SPICE dataset is possible

Following video on how incremental dataset refresh work ( inserts/updates/deletes ) : 2022 QuickSight Learning Series: Keep your data fresh with incremental SPICE refresh - YouTube

If you have a different configuration, provide us an example with a few screenshots .

Kind Regards,
Koushik

1 Like

Hi @Koushik_Muthanna and @Naveed,

Thanks for the quick reply. Yes, my datasource is a MySQL RDS instance and the storage configuration is set to SPICE.

I would like to schedule incremental refresh based on the ‘Last Activity On’ Date attribute that is in my dataset. In my dataset, I am joining multiple data tables (all sourced by custom SQL queries against db views residing in the same MySQL db).

Here’s some screengrabs for added context around my situation:

Multi-table dataset:

Multiple Date Fields Present in dataset:
image

Desired timestamp attribute to use:
image

Here’s what I see in the Refresh tab for this dataset:

Any ideas on what might be preventing incremental from showing on this dataset?

1 Like

I think there’s a limitation if your dataset is built from multiple data sources. Is that the case here? When you have cross-source dataset, incremental refresh is not possible in Quicksight.

@newQSUser - this dataset is built on a single MySQL data source though.

I do know what you are describing is true when you have a cross-source dataset, incremental is not an option, which makes total sense.

Hi Brian,

I was able to replicate the same at my end and checked it with QuickSight team. They are aware and looking into the possibility of incremental refresh when using custom sql with joins.

I will provide an update once this is available.

In the meanwhile, to unblock yourself to use incremental refresh, you will have to write a single customSQL which joins the tables instead of performing the join in QuickSight.

In the screenshot you posted with multiple custom sql joins, I am curious to understand if there is reason why it structured this way and if it could have been done in 1 custom SQL ?

Kind Regards,
Koushik

1 Like

Hi Thank you @Koushik_Muthanna for raising this with the QS team. I had set up the dataset with multiple custom joins through the development process extending this dataset with different attributes.

We recently created database views to match these multiple tables, so that would make a single customSQL query easier to construct to match the multi-table dataset.

Is there any performance impact on how QS handles a Custom SQL and whether the table joins are contained in single table script vs. multi-table scripts?

Hi Brian,

QuickSight is sending a SQL query to the backend datasource for data retrieval, hence the processing depends on database.
Regarding table joins ,based on what I tested in Amazon Athena, below are examples of SQL when I refreshed data into SPICE.

  1. Custom SQL with joins in QuickSight > The query sent to Athena

  2. Accessing a view from QuickSight > The query sent to Athena ( View is defined in Athena )

Regards,
Koushik

1 Like