I’m trying to understand how incremental dataset refreshes work. We have the enterprise edition, and I’m trying to understand what criteria needs to be true for incremental to be displayed as an option on a dataset.
In my situation, I have multiple datasets that are set up as SPICE storage and use the same Data Source type (MySQL RDS instance). In some cases, I am able to configure incremental refresh on dataset, but in other cases the only option shown to me is a full dataset refresh.
I cannot seem to understand how/why I cannot set up incremental on all my datasets. I have a few particularly large datasets (400M+ rows) where incremental refresh would allow faster refresh of my reporting data.
Has anyone else faced this issue, or is there some clarifications the QuickSight team can provide on why certain datasets allow configuration of incremental, while others do not despite the fact that they share the same data source connection?
If your datasource is MySQL RDS instance and the query mode is SPICE : you would have the option to configure incremental refresh to load the data into SPICE.
Now if you have used this dataset ( device_data ) as a child dataset in a new dataset (device_3) , then incremental refresh here is not possible.
New dataset
Thanks for the quick reply. Yes, my datasource is a MySQL RDS instance and the storage configuration is set to SPICE.
I would like to schedule incremental refresh based on the ‘Last Activity On’ Date attribute that is in my dataset. In my dataset, I am joining multiple data tables (all sourced by custom SQL queries against db views residing in the same MySQL db).
Here’s some screengrabs for added context around my situation:
I think there’s a limitation if your dataset is built from multiple data sources. Is that the case here? When you have cross-source dataset, incremental refresh is not possible in Quicksight.
I was able to replicate the same at my end and checked it with QuickSight team. They are aware and looking into the possibility of incremental refresh when using custom sql with joins.
I will provide an update once this is available.
In the meanwhile, to unblock yourself to use incremental refresh, you will have to write a single customSQL which joins the tables instead of performing the join in QuickSight.
In the screenshot you posted with multiple custom sql joins, I am curious to understand if there is reason why it structured this way and if it could have been done in 1 custom SQL ?
Hi Thank you @Koushik_Muthanna for raising this with the QS team. I had set up the dataset with multiple custom joins through the development process extending this dataset with different attributes.
We recently created database views to match these multiple tables, so that would make a single customSQL query easier to construct to match the multi-table dataset.
Is there any performance impact on how QS handles a Custom SQL and whether the table joins are contained in single table script vs. multi-table scripts?
QuickSight is sending a SQL query to the backend datasource for data retrieval, hence the processing depends on database.
Regarding table joins ,based on what I tested in Amazon Athena, below are examples of SQL when I refreshed data into SPICE.
Custom SQL with joins in QuickSight > The query sent to Athena