I originally created an Athena query based dataset with a column called “timestamp (UTC)” and set up an incremental refresh schedule using that column. I later renamed the column to “Timestamp” and saved & published the change. The refresh kicked off by the save failed with the following:
CUSTOMER_ERROR
Cannot find column [d4677354-34c2-44d4-a0f8-42479e3d9d89.timestamp (UTC)] in the query schema, please make sure that this column exists, and not a calculated column.
I deleted the incremental refresh schedules and tried again to manual refresh. The refresh failed again with the same error. I then put the field back into the dataset and the manual refresh succeeded. I then excluded the field and the refresh failed again with the same error (even though the incremental refresh schedules were deleted).
I’m assuming here that if there were an analysis using that column the error would be in the analysis, not in the dataset refresh. So wondering if the dataset refresh is caching something causing this error.
Found it.. it looks like the dialog for configuring the incremental refresh is holding onto the last used column which was causing the error. When I selected another column and saved a new refresh schedule I was able to go back and remove the column from the dataset.
When renaming, changing, or removing columns that are referenced in incremental refresh schedules or dataset configurations, it is important to understand that Quick Sight maintains internal schema references for those fields. If the name of a column used for incremental refresh is changed, or the field is removed, Quick Sight may continue to look for the original column during refresh operations even after schedules are deleted or updated.
@Xclipse - thanks for responding. On a related note.. My dataset previously contained data from two data sources an Athena Db and an RDS database. I removed all sub-queries/data from the dataset that referenced the RDS Db datasource and yet it still shows up in the Sources for my dataset and prevents me from adding incremental refreshes to the dataset. IS there anyway to clear this?
Check for and remove any joins, calculated fields, or hidden dependencies in the dataset editor. If you’re unable to remove the datasource using standard editing options due to errors, create a new dataset using only the Athena source and replace/re-point your analyses to this new dataset.
Please refer to the below community post this might be helpful for you.
Thanks again for following up. Apparently the actual issue was that I had two different Athena data sources in my dataset (after removing the RDS one). Ultimately they were pointing towards the same Athena database but the second was intended to be an eventual replacement for the first. In any event.. once I moved the custom queries to use the original Athena datasource I was able to turn on incremental refreshes.
I didn’t see any mention of this restriction in the incremental refresh documentation. If it’s not a bug, this probably should be documented.