I have a couple of questions / issues with using some data in from Athena in SPICE.
We previously had a table of around 40mn rows and were ingesting a “Full Refresh” every day - the main reason for the full refresh is because when I add a schedule, I’m not given the functionality / opportunity to select “Incremental Referesh”
- Any ideas why this might be occuring? (I have an enterprise edition of QS btw)
We have decided to backfill some data back to 2020, which has effectively doubled the table size. Because of the above Full Referesh, the process now takes nearly 2 hours to process.
Ideally we would use the incremental refresh to update on a daily basis (for previous day’s data) - but given the above this is not an option.
Any thoughts or suggestions here?
Is this the error you are getting?
For Incremental Refresh to work you need to have a date field that QuickSight can use when querying for new data.
You can find more information about this on the documentaion.
If this is not the error you are getting, can you please share a screen shot of the error?
Thanks for the prompt response.
Unfortunately I do not see that option when I try and select a refresh schedule - this is what I see below:
I do have a date field and this is set as a “Date” in the data type within the dataset.
Just to note - I do have enterpise and in the Frequency drop down, I do see “hourly” (as suggesting in the link provided).
Further info - the main table I have in SPICE would typically be referred to a fact table and I have created a join in the Data Prep screen on a dimension table (this contains static data). The ‘Query Mode’ in the bottom left of the screen is ‘SPICE’. When selecting ‘Add Data’ in the data prep screen, I see the below listed next to my table:
Could this have something to do with the incremental refresh?
For testing, I added a small Custom SQL data source with a date field and no join (as the join is done in the SQL query itself) - and now when I try and set up a schedule for this dataset, it gives me the option to implement incremental refresh.
What am i missing?
Incremental refresh does not work with joined dataset, that is why you do not see the option.
Some options will be to create the join between your facts and dimension tables on a custom SQL, or create a view on your source database that already has the the join performed.
Hope this helps.
A way around is to add datasource inside main DB and incremental refresh will work.
Thanks for the tip.
Are you able to elaborate a little more? or point me to a link which describes this work around?
I’m a little new to loading the data, and I typically would load a dataset from “Create a Dataset - from new data sources” in the Datasets section. Not sure if I’m doing this the right way or not.
@andres007 - thank you for your response also. Why is there a limitation in SPICE for this?
Is it best practice to have one large table in SPICE rather than have a conventional join of multiple tables?
Hi, in case you create a dataset as ‘New Dataset’ and then another, and another… and then you go to one of those datasets and join another to it. It won’t be able to make an incremental refresh regardless of the datasource type. So, alternatively, you can create a new dataset. After it is created, go to this dataset, edit the dataset, and click ‘Add Data.’ Then, use the “data source” and add the query you want here, and make the join, etc… and the incremental refresh will work.
Let me see if I understand correctly, because that does not work for me.
- Create a dataset which has a date field, and then setup an incremental refresh
- Open the dataset and add new tables, join them with the one from the original dataset
- The incremental refresh should still work.
In this scenario, I am getting this error.
Maybe you are doing it on a different way? I am curious on how this will work.
Looks like you are almost there as now you are at least able to see and manage incremental refresh Just sadly I dont know what this error is for. Maybe is problem as you scheduled Incremental refresh before join. Am normally joining all then creating schedule.
I am not able to see the incremental refresh, it is gone from the console as expected and the leftover incremental refresh that is scheduled is failing.
Do you have this setup working yourself with a dataset that is joined?
I have asked internally if this is related on how SPICE works and store the tables that are being joined. I have no further information on this as of now, as soon as I have something more specific I will come back to you.
I would also recommend filing a case with AWS Support where we can dive into the details so that we can help you further. Here are the steps to open a support case. If your company has someone who manages your AWS account, you might not have direct access to AWS Support and will need to raise an internal ticket to your IT team or whomever manages your AWS account. They should be able to open an AWS Support case on your behalf.
I have not been able to reproduce the way @matjazz is managing to do this. So I cannot comment if this can be a valid solution for your use case.
Hope this helps.