Optimizing processing time of QuickSight SPICE refresh

alexresh · January 31, 2024, 2:55am

I have a QuickSight SPICE data set that is set to refresh daily, processes around 500M rows (~200GB) and typically take an 1 to 1.5 hours to refresh. Datasource is redshift. More recently, this has been failing due to WLM limit rules (1.5 hour) configured for query execution time. The data set written as a custom sql with all the logic takes about the same time as doing a select * from table.
Another approach taken is creating this dataset in athena - however, the ingestion time was worse - close to 3 hours.
We have an enterprise account and the data sets are within the spice limitations for QuickSight.The SPICE Usage on the account is ~80% of available SPICE capacity.

Any pointers on best practices for datasets of this size? what factors should be taken into account for estimating the processing time ? Any recommendations on how the SPICE ingestion time can be optimized?

Other notes :
Running a create table in redshift completes in less than 5 minutes. But the QuickSight data ingestion takes longer than an hour.

apjvinod · January 31, 2024, 8:44am

Hi @alexresh

Could you please check on the below suggestions and see if it reduces the time taken to load 500M records in SPICE .

Check QuickSight SPICE region and Redshift are in the same region
Considering 500M rows is there any option to identify the incremental records using created date
/ last modified date and enable incremental SPICE refresh
Try to remove the unused/unwanted fields from dataset especially fields with string datatype
Extracting 500M rows through Redshift Spectrum might take additional time when compared Redshift managed Storage .

By looking at the throughput of 5M records per Minute (500M/90 Min) is relatively good…

Thanks
VInod

ErikG · January 31, 2024, 8:51am

Hi @alexresh
one additional question. You are proccesing 500M rows per day. Is it a full or incremental load?
BR

alexresh · January 31, 2024, 7:53pm

Processing 500 M rows per day as a full refresh. Incremental refresh in this case is not a viable solution as the 1st time it has to be a full refresh and we will hit the SPICE size limitation in less than a month

alexresh · January 31, 2024, 8:00pm

the redshift is in us-west-2 region
I have tried incremental loads but the overall processing times did not change much and have concern on hitting the spice ingestion size since the incremental load is constantly appending to the dataset
could try this approach
OK

So is the final take here that there is now way to optimize other than reducing the dataset size . whats the benefit of having spice and redshift in same region ?

ErikG · January 31, 2024, 8:03pm

But shouldnt be the row count the same if I do a full or just incremental?
e.g.
1st load: 500 to 500
2nd load: 534 to 500+34
etc.

alexresh · January 31, 2024, 9:16pm

each incremetal dayis close 100 M rows plus

Koushik_Muthanna · February 1, 2024, 8:55am

@alexresh ,

I have tried incremental loads but the overall processing times did not change much and have concern on hitting the spice ingestion size since the incremental load is constantly appending to the dataset

What do you mean with overall processing did not change much ? Ingesting 100 M records with incremental refresh took the same as a full refresh of 500 M ?

Is the custom SQL that you are executing only looking at data for certain periods of time ? ( For eg : last 12 months )
If the incremental load append works for your data , you can use a multi-schedule approach ( Setting data retention period on incremental refresh dataset? - #6 by Giang_Nguyen )

Kind regards,
Koushik

ErikG · February 7, 2024, 7:21pm

Hi @alexresh
any update for us?
BR

Koushik_Muthanna · February 27, 2024, 3:31pm

Since we have not received a response from you on this question, we are going to archive the question. If you still need this question answered, please create a new question topic.