Optimizing processing time of QuickSight SPICE refresh

I have a QuickSight SPICE data set that is set to refresh daily, processes around 500M rows (~200GB) and typically take an 1 to 1.5 hours to refresh. Datasource is redshift. More recently, this has been failing due to WLM limit rules (1.5 hour) configured for query execution time. The data set written as a custom sql with all the logic takes about the same time as doing a select * from table.
Another approach taken is creating this dataset in athena - however, the ingestion time was worse - close to 3 hours.
We have an enterprise account and the data sets are within the spice limitations for QuickSight.The SPICE Usage on the account is ~80% of available SPICE capacity.

Any pointers on best practices for datasets of this size? what factors should be taken into account for estimating the processing time ? Any recommendations on how the SPICE ingestion time can be optimized?

Other notes :
Running a create table in redshift completes in less than 5 minutes. But the QuickSight data ingestion takes longer than an hour.

Hi @alexresh

Could you please check on the below suggestions and see if it reduces the time taken to load 500M records in SPICE .

  1. Check QuickSight SPICE region and Redshift are in the same region
  2. Considering 500M rows is there any option to identify the incremental records using created date
    / last modified date and enable incremental SPICE refresh
  3. Try to remove the unused/unwanted fields from dataset especially fields with string datatype
  4. Extracting 500M rows through Redshift Spectrum might take additional time when compared Redshift managed Storage .

By looking at the throughput of 5M records per Minute (500M/90 Min) is relatively good…

Thanks
VInod

Hi @alexresh
one additional question. You are proccesing 500M rows per day. Is it a full or incremental load?
BR

Processing 500 M rows per day as a full refresh. Incremental refresh in this case is not a viable solution as the 1st time it has to be a full refresh and we will hit the SPICE size limitation in less than a month

  1. the redshift is in us-west-2 region
  2. I have tried incremental loads but the overall processing times did not change much and have concern on hitting the spice ingestion size since the incremental load is constantly appending to the dataset
  3. could try this approach
  4. OK

So is the final take here that there is now way to optimize other than reducing the dataset size . whats the benefit of having spice and redshift in same region ?

1 Like

But shouldnt be the row count the same if I do a full or just incremental?
e.g.
1st load: 500 to 500
2nd load: 534 to 500+34
etc.

each incremetal dayis close 100 M rows plus

@alexresh ,

I have tried incremental loads but the overall processing times did not change much and have concern on hitting the spice ingestion size since the incremental load is constantly appending to the dataset

What do you mean with overall processing did not change much ? Ingesting 100 M records with incremental refresh took the same as a full refresh of 500 M ?

Is the custom SQL that you are executing only looking at data for certain periods of time ? ( For eg : last 12 months )
If the incremental load append works for your data , you can use a multi-schedule approach ( Setting data retention period on incremental refresh dataset? - #6 by Giang_Nguyen )

Kind regards,
Koushik

1 Like

Hi @alexresh
any update for us?
BR

Since we have not received a response from you on this question, we are going to archive the question. If you still need this question answered, please create a new question topic.