Dataset parameters with SPICE

sav0819 · December 14, 2023, 5:50am

I wanted to use dataset parameters to pull the required data into spice from my DB instance.
When I do that I need to do it in Direct Query Mode. Does that mean my data doesn’t store inside SPICE?
And how do i refresh the data in this case when I use Direct Query mode? Is this like a real time data from my DB without SPICE?

Then I don’t need to implement any refresh mechanism? How do I make sure my dashboard as latest data always?

ErikG · December 14, 2023, 6:46am

Hi @sav0819
Yes, Direct Query is always querying your DB on runtime. No need to refresh. Be aware each visual and user are creating queries.
BR

sav0819 · December 14, 2023, 8:30am

So If I write any custom sql which gets the data by joining some tables from the database always like direct query and it’s a real time data from DB.

Instead of loading all the millions of data into SPICE I think using dataset parameters with Direct customer query might be more effective solution right? Do you agree?

ErikG · December 14, 2023, 8:35am

Depends!!!

Are real time data needed?
What datasource?
What are the parameter for?
How many users?

sav0819 · December 14, 2023, 8:39am

Consider 10k users and snowflake is the DB and we use client market and some unique id which is the main source of insights where UI sends these 3 params to get the insights embedded.

ErikG · December 14, 2023, 8:47am

If 10k users are using the dashboard at the same time you end up to 10k queries to the Snowflake DB.
Not sure Snowflake will handle it and if its an cost issue.

Your DB is updated real time?

To be honest, I have used 90% SPICE datasets so far. Because the performance is usually better and no real time data is required. A daily refresh of the SPICE is usually cheaper in terms of costs and resources than DirectQuery.

Neeraj · December 19, 2023, 8:55pm

Hi @sav0819

Amazon QuickSight offers two types of data sets: Direct Query data sets, which provides real-time access to data sources, and SPICE (Super-fast, Parallel, In-memory Calculation Engine) data sets, which are pre-aggregated and cached for faster performance and scalability that can be refreshed on a schedule.

You can opt for Direct query in case you need data realtime.In case of direct query your visuals will execute query against source based on aggregation performed in visual , results are then rendered reflecting data as it is in source with no delay.
In this case Query performance will depend on the instance running the data source and there will be cost associated to each query running against source.

If you are fine with frequent data refresh ( full refresh upto 1 hour or doing incremental refresh upto 15 min) then highly recomend you to opt for SPICE query mode . In this case when data refresh happens only then query will execute in Snowflake and results are ingested into SPICE. It will be highly scalable for 100 or 1000 or 100k users providing your dashboard consistent performance.
As Erik mentioned this is going to cost effective and highly performant option.

sagmukhe · December 20, 2023, 9:19am

@sav0819 - While @Neeraj and @ErikG have already addressed most of the pointers, I just want to highlight one additional option regarding SPICE dataset refresh. You can even leverage createIngestion() method also via SDK and integrate with your existing data pipeline to refresh the SPICE dataset in an event driven manner. That would be even more cost effective and optimum avoiding unwanted queries. Hope this helps!