Questions About QuickSight and SPICE

Hi Team,

I have some questions about QuickSight and SPICE. Could you help clarify them?

1.How does SPICE handle data retention when querying from Athena? If we need to maintain a rolling six-month dataset, do we have to manage data removal manually, or is there an automated way to handle this?
2.When a new month’s data is added, does SPICE overwrite the previous dataset, or does it append the data? How can we ensure that only relevant data is retained?
3.What is the impact of frequent data refreshes on performance and cost? Would a full refresh or incremental refresh be more efficient for our use case?
4.How does SPICE compression work, and what factors influence the actual storage size compared to raw data in Athena?
5.How can we estimate the SPICE storage required before importing data, especially when working with consolidated billing data?
6.Since SPICE storage is billed based on allocated capacity, what strategies can we use to optimize usage and avoid unnecessary costs?
7.Is there a way to monitor and analyze SPICE usage trends over time to ensure we don’t exceed allocated capacity?

@akshay_4129

1.How does SPICE handle data retention when querying from Athena? If we need to maintain a rolling six-month dataset, do we have to manage data removal manually, or is there an automated way to handle this?
2.When a new month’s data is added, does SPICE overwrite the previous dataset, or does it append the data? How can we ensure that only relevant data is retained?

You would need to handle data retention/ new data with incremental refresh [ replied here : Setting data retention period on incremental refresh dataset? - #7 by Koushik_Muthanna ]

3.What is the impact of frequent data refreshes on performance and cost? Would a full refresh or incremental refresh be more efficient for our use case?

From a cost perspective, if we take Athena as example, then costs will be on Athena , performance from a dashboard view perspective is not going to be affected as you are viewing a snapshot based on SPICE. Once the refresh is complete, the latest snapshot on SPICE is visible. SPICE storage costs based on data ingestion.

4.How does SPICE compression work, and what factors influence the actual storage size compared to raw data in Athena?
5.How can we estimate the SPICE storage required before importing data, especially when working with consolidated billing data?

6.Since SPICE storage is billed based on allocated capacity, what strategies can we use to optimize usage and avoid unnecessary costs?
1/ Bring in required columns into SPICE [ If your business requirement needs 20 columns and your table has 100, don’t bring all the 100 ]
2/ Tips and tricks (Tips and tricks for high-performant dashboards in Amazon QuickSight | AWS Big Data Blog)

7.Is there a way to monitor and analyze SPICE usage trends over time to ensure we don’t exceed allocated capacity?
SPICE Usage Analytics (Dataset Sizes by User/Group/Namespace, Permissions, etc)

Kind regards,
Koushik

1 Like

Hi @akshay_4129

  1. SPICE doesn’t automatically manage data retention based on time frames. To maintain a rolling six-month dataset, you should implement a data source query that filters data to include only the past six months. This ensures that each SPICE refresh captures only the relevant data, effectively managing retention.

  2. SPICE refresh behavior depends on the refresh type:,

    • Full Refresh: Replaces the entire dataset in SPICE with the latest data from the source.​
    • Incremental Refresh: Adds or updates data within a specified look-back window without removing data outside this window.​
    • To ensure only relevant data is retained, especially when using incremental refresh, configure your data source query to include only the desired time frame (e.g., the last six months). This approach ensures that each refresh captures only the pertinent data.
  3. Incremental refreshes are generally faster and more efficient than full refreshes, as they process only new or changed data within a specified window. ​SPICE capacity is billed based on allocated storage. Frequent refreshes may increase data transfer and processing costs, especially if full refreshes are used.

  4. SPICE uses columnar storage, which is optimized for analytics and often compress data, optimizing storage and query performance. The actual storage size in SPICE compared to raw data in Athena depends on factors such as data types, cardinality, and redundancy. Highly repetitive data compresses more effectively, leading to greater storage savings. ​

  5. To estimate the size of SPICE datasets refer Estimating the size of SPICE datasets- Importing data into SPICE - Amazon QuickSight

  6. To optimize SPICE usage and control costs, refer to best practices of SPICE

    • Efficient Data Modeling: Import only necessary data fields and apply filters to exclude irrelevant data.​
    • Scheduled Refreshes: Align refresh schedules with data update frequencies to avoid unnecessary refreshes.​
    • Monitor Usage: Regularly review SPICE capacity utilization and adjust allocations as needed.
    • Auto-Purchase Settings: Enable SPICE auto capacity purchasing to automatically acquire necessary capacity and avoid ingestion failures due to capacity constraints
  7. Yes, QuickSight administrators can monitor SPICE capacity and usage,

    • SPICE Capacity Dashboard: Provides insights into allocated capacity, usage, and available space.​
    • Alerts and Notifications: Set up alerts to notify administrators when usage approaches capacity limits.
1 Like

Hi @akshay_4129,
It’s been awhile since we last heard from you on this thread, did you have any additional questions regarding your post?

If we do not hear back within the next 3 business days, I’ll close out this topic.

Thank you!

Hi @akshay_4129,
Since we have not heard back, I’ll go ahead and close out this topic. However, if you have any additional questions, feel free to create a new post in the community and link this discussion for relevant information if needed.

Thank you!