Spice Computation

woliveiras · August 21, 2024, 11:42pm

Hello everyone,

I have a simple question about Spice and Dataset. Let’s imagine I have a CSV file with 10,000 rows and 4 columns: LoadDate, ColumnA, ColumnB, and ColumnC. If I load this file into Quick Sight, it gives me a 10MB dataset. If I generate this file over 10 days, changing only the LoadDate column (same LoadDate for all rows), and then load all these files into Quick Sight Dataset, it gives me a 100MB dataset. I expected a smaller dataset considering columnar storage, given that ColumnA, ColumnB, and ColumnC have the same data. Is my expectation correct?

Xclipse · August 22, 2024, 5:33am

Hi @woliveiras

SPICE uses columnar storage, which is optimized for analytics and often compresses data by storing columns separately. When many rows share the same values in certain columns (like ColumnA, ColumnB, and ColumnC in your example), SPICE can apply compression effectively.

SPICE also stores metadata, indexing, and other overhead for each file, which can result in a dataset size larger than expected when concatenating multiple files.

Full Refresh vs. Incremental Refresh: If you perform a full refresh, SPICE reloads the entire dataset, potentially leading to increased storage requirements as new, unique data accumulates. On the other hand, using incremental refresh can optimize storage by only appending new rows (e.g., for a new LoadDate) instead of reloading the entire dataset. This method is more storage-efficient when only a portion of your data changes over time.

Please refer the below documentation this might be helpful for you.

woliveiras · August 22, 2024, 6:58pm

Hey @Xclipse ,

Thank you for your response. Unfortunately, the behavior I’m observing in my environment is different from what you described.

I created two files:

File1.csv (5 MB)
Filetotal.csv (15 MB)
Both files have the same columns. The only difference is that Filetotal.csv contains three snapshots of the same data found in File1.csv, with the LoadDate column having different dates and the value columns having different values.

When I import File1.csv into a dataset in Quick Sight, it generates a 10 MB dataset. However, when I import Filetotal.csv, Quick Sight generates a 45 MB dataset.

This is not what I expected, given the repetition of column data.

Could you please provide some insights into why this might be happening?

File1.csv (5mb)
LoadDate, ColumnA, ColumnB, and ColumnC.
01/01/2024,AAA,BBB,10
01/01/2024,AAA,BBB2,15
01/01/2024,AAA,BBB3,15
01/01/2024,AAA1,BBB,100
01/01/2024,AAA1,BBB2,13
01/01/2024,AAA1,BBB3,11

Filetotal.csv (15mb)
LoadDate, ColumnA, ColumnB, and ColumnC.
01/01/2024,AAA,BBB,10
01/01/2024,AAA,BBB2,15
01/01/2024,AAA,BBB3,15
01/01/2024,AAA1,BBB,100
01/01/2024,AAA1,BBB2,13
01/01/2024,AAA1,BBB3,11
02/01/2024,AAA,BBB,102
02/01/2024,AAA,BBB2,152
02/01/2024,AAA,BBB3,151
02/01/2024,AAA1,BBB,190
02/01/2024,AAA1,BBB2,130
02/01/2024,AAA1,BBB3,111
03/01/2024,AAA,BBB,9
03/01/2024,AAA,BBB2,153
03/01/2024,AAA,BBB3,152
03/01/2024,AAA1,BBB,70
03/01/2024,AAA1,BBB2,135
03/01/2024,AAA1,BBB3,110

Koushik_Muthanna · August 22, 2024, 9:45pm

@woliveiras ,

SPICE is geared for performance and the expectation that repeated data would lead to a smaller dataset size is not true . If you want to get an estimate of the logical size of your dataset , use the following where we describe how it’s calculated spice-capacity-formula .

Kind regards,
Koushik

Topic		Replies	Views
Json file size in S3 to spice dataset size in quickSight Q&A data-source	4	544	September 15, 2023
Quicksight Spice Capacity & data load Q&A admin	3	79	October 17, 2024
How to reduce dataset spice space Q&A data-source , dataset	3	590	July 3, 2023
SPICE Capacity filled more that what the dataset weighs Q&A spice	3	237	July 28, 2023
Quicksight SPICE dataset size Q&A dataset , Business-Intelligence-Engineer	2	56	March 21, 2025

Spice Computation

Related topics