Json file size in S3 to spice dataset size in quickSight

Deepti · September 14, 2023, 1:34am

Hi team,

We have json files in S3 bucket. If the data size of all files under S3 buket is ~100GB, then approximately what will be the corresponding size of spice dataset in Quicksight?

Sanjeeb2022 · September 14, 2023, 5:24am

Hi @Deepti - This is a very good question, I am not sure whether SPICE compressed the data before storing it in memory. We need to connect with QuickSight team for the same. However is it possible to create a 1GB file and ingest it and test the SPICE capacity for the same. This will give a some idea whether SPICE applied any compression algorithm or not.

Hi @duncan @sagmukhe @David_Wong - Need your expert advise on this. Is SPICE applied any compression before storing the data in memory, for an example if the file size is 100GB, what is the spice occupy space?

Regards - Sanjeeb

sagmukhe · September 14, 2023, 11:20am

@Deepti - Thank you for posting your query. I believe, SPICE capacity can be approximately calculated following the below rule.

Total logical row size in bytes =
   (Number of Numeric Fields *  8 bytes per field)
 + (Number of Date Fields    *  8 bytes per field)
 + (Number of Text Fields    * (8 bytes + UTF-8 encoded character length per field) )

Total bytes of data = Number of rows * Total logical row size in bytes

GB of SPICE Capacity Needed = Total bytes of data / 1,073,741,824

However, it would be great to get a confirmation from someone within the AWS QuickSight Team. @eperts @Kellie_Burton @abacon @Asem @Jesse @Wakana

Deepti · September 14, 2023, 9:41pm

I think the S3 data size is also almost similar to your calculation.
I want to know if Quicksight does some sort of compression in between.

Also, I want to explore, can quicksight work with big data volumes ~200TB. There is a limitation of 1tb dataset limit as well.

Sanjeeb2022 · September 15, 2023, 12:49am

Hi @Deepti - 200TB for a data set is not good fit for any reporting usecase. Can you please provide more details on your usecase, Possibly you need to preprocess the data and create an athena table ( using glue) and stored the data in compressed format ( Parquet) and take the athena table for the reporting.

Hi @David_Wong @sagmukhe - What is your advise on this. 200TB data for reporting looks quite big for me.

Regards - Sanjeeb