We are having data in S3 in parquet format where its size is 300MB. when we take this data without any transformation via Athena in Quicksight, Data Size on SPICE appears around 8.8 GB.
Unable to understand how this data volume is increasing this much.
Is it because of the Data type thing?
Looking for some understanding here. Since different tools offer compression of data while those are imported in the system here its shooting so high
The size of a dataset in SPICE relative to your account’s SPICE capacity is called logical size. A dataset’s logical size isn’t the same as the size of the dataset’s source file or table. The computation of a dataset’s logical size occurs after all the data type transformations and calculated columns are defined during data preparation. These fields are materialized in SPICE in a way that enhances query performance. Any changes you make in an analysis have no effect on the logical size of the data in SPICE. Only changes that are saved in the dataset apply to SPICE capacity.
The logical size of a SPICE dataset depends on the data types of the dataset fields and the number of rows in the dataset. The three types of SPICE data are decimals, dates, and strings. You can transform a field’s data type during the data preparation phase to fit your data visualization needs. For example, the file you want to import might contain all strings (text). But for these to be used in a meaningful way in an analysis, you prepare the data by changing the data types to their proper form. Fields containing prices can be changed from strings to decimals, and fields containing dates can be changed from strings to dates. You can also create calculated fields and exclude fields that you don’t need from the source table. When you are finished preparing your dataset and all transformations are complete, you can estimate the logical size of the final schema.
In the formula below, decimals and dates are calculated as 8 bytes per cell with 4 extra bytes for auxillary. Strings are calculated based on the text’s length in UTF-8 encoding plus 24 bytes for auxillary. String data types require more space because of the extra indexing required by SPICE to provide high query performance.
Logical dataset size in bytes =
(Number of Numeric cells * (12 bytes per cell))
+ (Number of Date cells * (12 bytes per cell))
+ SUM ((24 bytes + UTF-8 encoded length) per Text cell)
Thanks for Answering Neelay,
Does Normalisation of Data helps in reducing the size here in quicksight, Tried testing the same but unable to reach onto the soluution .
since data size of 300MB is inflated to 8.5 GB. , need your opinion on the same.
@shweta may be, you have to try.
I want to see the sample data just for the curiosity,
If it Is possible, Can you share sample data?
Here is the attached sample data of one of the table, Unable to upload excel here. Try below if you can access this