How to optimize the storage in QS

achourey · May 28, 2024, 6:53am

Hi All,

In QS if with create the dataset using start schema model like multiple dimensions and fact table then QS will take too much storage as compare to power bi.

In QS if we join the 1 fact table with 3-dimension table then it will create a master table using all these 4 tables, and if we have fact table with huge data then unnecessarily the bcoz of the dimension tables it will increase the data set size.

can anyone tell me what is the best model for QS and how to optimize the storage in QS.

Naveed · May 28, 2024, 7:22am

Hi,

First of all you can exclude those column not being used in you dataset and check your join condition. Big table should be the first table and use the left joins for other tables.
Rest it depends on you data model.

if the you getting your data from a database, you Create view in database. Finally get data into QS this will use les storage memory.

Storage cost is very low in SPICE

Regards,
Naveed Ali

Sanjeeb2022 · May 28, 2024, 11:23am

Hi @achourey - When you are talking about storage in QS, it refers to SPICE storage only. If your final data set is SPICE based, I believe one of the potential area to reduce or remove the column which is not require for your reporting. Most of the cases I saw SPICE limit hit on the basis of number of rows not size.

When you are comparing the size of the data w.r.t power BI, do you have any stats. Please share it, as per my understanding SPICE store the data optimally compressed format in the backend.

Regards - Sanjeeb

achourey · May 28, 2024, 3:11pm

@Sanjeeb2022
I have implemented same report in power bi as well as in QS.

I have around 55 million records in my data set where power bi took ~4 GB space for dataset and QS took ~30 GB space.

Sanjeeb2022 · May 28, 2024, 3:41pm

Hi @achourey - Thanks for the details.
Hi @Koushik_Muthanna @Ramon_Lopez - Can you please highlight this to internal team of QS. We are expecting QS should compress the file while loading to SPICE. There is a huge difference in size compare to power BI. It will be helpful if internal team can have a look on this please.

Hi @Kristin - just fyi.

Regards - Sanjeeb

Ramon_Lopez · May 28, 2024, 4:23pm

Hi All

QuickSight when storing data in SPICE is optimizing for Performance. This entails fast query times and support to scale for many users. Data is not compressed and due to focus on performance (columnar data store). This can be very noticeable when working with Parquet files. This is by design.

To reduce size focus should be placed on columns that contain large and varying strings. Any efforts to trim and eliminate unnecessary columns will impact the size it consumes.

Some info from our docs - Importing data into SPICE - Amazon QuickSight

Thanks,
Ramon Lopez

Sanjeeb2022 · May 29, 2024, 3:22am

Thank you @Ramon_Lopez . Out of curiosity, the data loading size is decreased a lot in Power BI compare QuickSight. See the notes from @achourey
“I have around 55 million records in my data set where power bi took ~4 GB space for dataset and QS took ~30 GB space”

This is bit surprise for me. Any idea on this?

Regards - Sanjeeb

Ramon_Lopez · May 29, 2024, 3:50pm

@Sanjeeb2022 I’ve seen this before. Likely due to limitations on client capacity they have spent considerable effort to compress the dataset when saving to disk. This has an impact on performance. We don’t do that.

Sanjeeb2022 · May 30, 2024, 3:07am

Thanks @Ramon_Lopez for providing this insight.

Regards - Sanjeeb