How to optimize the storage in QS

Hi All,

In QS if with create the dataset using start schema model like multiple dimensions and fact table then QS will take too much storage as compare to power bi.

In QS if we join the 1 fact table with 3-dimension table then it will create a master table using all these 4 tables, and if we have fact table with huge data then unnecessarily the bcoz of the dimension tables it will increase the data set size.

can anyone tell me what is the best model for QS and how to optimize the storage in QS.

Hi,

First of all you can exclude those column not being used in you dataset and check your join condition. Big table should be the first table and use the left joins for other tables.
Rest it depends on you data model.

if the you getting your data from a database, you Create view in database. Finally get data into QS this will use les storage memory.

Storage cost is very low in SPICE :slight_smile:

Regards,
Naveed Ali

1 Like

Hi @achourey - When you are talking about storage in QS, it refers to SPICE storage only. If your final data set is SPICE based, I believe one of the potential area to reduce or remove the column which is not require for your reporting. Most of the cases I saw SPICE limit hit on the basis of number of rows not size.

When you are comparing the size of the data w.r.t power BI, do you have any stats. Please share it, as per my understanding SPICE store the data optimally compressed format in the backend.

Regards - Sanjeeb

@Sanjeeb2022
I have implemented same report in power bi as well as in QS.

I have around 55 million records in my data set where power bi took ~4 GB space for dataset and QS took ~30 GB space.

Hi @achourey - Thanks for the details.
Hi @Koushik_Muthanna @Ramon_Lopez - Can you please highlight this to internal team of QS. We are expecting QS should compress the file while loading to SPICE. There is a huge difference in size compare to power BI. It will be helpful if internal team can have a look on this please.

Hi @Kristin - just fyi.

Regards - Sanjeeb

Hi All

QuickSight when storing data in SPICE is optimizing for Performance. This entails fast query times and support to scale for many users. Data is not compressed and due to focus on performance (columnar data store). This can be very noticeable when working with Parquet files. This is by design.

To reduce size focus should be placed on columns that contain large and varying strings. Any efforts to trim and eliminate unnecessary columns will impact the size it consumes.

Some info from our docs - Importing data into SPICE - Amazon QuickSight

Thanks,
Ramon Lopez

2 Likes

Thank you @Ramon_Lopez . Out of curiosity, the data loading size is decreased a lot in Power BI compare QuickSight. See the notes from @achourey
“I have around 55 million records in my data set where power bi took ~4 GB space for dataset and QS took ~30 GB space”

This is bit surprise for me. Any idea on this?

Regards - Sanjeeb

@Sanjeeb2022 I’ve seen this before. Likely due to limitations on client capacity they have spent considerable effort to compress the dataset when saving to disk. This has an impact on performance. We don’t do that.

2 Likes

Thanks @Ramon_Lopez for providing this insight.

Regards - Sanjeeb

1 Like