What would be considered a best practice in QuickSight, to create a new data source for every single dataset or to reuse a data source for multiple datasets? What are the pros and cons of each approach?
hello david.
tl;dr: we resuse the data source, transform via dbt, build any and all aggregations as views there, that quicksight either ingests into spice or queries directly.
i am not claiming to know best practice nor wanting to preach that our way is the best. still, happy to share how we work at my workplace for purposes of discussion and perspectives. happy to hear more thoughts on this.
there are so many ways to aggregate our transaction data (by week, month, supplier, category, region, currency, order status, by delivery method, by payment method, by order type, and by bespoke combinations of the above) that we tend to run each of the requested aggregations in our data warehouse (redshift) and load only the aggregates+key performance indicators into quicksight for visualisations. an aggregated dataset compresses the transaction-level data by a factor of 30-30,000 so it is worth it from the point of view of spice storage as well. the lighter data also load faster (although i do find that after the initial loading, quicksight tends to be quite responsive).
the immediate challenge we get is from users who do not yet know what aggregations they want to run on the data, users who want to use quicksight as a discovery tool. they basically need transaction level data but whenever they have a defined question to track over time we can convert it into an aggregated dataset and build a report around it.
your milage may vary, especially if your data source is a data lake (s3) or another source that is not well suited for analytical computations. i am aware that with dataset construction, filtering and joining, and with calculated data fields and parameters, quicksight is itself quite flexible for ad hoc calculations. still, i have found it unnecessarily difficult to visualise some of bespoke aggregations we need for the kpis. and the data warehose is right there and in it, i can manage the semantic transformations and metric definitions with dbt.
governance and security is also more manageable outside of quicksight than inside.
this may help: Best practices for Amazon QuickSight SPICE and direct query mode | AWS Business Intelligence Blog
Hi @david.adds - Welcome to AWS QuickSight and thanks for posting the question. Please find some of the best practices below.
- If your data is refreshed in batch mode and same dataset is used for multiple analysis, better to store in a common folder ( for everyone’s access) and make it SPICE mode refresh. This way you can refresh the data once and same data set can be used by multiple people and reuse it.
- Try to create the calculated fields in data preparation stage ( in case your dataset is SPICE) . This way during the SPICE refresh, the calculated fields will be calculated and can be used in analysis .
- If you have any real time reporting requirement, direct mode is approach as there will be a delay or lag in SPICE mode refresh.
- If you will require subset data from your data source and data source is Relational based, you can utilize custom SQL functionality and also use join and use sql function to extract the source data to QuickSight.
The link provided by @jens is also useful and have a look as well.
Regards,
Sanjeeb
Hi @david.adds,
It’s been awhile since we last heard from you, do you have any follow up questions regarding your original topic or were the responses listed above helpful?
If we do not hear back within the next 3 business days, I’ll go ahead and close this topic.
Thank you!
Hi @david.adds,
Since we have not heard back from you yet, I’ll go ahead and mark the solution. However, if you have any additional questions, feel free to create a new topic in the community and link this discussion for relevant information.
Thank you!