Best way to re-use same data in different datasets

Andrea_M · December 19, 2021, 2:56pm

Hi all,

I have a “design” question.

Our datasets are based on custom mysql/psql queries: many times, we used case statements to map our data.

For example
WITH order_attribution AS (
SELECT order_id,
CASE
WHEN LOWER(referer) LIKE ‘aaaa%’ THEN ‘one’
WHEN LOWER(referer) LIKE ‘bbb’ THEN ‘two’
[CUT]

For different reasons, currently the same part of the queries are in common in different datasets (ie. 5 datasets).
Which is the best way to avoid updating the queries in all the 5 datasets when you need to change some condition?

I was thinking that an option could be creating a simple dataset only with the subquery containing the “mapping” and join it with the main datasets where you need to use that mapping.

What do you think?

To create new tables/views in the database is not an option (I prefer to have the mapping on QS).

Thanks,
Andrea

Ying_Wang · December 20, 2021, 9:07pm

hi Andrea, there are two ways to do that:

Composite dataset (dataset as source). You can create a dataset with SQL:
SELECT order_id, xxx, yyy
CASE
From Table
Where mandatory conditions
Group by 1,2,3

After then, reuse this dataset by creating child dataset on top of it. You can apply filter, create calculation and etc on the children datasets.

Create a view with the same query in DWH. And then, create a dataset based on the DB view. And then, you can apply filter, calculation and etc on the datasets.

Either way is fine. Option 1 you can have everything in BI layer but you have to share the parent dataset to the team and pay for the SPICE cost if you are using SPICE. Option 2 you don’t have to worry about the refresh scheduling, assets access permission in BI layer. But you have to get the access to create DB view in your DWH.

Wish it helps.

– Ying

Andrea_M · December 21, 2021, 5:15pm

Hi Ying,

thank you for your reply.

The problem is that this datasets is usually used as subquery in other datasets, so I’ve always need to join it in other queries.
It can’t be really used in a “generic” datasets and create all the children datasets based on it.

So if I well understand, option 1 doesn’t suit my need.

For option 2, I was looking an alternative to to create views directly on db, because it’s more easy to see the conditions used in the datasets on QS instead of db.

To join datasets on QS is not a good option?

Thanks,
Andrea

Ying_Wang · December 21, 2021, 9:44pm

Join datasets in QS is a good option. But please be careful about the 1GB limitation. For SPICE join SPICE, even if the two SPICE are from same data source, QS treat them as multiple data sources right now. So, the SPICE from 2nd data source (the right hand side of the join) can’t be larger than 1GB. FYI. This limitation will be removed in future.

Andrea_M · December 22, 2021, 9:31am

Ok, I think this should not be an issue, since the secondary datasets will be used just to clean the data, so they should never reach 1GB.

So, I’ll try to implement this option and let’s see.

Thanks!

Topic		Replies	Views
Dataset available only in SPICE query mode Q&A quick-sight , dataset	3	855	January 11, 2023
How to best model several custom sql datasets from the same datasource Q&A data-source , quick-sight , dataset , Business-Intelligence-Engineer	6	311	November 3, 2023
How to use SQL query to create new dataset from existing dataset Q&A data-source , quick-sight	1	1249	August 16, 2022
When I am trying to join two Data Sets then the data is automatically move to SPICE instead of Direct Query? Q&A data-source , quick-sight	4	946	December 6, 2022
Is it better to add additional data as a dataset or datasource? Q&A data-source , quick-sight , dataset	7	1967	August 22, 2023

Best way to re-use same data in different datasets

Related topics