Best way to re-use same data in different datasets

Hi all,

I have a “design” question.

Our datasets are based on custom mysql/psql queries: many times, we used case statements to map our data.

For example
WITH order_attribution AS (
SELECT order_id,
CASE
WHEN LOWER(referer) LIKE ‘aaaa%’ THEN ‘one’
WHEN LOWER(referer) LIKE ‘bbb’ THEN ‘two’
[CUT]

For different reasons, currently the same part of the queries are in common in different datasets (ie. 5 datasets).
Which is the best way to avoid updating the queries in all the 5 datasets when you need to change some condition?

I was thinking that an option could be creating a simple dataset only with the subquery containing the “mapping” and join it with the main datasets where you need to use that mapping.

What do you think?

To create new tables/views in the database is not an option (I prefer to have the mapping on QS).

Thanks,
Andrea

hi Andrea, there are two ways to do that:

  1. Composite dataset (dataset as source). You can create a dataset with SQL:
    SELECT order_id, xxx, yyy
    CASE
    From Table
    Where mandatory conditions
    Group by 1,2,3

After then, reuse this dataset by creating child dataset on top of it. You can apply filter, create calculation and etc on the children datasets.

  1. Create a view with the same query in DWH. And then, create a dataset based on the DB view. And then, you can apply filter, calculation and etc on the datasets.

Either way is fine. Option 1 you can have everything in BI layer but you have to share the parent dataset to the team and pay for the SPICE cost if you are using SPICE. Option 2 you don’t have to worry about the refresh scheduling, assets access permission in BI layer. But you have to get the access to create DB view in your DWH.

Wish it helps.

– Ying

Hi Ying,

thank you for your reply.

The problem is that this datasets is usually used as subquery in other datasets, so I’ve always need to join it in other queries.
It can’t be really used in a “generic” datasets and create all the children datasets based on it.

So if I well understand, option 1 doesn’t suit my need.

For option 2, I was looking an alternative to to create views directly on db, because it’s more easy to see the conditions used in the datasets on QS instead of db.

To join datasets on QS is not a good option?

Thanks,
Andrea

Join datasets in QS is a good option. But please be careful about the 1GB limitation. For SPICE join SPICE, even if the two SPICE are from same data source, QS treat them as multiple data sources right now. So, the SPICE from 2nd data source (the right hand side of the join) can’t be larger than 1GB. FYI. This limitation will be removed in future.

Ok, I think this should not be an issue, since the secondary datasets will be used just to clean the data, so they should never reach 1GB.

So, I’ll try to implement this option and let’s see.

Thanks!