How to create a new dataset from existing dataset?

Have a existing dataset, want to create a new dataset by running group by query to create a new dataset for analysis, is it possible? If it is not supported, I am facing below use case, could it be achieved?

My Use Case

dataeset 1:

userName  value tempId
123         1     abc
123         2     abc
123         5     abc

dataeset 2:

realId  realValue
abc       8  

Want to validate below:

In dataset 1, group by(tempId) and get the SUM(value) = tempValue(in this case, tempValue = 1+2+5=8), tempValue should equals realValue from dataset 2 where realId == tempId.

Hi @Jihui_Zhang,

Is there a particular reason this would need to be a seperate dataset rather than just using a calculated field such as: sum({Value},[{TempID}])?

emmm, not sure how would this work? where should the result be? wouldn’t it be duplicated for each row? And can you use SUM to add up values from different rows?

Hi @Jihui_Zhang
you will create a new calculated field and more or less for each row. But as you use the group attribute you also will calculate is for each ID over all rows.
I guess the reason why you ask for a second dataset you expect only one row for each ID?
BR

Basically in my head the solution is:

In dataset 1, group by(tempId) and generate dataset 3:

tempId  value
abc         8     
def         10   
...

In dataset 2, for each (dataset2.realId == dataset3.tempId) compare if dataset3.value == dataset2.realValue

What I do not understand is, how could this be achieved by using calculated field? I thought calculated field can only be created using the value from the same row? Can it also take values from other row with the same id?

HI @Jihui_Zhang
you can calculate over multible rows.
Pls have a look at

and

BR

1 Like

Looks like LAC-A is not supported in dataset functions, only supported in Analysis mode.

LAC-A calculations are not available in dataprep yet, only in the analysis. Therefore, the only workaround is to create these expressions in the analysis for now.

An in analysis isn’t an option?
Datasource is file or database?

1 Like

Analysis can only operates on one single dataset, however in this case, I want to operates on these 2 dataset both which means:

  1. Need to create an extra colum with SUM(value) group by tempId in dataset 1
  2. Join dataset 1 with dataset 2 on realId == tempId
1 Like

hi @Jihui_Zhang ,

Analysis can have multiple datasets but one visual per one dataset. and LAC-A can’t use at dataset level. so you would have to do LAC-A calculation at data source level. which data source are you using?

1 Like

I am using Athena as data source

Then you could do the group and validation on Athena and create a new dataset,right?

So as per my understanding, this cannot be achieved by quickSight right? I would have to look else where for a solution.

In theory you can use a custom sql during dataset creation in QS. In that sql you can do the group and validation and create a new dataset with the result.

Hello @Jihui_Zhang !

Did @ErikG 's post above about using custom sql help you, and if so could you mark their post as a solution?

We have not heard from you in a few days but would still like to help find a solution. If we do not hear from you in 3 days this post will be archived.