Delete duplicate with Incrementally refreshing a dataset

Hi

I need delete de duplicate value when i ncrementally refreshing a dataset and i have duplicate.

I try create a calculated field with this code so that when it detects the same pk, it gives me the duplicate value if so, and then add a filter of only those that are not duplicates. all this at the dataset level.

ifelse(lead({PK},[{DATE} ASC],1)=firstValue({PK},[{DATE} ASC],[{DATE}]),‘Duplicate’,‘Keep’)

the problem is that I get a field not available in quicksight, it doesn’t read the code well… could you give me some help or a new option?

1 Like

You need to substitute your fields in.

What is your {PK} and {DATE} field in your dataset?

1 Like

Pk is the primary key. And date is the date of update, the idea is that when I have the same primary key, I select the most recent date and the other put duplicate and thus put a filter and remove duplicates.

Thank you!

What are the exact names of your fields?

QuickSight cannot find the ones you put in so you need to put in the correct names.

1 Like

Yes! I put the same name :s

The logical is ok?

1 Like

yes that logic should work

2 Likes

But the field say:
FLAG

And the calculate field is :

MODIF_EN is the date and PK_CONCAT is the primary key
i dont understand why dont work… :frowning:

That’s because it’s an aggregation.

Create a table with the pk_concat and modif_en in a table and add this as a value to that field and you will see the values.

1 Like

Ok ok, i understand,

but that field now I want to use it to add a filter to remove duplicates and it tells me that it can not be used at the data set level, so I do the filter in the analysis, right?

And only can filtre de visual where this field ‘flag’ stay, and another visual where i use other field, can work :frowning:

thanks you!

Yep you will need to do it at the analsysis level and only use it with visuals that have the pk_concat and modif_en fields.

I would suggest moving this logic to sql if you can.

Maybe do a SELECT DISTINCT if your sql allows it.

1 Like

You are pretty close, but the lead and firstValue functions are table calculations, which means you need to use the dimensions in your visual. Try it this way:

ifelse(maxOver({DATE}, [{PK}], PRE_AGG) = {DATE}, 1, 0)

Then set a filter on this field and only keep ‘1’.

2 Likes

Yes, but this solution is not valid. Since I do incremental loading of data is spice, and my idea is to remove when there is updated and repeated data, hence the calculated field. What other option can you think of?

This filter is going to happen in QuickSight when the visuals load. If you need to remove those duplicate rows before importing into SPICE then you will have to do it in the DB or at least with Custom SQL. SPICE doesnt have any way to delete certain rows from the data, can just filter them out when running queries.

2 Likes

I will try this solution, to see if I can see the duplicates and filter it at the level of the whole analysis. Thanks you

1 Like