Delete duplicate with Incrementally refreshing a dataset

Raulsc · March 2, 2023, 4:24pm

Hi

I need delete de duplicate value when i ncrementally refreshing a dataset and i have duplicate.

I try create a calculated field with this code so that when it detects the same pk, it gives me the duplicate value if so, and then add a filter of only those that are not duplicates. all this at the dataset level.

ifelse(lead({PK},[{DATE} ASC],1)=firstValue({PK},[{DATE} ASC],[{DATE}]),‘Duplicate’,‘Keep’)

the problem is that I get a field not available in quicksight, it doesn’t read the code well… could you give me some help or a new option?

Max · March 2, 2023, 7:39pm

You need to substitute your fields in.

What is your {PK} and {DATE} field in your dataset?

Raulsc · March 2, 2023, 7:52pm

Pk is the primary key. And date is the date of update, the idea is that when I have the same primary key, I select the most recent date and the other put duplicate and thus put a filter and remove duplicates.

Thank you!

Max · March 2, 2023, 7:58pm

What are the exact names of your fields?

QuickSight cannot find the ones you put in so you need to put in the correct names.

Raulsc · March 2, 2023, 8:13pm

Yes! I put the same name :s

The logical is ok?

Max · March 2, 2023, 8:54pm

yes that logic should work

Raulsc · March 2, 2023, 9:15pm

But the field say:
FLAG

And the calculate field is :

MODIF_EN is the date and PK_CONCAT is the primary key
i dont understand why dont work…

Max · March 2, 2023, 9:27pm

That’s because it’s an aggregation.

Create a table with the pk_concat and modif_en in a table and add this as a value to that field and you will see the values.

Raulsc · March 2, 2023, 9:41pm

Ok ok, i understand,

but that field now I want to use it to add a filter to remove duplicates and it tells me that it can not be used at the data set level, so I do the filter in the analysis, right?

And only can filtre de visual where this field ‘flag’ stay, and another visual where i use other field, can work

thanks you!

Max · March 2, 2023, 9:53pm

Yep you will need to do it at the analsysis level and only use it with visuals that have the pk_concat and modif_en fields.

I would suggest moving this logic to sql if you can.

Maybe do a SELECT DISTINCT if your sql allows it.

Jesse · March 2, 2023, 10:00pm

You are pretty close, but the lead and firstValue functions are table calculations, which means you need to use the dimensions in your visual. Try it this way:

ifelse(maxOver({DATE}, [{PK}], PRE_AGG) = {DATE}, 1, 0)

Then set a filter on this field and only keep ‘1’.

Raulsc · March 2, 2023, 10:01pm

Yes, but this solution is not valid. Since I do incremental loading of data is spice, and my idea is to remove when there is updated and repeated data, hence the calculated field. What other option can you think of?

Jesse · March 2, 2023, 10:03pm

This filter is going to happen in QuickSight when the visuals load. If you need to remove those duplicate rows before importing into SPICE then you will have to do it in the DB or at least with Custom SQL. SPICE doesnt have any way to delete certain rows from the data, can just filter them out when running queries.

Raulsc · March 2, 2023, 10:08pm

I will try this solution, to see if I can see the duplicates and filter it at the level of the whole analysis. Thanks you

Topic		Replies	Views
How do I remove duplicates from the dataset when uploaded to quicksight? Question & Answer quicksight , Business-Intelligence-Engineer	5	459	March 20, 2024
Remove duplicate rows from dataset Question & Answer direct-query	11	8410	October 15, 2024
When i am uploading incremental data into SPICE getting duplicate records , is there any configuration in Quicksight which help to avoid duplicate into Quicksight Question & Answer quicksight	4	1121	March 6, 2023
Need to delete rows appended by incremental refresh Question & Answer feature-request , Business-Intelligence-Engineer , tables	2	225	October 21, 2023
Delete duplicate rows appended by incremental refresh Question & Answer Business-Intelligence-Engineer , tables	9	529	October 12, 2023

Delete duplicate with Incrementally refreshing a dataset

Related topics