Marking Duplicate records

Hi Team,
I have a dataset in Amazon QuickSight with two columns: Container ID and Destination ID. I want to create a new column that labels each Container ID as either ‘Remove’ or ‘Keep’ based on the following rules:

  1. If a Container ID appears more than once, I want to mark one among them as ‘Keep’ and label the others as ‘Remove’.
  2. If any Container ID has a Destination ID of ‘#0’, label this record as ‘Keep’ and the other duplicates as ‘Remove’.
  3. If a Container ID is not duplicated, label it as ‘Keep’.

I have used the following code, but its showing error.

ifelse(
countOver(Container ID, [Container ID]) > 1 and {Destination ID} = ‘#0’,
‘Keep’,
ifelse(
countOver(Container ID, [Container ID]) > 1,
‘Remove’,
‘Keep’
)
)

Can you please let me know if this is correct?

Hi @cijod,
I believe this may due to your ‘and’ in the first portion of your calculated field needs to be capitalized. Additionally, you don’t need to nest ifelse statements so let’s simplify this a bit further and see if this works for you:

ifelse(
countOver(Container ID, [Container ID], PRE_AGG) > 1 AND {Destination ID} = ‘#0’,
‘Keep’,
countOver(Container ID, [Container ID], PRE_AGG) > 1,
‘Remove’,
‘Keep’
)

Let me know if this works for you or not.

Thank you!

Hi Duncan,
I have tried your statement, and its showing syntax error. I have also missed one scenario here and have updated the question.

  • If a Container ID appears more than once, I want to mark one among them as ‘Keep’ and label the others as ‘Remove’.
  • If any Container ID has a Destination ID of ‘#0’, mark this record as ‘Keep’ and the other duplicates as ‘Remove’.
  • If a Container ID is not duplicated, label it as ‘Keep’.

Any specific workaround to solve this. Do I need to use Rank function for this?

Hi @cijod,
What if you try the following:

count_over ( [Container ID], [{  }]) > 1 AND
ifelse (min_over ( [Destination ID], [{  }]) = '#0', 'Keep', 'Remove'),
'Keep'

Let me know if this provides the desired outcome you’re looking to achieve.

Thank you!

1 Like

Thanks Duncan,
I have tried this as well, and its showing syntax error. I am not sure how this would mark one among the Duplicate records as ‘Keep’ and label the others as ‘Remove’.

I have used ‘Rank’ as well in the query, but its not working out.

This is one of the example.
-There are 5 duplicate SampleContainer1, and I need to mark the one with destination ID: #0 as Keep. All the others need to be marked as Remove.
-There are 2 SampleContainer2. Out of which any one can be marked as Keep and other need to marked as Remove.
-SampleContainer3 doesnt have any duplicate and this need to be marked as Keep.

I think only Rank query would work in this. Unfortunately I am not able to find a solution for this here.

Hi @cijod ,

Yes this can be done using rank.

Duplicate record issue
Here is a sample solution you can check.

Have considered record count and ranked based on destination id to identify the filter criteria. Have broken the solution into several steps for easy debug.

Thanks,
Prantika

3 Likes

Wow. Thank you so much Prantika. I just looked into it, and it worked. :slight_smile:

Out of curiosity, I justed wanted to check with you how you added the ‘desired output’ field(Keep or remove). Looks like its not a calculated field, and it was not there in the existing file.

The desired output was a field I ingested along with containerid and destinationid from the snip you shared above. I ingested it to check the desired output you mentioned. Apologies for confusing you with that!

1 Like

Got it. Thanks Prantika & Duncan for your support. Appreciate it. :slight_smile:

Hi @cijod and @prantika_sinha I am having this exact requirement for one of my analysis. Could you share the code to do this please? I am not able to access the solution Prantika shared above. Thanks in advance.

Hey @nitshree

You would be able to see the code by copying the analysis(See screenshot)