How can we use the primary key feature to avoid duplication during incremental refresh?

edisonp11 · July 25, 2025, 2:53pm

I noticed that QuickSight now supports defining a primary key for each dataset. Can this be used to prevent data duplication during incremental refresh? This is a capability already supported in many BI tools like Power BI and Tableau.

If it’s not currently supported, is this functionality on the roadmap? If so, is there an estimated timeline for its availability? Thanks

p.s. We are using Athena for dataset ingestion

Deep · July 26, 2025, 9:05am

Hello Edison,

First of all Welcome to the Quicksight community !

I am not sure if P key or unique key column can prevent data duplication during incremental refresh but it should Prevents Duplicate Records – Ensuring that datasets contain only unique entries for that column.

This unique key acts as a global sorting key for the dataset and optimizes query generation for table visuals. When a user creates a table visual in QuickSight and adds the unique key column to the value field well,data is sorted from left to right up to the unique key column. All columns to the right of the unique key column are ignored in the sort order. Tables that don’t contain a unique key are sorted based on the order that the columns appar in the dataset.

The following limitations apply to unique keys:

Unique keys are only supported for unaggregated tables.
If a dataset column is used for column level security (CLS), the column can’t also be used as the unique key.

See reference here

Some other public reference but not from AWS - Amazon QuickSight Launches Unique Key for Datasets

Hope this gives some idea.

Cheers,
Deep

edisonp11 · July 28, 2025, 12:53pm

HI @Deep , Thanks for your quick response. I have read the documentation you provided and have a question: After enable Unique Keys for the dataset, during the dataset ingestion:

If any duplicates are detected, QuickSight will prompt you with an error notification.
You’ll need to resolve any errors, such as eliminating duplicate entries or adjusting the dataset, before proceeding.

I’m wondering if QuickSight can identify which records or unique keys are duplicated in the response. If so, I could use that information to trigger an S3 deletion and remove the old files to avoid duplication.

Deep · July 28, 2025, 1:05pm

Hello Edison,

I have not tried it myself but i think when QuickSight detects duplicates during dataset ingestion with Unique Keys enabled, it will show an error, but may not provide detailed information about which specific records are duplicated in its error response. The error message could be likely generic indicating that duplicates were found.

To handle this situation, you potentially could:

Export your dataset to a temporary location
Use other AWS services like Athena or Glue to identify the duplicate records by querying the data directly
Once identified, you can then programmatically remove the duplicate files from S3

Alternative approach:

Use AWS Glue or AWS Lambda to pre-process your data before ingesting into QuickSight
Implement deduplication logic that maintains a record of processed files
Only then load the deduplicated data into QuickSight

This way, you can handle duplicates proactively rather than relying on QuickSight’s error messages.

Hope this gives some idea.

Cheers,
Deep

Topic		Replies	Views
Incremental refresh created duplicates Q&A data-source , spice , feature-request , dataset , how-to	5	542	June 1, 2023
Delete duplicate with Incrementally refreshing a dataset Q&A data-source , analysis , parameters , formatting , error , quicksight , calculations , how-to	13	1644	March 2, 2023
When i am uploading incremental data into SPICE getting duplicate records , is there any configuration in Quicksight which help to avoid duplicate into Quicksight Q&A quicksight	4	1126	March 6, 2023
Need to delete rows appended by incremental refresh Q&A feature-request , Business-Intelligence-Engineer , tables	2	225	October 21, 2023
How do I remove duplicates from the dataset when uploaded to quicksight? Q&A quicksight , Business-Intelligence-Engineer	5	468	March 20, 2024

How can we use the primary key feature to avoid duplication during incremental refresh?

Related topics