Duplicate rows when loading data in S3

sakshisl · March 21, 2024, 9:51am

Hi All,
I have a csv data file in my s3 bucket. I have connected it as a quicksight dataset source using manifest file. The problem I am facing is that it is creating double the number of rows that I have in my S3 file every time I refresh it. This is affecting my sum and other counts.
Any help? Do I need changes in my JSON file below?

{
“fileLocations”: [
{
“URIs”: [
“s3://bucket/Controllership_SLA_output.csv”
]
},
{
“URIPrefixes”: [
“s3://bucket/”
]
}
],
“globalUploadSettings”: {
“format”: “CSV”,
“delimiter”: “,”,
“containsHeader”: “true”
}
}

Koushik_Muthanna · March 22, 2024, 8:38am

@sakshisl ,

Based on the above you just have one file which is being used to load to SPICE and used in your dashboard.
Check the number of rows after the refresh the completes. Post a screenshot of what you mean with double number of rows ?

Kind regards,
Koushik

sakshisl · March 22, 2024, 10:17am

I only have 4 rows in my S3 file but it doubles in row count when I load it.

Koushik_Muthanna · March 22, 2024, 11:13am

@sakshisl ,
in the Quick Sight analysis ? Do you see the duplicated rows as well ?

duncan · April 30, 2024, 10:23pm

Hello @sakshisl and @Koushik_Muthanna!

@sakshisl were you able to find a solution for this issue, or if you still need help could you follow up on @Koushik_Muthanna 's questions above?

sakshisl · May 2, 2024, 8:11am

Hi, no it is not yet solved. I am seeing duplicates in my analysis even when I do count.
I have to select distinct in these cases.
My S3 file has the appropriate records but when I connect to Quicksight it is duplicating each record and doubling the output.

duncan · May 2, 2024, 2:27pm

Hello @sakshisl !

Have you tried the suggested solutions in this post?

sakshisl · May 7, 2024, 9:17am

This is not useful for me

Koushik_Muthanna · May 7, 2024, 12:03pm

@sakshisl ,

Can you
1/post a screenshot of your s3 bucket ? Is there only 1 file with 4 records and you see 8 records when it is finally ingested into SPICE ?
2/post a screenshot of the data prep ( we should be seeing only 4 records )

I would like to validate the above 2 before asking you to log a support ticket for further assistance.

Kind regards,
Koushik

sakshisl · May 7, 2024, 1:29pm

Input/ingested file

When loaded in Quicksight the count goes to more than double.

Koushik_Muthanna · May 7, 2024, 2:41pm

@sakshisl ,

please remove the URIPrefixes from the manifest file and test the ingestion process. Let us know if that solves the issue , if not then would recommend to open a support ticket.

Kind regards,
Koushik

sakshisl · May 8, 2024, 4:32am

Hi Koushik,
That worked for me. Thanks alot.

Topic		Replies	Views
I overwrote a csv file in s3 bucket(existing), then refreshed SPICE, but it doesn't pickup every records in CSV, why? Q&A data-source , feature-request , s3	4	886	May 10, 2023
Missing rows when unload datasets from ETL to S3 Q&A data-source , analysis , s3	3	637	October 26, 2022
Data Sync Issue Q&A data-source	10	366	September 11, 2023
Quick Sight Data Ingestion Reduces After 1M Rows (Failure) Q&A data-source , analysis , error , dataset , Business-Intelligence-Engineer	10	74	February 6, 2025
Quicksight Data Load Issue from S3 Q&A analysis , s3 , Business-Intelligence-Engineer	1	196	March 22, 2024

Duplicate rows when loading data in S3

Related topics