Data from S3 source having a duplicate row with only difference in date format of column

I have checked the files in the s3 path to see if there are duplicate rows but could find a single row for each serial number, however once i use the source and get to the analysis part, there seem to be two rows with difference in format of date values in date column as shown below,


The datatype of this column is string. I have tried the approach given here Duplicate rows when loading data in S3 with no luck. Could someone please suggest a solution for this issue?

Hello @p.ry,

Are all your rows duplicated or only this one?

@andres007 Seems the rows having dates like 01/01/2025, 20/02/2025 etc are getting duplicated as rows with dates like 1/1/2025 and 20/2/2025.

Thanks, I will try to reproduce this behaviour, can you tell me what file format are you using?

Thanks @andres007 im using files with csv format.

Hi,

I tried reproducing this behaviour but it does not look like it is related to the dates.

I am using a manifest with URI prefix and 2 files with this data

serial,date
'123456789012,01/01/2025
'234567890123,20/02/2025

serial,date
'34567890123456,01/01/2025
'56789012334567,20/02/2025

Is there anything else you can tell us about the data in the files or how you are loading them in the manifest file?

Hi @p.ry,
It’s been awhile since we last heard from you, did you have any additional questions regarding your initial post or were you able to find a work around?

If we do not hear back within the next 3 business days, I’ll close out this topic.

Thank you!

Hi @p.ry,
Since we have not heard back, I’ll go ahead and close out this topic. However, if you have any additional questions, feel free to create a new post in the community and link this discussion for relevant information if needed.

Thank you!