Data Tolerance Exception

Fotis_flex · November 3, 2023, 2:02pm

Hi all,
I have a problem ingesting a .csv file from S3 to Quicksight. In general, I produce two files by doing some very complex queries in Redshift. The two tables produced are then unloaded to S3 and finally
imported to QS.
In Redshift, for both files, I added a new field by joining an extra table with all the rest. Both tables are produced successfully in Redshift (containing the new field) and then they are unloaded to S3. However when I attempt to ingest the files to QS, only one is successful. For one .csv I get “Data Tolerance Exception” with a message that some rows have a different number of fields! I find this strange because the unload to s3 command I use is the same for both files:

      unload ('select * from public.{REPORT_NAME}')
      to '{report_dir}'
      iam_role '{IAM_ROLE}'
      header parallel off ALLOWOVERWRITE DELIMITER ',' ADDQUOTES ESCAPE;

I suspect that the new field I added contain a character that splits it in more fields when encountered. Although this is not expected since in both reports the field contains very similar text values and I have added quotes in my unload command.
Does anyone has any idea what can go wrong and why one of the files is ingested successfully while the other fails?
Thx

Screenshot from 2023-11-03 16-16-47

duncan · November 3, 2023, 9:07pm

Hello @Fotis_flex !

I know this might be tedious but I would try removing fields from your select statement to find the which field is causing the problem. Then I would check for things like commas that might be tricking the query to think that there should be another field.

duncan · November 7, 2023, 11:01pm

Hello @Fotis_flex ! Are you still having trouble with this or if my comment above helped could you mark it as a solution to help the community?

Fotis_flex · November 8, 2023, 10:12am

Hi @duncan . I am afraid I still have not solved that. The queries are quite complex and so far my attempts to remove fields still fails. My main concern is that the query successfully produces the report in Redshift. In redshift, everything seems to be in place and the new field is displayed correctly. The import to QS is what causing the problem. That is why I run out of ideas. I would expect that if the Redshift table is ok, QS should be fine importing it.

duncan · November 8, 2023, 4:49pm

Hey @Fotis_flex !

Could you use a SQL to query from Redshift directly rather than S3?

duncan · November 13, 2023, 4:40pm

Hey @Fotis_flex !

Were you able to try using a SQL to query from Redshift directly rather than S3?

Fotis_flex · November 15, 2023, 10:35am

No I haven’t tried that yet as it’s essential to have the tables as .csv’s (we share them with customers). I will just to check out of curiosity to see how it will react

duncan · November 16, 2023, 2:41pm

Hey @Fotis_flex !

Were you able to run the test to see if using SQL worked how you expected?

Fotis_flex · November 17, 2023, 7:06am

Hi @duncan , thanks for the tip, yes importing the dataset from Redshift to QS directly is successful and everything looks as it should. I guess then something is wrong with my UNLOAD command? I find this strange since I specify ADDQUOTES and ESCAPE so that should be enough to correctly identify the text values and special characters of the problematic field and not split it right? I also remind you that adding this field to another dataset did not create any problems! Any ideas?
Thx

Fotis_flex · November 20, 2023, 9:06am

To present the problem better, I will provide 2 screenshots of the records in redshift, one that is ingested successfully in QS and another that fails to be ingested and is included in the “error report” that I download from QS after the dataset import failure. Please note, that in QS, I have also included some calculated fields and that is why you see them in the error report and not in redshift. As you can see from the error report, the record that fail to be ingested, the “aoc_name” field does not appear at all. Instead, its value is transferred to the calculated field “bookingYear”. So, in the end, some records that are ingested successfully do include the “aoc_name” field and the error ones do not, giving the DATA_TOLERANCE_EXCEPTION error. I just cannot understand why this happens since the records seem very similar. The 1st pic shows the successfully imported record, the 2nd shows the failed one and the 3rd presents the problem from the QS error report

duncan · November 21, 2023, 8:49pm

Hey @Fotis_flex !

Does the ingestion work if you remove calculated fields from the dataset because those can be added at the analysis level once your dataset is ingested?

If that doesn’t work and your client needs the CVS, you can always pull all the fields into a table on Quicksight and export that as CVS.

Also you can create a a support ticket with AWS support if the problem persists. Here are the steps to open a support case. If your company has someone who manages your AWS account, you might not have direct access to AWS Support and will need to raise an internal ticket to your IT team or whomever manages your AWS account. They should be able to open an AWS Support case on your behalf.

Fotis_flex · November 22, 2023, 10:53am

Hi @duncan . Thank you for your time on this. I can’t remove the calculated fields since we share the datasets with our customers to do their own analysis as well and they are required. On top, the CSV’s are updated daily via a cron job so manually exporting them from QS would not do. I think my problem has to be communicated with support! Thanks a lot though

duncan · November 22, 2023, 7:27pm

No problem! @Fotis_flex if you create a support ticket could you share any insight you get from them for the community?