Hi all,
I have a problem ingesting a .csv file from S3 to Quicksight. In general, I produce two files by doing some very complex queries in Redshift. The two tables produced are then unloaded to S3 and finally
imported to QS.
In Redshift, for both files, I added a new field by joining an extra table with all the rest. Both tables are produced successfully in Redshift (containing the new field) and then they are unloaded to S3. However when I attempt to ingest the files to QS, only one is successful. For one .csv I get “Data Tolerance Exception” with a message that some rows have a different number of fields! I find this strange because the unload to s3 command I use is the same for both files:
unload ('select * from public.{REPORT_NAME}')
to '{report_dir}'
iam_role '{IAM_ROLE}'
header parallel off ALLOWOVERWRITE DELIMITER ',' ADDQUOTES ESCAPE;
I suspect that the new field I added contain a character that splits it in more fields when encountered. Although this is not expected since in both reports the field contains very similar text values and I have added quotes in my unload command.
Does anyone has any idea what can go wrong and why one of the files is ingested successfully while the other fails?
Thx
I know this might be tedious but I would try removing fields from your select statement to find the which field is causing the problem. Then I would check for things like commas that might be tricking the query to think that there should be another field.
Hi @duncan . I am afraid I still have not solved that. The queries are quite complex and so far my attempts to remove fields still fails. My main concern is that the query successfully produces the report in Redshift. In redshift, everything seems to be in place and the new field is displayed correctly. The import to QS is what causing the problem. That is why I run out of ideas. I would expect that if the Redshift table is ok, QS should be fine importing it.
No I haven’t tried that yet as it’s essential to have the tables as .csv’s (we share them with customers). I will just to check out of curiosity to see how it will react
Hi @duncan , thanks for the tip, yes importing the dataset from Redshift to QS directly is successful and everything looks as it should. I guess then something is wrong with my UNLOAD command? I find this strange since I specify ADDQUOTES and ESCAPE so that should be enough to correctly identify the text values and special characters of the problematic field and not split it right? I also remind you that adding this field to another dataset did not create any problems! Any ideas?
Thx
To present the problem better, I will provide 2 screenshots of the records in redshift, one that is ingested successfully in QS and another that fails to be ingested and is included in the “error report” that I download from QS after the dataset import failure. Please note, that in QS, I have also included some calculated fields and that is why you see them in the error report and not in redshift. As you can see from the error report, the record that fail to be ingested, the “aoc_name” field does not appear at all. Instead, its value is transferred to the calculated field “bookingYear”. So, in the end, some records that are ingested successfully do include the “aoc_name” field and the error ones do not, giving the DATA_TOLERANCE_EXCEPTION error. I just cannot understand why this happens since the records seem very similar. The 1st pic shows the successfully imported record, the 2nd shows the failed one and the 3rd presents the problem from the QS error report
Does the ingestion work if you remove calculated fields from the dataset because those can be added at the analysis level once your dataset is ingested?
If that doesn’t work and your client needs the CVS, you can always pull all the fields into a table on Quicksight and export that as CVS.
Also you can create a a support ticket with AWS support if the problem persists. Here are the steps to open a support case. If your company has someone who manages your AWS account, you might not have direct access to AWS Support and will need to raise an internal ticket to your IT team or whomever manages your AWS account. They should be able to open an AWS Support case on your behalf.
Hi @duncan . Thank you for your time on this. I can’t remove the calculated fields since we share the datasets with our customers to do their own analysis as well and they are required. On top, the CSV’s are updated daily via a cron job so manually exporting them from QS would not do. I think my problem has to be communicated with support! Thanks a lot though