Blank values in numeric columns are causing S3 dataset to fail

cactus · December 7, 2023, 8:11pm

I can connect succesfully to a series of .csvs in S3. When I publish the data, it fails to refresh because too many rows have a blank value where the other rows have an integer value. Is there a way to fix this without using other tools, just in Quicksight?

awsvig · December 7, 2023, 8:21pm

Hi @cactus ,

Are you having trouble preparing an analysis from your dataset or publishing your dataset? If the former, did you try using data formatting options on the fields that have blank values? For example, you can pre-fill the empty values with “0” or another value that aligns for your usecase.

cactus · December 7, 2023, 8:35pm

The trouble is in publishing the dataset. I get failures because the errors exceed 10K (where QS isn’t recognizing blank as 0 for numeric datasets)

Where is the appropriate place to do that data preparation, can it be done in QS itself? The dataset is 1000 .csvs in S3.

awsvig · December 7, 2023, 8:45pm

If you have a large workload that has a variety of data, then we recommend that you use Amazon EMR or AWS Glue for your data preparation and cleaning tasks. Amazon EMR and AWS Glue both work with unstructured, semi-structured, and relational data, and both can use Apache Spark to create a DataFrame or DynamicFrame to work with horizontal processing. Moreover, you can use AWS Glue DataBrew to clean and process data with a no-code approach. Additionally, DataBrew can profile your dataset with column statistics, provide data lineages, and include data quality rules for all or specified columns.

Reference: Data preparation and cleaning - AWS Prescriptive Guidance

Topic		Replies	Views
Formatting blank columns from S3 Q&A data-source , feature-request	7	375	May 11, 2023
Missing rows when unload datasets from ETL to S3 Q&A data-source , analysis , s3	3	637	October 26, 2022
DataSet change in Quicksight Q&A data-source , accessibility , spice , database , data-preparation	2	949	December 22, 2021
Display NULL for empty strings Q&A redshift , feature-request , Business-Intelligence-Engineer	7	1758	December 21, 2023
Duplicate rows when loading data in S3 Q&A developer , SDK	13	428	May 8, 2024

Blank values in numeric columns are causing S3 dataset to fail

Related topics