Blank values in numeric columns are causing S3 dataset to fail

I can connect succesfully to a series of .csvs in S3. When I publish the data, it fails to refresh because too many rows have a blank value where the other rows have an integer value. Is there a way to fix this without using other tools, just in Quicksight?

Hi @cactus ,

Are you having trouble preparing an analysis from your dataset or publishing your dataset? If the former, did you try using data formatting options on the fields that have blank values? For example, you can pre-fill the empty values with “0” or another value that aligns for your usecase.

The trouble is in publishing the dataset. I get failures because the errors exceed 10K (where QS isn’t recognizing blank as 0 for numeric datasets)

Where is the appropriate place to do that data preparation, can it be done in QS itself? The dataset is 1000 .csvs in S3.

If you have a large workload that has a variety of data, then we recommend that you use Amazon EMR or AWS Glue for your data preparation and cleaning tasks. Amazon EMR and AWS Glue both work with unstructured, semi-structured, and relational data, and both can use Apache Spark to create a DataFrame or DynamicFrame to work with horizontal processing. Moreover, you can use AWS Glue DataBrew to clean and process data with a no-code approach. Additionally, DataBrew can profile your dataset with column statistics, provide data lineages, and include data quality rules for all or specified columns.

Reference: Data preparation and cleaning - AWS Prescriptive Guidance