How to Handle Empty Strings When Casting String to TIMESTAMP in Quick Sight from Athena?

I’m working on transferring data from Athena to Quick Sight, and I’ve encountered an issue with transforming a deleted_at column from a STRING in Athena to a TIMESTAMP in Quick Sight.

Here’s some context:

  • In my Athena table, the deleted_at column is of type STRING, and some rows have empty strings ("") instead of NULL.
  • In Quick Sight, I want to cast this column to TIMESTAMP using the format yyyy-MM-dd HH:mm:ss

To achieve this, I’m using the following transform operation in my Quick Sight dataset definition:

        var createdAtColumn = InputColumn.builder().name("created_at").type(InputColumnDataType.DATETIME).build();
        var idColumn = InputColumn.builder().name("id").type(InputColumnDataType.STRING).build();
        var deletedAtColumn = InputColumn.builder().name("deleted_at").type(InputColumnDataType.STRING).build();
        
        var relationalTable = RelationalTable.builder()
                .dataSourceArn(dataSourceArn)
                .catalog("AwsDataCatalog")
                .schema(DATABASE_NAME)
                .name(dataSetName)
                .inputColumns(createdAtColumn, idColumn, deletedAtColumn)
                .build();
				
	   var deletedAtOperation = TransformOperation.builder()
                .castColumnTypeOperation(CastColumnTypeOperation.builder()
                        .columnName("deleted_at")
                        .newColumnType(ColumnDataType.DATETIME)
                        .format("yyyy-MM-dd HH:mm:ss")
                        .build())
                .build();

        var physicalTable = PhysicalTable.builder()
                .relationalTable(relationalTable)
                .build();

        var physicalTableId = UUID.randomUUID().toString();
        var physicalTableMap = new HashMap<String, PhysicalTable>();
        physicalTableMap.put(physicalTableId, physicalTable);

        var logicalTable = LogicalTable.builder()
                .alias(dataSetName)
                .dataTransforms(deletedAtOperation)
                .source(LogicalTableSource.builder().physicalTableId(physicalTableId).build())
                .build();
				
	var logicalTableId = UUID.randomUUID().toString();
        var logicalTableMap = new HashMap<String, LogicalTable>();
        logicalTableMap.put(logicalTableId, logicalTable);
		
		        var createDataSetRequest = CreateDataSetRequest.builder()
                .awsAccountId(awsAccountId)
                .name(dataSetName)
                .dataSetId(UUID.randomUUID().toString())
                .importMode(DataSetImportMode.SPICE)
                .physicalTableMap(physicalTableMap)
                .logicalTableMap(logicalTableMap)
                .folderArns(folderArn)
                .permissions(resourcePermission)
                .build();

        var response = quickSightClient.createDataSet(createDataSetRequest);

However, when processing the dataset, I get the following error in Athena:

INVALID_FUNCTION_ARGUMENT: Invalid format: ""

This error occurs because the Athena query generated by Quick Sight tries to cast the empty string to a TIMESTAMP, which is invalid. For example:

SELECT "created_at", "id", CAST(PARSE_DATETIME("deleted_at", 'yyyy-MM-dd HH:mm:ss') AS TIMESTAMP) AS "deleted_at" 
FROM "AwsDataCatalog"."quicksight"."users";

What is the best way to handle empty strings in the deleted_at column when transferring data from Athena to Quick Sight and casting it to a TIMESTAMP?

In Athena, the deleted_at column is a STRING where some values are empty strings (""). I want to transform this column to a TIMESTAMP in Quick Sight during dataset creation. However, empty strings cause an error when Quick Sight generates the query.

How can I handle empty strings in the deleted_at column in Quick Sight so the transformation to TIMESTAMP works correctly?

Are there any specific techniques or configurations in Quick Sight that allow:

  1. Replacing empty strings ("") with NULL during the data transformation?
  2. Adding logic to skip invalid rows or apply a default value?

Hello @maxim, after looking through the Java SDK documentation, my initial thought is to bring the deleted_at field into Quick Sight as a string. Then you could utilize the parseDate function inside of an ifelse statement to convert the values that are not empty strings into a date else return NULL. Athena can be a bit fickle with date values as well, so handling this in Quick Sight may yield the best results.

Alternatively, you could try utilizing the try_cast conversion function instead. This is supposed to return NULLs where the cast function fails.

One of those options should yield the desired result, but let me know if you have any further questions. Thank you!