I’m working on transferring data from Athena to Quick Sight, and I’ve encountered an issue with transforming a deleted_at column from a STRING in Athena to a TIMESTAMP in Quick Sight.
Here’s some context:
- In my Athena table, the
deleted_atcolumn is of typeSTRING, and some rows have empty strings ("") instead ofNULL. - In Quick Sight, I want to cast this column to
TIMESTAMPusing the formatyyyy-MM-dd HH:mm:ss
To achieve this, I’m using the following transform operation in my Quick Sight dataset definition:
var createdAtColumn = InputColumn.builder().name("created_at").type(InputColumnDataType.DATETIME).build();
var idColumn = InputColumn.builder().name("id").type(InputColumnDataType.STRING).build();
var deletedAtColumn = InputColumn.builder().name("deleted_at").type(InputColumnDataType.STRING).build();
var relationalTable = RelationalTable.builder()
.dataSourceArn(dataSourceArn)
.catalog("AwsDataCatalog")
.schema(DATABASE_NAME)
.name(dataSetName)
.inputColumns(createdAtColumn, idColumn, deletedAtColumn)
.build();
var deletedAtOperation = TransformOperation.builder()
.castColumnTypeOperation(CastColumnTypeOperation.builder()
.columnName("deleted_at")
.newColumnType(ColumnDataType.DATETIME)
.format("yyyy-MM-dd HH:mm:ss")
.build())
.build();
var physicalTable = PhysicalTable.builder()
.relationalTable(relationalTable)
.build();
var physicalTableId = UUID.randomUUID().toString();
var physicalTableMap = new HashMap<String, PhysicalTable>();
physicalTableMap.put(physicalTableId, physicalTable);
var logicalTable = LogicalTable.builder()
.alias(dataSetName)
.dataTransforms(deletedAtOperation)
.source(LogicalTableSource.builder().physicalTableId(physicalTableId).build())
.build();
var logicalTableId = UUID.randomUUID().toString();
var logicalTableMap = new HashMap<String, LogicalTable>();
logicalTableMap.put(logicalTableId, logicalTable);
var createDataSetRequest = CreateDataSetRequest.builder()
.awsAccountId(awsAccountId)
.name(dataSetName)
.dataSetId(UUID.randomUUID().toString())
.importMode(DataSetImportMode.SPICE)
.physicalTableMap(physicalTableMap)
.logicalTableMap(logicalTableMap)
.folderArns(folderArn)
.permissions(resourcePermission)
.build();
var response = quickSightClient.createDataSet(createDataSetRequest);
However, when processing the dataset, I get the following error in Athena:
INVALID_FUNCTION_ARGUMENT: Invalid format: ""
This error occurs because the Athena query generated by Quick Sight tries to cast the empty string to a TIMESTAMP, which is invalid. For example:
SELECT "created_at", "id", CAST(PARSE_DATETIME("deleted_at", 'yyyy-MM-dd HH:mm:ss') AS TIMESTAMP) AS "deleted_at"
FROM "AwsDataCatalog"."quicksight"."users";
What is the best way to handle empty strings in the deleted_at column when transferring data from Athena to Quick Sight and casting it to a TIMESTAMP?
In Athena, the deleted_at column is a STRING where some values are empty strings (""). I want to transform this column to a TIMESTAMP in Quick Sight during dataset creation. However, empty strings cause an error when Quick Sight generates the query.
How can I handle empty strings in the deleted_at column in Quick Sight so the transformation to TIMESTAMP works correctly?
Are there any specific techniques or configurations in Quick Sight that allow:
- Replacing empty strings (
"") withNULLduring the data transformation? - Adding logic to skip invalid rows or apply a default value?