Hi,
at our company we manage our data sets defined by custom SQL and Athena by Terraform. I noticed that although the column names and data types are automatically inferred from the query when you use the UI to add a custom SQL dataset, this is not the case for CLI or TF dataset creation, even when the API spec says they are optional.
From the API Spec:
Columns
The column schema from the SQL query result set. Type: Array of InputColumn objects Array Members: Minimum number of 1 item. Maximum number of 2048 items. Required: No
When I create a dummy data set with e.g. this CLI command:
aws quicksight create-data-set \
--aws-account-id <redacted_account_id> \
--data-set-id custom-sql-test \
--name "custom-sql-test" \
--import-mode SPICE \
--physical-table-map '{
"custom-sql-test": {
"CustomSql": {
"DataSourceArn": "arn:aws:quicksight:eu-west-1:<redacted_account_id>:datasource/<redacted_datasource_id>",
"Name": "custom-sql-test",
"SqlQuery": "SELECT \n1 as example_integer_column, \n0.1 as example_float_column, \n'some_text' as example_string_column"
}
}
}
It fails with the error:
An error occurred (InvalidParameterValueException) when calling the CreateDataSet operation: InputColumns is required
While specifying the columns in the command works:
aws quicksight create-data-set \
--aws-account-id <redacted_account_id> \
--data-set-id custom-sql-test \
--name "custom-sql-test" \
--import-mode SPICE \
--physical-table-map '{
"custom-sql-test": {
"CustomSql": {
"DataSourceArn": "arn:aws:quicksight:eu-west-1:<redacted_account_id>:datasource/<redacted_datasource_id>",
"Name": "custom-sql-test",
"SqlQuery": "SELECT \n1 as example_integer_column, \n0.1 as example_float_column, \n'some_text' as example_string_column",
"Columns": [
{"Name": "example_integer_column", "Type": "INTEGER"},
{"Name": "example_float_column", "Type": "DECIMAL"},
{"Name": "example_string_column", "Type": "STRING"}
]
}
}
}'
I would think the CLI and TF data set creation should also be able to infer the columns automatically, just like the UI. However this does not seems to be the case. This causes quite some overhead in our management of our datasets. Is there a way around this requirement of specifying the columns manually?
Kind regards
Nelis