Custom SQL Data set creation: CLI/TF requires column definition while API doc states 'optional'

Nelis · October 29, 2024, 1:43pm

Hi,
at our company we manage our data sets defined by custom SQL and Athena by Terraform. I noticed that although the column names and data types are automatically inferred from the query when you use the UI to add a custom SQL dataset, this is not the case for CLI or TF dataset creation, even when the API spec says they are optional.
From the API Spec:

Columns

The column schema from the SQL query result set.

Type: Array of InputColumn objects

Array Members: Minimum number of 1 item. Maximum number of 2048 items.

Required: No

When I create a dummy data set with e.g. this CLI command:

aws quicksight create-data-set \
--aws-account-id <redacted_account_id> \
--data-set-id custom-sql-test \
--name "custom-sql-test" \
--import-mode SPICE \
--physical-table-map '{
"custom-sql-test": {
"CustomSql": {
"DataSourceArn": "arn:aws:quicksight:eu-west-1:<redacted_account_id>:datasource/<redacted_datasource_id>",
"Name": "custom-sql-test",
"SqlQuery": "SELECT \n1 as example_integer_column, \n0.1 as example_float_column, \n'some_text' as example_string_column"
}
}
}

It fails with the error:
An error occurred (InvalidParameterValueException) when calling the CreateDataSet operation: InputColumns is required
While specifying the columns in the command works:

aws quicksight create-data-set \
--aws-account-id <redacted_account_id> \
--data-set-id custom-sql-test \
--name "custom-sql-test" \
--import-mode SPICE \
--physical-table-map '{
"custom-sql-test": {
"CustomSql": {
"DataSourceArn": "arn:aws:quicksight:eu-west-1:<redacted_account_id>:datasource/<redacted_datasource_id>",
"Name": "custom-sql-test",
"SqlQuery": "SELECT \n1 as example_integer_column, \n0.1 as example_float_column, \n'some_text' as example_string_column",
"Columns": [
{"Name": "example_integer_column", "Type": "INTEGER"},
{"Name": "example_float_column", "Type": "DECIMAL"},
{"Name": "example_string_column", "Type": "STRING"}
]
}
}
}'

I would think the CLI and TF data set creation should also be able to infer the columns automatically, just like the UI. However this does not seems to be the case. This causes quite some overhead in our management of our datasets. Is there a way around this requirement of specifying the columns manually?

Kind regards
Nelis

duncan · November 1, 2024, 2:46pm

Hello @Nelis !

This is an issue that I have seen pop up before with QuickSight ([Bug]: Quicksight dataset custom_sql requires columns definition · Issue #33273 · hashicorp/terraform-provider-aws · GitHub) , but I wasn’t able to find a straightforward workaround/fix.

What you could try to do is use a lambda to query the table in Athena as a json file to get the column details. Then create the dataset from that json file or try to pull the details and add them to your current process.

Nelis · November 4, 2024, 8:35am

Hi Duncan,

Thank you for your response. That is also the direction I am thinking of. To work with Terraform resources I will implement an external data source to fetch the column names and types. Determining the output columns and types of a query however, requires running the query fully so you are billed for the query cost every time you run terraform apply. Therefore I think I will add the custom SQL first a veiw to Athena and fetch the column names and types from the schema (which should be free).

Kind regards
Nelis