New custom SQL SPICE import failures--recent code update?

Don_Rowe · January 27, 2023, 8:37am

Was there a recent update to how SPICE imports work, perhaps when used in conjunction with custom SQL? I’m seeing an odd bug involving column orders like so:

Given custom SQL that looks like this:

SELECT
    SOME_VARCHAR_COLUMN_A,
    SOME_DATETIME_COLUMN_B,
    SOME_INTEGER_COLUMN_C
FROM MY_TABLE

that returns rows like this:

SOME_VARCHAR_COLUMN_A | SOME_DATETIME_COLUMN_B | SOME_INTEGER_COLUMN_C
======================+========================+======================
abcdefg               | 1659355922000          | 1234
Some other string     | 1659432273000          | 192384719

And given a data set where I previously had done something like rename SOME_VARCHAR_COLUMN_A to “Column A”, I am now getting errors on SPICE refresh that say the following:

Error threshold exceeded for dataset (10000) (maxErrors = 10000)

SKIPPED ROWS
10001 rows where SOME_DATETIME_COLUMN_B field date values were not in a supported date format.

The error file that the refresh dialog looks something like this:

ERROR_TYPE     | COLUMN_NAME            | SOME_DATETIME_COLUMN_B | SOME_INTEGER_COLUMN_C | Column A
===============+========================+========================+=======================+=========
MALFORMED_DATE | SOME_DATETIME_COLUMN_B | abcdefg                | 1659355922000         | 1234
MALFORMED_DATE | SOME_DATETIME_COLUMN_B | Some other string      | 1659432273000         | 192384719

So, it appears that the fields being served to SPICE from the query result are still being served in the same order (A, B, C), but the column that was renamed is now placed after the other columns in parsing order (B, C, A).

As a workaround, I changed my renamed columns back to their original names, but I’m still getting SPICE ingest errors. (I’ve even confirmed via the CLI with aws quicksight describe-data-set --aws-account-id my-acct-number --data-set-id my-dataset-id that the RenameColumnOperation blocks are gone from the DataTransforms section in the LogicalTableMap).

nate.estrada · January 27, 2023, 9:12am

I’ve just dealt with this myself recently as well. Do you have any calculated columns in your dataset? After a longtime dataset started falling over, I started debugging a couple of days ago and found that any calculated columns doing any form of division resulted in the above error you are describing. Have you tested that the query by itself, with no additional transormations, is 100% failing? The error message was a bit of a red herring for me before I traced my steps back.

Don_Rowe · January 27, 2023, 5:38pm

The query runs just fine, and SPICE import runs fine when nothing is renamed and there are no custom fields (haven’t tried all the permutations, though).

Don_Rowe · January 27, 2023, 6:18pm

(But yes, the impacted dataset does feature the presence of at least one calculated field involving division.)

nate.estrada · January 28, 2023, 12:04am

Then I would recommend deleting those fields with division in them and saving the calculations elsewhere just to see if that sorts the issue as a first step. I ended up adding the calculated fields in the analysis stage instead of within the data preparation stage to solve the problem. It’s annoying because doing this means that those calculated fields won’t be available to all of the analyses that use that dataset and you’ll have to create the same dataset be multiple times. However, having a dataset that isn’t broken means you can actually use it, so there’s that.

emilyzhu · January 30, 2023, 10:56pm

Hi Don, as mentioned in our 1-1 message, we will work with you to figure out the calculation issue offline. Thanks for reaching out, hope we can solve that problem soon.

emilyzhu · January 30, 2023, 11:04pm

Hi @nate.estrada the divition issue you mentioned, can you share more? It sounds a different issue with Don’s but a related one to our recent findings w.r.t to numeric or double data type. What is your division formular? is it int/int, or int/numeric…etc?
cc @yubozhao

nate.estrada · January 30, 2023, 11:39pm

@emilyzhu I posted about it here, but TLDR it is as follows:

Dataset that was automatically updating with no issues for a long time got stuck in a refresh
Duplicating dataset reproduced error
Copy/pasting SQL worked fine
Error being thrown referenced a datetime column being in the wrong format
After individually removing calculated fields, found that any field doing any division of any kind (none of mine were using a datetime column, curiously) would cause the entire dataset to fail to update and caused all dashboards to not work

I have tested this on other non-production datasets and reproduced this error by creating a calculated field in the data prep stage with 1/2 (or any math you like) as the formula. My calculated columns were int/int.

emilyzhu · January 31, 2023, 12:19am

Thanks Nate. I am passing the issue to the engineer team. We will try to find the root cause. Will reach out to you if more information needed