Calculated field working on preview, but not in analysis

caio.almeida · February 7, 2023, 10:08pm

Hello!

Recently, we’ve stumbled upon the following issue: we created a particular calculated field when defining the dataset, which shows up fine on the data preview and seems to be working properly. However, when you import the dataset into an analysis, the field doesn’t work anymore. We confirmed it by comparing the behaviour on some rows which were right during data prep.

Now, being more specific, the table structure relevant to the question is something like

| col_1 |   col_2   | calc_field |
|:-----:|:---------:|:----------:|
|   A   | 1.0       |          1 |
|   A   | 3.0       |          3 |
|   B   | some text |            |
|   A   | 8.0       |          8 |
|   B   | some text |            |

where both col_1 and col_2 are stored with string data types, and the calc_field represents our numerical (and troublesome) calculated field, defined as follows

ifelse({col_1}='A', parseInt({col_2}), null)

The sample table above is essentially what we see during preview in data prep, but, when attaching the dataset in an analysis, a quick table visual to check those fields yield the following picture,

| col_1 |   col_2   | calc_field |
|:-----:|:---------:|:----------:|
|   A   | 1.0       |       null |
|   A   | 3.0       |       null |
|   B   | some text |       null |
|   A   | 8.0       |       null |
|   B   | some text |       null |

In summary, that’s it.

We’ve tried searching similar questions here in the community, however, the underlying problems in all of those didn’t seem to be applicable in our case.

One thing we’d also like to mention is that we have a dataset coming from a relational database source (MySQL) that follows the exact same logic and everything works fine, both in data preparation and inside an analysis. This particular issue showed up in a dataset coming from an Athena source. SPICE query mode was used in all scenarios.

Thanks,
C.

Max · February 7, 2023, 10:15pm

You can’t have mismatched datatypes in columns.

They either need to all be int’s or all be strings in this case.

caio.almeida · February 7, 2023, 10:27pm

Hi, @Max. (:

Thanks for the response. Unfortunately, there is no mismatch. Both col_1 and col_2 have strings only. The idea was to parse some of those values into integers, by some specific rule that we know it is always true: every entry on col_2 is a string that can be parsed into a numerical data type when the corresponding value on col_1 is equal to 'A'.

It is also worth noting that if we create the calculated field inside the analysis, it works as expected. The problem arises when creating it at the dataset definition.

Max · February 7, 2023, 10:38pm

The calculated field has different datatypes right?

You can’t do that

caio.almeida · February 7, 2023, 11:06pm

Maybe I’m missing your point, but… How come? If the condition statement is true, then it parses our string in the second column into an integer. Otherwise, it will return a NULL entry, and not a string containing the word ‘null’.

If the problem was due to different datatypes that remain after the processing of the calculated field, it would raise an error when trying to save the calculation, which I just tested. However, in our original issue, no error is raised by the system.

The problem is an inconsistency between what is seen from calc_field at the data preview and from the same field when the dataset is imported into an analysis. I also reiterate: if I copy the exact same expression of calc_field, as it was defined in data prep, but now create the calculated field from inside the analysis, it works fine.

Max · February 7, 2023, 11:33pm

Hmm, maybe I miss read this.

That’s interesting it works find in the analysis.

I can mark this as a bug and see if someone can get to the root cause.

For now, can you use it in the analysis?

caio.almeida · February 8, 2023, 1:35pm

We certainly could do that, but it’s not ideal, since the dataset will be used in several analysis. Right now, we are going to work around the problem with a custom SQL query, as the processing will (at least should) be naturally enforced.

In any case, it would be nice to report the original issue as a bug, since it could happen again for other people.

Anyway, thanks for the discussion. (: