Custom sql - Select all key-value data from a field with string data type

Hi, I have a sample test string value below. The string is patterned like an array of hashes or dictionaries. I want to get the name values with type type02.

Expected output: id = “test03, test04, test05”

Sample test string:

data = "[{'testdata': 'has', 'name': 'test01', 'type': 'type01'}, {'testdata': 'does not have', 'name': 'test02', 'type': 'type03'}, {'testdata': 'has', 'id': 'test03', 'type': 'type02'}, {'testdata': 'has', 'id': 'test04', 'type': 'type02'}, {'testdata': 'has', 'id': 'test05', 'type': 'type02'}]"

At first I used calculated field but I can only retrieve the first type02 info in the string.

ifelse(
    contains(split(data, ',' , 2),'type02'),
    replace(replace(split(data, ',' , 2), "'id':", ''), "'", ''),
    NULL
)

output: id = “test03”

Since the value datatype is a string and there are a few ids that I need, I am trying to use custom SQL and create a new table field.
In Athena, I trimmed the data value then tried to use json_extract_scalar() to get the key-value data inside but was unsuccessful.

trim('[]' from data) as trimmed_data
json_extract_scalar(replace(replace(trimmed_data, '''', '"'), ': False', ': false'),'$.id') AS id

Do you have any suggestions on how to parse this type of data? Thank you.

Hi,

Question, Is the number of items in the data dictionary fixed or can it vary per row?

Hello @andres007 , thank you for checking. The number of items are not fixed. And yes, it varies per row. Sometimes some keys like the id or type is not existing. Thanks

Hi,

If the number of elems and items is not known I have no ideas to try out, maybe someone else can help you here. If they were fixed you could just try nesting ifelse with splits.

1 Like

thanks for your help, @andres007 !

I’m still getting the same issue. Thank you.