Text Extract Code Help

andyhob · September 17, 2024, 4:28pm

substring(um.notes, charindex(‘#$#’, um.notes) + 4, charindex(‘$#$’, um.notes) - charindex(‘#$#’, um.notes))

The above code pulls the bolded information from the below entry.

Documentation Delay - ]0;3 (For Data mining purpose only) #$# Waited 64 minutes for paperwork. $#$ (JAH).

I am only trying to pull the information bolded text below.

Documentation Delay - ]0;3 (For Data mining purpose only) #$# Waited 64 minutes for paperwork. $#$ (JAH).

I have tried this to try and cut the extraction by 3 character places.

substring(um.notes, charindex(‘#$#’, um.notes) + 3, charindex(‘$#$.’, um.notes) - charindex(‘#$#’, um.notes) - 3)

However, this causes the error below:

DylanM · September 17, 2024, 4:55pm

Hello @andyhob, welcome to the QuickSight community!

I think the 3rd part of your substring function is just barely off. If you try changing it slightly and add some parenthesis for order of operations, you should be able to resolve it:

substring(um.notes, charindex(‘#$#’, um.notes) + 4, (charindex(‘$#$’, um.notes) - 1) - (charindex(‘#$#’, um.notes) + 4))

Please let me know if that helps!

andyhob · September 17, 2024, 5:26pm

I tried your suggestion with the below in bold being displayed:

Documentation Delay - ]0;3 (For Data mining purpose only) #$# Waited 64 minutes for paperwork. $#$ (JAH).

DylanM · September 17, 2024, 5:55pm

Hello @andyhob, what database are you querying data from? I am looking at some SQL Server documentation for the charindex function you are using, and it looks like we may be able to just use the starting position parameter to try and fix this. Maybe try this instead of my previous suggestion:

substring(um.notes, charindex(‘#$#’, um.notes, 4), (charindex(‘$#$’, um.notes, 1) - 1) - charindex(‘#$#’, um.notes, 4))

It seems like we need to specific to start at the $ for the $#$ portion. I think this should give us a better result. If you are still having issues can you run each charindex function separately to see what values are being returned?

Provide me with the full string length, charindex(‘$#$’, um.notes, 1), and charindex(‘#$#’, um.notes, 4). Then maybe we can find an alternative solution if this doesn’t work.

andyhob · September 18, 2024, 12:34pm

Your first suggestion caused the same error as before.

When I tried charindex(‘$#$’, notes,1) it generated the error below.

DylanM · September 18, 2024, 2:56pm

Hello @andyhob, I just noticed in your error code it mentions Redshift, so identifying the starting position is not an option with that version of charindex().

In that case, we need to piece the function together to try and get the expected result. It also seems like the starting point of the substring isn’t our issue, but rather the number of characters in the string. I went ahead and counted and we need to find a way for this to return the number 32.

If you just run 2 separate calculations:

charindex(‘#$#’, um.notes) + 4
charindex(‘$#$’, um.notes) - 1
len(um.notes)

What values are you getting?

andyhob · September 18, 2024, 3:18pm

These are the values I got:

351
383
393

DylanM · September 18, 2024, 3:22pm

Hello @andyhob, okay, well that seems to be working as expected. So then if you run this:

(charindex(‘$#$’, um.notes) - 1) - (charindex(‘#$#’, um.notes) + 4)

Are you not getting the result of 32?

andyhob · September 18, 2024, 3:33pm

I am getting 32 when I run this.

DylanM · September 18, 2024, 6:14pm

Hello @andyhob, if that is the case, does this work or is it still throwing an error?

substring(um.notes, charindex(‘#$#’, um.notes, 4), (charindex(‘$#$’, um.notes) - 1) - (charindex(‘#$#’, um.notes) + 4))

If the last part is returning 32, this should work unless we are hitting some weird Redshift limitation. Let me know what happens.

andyhob · September 18, 2024, 6:34pm

New error with that one.

DylanM · September 18, 2024, 9:54pm

Hello @andyhob, sorry, I used to the wrong version of the first charindex function. Try this instead:

substring(um.notes, charindex(‘#$#’, um.notes) + 4, (charindex(‘$#$’, um.notes) - 1) - (charindex(‘#$#’, um.notes) + 4))

That should resolve the issue from the last error.

andyhob · September 19, 2024, 2:12pm

Hi @DylanM, I tried the last recommendation and received this error. It seems we’re getting a different variation of the same error each time. When we run the substring all together is coming up with a negative length, but running the individual pieces of the substring, they are showing the correct character length.

DylanM · September 19, 2024, 2:51pm

Hello @andyhob, I know in the example returned value, you mentioned this:

Documentation Delay - ]0;3 (For Data mining purpose only) #$# Waited 64 minutes for paperwork. $#$ (JAH).

I am assuming, this is not the only value that is being returned from the um.notes field, am I correct? If that is the case, is there a scenario where the value isn’t formatted with the #$# before the $#$ in the string? If those values are ever switched, then this substring calculation would not work. That is the only thing I can think of as an issue here because all of our testing shows the current format should return the expected value.

Something else you can try is building this function as a calculated field rather than in SQL to see if there is a similar error. That would look like this:

substring({notes}, locate({notes}, '#$#')+4, (locate({notes}, ‘$#$’) - 1) - (locate({notes}, ‘#$#’) + 4))

I am curious if the result from the QuickSight calculated field will be any different than the one from Redshift directly.

andyhob · September 19, 2024, 5:45pm

Ok, so it seems to work as a calculated field. I can see it pulling the correct information in the QS preview pain. However, when going down this path the refresh fails, and the dataset will not refresh with this calculated field.

DylanM · September 19, 2024, 5:56pm

Hey @andyhob, what if you build it on the analysis instead of on the dataset directly? It seems like we are hitting a Redshift limitation, so maybe we can bypass it that way.

andyhob · September 19, 2024, 6:13pm

Well that worked!!! Thanks for all your help with this!