Text Extract Code Help

substring(um.notes, charindex(‘#$#’, um.notes) + 4, charindex(‘$#$’, um.notes) - charindex(‘#$#’, um.notes))

The above code pulls the bolded information from the below entry.

Documentation Delay - ]0;3 (For Data mining purpose only) #$# Waited 64 minutes for paperwork. $#$ (JAH).

I am only trying to pull the information bolded text below.

Documentation Delay - ]0;3 (For Data mining purpose only) #$# Waited 64 minutes for paperwork. $#$ (JAH).

I have tried this to try and cut the extraction by 3 character places.

substring(um.notes, charindex(‘#$#’, um.notes) + 3, charindex(‘$#$.’, um.notes) - charindex(‘#$#’, um.notes) - 3)

However, this causes the error below:

image

2 Likes

Hello @andyhob, welcome to the QuickSight community!

I think the 3rd part of your substring function is just barely off. If you try changing it slightly and add some parenthesis for order of operations, you should be able to resolve it:

substring(um.notes, charindex(‘#$#’, um.notes) + 4, (charindex(‘$#$’, um.notes) - 1) - (charindex(‘#$#’, um.notes) + 4))

Please let me know if that helps!

I tried your suggestion with the below in bold being displayed:

Documentation Delay - ]0;3 (For Data mining purpose only) #$# Waited 64 minutes for paperwork. $#$ (JAH).

2 Likes

Hello @andyhob, what database are you querying data from? I am looking at some SQL Server documentation for the charindex function you are using, and it looks like we may be able to just use the starting position parameter to try and fix this. Maybe try this instead of my previous suggestion:

substring(um.notes, charindex(‘#$#’, um.notes, 4), (charindex(‘$#$’, um.notes, 1) - 1) - charindex(‘#$#’, um.notes, 4))

It seems like we need to specific to start at the $ for the $#$ portion. I think this should give us a better result. If you are still having issues can you run each charindex function separately to see what values are being returned?

Provide me with the full string length, charindex(‘$#$’, um.notes, 1), and charindex(‘#$#’, um.notes, 4). Then maybe we can find an alternative solution if this doesn’t work.

Your first suggestion caused the same error as before.

When I tried charindex(‘$#$’, notes,1) it generated the error below.

image

Hello @andyhob, I just noticed in your error code it mentions Redshift, so identifying the starting position is not an option with that version of charindex().

In that case, we need to piece the function together to try and get the expected result. It also seems like the starting point of the substring isn’t our issue, but rather the number of characters in the string. I went ahead and counted and we need to find a way for this to return the number 32.

If you just run 2 separate calculations:

  1. charindex(‘#$#’, um.notes) + 4
  2. charindex(‘$#$’, um.notes) - 1
  3. len(um.notes)

What values are you getting?

1 Like

These are the values I got:

  1. 351
  2. 383
  3. 393
1 Like

Hello @andyhob, okay, well that seems to be working as expected. So then if you run this:

(charindex(‘$#$’, um.notes) - 1) - (charindex(‘#$#’, um.notes) + 4)

Are you not getting the result of 32?

I am getting 32 when I run this.

1 Like

Hello @andyhob, if that is the case, does this work or is it still throwing an error?

substring(um.notes, charindex(‘#$#’, um.notes, 4), (charindex(‘$#$’, um.notes) - 1) - (charindex(‘#$#’, um.notes) + 4))

If the last part is returning 32, this should work unless we are hitting some weird Redshift limitation. Let me know what happens.

1 Like

New error with that one.

image

Hello @andyhob, sorry, I used to the wrong version of the first charindex function. Try this instead:

substring(um.notes, charindex(‘#$#’, um.notes) + 4, (charindex(‘$#$’, um.notes) - 1) - (charindex(‘#$#’, um.notes) + 4))

That should resolve the issue from the last error.

Hi @DylanM, I tried the last recommendation and received this error. It seems we’re getting a different variation of the same error each time. When we run the substring all together is coming up with a negative length, but running the individual pieces of the substring, they are showing the correct character length.

image

Hello @andyhob, I know in the example returned value, you mentioned this:

Documentation Delay - ]0;3 (For Data mining purpose only) #$# Waited 64 minutes for paperwork. $#$ (JAH).

I am assuming, this is not the only value that is being returned from the um.notes field, am I correct? If that is the case, is there a scenario where the value isn’t formatted with the #$# before the $#$ in the string? If those values are ever switched, then this substring calculation would not work. That is the only thing I can think of as an issue here because all of our testing shows the current format should return the expected value.

Something else you can try is building this function as a calculated field rather than in SQL to see if there is a similar error. That would look like this:

substring({notes}, locate({notes}, '#$#')+4, (locate({notes}, ‘$#$’) - 1) - (locate({notes}, ‘#$#’) + 4))

I am curious if the result from the QuickSight calculated field will be any different than the one from Redshift directly.

Ok, so it seems to work as a calculated field. I can see it pulling the correct information in the QS preview pain. However, when going down this path the refresh fails, and the dataset will not refresh with this calculated field.

Hey @andyhob, what if you build it on the analysis instead of on the dataset directly? It seems like we are hitting a Redshift limitation, so maybe we can bypass it that way.

1 Like

Well that worked!!! Thanks for all your help with this!

2 Likes