Dataset with comma separated values in one row

debanjana · November 4, 2023, 5:32am

I am trying to join two datasets based on a unique key (SPMS ID). For some of the records the value in the unique key has a comma (,). In the aggregated table by revenue, it is not aggregating the revenue corresponding to the record with the comma (,). How can we achieve that?

Eg. Datadog 3294, 3294 is not getting accounted for.

Screenshot 2023-11-03 at 2.36.06 PM

Thanks!

ErikG · November 4, 2023, 6:20am

Hi @debanjana
but your join is working for these rows?
What data type do you use for the id column?
BR

debanjana · November 4, 2023, 5:54pm

Hi @ErikG The join is working for the rows that do not have comma. Eg. It is populating the revenue against 3294, but it is not populating the revenue against 3294, 3294. My raw file is structured as below.

Name ID. Rev
Datadog 3294 100
Datadog 3294, 3294. 50

I need to show in results

Name Rev
Datadog 150

Today it is showing in results

Name Rev
Datadog 100

The ID is in string datatype since it has comma (,) in its values.

ErikG · November 4, 2023, 8:21pm

Hi @debanjana ,
are you always interested in the first ID (3294, 3294)?
Then you could create a new ID field by cutting the string at the comma.
BR

Sanjeeb2022 · November 5, 2023, 2:59am

Hi @debanjana - Welcome to AWS QuickSight community and thanks for posting the question. Couple of observation on the file -

If the file is comma separated, I believe the record 3294,3294 should be double quoted embedded and that is why when you have imported to QuickSight, it is considered as one field.
Looking at the such sample record, it is the same record after comma(,). if you are OK to take one value, then @ErikG suggestion is good and you can use split function for those records contain a comma(,) for SPMS ID and take only the first record and ignore the second record and possibly do a cast to int as well. After that the join should work.
If you do not want to ignore and make this as 2 records, possibly you need put some intelligence spilt the same row into multiple row. This may be a tricky part and need some detail analysis.

Regards - Sanjeeb

debanjana · November 5, 2023, 5:00am

Hi @Sanjeeb2022 and @ErikG Thanks for your responses. The SPMS ID field is string so it does consider in the dataset both the records before joining but after joining I dont get the revenue attributed to the record with the comma. The raw dataset has different revenues associated with “3294” record and “3294, 3294” record. So I dont want to ignore the record(s) associated with “3294, 3294”. Is there any office hour where I can join and someone can help me real time? Thanks!

Sanjeeb2022 · November 5, 2023, 6:00am

Hi @debanjana - Thanks. If you are looking for a dedicate time for AWS support, please raise a ticket to AWS Customer support so that they can guide you. To raise the request, please follow the link - Creating support cases and case management - AWS Support

Regards - Sanjeeb

ErikG · November 5, 2023, 6:32am