I am trying to join two datasets based on a unique key (SPMS ID). For some of the records the value in the unique key has a comma (,). In the aggregated table by revenue, it is not aggregating the revenue corresponding to the record with the comma (,). How can we achieve that?
Eg. Datadog 3294, 3294 is not getting accounted for.
but your join is working for these rows?
What data type do you use for the id column?
Hi @ErikG The join is working for the rows that do not have comma. Eg. It is populating the revenue against 3294, but it is not populating the revenue against 3294, 3294. My raw file is structured as below.
Name ID. Rev
Datadog 3294 100
Datadog 3294, 3294. 50
I need to show in results
Today it is showing in results
The ID is in string datatype since it has comma (,) in its values.
Hi @debanjana ,
are you always interested in the first ID (3294, 3294)?
Then you could create a new ID field by cutting the string at the comma.
Hi @debanjana - Welcome to AWS QuickSight community and thanks for posting the question. Couple of observation on the file -
- If the file is comma separated, I believe the record 3294,3294 should be double quoted embedded and that is why when you have imported to QuickSight, it is considered as one field.
- Looking at the such sample record, it is the same record after comma(,). if you are OK to take one value, then @ErikG suggestion is good and you can use split function for those records contain a comma(,) for SPMS ID and take only the first record and ignore the second record and possibly do a cast to int as well. After that the join should work.
- If you do not want to ignore and make this as 2 records, possibly you need put some intelligence spilt the same row into multiple row. This may be a tricky part and need some detail analysis.
Regards - Sanjeeb
Hi @Sanjeeb2022 and @ErikG Thanks for your responses. The SPMS ID field is string so it does consider in the dataset both the records before joining but after joining I dont get the revenue attributed to the record with the comma. The raw dataset has different revenues associated with “3294” record and “3294, 3294” record. So I dont want to ignore the record(s) associated with “3294, 3294”. Is there any office hour where I can join and someone can help me real time? Thanks!
Hi @debanjana - Thanks. If you are looking for a dedicate time for AWS support, please raise a ticket to AWS Customer support so that they can guide you. To raise the request, please follow the link - Creating support cases and case management - AWS Support
Regards - Sanjeeb
my suggestion is not to ignore the record it is to cut the string that you will get a field to join.
Name | SPMS ID | Revenue | ID
Datadog | 3294 | 100 | 3294
Datadog | 3294, 3294 | 50 | 3294
The join can be done by the ID column but you will keep the SPMS ID if you need.