Turn off UUID in sql statements so that Performance insights can aggregate

eweb · August 8, 2024, 9:13am

Quicksight adds a comment with a unique identifier to sql queries. e.g.
/* QuickSight 01456d7b-af4b-4e6f-8fed-30cd97968bdd */ which means that Performance insights fails to aggregate. Is there a way to turn theses comments off?

jayeshrd · August 8, 2024, 10:38am

Hello @eweb, the unique identifiers are added as comments to the query text by design so it cannot be turned off. Can you please elaborate on what performance insights do you refer to?

eweb · August 8, 2024, 10:59am

AWS RDS performance insights. It groups similar queries and gives averages. But if all queries are distinct then it is harder to see where performance problems lie. e.g. if I am loading 6 datasets I’d like to see 6 aggregates. But what I see is each statement separately, as below

jayeshrd · August 8, 2024, 2:12pm

You can see aggregated metrics like IngestionInvocationCount, IngestionLatency and IngestionRowCount per data-set in your CloudWatch. For more information see monitoring data in Amazon QuickSight. Let me know if this what you are looking for.

eweb · August 8, 2024, 2:31pm

No it isn’t. We are looking to utilize aws rds performance insights. The unique identifiers in the queries breaks that.

robdhondt · August 8, 2024, 6:26pm

Hi @eweb -

No way to turn this off in QuickSight.

Do you have api / programmatic access? You could use boto3 and python to do this.

https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/pi.html

Example:

import boto3
import re
from datetime import datetime, timedelta

# Function to strip UUID from QuickSight SQL comments
def strip_uuid_from_query(sql):
    return re.sub(r'/\* QuickSight [0-9a-fA-F-]+ \*/', '', sql)

# Initialize a session using Amazon RDS
client = boto3.client('pi')

# Define your parameters
db_identifier = 'your-db-identifier'
start_time = datetime.utcnow() - timedelta(hours=1)
end_time = datetime.utcnow()
period_in_seconds = 60

# Define the metric query
metric_query = {
    "Metric": "db.sql_tokenized.avg",
    "GroupBy": {
        "Group": "db.sql_tokenized.query",
        "Limit": 10
    }
}

# Get the performance insights data
response = client.get_resource_metrics(
    ServiceType='RDS',
    Identifier=db_identifier,
    StartTime=start_time,
    EndTime=end_time,
    PeriodInSeconds=period_in_seconds,
    MetricQueries=[metric_query]
)

# Process the response
aggregated_data = {}
for metric in response['MetricList']:
    for datapoint in metric['DataPoints']:
        query = strip_uuid_from_query(metric['Key']['Dimensions']['db.sql_tokenized.query'])
        timestamp = datapoint['Timestamp']
        value = datapoint['Value']

        if query not in aggregated_data:
            aggregated_data[query] = []
        aggregated_data[query].append((timestamp, value))

# Print the aggregated data
for query, data_points in aggregated_data.items():
    print(f"Query: {query}")
    for timestamp, value in data_points:
        print(f"  {timestamp}: {value}")

Brett · August 20, 2024, 7:13pm

Hi @eweb,
It’s been awhile since we last heard from you. Were you able to try Rob’s last suggestion or find a different work around for your solution? If not, did you have any additional questions?

If we do not hear back within the next 3 business days, I’ll go ahead and close out this topic.

Thank you!

eweb · August 21, 2024, 9:00am

We use Performance insights to monitor and troubleshooting, the work around won’t work for us. But thanks for the suggestion. It would be nice if your approach was compatible with other AWS products. Feel free to close, but perhaps something for your team to consider in the future.

Brett · August 23, 2024, 3:54pm

Thanks for the response and the suggestion! I’ll mark this as a feature request for make visible to our support team!