Building a Python App for QuickSight SPICE Capacity

Use Case and Problem
We have plans to create distinct Quicksight Analysis templates for each of our clients, with each client having at least four “out of the box” analyses that come with the onboarding package of our service product, with additional analyses potentially getting added in the future. Every analysis template will have the same number of datasets, but some have over 10, with varying size levels of SPICE capacity. Being able to roll up the SPICE capacity at the analysis level enables us to better forecast SPICE costs based off the number of new client onboarding in our pipeline. To do this roll up job, I developed a Python script using the Boto3 library that retrieves SPICE capacity details for each dataset in a QuickSight analysis. In this blog post, I will guide you through the process of building this script and understanding its functionality.

Prerequisites
• AWS account credentials (for role/user that has permissions to run QuickSight APIs).
• Python installed on your computer.
• Boto3 Python library installed on your computer.

Step by step solution
To get started, we need to import the boto3 library, which allows us to interact with AWS services. We also define some variables, including an empty list for dataset IDs, an empty list for storing the SPICE capacity data, and a variable to track the total SPICE capacity.

Next, we create a QuickSight client using boto3.client('quicksight', region). This client will enable us to communicate with the QuickSight service.

We prompt the user to enter the AWS account id, analysis id and region. Once this info is entered and confirmed by user, we call the describe_analysis() method to retrieve the analysis details.

After obtaining the analysis details, we extract the dataset IDs from the response and store them in the dataset_ids list.

Now comes the crucial part. We iterate through each dataset ID and use the describe_data_set() method to retrieve the dataset’s SPICE capacity information. We extract the dataset name, SPICE capacity in bytes, and perform calculations to convert it to both kilobytes (KB) and megabytes (MB). We store this information in the data list for further processing.

During the iteration, if any errors occur while retrieving the dataset information, we gracefully handle them by appending an error message to the data list.

To provide additional insights, we also calculate the total SPICE capacity by summing up the capacity of all the datasets. This information is stored in the total_SPICE_capacity_bytes variable and displayed as total SPICE capacity in megabytes (MB).

Finally, we print the dataset names, their corresponding SPICE capacity in kilobytes (KB), and the total SPICE capacity in megabytes (MB).

Conclusion
In this article, we explored how to retrieve SPICE capacity information for datasets in a QuickSight analysis using Python and the Boto3 library. Try using the below script to get SPICE usage of datasets linked with an analysis and tweak as needed to meet your specific use cases.

Sample Code

import boto3
dataset_ids = []
data = []
error_data = []
total_spice_capacity_bytes = 0

analysisArnAvailable = False

#Collect user input and get confirmation
while not(analysisArnAvailable):
    account_id = input('Please enter your AWS account number - ')
    analysis_id = input('Please enter your analysis id - ')
    region = input('Please enter region where this analysis exists (eg us-east-1) - ')
    print('----------------------------------------------------')
    print ('You entered \n AWS account id : '+str(account_id)+'\n analysis id : '+str(analysis_id)+'\n region : '+str(region))
    userResponse = input('Please confirm if this is correct [yes/no] - ')
    if userResponse.upper() == 'YES':
        analysisArnAvailable = True

print('----------------------------------------------------')
print('Please wait..')
print('----------------------------------------------------')
# Create a QuickSight client
client = boto3.client('quicksight', region)

# Call describe_analysis() to retrieve the analysis details
try:
    response = client.describe_analysis(
        AnalysisId=analysis_id,
        AwsAccountId=account_id
    )
    analysis_name = response['Analysis']['Name']

    # Extract the dataset IDs from the response
    dataset_ids = [d.split('/')[-1] for d in response['Analysis']['DataSetArns']]

    # Use the retrieved dataset IDs to get the corresponding Spice capacity and name
    for dataset_id in dataset_ids:
        try:
            response = client.describe_data_set(
                AwsAccountId=account_id,
                DataSetId=dataset_id
            )
            dataset_name = response['DataSet']['Name']
            spice_capacity_bytes = response['DataSet'].get('ConsumedSpiceCapacityInBytes', 'N/A')  # add a check for empty value
            import_mode = response['DataSet'].get('ImportMode')
            spice_capacity_mb = '{:.2f} MB'.format(float(spice_capacity_bytes) / (1024 * 1024)) if spice_capacity_bytes != 'N/A' else spice_capacity_bytes
            spice_capacity_kb = '{:.2f} KB'.format(float(spice_capacity_bytes) / 1024) if spice_capacity_bytes != 'N/A' else spice_capacity_bytes

            data.append((dataset_name[:40], dataset_id, import_mode, spice_capacity_kb))
            
            if spice_capacity_bytes != 'N/A':
                total_spice_capacity_bytes += int(spice_capacity_bytes)
        except Exception as e:
            error_data.append((dataset_id,  str(e)))
    total_spice_capacity_mb = '{:.2f} MB'.format(float(total_spice_capacity_bytes) / (1024 * 1024))

    if data != []:
        print('SPICE Usage')
        print('----------------------------------------------------')
        print('| {:<40} | {:<50} | {:<12} | {:>25} |'.format('Dataset name','Dataset id', 'Query mode', 'SPICE capacity in KB'))
        for row in data:
            print('| {:<40} | {:<50} | {:<12} | {:>25} |'.format(*row))
        print('----------------------------------------------------')

    if error_data != []:
        print('Errors encountered')
        print('----------------------------------------------------')
        print('| {:<50} | {:<100} |'.format('Dataset id', 'Error text'))
        for row in error_data:
            print('| {:<50} | {:<100} |'.format(row[0],row[1]))
        print('----------------------------------------------------')    

    print(f"Total SPICE capacity: {total_spice_capacity_mb}")
    print('----------------------------------------------------')

except TypeError as e:
    print('Caught TypeError:', e)

Author Bio

Company Bio

Acxiom partners with the world’s leading brands to create customer intelligence, enabling data-driven marketing experiences that generate value for people and for brands. The experts in identity, the ethical use of data, cloud-first customer data management, and analytics solutions, Acxiom makes the complex marketing ecosystem work, applying customer intelligence wherever brands and customers meet. For more than 50 years, Acxiom has improved clients’ customer acquisition, growth, and retention. Visit Acxiom.com to learn more.

6 Likes

Wow … This is great and congrats @Ben_Berliner

Regards - Sanjeeb

2 Likes

Thanks, @Sanjeeb2022!

1 Like

cool solution @Ben_Berliner !

Thanks!

2 Likes

This solution works fine for normal datasets. But, it skips the preparedDataset type of datasets and do not add up the spice capacity that is being consumed and the capacity printed would be less than the actual capacity and can be misleading.