Programmatically obtaining the SPICE capacity consumed by each dataset in Amazon QuickSight

This article provides a python code which utilizes boto3 Application Programming Interfaces (API’s) to programmatically obtain the list of Amazon QuickSight datasets, calculates the SPICE capacity consumed by each dataset and the total SPICE capacity consumed by all the datasets.

Our customers often reach out to us seeking guidance in finding the consumed SPICE capacity for all of the datasets present in their account with the breakdown of SPICE capacity consumed by each individual dataset. Currently, Amazon QuickSight does not have a feature to view the breakdown of SPICE capacity consumed by each dataset. This is a difficult task to be performed in the console as it may require manual effort to open every dataset and then find the SPICE capacity consumed by the dataset. In order to make this task easier, I have developed a python code which provides the list of all Amazon QuickSight datasets, calculates the SPICE capacity consumed by each dataset and the total SPICE capacity consumed by all the datasets.

Please see below:

import boto3
import pandas as pd
from botocore.exceptions import ClientError

# Specify your AWS credentials and region
aws_access_key_id = 'enter_aws_access_key_id'
aws_secret_access_key = 'enter_aws_secret_access_key'
aws_account_id = 'enter_aws_account_id'
aws_region = 'enter_aws_region'

# Initialize Boto3 client for QuickSight
client = boto3.client('quicksight', region_name=aws_region)

def list_datasets():
    print("Retrieving list of datasets...")
    datasets = []
    paginator = client.get_paginator('list_data_sets')
    try:
        for page in paginator.paginate(AwsAccountId=aws_account_id):
            datasets.extend(page['DataSetSummaries'])
    except ClientError as e:
        print("Error listing datasets:", e)
    print(f"Found {len(datasets)} datasets")
    return datasets

def get_dataset_spice_capacity(dataset_arn):
    try:
        response = client.describe_data_set(
            AwsAccountId=aws_account_id,
            DataSetId=dataset_arn
        )
        spice_capacity = response['DataSet']['ConsumedSpiceCapacityInBytes']
        return spice_capacity
    except ClientError as e:
        return None

def get_datasets_spice_capacity():
    datasets = list_datasets()
    data = []
    total_spice_capacity = 0
    total_datasets = len(datasets)
    
    print("\nProcessing dataset details...")
    for index, dataset in enumerate(datasets, 1):
        dataset_id = dataset['DataSetId']
        dataset_name = dataset['Name']
        dataset_arn = dataset['Arn']
        
        # Progress indicator
        if index % 5 == 0 or index == total_datasets:  # Show status every 5 datasets or at the end
            print(f"Processing dataset {index} of {total_datasets}")
        else:
            print("---", end="", flush=True)
            
        spice_capacity = get_dataset_spice_capacity(dataset_id)
        if spice_capacity is not None:
            total_spice_capacity += spice_capacity
            spice_capacity_gb = spice_capacity / (1024**3)
            data.append({
                'Dataset Name': dataset_name,
                'Dataset ARN': dataset_arn,
                'Spice Capacity Consumed (Bytes)': spice_capacity,
                'Spice Capacity Consumed (GB)': spice_capacity_gb
            })

    # Calculate total SPICE capacity in GB
    total_spice_capacity_gb = total_spice_capacity / (1024**3)
    
    # Create a DataFrame and add total row
    df = pd.DataFrame(data)
    df.loc['Total'] = ['Total', '', total_spice_capacity, total_spice_capacity_gb]
    
    # Export to Excel
    df.to_excel('QuickSight_Spice_Capacity.xlsx', index=False)
    print("\nData on SPICE capacity breakdown exported to QuickSight_Spice_Capacity.xlsx")
    print(f"Total SPICE capacity consumed = {total_spice_capacity_gb:.2f} GB")

# Run the function

get_datasets_spice_capacity()


NOTE: The above code has package dependencies such as pandas and openpyxl.

The output will provide the list of Amazon QuickSight datasets, calculates the SPICE capacity consumed by each dataset and the total SPICE capacity consumed by all the datasets. In order to make it easy for viewing, the code will save the result as an Excel file in the local machine.

Sample output:

Retrieving list of datasets… Found 20 datasets

Processing dataset details…
------------Processing dataset 5 of 20
------------Processing dataset 10 of 20
------------Processing dataset 15 of 20
------------Processing dataset 20 of 20

Data on SPICE capacity breakdown exported to QuickSight_Spice_Capacity.xlsx Total SPICE capacity consumed = 57.73 GB

NOTE: AWS Support does not provide code support or code development support. This is an example of how API’s can be made useful for automating certain tasks. The customers can modify the code in order to get desired output.

Please feel free to reach out to me if you have any questions.


Author & Company Bio

Mahaswin Ramalingam Balaji is a Cloud Support Engineer-II at AWS and a certified Subject Matter Expert in QuickSight. Mahaswin is interested in Data visualization, Data engineering and in enhancing the features of QuickSight.

Amazon Web Services (AWS) is the world’s most comprehensive and broadly adopted cloud, offering over 200 fully featured services from data centers globally. Millions of customers—including the fastest-growing startups, largest enterprises, and leading government agencies—are using AWS to lower costs, become more agile, and innovate faster.

1 Like