Streamline Amazon QuickSight dataset management: Dataset insights at a glance

Amazon QuickSight is a cloud-based, fully managed business intelligence (BI) service. It offers you the flexibility to create datasets through direct query mode or by importing data into SPICE (Super-fast, Parallel, In-memory Calculation Engine).

For data ingested into SPICE, you can establish automated refresh schedules, making sure datasets are updated at predetermined intervals. However, monitoring various aspects of these datasets can be time-consuming. These aspects include:

  • Refresh status (for example INITIALIZED, QUEUED, RUNNING, FAILED, COMPLETED, or CANCELLED)
  • SPICE consumption
  • Dataset usage metrics
  • Dataset and data source access permissions
  • Associated data sources along with file-based datasets
  • SPICE usage for each dataset across customer namespaces and groups for the deployed AWS Region
  • Identifying duplicate and orphaned datasets and data sources

Typically, to access this information, you need to navigate to each individual dataset and manually review its details. This process can be cumbersome, especially when managing multiple datasets across multiple Regions and QuickSight accounts.

This post offers a solution for monitoring and analyzing QuickSight datasets by providing valuable insights for efficient QuickSight usage and maintenance using a custom QuickSight dashboard. The solution provides a comprehensive view of SPICE ingestion status and type, SPICE usage metrics along with total capacity limit for the account, dataset relationships with dashboards, analyses, and linked data sources along with dataset query mode.

Solution overview

This solution uses QuickSight in conjunction with other AWS services to process and prepare data for analysis. To simplify deployment, we provide an AWS CloudFormation template that automates the provisioning of necessary resources. The solution consists of the following key components:

  • An AWS Glue job is scheduled to run at regular intervals. It calls QuickSight APIs to get the QuickSight dataset details along with permissions from a specific Region.
  • The AWS Glue job saves the results in an Amazon Simple Storage Service (Amazon S3) bucket.
  • The QuickSight dashboard is deployed based on the dataset, which fetches the data using Amazon Athena.

The following architecture diagram illustrates three main steps in its workflow:

  1. Automatically gather key metrics on QuickSight datasets, data sources, analyses, dashboards, user information, and SPICE limits from Amazon CloudWatch. AWS Glue can run daily extract, transform, and load (ETL) jobs, so decision-makers always have access to up-to-date information.
  2. Centralize this data in a dedicated S3 bucket, creating a single source of truth for QuickSight usage across the AWS account Region.
  3. Import data in SPICE and get insights into usage patterns using the SPICE analytics dashboard.

Key advantages of implementing this solution include:

  • Enhanced QuickSight insights:
    • Obtain a comprehensive view of your QuickSight usage patterns
    • Optimize SPICE resource allocation effectively
    • Analyze ingestion accuracy and performance on datasets
    • Increase productivity and efficiency of QuickSight admins
  • Data-driven decision making:
    • Inform strategic choices regarding QuickSight implementations
    • Facilitate accurate SPICE capacity planning
    • Streamline QuickSight management based on concrete usage data

Prerequisites

Before proceeding with this walkthrough, make sure you have the following:

Create resources using AWS CloudFormation

The first step is to create resources using a CloudFormation template to extract QuickSight assets and usage details along with CloudWatch QuickSight metrics. When using this template, it’s essential to specify the QuickSight identity Region. This step makes it possible to retrieve comprehensive user information for your QuickSight account. Selecting and providing the correct identity Region is a mandatory parameter and can’t be skipped.

The template creates the following resources:

  • An IAM role for AWS Glue jobs with permissions for QuickSight, Amazon S3, and CloudWatch called qs-assets-glue-role-{AWS::Region}.
  • An AWS Glue trigger scheduled to run all six AWS Glue jobs daily at 6 AM ET named QuickSightInsightsDailyTrigger.
  • An S3 bucket for storing QuickSight datasets and insights called qs-datasets-insights-{AWS::AccountId}-{AWS::Region}.
  • AWS Glue jobs to extract the metadata:
    • GlueJobDatasetsInfo: Job-ETL-qs-datasets-info – This job automates the extraction of comprehensive QuickSight dataset information in your AWS account deployed Region. It iterates through the datasets, gathering details such as import mode, SPICE capacity usage, ingestion status, and permissions. The script handles both SPICE and direct query datasets, processing various ingestion states and error conditions. Finally, it compiles this valuable information into a CSV file and uploads it to the S3 bucket qs-datasets-insights-{AWS::AccountId}-{AWS::Region}, providing a centralized view of QuickSight dataset metrics and health. The following diagram illustrates this workflow.

  • GlueJobAnalysisInfo: Job-ETL-qs-analysis-info – This job automates the extraction of QuickSight analysis information for your AWS account deployed Region. It iterates through the analyses, gathering details such as analysis name, ID, status, and associated datasets. It compiles this information into a CSV file, including dataset details like name, ID, and last updated time. Finally, it uploads the CSV to an S3 bucket, providing a comprehensive view of QuickSight analyses and their associated datasets.
  • GlueJobDashboardInfo: Job-ETL-qs-dashboard-info – This job automates the extraction of QuickSight dashboard information in your AWS account deployed Region, including dashboard details and associated datasets. It handles pagination and exceptions, compiling the data into a CSV file. The result is a comprehensive overview of QuickSight dashboards and their datasets, stored in Amazon S3 for access and analysis.
  • GlueJobDatasourceInfo: Job-ETL-qs-datasource-info – This job automates the extraction of QuickSight data source information in your AWS account deployed Region, including dataset details and their associated data sources. It handles various data source types (RelationalTable, CustomSQL, S3Source) and file-type datasets, compiling the information into a CSV file. The result is a comprehensive overview of QuickSight datasets and their data sources, stored in Amazon S3 for further analysis.
  • GlueJobSPICELimitInfo: Job-ETL-qs-SPICELimit-info – This script automates the extraction of QuickSight SPICE capacity metrics from CloudWatch, covering both capacity limit and consumed capacity over the last 1,000 days. It processes the data, handling special cases, and compiles it into a CSV file. The result provides a comprehensive historical view of SPICE capacity usage, stored in Amazon S3 for trend analysis.
  • GlueJobUsersInfo: Job-ETL-qs-users-info – This job automates the extraction of QuickSight user information across all namespaces. This is where the QuickSight deployed Region is highly important as part of the job execution. It collects detailed user data including user Amazon Resource Name (ARN), user name, email, role, identity type, active status, and namespace. The script compiles this information into a comprehensive CSV file, which is then stored in Amazon S3, providing a view of all QuickSight users in an account.

Choose Launch Stack to deploy the resources.

Make sure to provide the necessary parameters as shown in the following screenshot during deployment. You can provide the stack name as spice-usage-glue-jobs-stack.

Run AWS Glue ETL jobs

On the AWS Glue console, open each job one at a time to run them. Make sure all jobs complete successfully.

Set S3 bucket permissions on QuickSight

After successful deployment, go to Security & Permissions on the QuickSight console to enable bucket access to qs-datasets-insights-{AWS::AccountId}-{AWS::Region} and Athena access.

Create QuickSight assets using AWS CloudFormation

The next step is to create QuickSight assets required for the SPICE analytics dashboard using another CloudFormation template. This template requires you to provide a user ARN that can be given access to all assets that are deployed. You can get the QuickSight admin or user’s ARNs by running the following command in the AWS Command Line Interface (AWS CLI):

aws quicksight describe-user --aws-account-id <> --namespace default --user-name <>

The following is an example ARN: arn:aws:quicksight:us-east-1:12345678910:user/default/admin/xyz.
The CloudFormation template creates the following resources:

  • An IAM role called ManifestCreationLambdaRole
  • A Lambda function to generate Amazon S3 data source manifest files called ManifestCreationLambda; these manifest files get stored in the S3 bucket qs-datasets-insights-{AWS::AccountId}-{AWS::Region} in the Manifest folder, pointing to outputs from AWS Glue ETL jobs
  • QuickSight assets:
    • Data sources: qs-analysis-info-datasource, qs-dashboard-info-datasource, qs-datasets-info-datasource, qs-datasource-info-datasource, qs-users-info-datasource, qs-SPICECapacity-info-datasource
    • Datasets: qs-analysis-info (joining with qs-users-info), qs-dashboard-info, qs-datasets-info, qs-datasource-info, qs-SPICECapacity-info
    • Theme: SPICEUsage
    • Dashboard: Spice Analytics Dashboard

Choose Launch Stack to deploy the resources.

Provide the necessary parameters as shown in the following screenshot during deployment.

To find your QuickSight user name and account information, log in to QuickSight and choose the user icon on the upper right of the page.

You can provide the stack name as spice-usage-QS-assets-stack.

Use the dashboard

The SPICE analytics dashboard contains two sheets:

  • Dataset Insights
    • The top section provides KPIs like SPICE capacity utilization, ingestion status for refreshes, dataset types, and data source breakdown

  • The bottom section provides dataset ingestion details along with dataset access details

  • Related Assets – Provides the asset information related to any dataset like data source, analyses, dashboard, and users

You can filter the dashboard by choosing a visual or selecting the dataset values in the control.

Configure alerts

A standout feature of this solution is the ability to set up proactive alerts. By configuring an alert on the first gauge chart, which displays SPICE capacity utilization, you can stay ahead of potential issues. This powerful tool notifies you when SPICE consumption reaches a predetermined threshold, allowing you to take timely action and manage your resources more effectively. It’s a game changer for maintaining optimal performance and avoiding unexpected capacity constraints.

Automate assets and ingestion details extraction

Data in the dashboard will be pulled after 12 hours from the deployment time; refresh the controls to pull the latest data.

To keep your dashboard fresh and automated, QuickSight SPICE datasets let you schedule dataset refreshes exactly when you need them. Solution owners can seamlessly configure periodic dataset updates during deployment, making sure your SPICE usage analytics reflects the most current information.

Clean up

Complete the following steps to remove the resources deployed by this solution in the following sequence:

  1. On the AWS CloudFormation console, delete the CloudFormation stack that deployed the QuickSight assets (Spice-usage-QS-assets-Stack).

This removes the following resources:

  • The QuickSight dashboard: Spice Analytics Dashboard
  • The following six datasets:
    • qs-analysis-info
    • qs-dashboard-info
    • qs-datasets-info
    • qs-datasource-info
    • qs-users-info
    • qs-SPICECapacity-info
  • The following six Amazon S3 data sources:
    • qs-analysis-info-datasource
    • qs-dashboard-info-datasource
    • qs-datasets-info-datasource
    • qs-datasource-info-datasource
    • qs-users-info-datasource
    • qs-SPICECapacity-info-datasource
  • The QuickSight usage theme: SPICEUsage
  • The Lambda function that created manifest files for the Amazon S3 connection: ManifestCreationLambda
  • The IAM role that executed the Lambda function: ManifestCreationLambdaRole
  1. On the Amazon S3 console, empty the S3 bucket qs-datasets-insights-{AWS::AccountId}-{AWS::Region}.
  2. On the AWS CloudFormation console, delete the CloudFormation stack that created the AWS Glue ETL jobs to extract QuickSight assets information (spice-usage-glue-jobs-stack).

This removes the following resources:

  • The following six AWS Glue ETL Jobs:
    • Job-ETL-qs-datasets-info
    • Job-ETL-qs-analysis-info
    • Job-ETL-qs-dashboard-info
    • Job-ETL-qs-datasource-info
    • Job-ETL-qs-SPICELimit-info
    • Job-ETL-qs-users-info
  • The AWS Glue trigger: QuickSightInsightsDailyTrigger
  • The IAM role: qs-assets-glue-role-{AWS::AccountId}-{AWS::Region}

Conclusion

As QuickSight adoption grows, more customers need to analyze how their datasets are performing. With the provided CloudFormation template, organizations can deploy this solution, gaining a powerful tool for QuickSight administration. By offering a holistic view of QuickSight usage across Regions and accounts, this solution empowers businesses to maximize the value of their BI investments, enhance user experiences, and drive more informed decision-making processes.

QuickSight offers powerful capabilities as a cloud-centered BI platform that are enhanced by this innovative solution, which addresses the challenges of monitoring and managing datasets across multiple Regions and accounts. By using AWS services such as AWS Glue, Amazon S3, and Athena, this solution automates the collection and centralization of critical QuickSight metrics, providing a comprehensive view of dataset performance, usage, and resource allocation.

The custom QuickSight dashboard offers invaluable insights into SPICE ingestion status, consumption, and dataset relationships, so users can make data-driven decisions about their BI implementations. This streamlined approach not only saves time by alleviating the need for manual dataset review, but also promotes efficient resource management and helps identify areas for optimization.

If you have any feedback or questions, leave them in the comments section. For more in-depth discussions, or to connect with other users facing similar challenges, we encourage you to explore the Amazon QuickSight Community.


About the Authors

Siriguppa Venkat Sai is a Cloud Engineer at Amazon, where he offers expert technical support and top-notch solutions to address customer concerns. His role involves providing advanced engineering assistance and delivering high-quality resolutions to ensure customer satisfaction.

Salim Khan is a Specialist Solutions Architect for Amazon QuickSight. Salim has over 16 years of experience implementing enterprise business intelligence (BI) solutions. Prior to AWS, Salim worked as a BI consultant catering to industry verticals like Automotive, Healthcare, Entertainment, Consumer, Publishing and Financial Services. He has delivered business intelligence, data warehousing, data integration and master data management solutions across enterprises.

Neeraj Kumar is a Senior Worldwide Solutions Architect at AWS, where he architects enterprise-scale business intelligence solutions that transform how organizations leverage their data. With over two decades of experience in data and analytics, Neeraj has driven digital transformation across automotive, manufacturing, and telecom sectors. Today, he guides global organizations to unlock breakthrough insights using Amazon QuickSight and is focused on pushing the boundaries of AI-powered analytics with Q in QuickSight, passionately helping customers modernize their BI landscape and accelerate their journey to becoming truly data-driven enterprises.


This is a companion discussion topic for the original entry at https://aws.amazon.com/blogs/business-intelligence/streamline-amazon-quicksight-dataset-management-dataset-insights-at-a-glance/