by Antonio Samaniego Jurado and Pascal Vogel | on 17 NOV 2023 | in Advanced (300), Amazon Athena, Amazon DynamoDB, Amazon QuickSight, AWS Glue, Best Practices, Technical How-to | Permalink | Comments | Share
Amazon DynamoDB is a fully managed, serverless, key-value NoSQL database designed to run high-performance applications at any scale. DynamoDB offers built-in security, continuous backups, automated multi-Region replication, in-memory caching, and data import and export tools. The scalability and flexible data schema of DynamoDB make it well-suited for a variety of use cases. These include internet-scale web and mobile applications, low-latency metadata stores, high-traffic retail websites, Internet of Things (IoT) and time series data, online gaming, and more.
Data stored in DynamoDB is the basis for valuable business intelligence (BI) insights. To make this data accessible to data analysts and other consumers, you can use Amazon Athena. Athena is a serverless, interactive service that allows you to query data from a variety of sources in heterogeneous formats, with no provisioning effort. Athena accesses data stored in DynamoDB via the open source Amazon Athena DynamoDB connector. Table metadata, such as column names and data types, is stored using the AWS Glue Data Catalog.
Finally, to visualize BI insights, you can use Amazon QuickSight, a cloud-powered business analytics service. QuickSight makes it straightforward for organizations to build visualizations, perform ad hoc analysis, and quickly get business insights from their data, anytime, on any device. Its generative BI capabilities enable you to ask questions about your data using natural language, without having to write SQL queries or learn a BI tool.
This post shows how you can use the Athena DynamoDB connector to easily query data in DynamoDB with SQL and visualize insights in QuickSight.
Solution overview
The following diagram illustrates the solution architecture.
- The Athena DynamoDB connector runs in a pre-built, serverless AWS Lambda function. You don’t need to write any code.
- AWS Glue provides supplemental metadata from the DynamoDB table. In particular, an AWS Glue crawler is run to infer and store the DynamoDB table format, schema, and associated properties in the Glue Data Catalog.
- The Athena editor is used to test the connector and perform analysis via SQL queries.
- QuickSight uses the Athena connector to visualize BI insights from DynamoDB.
This walkthrough uses data from the ProductCatalog
table, part of the DynamoDB developer guide sample data files.
Prerequisites
Before you get started, you should meet the following prerequisites:
- An AWS account (refer to How do I create and activate a new AWS account? to create a new account)
- A QuickSight account (see Signing up for an Amazon QuickSight subscription for sign up instructions)
- Familiarity with AWS Identity and Access Management (IAM)
Set up the Athena DynamoDB connector
The Athena DynamoDB connector comprises a pre-built, serverless Lambda function provided by AWS that communicates with DynamoDB so you can query your tables with SQL using Athena. The connector is available in the AWS Serverless Application Repository, and is used to create the Athena data source for later use in data analysis and visualization. To set up the connector, complete the following steps:
- On the Athena console, choose Data sources in the navigation pane.
- Choose Create data source.
- In the search bar, search for and choose Amazon DynamoDB.
- Choose Next.
- Under Data source details, enter a name. Note that this name should be unique and will be referenced in your SQL statements when you query your Athena data source.
- Under Connection details, choose Create Lambda function.
This will take you to the Lambda applications page on the Lambda console. Do not close the Athena data source creation tab; you will return to it in a later step.
- Scroll down to Application settings and enter a value for the following parameters (leave the other parameters as default):
SpillBucket
– Specifies the Amazon Simple Storage Service (Amazon S3) bucket name for storing data that exceeds Lambda function response size limits. To create an S3 bucket, refer to Creating a bucket.AthenaCatalogName
– A lowercase name for the Lambda function to be created.
- Select the acknowledgement check box and choose Deploy.
Wait for deployment to complete before moving to the next step.
- Return to the Athena data source creation tab.
- Under Connection details, choose the refresh icon and choose the Lambda function you created.
- Choose Next.
- Review and choose Create data source.
Provide supplemental metadata via AWS Glue
The Athena connector already comes with a built-in inference capability to discover the schema and table properties of your data source. However, this capability is limited. To accurately discover the metadata of your DynamoDB table and centralize schema management as your data evolves over time, the connector integrates with AWS Glue.
To achieve this, an AWS Glue crawler is run to automatically determine the format, schema, and associated properties of the raw data stored in your DynamoDB table, writing the resulting metadata to a Glue database. Glue databases contain tables, which hold metadata from different data stores, independent from the actual location of the data. The Athena connector then references the Glue table and retrieves the corresponding DynamoDB metadata to enable queries.
Create the AWS Glue database
Complete the following steps to create the Glue database:
- On the AWS Glue console, under Data Catalog in the navigation pane, choose Databases.
- Choose Add database (you can also edit an existing database if you already have one).
- For Name, enter a database name.
- For Location, enter the string literal
dynamo-db-flag
. This keyword indicates that the database contains tables that the connector can use for supplemental metadata. - Choose Create database.
Following security best practices, it is also recommended that you enable encryption at rest for your Data Catalog. For details, refer to Encrypting your Data Catalog.
Create the AWS Glue crawler
Complete the following steps to create and run the Glue crawler:
- On the AWS Glue console, under Data Catalog in the navigation pane, choose Crawlers.
- Choose Create crawler.
- Enter a crawler name and choose Next.
- For Data sources, choose Add a data source.
- On the Data source drop-down menu, choose DynamoDB. For Table name, enter the name of your DynamoDB table (string literal).
- Choose Add a DynamoDB data source.
- Choose Next.
- For IAM Role, choose Create new IAM role.
- Enter a role name and choose Create. This will automatically create an IAM role that trusts AWS Glue and has permissions to access the crawler targets.
- Choose Next.
- For Target database, choose the database previously created.
- Choose Next.
- Review and choose Create crawler.
- On the newly created crawler page, choose Run crawler.
Crawler runtimes depend on your DynamoDB table size and properties. You can find crawler run details under Crawler runs.
Validate the output metadata
When your crawler run status shows as Completed, follow the below steps to validate the output metadata:
- On the AWS Glue console, choose Tables in the navigation pane. Here, you can confirm a new table has been added to the database as a result of the crawler run.
- Navigate to the newly created table and take a look at the Schema tab. This tab shows the column names, data types, and other parameters inferred from your DynamoDB table.
- If needed, edit the schema by choosing Edit schema.
- Choose Advanced properties.
- Under Table properties, verify the crawler automatically created and set the
classification
key todynamodb
. This indicates to the Athena connector that the table can be used for supplemental metadata. - Optionally, add the following properties to correctly catalog and reference DynamoDB data in AWS Glue and Athena queries. This is due to capital letters not being permitted in AWS Glue table and column names, but being permitted in DynamoDB table and attribute names.
- If your DynamoDB table name contains any capital letters, choose Actions and Edit Table and add an extra table property as follows:
* Key:sourceTable
* Value:YourDynamoDBTableName
- If your DynamoDB table has attributes that contain any capital letters, add an extra table property as follows:
* Key:columnMapping
* Value:yourcolumn1=YourColumn1
,yourcolumn2=YourColumn2
, …
Test the connector with the Athena SQL editor
After the Athena DynamoDB connector is deployed and the AWS Glue table is populated with supplemental metadata, the DynamoDB table is ready for analysis. The example in this post uses the Athena editor to make SQL queries to the ProductCatalog
table. For further options to interact with Athena, see Accessing Athena.
Complete the following steps to test the connector:
- Open the Athena query editor.
- If this is your first time visiting the Athena console in your current AWS Region, complete the following steps. This is a prerequisite before you can run Athena queries. See Getting Started for more details.
- Choose Query editor in the navigation pane to open the editor.
- Navigate to Settings and choose Manage to set up a query result location in Amazon S3.
- Under Data, select the data source and database you created (you may need to choose the refresh icon for them to sync up with Athena).
- Tables belonging to the selected database appear under Tables. You can choose a table name for Athena to show the table column list and data types.
- Test the connector by pulling data from your table via a SELECT statement. When you run Athena queries, you can reference Athena data sources, databases, and tables as
<datasource_name>.<database>.<table_name>
. Retrieved records are shown under Results.
For increased security, refer to Encrypting Athena query results stored in Amazon S3 to encrypt query results at rest.
For this post, we run a SELECT statement to validate the process. You can refer to the SQL reference for Athena to build more complex queries and analyses.
Visualize in QuickSight
QuickSight allows for building modern interactive dashboards, paginated reports, embedded analytics, and natural language queries through a unified BI solution. In this step, we use QuickSight to generate visual insights from the DynamoDB table by connecting to the Athena data source previously created.
Allow QuickSight to access to resources
Complete the following steps to grant QuickSight access to resources:
- On the QuickSight console, choose the profile icon and choose Manage QuickSight.
- In the navigation pane, choose Security & Permissions.
- Under QuickSight access to AWS services, choose Manage.
- QuickSight may ask you to switch to the Region in which users and groups in your account are managed. To change the current Region, navigate to the profile icon on the QuickSight console and choose the Region you want to switch to.
- For IAM Role, choose Use QuickSight-managed role (default).
Subsequent instructions assume that the default QuickSight-managed role is being used. If this is not the case, make sure to update the existing role to the same effect.
- Under Allow access and autodiscovery for these resources, select IAM and Amazon S3.
- For Amazon S3, choose Select S3 buckets.
- Choose the spill bucket you specified in earlier when deploying the Lambda function for the connector and the bucket you specified as the Athena query result location in Amazon S3.
- For both buckets, select Write permission for Athena Workgroup.
- Choose Amazon Athena.
- In the pop-up window, choose Next.
- Choose Lambda and choose the Amazon Resource Name (ARN) of the Lambda function previously used for the Athena data source connector.
- Choose Finish.
- Choose Save.
Create the Athena dataset
To create the Athena dataset, complete the following steps:
- On the QuickSight console, choose the user profile and switch to the Region you deployed the Athena data source to.
- Return to the QuickSight home page.
- In the navigation pane, choose Datasets.
- Choose New dataset.
- For Create a Dataset, select Athena.
- For Data source name, enter a name and choose Validate connection.
- When the connection shows as Validated, choose Create data source.
- Under Catalog, Database, and Tables, select the Athena data source, AWS Glue database, and AWS Glue table previously created.
- Choose Select.
- On the Finish dataset creation page, select Import to SPICE for quicker analytics.
- Choose Visualize.
For additional information on QuickSight query modes, see Importing data into SPICE and Using SQL to customize data.
Build QuickSight visualizations
Once the DynamoDB data is available in QuickSight via the Athena DynamoDB connector, it is ready to be visualized. The QuickSight analysis in the below example shows a vertical stacked bar chart with the average price per product category for the ProductCatalog
sample dataset. In addition, it shows a donut chart with the proportion of products by product category, and a tree map containing the count of bicycles per bicycle type.
If you use data imported to SPICE in a QuickSight analysis, the dataset will only be available after the import is complete. For further details, see Using SPICE data in an analysis.
For comprehensive information on how to create and share visualizations in QuickSight, refer to Visualizing data in Amazon QuickSight and Sharing and subscribing to data in Amazon QuickSight.
Clean up
To avoid incurring continued AWS usage charges, make sure you delete all resources created as part of this walkthrough.
- Delete the Athena data source:
- On the Athena console, switch to the Region you deployed your resources in.
- Choose Data sources in the navigation pane.
- Select the data source you created and on the Actions menu, choose Delete.
- Delete the Lambda application:
- On the AWS CloudFormation console, switch to the Region you deployed your resources in.
- Choose Stacks in the navigation pane.
- Select
serverlessrepo-AthenaDynamoDBConnector
and choose Delete.
- Delete the AWS Glue resources:
- On the AWS Glue console, switch to the Region you deployed your resources in.
- Choose Databases in the navigation pane.
- Select the database you created and choose Delete.
- Choose Crawlers in the navigation pane.
- Select the crawler you created and on the Action menu, choose Delete crawler.
- Delete the QuickSight resources:
- On the QuickSight console, switch to the Region you deployed your resources in.
- Delete the analysis created for this walkthrough.
- Delete the Athena dataset created for this walkthrough.
- If you no longer need the Athena data source to create other datasets, delete the data source.
Summary
This post demonstrated how you can use the Athena DynamoDB connector to query data in DynamoDB with SQL and build visualizations in QuickSight.
Learn more about the Athena DynamoDB connector in the Amazon Athena User Guide. Discover more available data source connectors to query and visualize a variety of data sources without setting up or managing any infrastructure while only paying for the queries you run.
For advanced QuickSight capabilities powered by AI, see Gaining insights with machine learning (ML) in Amazon QuickSight and Answering business questions with Amazon QuickSight Q.
About the Authors
Antonio Samaniego Jurado is a Solutions Architect at Amazon Web Services. With a strong passion for modern technology, Antonio helps customers build state-of-the-art applications on AWS. A creator at heart, he loves community-driven learning and sharing of best practices across the AWS service portfolio to make the best of customers cloud journey.
Pascal Vogel is a Solutions Architect at Amazon Web Services. Pascal helps startups and enterprises build cloud-native solutions. As a cloud enthusiast, Pascal loves learning new technologies and connecting with like-minded customers who want to make a difference in their cloud journey.
This is a companion discussion topic for the original entry at **Visualize Amazon DynamoDB insights in Amazon QuickSight using the Amazon Athena DynamoDB connector and AWS Glue | AWS Big Data Blog.