We have use case to trigger a Dataset Refresh via an API call with another API call or another AWS Service such as S3 events. Is there a way to do this with some dynamic variables?
We have developed two dynamic solutions for dataset refresh automation, tailored to various tools and services:
- Multi-Dataset Approach for Tools like DJS, BDT Services, and API Calls:
- This method allows dynamic triggering of dataset refreshes via API calls.
- It maps IAM roles (both ADM-owned and customer-owned) to specific datasets through an onboarding process.
- The trigger event, initiated by external tools, uses a CSV file stored in S3 to map IAM roles to the corresponding dataset IDs, ensuring only authorized roles can initiate dataset refreshes.
- Dynamic S3 Trigger Based on Multiple S3 Buckets:
- This method monitors S3 bucket events to trigger dataset refreshes in response to object creation events.
- It uses a CSV file stored in S3 to map S3 buckets to their respective dataset IDs.
- The solution is designed to handle multiple S3 buckets dynamically, and it requires the IAM role associated with the Lambda function to have access to these buckets.
- This approach necessitates additional setup on both ADM and customer sides, including configuring S3 event notifications and setting appropriate bucket policies.
In both methods, we have implemented a robust onboarding process to assign roles to datasets, ensuring dynamic and secure dataset refreshes. The event-based process for tools integrates IAM roles, while the S3-based method focuses on monitoring bucket events, each requiring specific configurations to function seamlessly.
Method 1: Setting Up Lambda Function to Refresh Datasets Based on S3 Events
Overview
The Lambda function will:
- Be triggered by S3 events (object creation) from multiple buckets.
- Map the bucket to the appropriate QuickSight dataset using a CSV file stored in S3.
- Trigger a dataset refresh in QuickSight.
Prerequisites
- AWS account with appropriate permissions.
- S3 buckets to monitor for new object creation.
- QuickSight account and datasets.
Step-by-Step Instructions
Step 1: Update the Existing IAM Role for the Lambda Function
-
Go to the IAM Console in Your Account:
- Open the AWS Management Console.
- Navigate to the IAM console.
-
Select the Existing Role:
- Click on “Roles” in the left sidebar, then find and select the role used by your Lambda function (e.g.,
LambdaQS
).
- Click on “Roles” in the left sidebar, then find and select the role used by your Lambda function (e.g.,
-
Set Permissions for the Role:
- Attach the following policy to the existing role:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:GetObject"
],
"Resource": "arn:aws:s3:::YOUR_S3_BUCKET/path/to/access_control.csv"
},
{
"Effect": "Allow",
"Action": [
"quicksight:CreateIngestion",
"quicksight:DescribeDataSet",
"quicksight:ListDataSets"
],
"Resource": "*"
}
]
}
- Replace
YOUR_S3_BUCKET
with your specific bucket name.
- Update the Trust Relationship:
- Edit the trust relationship to allow the external account’s role to assume this role if necessary.
Step 2: Set Up the Lambda Function
-
Go to the Lambda Console:
- Open the AWS Management Console.
- Navigate to the Lambda console.
-
Create a New Lambda Function:
- Click on “Create function.”
- Choose “Author from scratch.”
- Provide a function name (e.g.,
S3EventProcessor
). - Select the runtime (e.g., Python 3.x).
- Choose the execution role created in Step 1.
- Click “Create function.”
-
Add the Lambda Function Code:
- Replace the default code with the following:
import boto3
import csv
import datetime
import logging
import io
# Configure logging
logger = logging.getLogger()
logger.setLevel(logging.INFO)
# Initialize clients
s3_client = boto3.client('s3')
quicksight_client = boto3.client('quicksight', region_name='YOUR_REGION')
# S3 bucket and key for the access control file
S3_BUCKET = 'YOUR_S3_BUCKET'
S3_KEY = 'path/to/access_control.csv'
def get_dataset_id_for_bucket(bucket_arn):
try:
# Retrieve the access control file from S3
response = s3_client.get_object(Bucket=S3_BUCKET, Key=S3_KEY)
content = response['Body'].read().decode('utf-8')
# Read the CSV content
csv_reader = csv.reader(io.StringIO(content))
# Skip the header row
next(csv_reader)
# Map the bucket to the dataset ID
for row in csv_reader:
if row[0] == bucket_arn:
return row[1] # Assuming the dataset ID is in the second column
return None
except Exception as e:
logger.error(f'Error retrieving dataset ID for bucket {bucket_arn}: {e}')
raise
def lambda_handler(event, context):
try:
# Extract bucket name and object key from the event
bucket_name = event['Records'][0]['s3']['bucket']['name']
object_key = event['Records'][0]['s3']['object']['key']
# Construct the bucket ARN
bucket_arn = f"arn:aws:s3:::{bucket_name}"
# Get the dataset ID for the bucket
dataset_id = get_dataset_id_for_bucket(bucket_arn)
if not dataset_id:
raise ValueError(f'No dataset ID found for bucket {bucket_arn}')
# AWS account ID
account_id = context.invoked_function_arn.split(':')[4]
# Create a unique ingestion ID based on the current timestamp
ingestion_id = f"ingestion-{datetime.datetime.now().strftime('%Y%m%d%H%M%S')}"
# Log the ingestion ID and dataset ID
logger.info(f'Bucket: {bucket_arn}, Object: {object_key}')
logger.info(f'Dataset ID: {dataset_id}')
logger.info(f'Ingestion ID: {ingestion_id}')
# Call the CreateIngestion API
response = quicksight_client.create_ingestion(
DataSetId=dataset_id,
IngestionId=ingestion_id,
AwsAccountId=account_id
)
# Log the response
logger.info(f'Response: {response}')
return {
'statusCode': 200,
'body': response
}
except ValueError as ve:
logger.error(f'Value Error: {ve}')
return {
'statusCode': 400,
'body': str(ve)
}
except Exception as e:
logger.error(f'Error: {e}')
return {
'statusCode': 500,
'body': str(e)
}
Step 3: Set Up the Access Control CSV File
- Create a CSV File with the following columns:
bucket_arn
,dataset_id
. - Example CSV Content:
bucket_arn,dataset_id
arn:aws:s3:::bucket1,dataset_id1
arn:aws:s3:::bucket2,dataset_id2
- Upload the CSV File to S3:
- Upload the CSV file to the specified S3 bucket and key (e.g.,
YOUR_S3_BUCKET/path/to/access_control.csv
).
- Upload the CSV file to the specified S3 bucket and key (e.g.,
Step 4: Configure S3 Event Notifications
-
Go to the S3 Console:
- Open the AWS Management Console.
- Navigate to the S3 console.
-
Select a Bucket and Configure Event Notifications:
- Choose one of the buckets you want to monitor.
- Go to the “Properties” tab.
- Scroll down to the “Event notifications” section and click “Create event notification.”
-
Configure the Event Notification:
- Give your event a name.
- Select “All object create events” as the event type.
- In the “Send to” section, select “Lambda function.”
- Choose the Lambda function.
- Click “Save changes.”
-
Repeat for Other Buckets:
- Repeat the above steps for each S3 bucket you want to monitor.
Step 5: Modify External S3 Buckets for Lambda Function Access
-
Go to the S3 Console:
- Open the AWS Management Console.
- Navigate to the S3 console.
-
Select a Bucket and Edit Permissions:
- Choose one of the external buckets you want to monitor.
- Go to the “Permissions” tab.
- Scroll down to the “Bucket policy” section and click “Edit.”
-
Add a Bucket Policy:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::YOUR_ACCOUNT_ID:role/LambdaExecutionRole"
},
"Action": "s3:GetObject",
"Resource": "arn:aws:s3:::BUCKET_NAME/*"
}
]
}
- Replace
YOUR_ACCOUNT_ID
with your account ID andBUCKET_NAME
with the name of the external bucket. - Click “Save changes.”
- Repeat for Other External Buckets:
- Repeat the above steps for each external S3 bucket you want to monitor.
Step 6: Test the Setup
-
Upload a Test File to a Monitored Bucket:
- Go to one of the monitored S3 buckets and upload a test file.
-
Verify the Lambda Function Execution:
- Check the CloudWatch logs for the Lambda function to verify that it executed correctly and triggered the dataset refresh in QuickSight.
-
Check QuickSight:
- Verify that the dataset in QuickSight has been refreshed.
CSV Template
Here is the CSV template content for the access control file:
bucket_arn,dataset_id
arn:aws:s3:::bucket1,dataset_id1
arn:aws:s3:::bucket2,dataset_id2
Method 2: Using a CSV File for Access Control to Refresh Datasets in QuickSight
Overview
The Lambda function will:
- Be triggered by an external tool via an API call.
- Check access permissions using an access control CSV file stored in S3.
- Trigger a dataset refresh in QuickSight if access is granted.
Prerequisites
- AWS account with appropriate permissions.
- S3 bucket to store the access control CSV file.
- QuickSight account and datasets.
Step-by-Step Instructions
Step 1: Create the IAM Role for the External Tool
- **
Go to the IAM Console in the External Account:**
- Open the AWS Management Console.
- Navigate to the IAM console.
-
Create a New Role:
- Click on “Roles” in the left sidebar, then click “Create role.”
- Choose “Another AWS account” as the trusted entity.
- Enter the AWS account ID of the account where the Lambda function resides.
-
Set Permissions for the Role:
- Attach a policy to allow invoking the Lambda function. You can create a custom policy like this:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"lambda:InvokeFunction"
],
"Resource": "arn:aws:lambda:YOUR_REGION:YOUR_ACCOUNT_ID:function:YOUR_LAMBDA_FUNCTION_NAME"
}
]
}
- Replace
YOUR_REGION
,YOUR_ACCOUNT_ID
, andYOUR_LAMBDA_FUNCTION_NAME
with your specific details. - Click “Next: Tags,” then “Next: Review,” provide a role name (e.g.,
ExternalInvokeRole
), and click “Create role.”
Step 2: Update the Trust Relationship of the Existing Lambda Execution Role
-
Go to the IAM Console in Your Account:
- Open the AWS Management Console.
- Navigate to the IAM console.
-
Find the Existing Lambda Execution Role:
- Click on “Roles” in the left sidebar.
- Find and select the IAM role that your Lambda function is using.
-
Edit the Trust Relationship:
- In the “Trust relationships” tab, click “Edit trust relationship.”
- Update the trust relationship to allow the external account’s role to assume this role. The trust relationship should look like this:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::EXTERNAL_ACCOUNT_ID:role/ExternalInvokeRole"
},
"Action": "sts:AssumeRole"
}
]
}
Step 3: Ensure the Existing Lambda Execution Role Has Necessary Permissions
Ensure the existing Lambda execution role has the necessary permissions to read from S3 and manage QuickSight datasets. Here is a custom policy you can attach if not already attached:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:GetObject"
],
"Resource": "arn:aws:s3:::YOUR_S3_BUCKET/path/to/access_control.csv"
},
{
"Effect": "Allow",
"Action": [
"quicksight:CreateIngestion",
"quicksight:DescribeDataSet",
"quicksight:ListDataSets"
],
"Resource": "*"
}
]
}
Replace YOUR_S3_BUCKET
with your specific bucket name.
Step 4: Update the External Tool
Ensure the external tool sends the dataset_id
and the ARN of the calling role in the event payload when triggering the Lambda function.
Example Event Payload:
{
"dataset_id": "YOUR_DATASET_ID",
"caller_role": "arn:aws:iam::EXTERNAL_ACCOUNT_ID:role/ExternalInvokeRole"
}
Step 5: Deploy and Test
- Update the Lambda function with the necessary code to handle the dynamic input, verify the caller role and dataset ID from the CSV.
- Upload the CSV file to the specified S3 bucket and key.
- Test the Lambda function by triggering it with an appropriate event payload that includes
dataset_id
andcaller_role
.
Lambda Function Code
Here is the Lambda function code for Method 2:
import boto3
import csv
import datetime
import logging
import io
# Configure logging
logger = logging.getLogger()
logger.setLevel(logging.INFO)
# Initialize clients
s3_client = boto3.client('s3')
quicksight_client = boto3.client('quicksight', region_name='YOUR_REGION')
# S3 bucket and key for the access control file
S3_BUCKET = 'YOUR_S3_BUCKET'
S3_KEY = 'path/to/access_control.csv'
def get_dataset_id_and_check_access(role_arn, bucket_arn):
try:
# Retrieve the access control file from S3
response = s3_client.get_object(Bucket=S3_BUCKET, Key=S3_KEY)
content = response['Body'].read().decode('utf-8')
# Read the CSV content
csv_reader = csv.reader(io.StringIO(content))
# Skip the header row
next(csv_reader)
# Check if the role has access to the bucket and retrieve the dataset ID
for row in csv_reader:
if row[0] == role_arn and row[1] == bucket_arn:
return row[2] # Assuming the dataset ID is in the third column
return None
except Exception as e:
logger.error(f'Error checking access and retrieving dataset ID: {e}')
raise
def lambda_handler(event, context):
try:
# Extract bucket name and object key from the event
bucket_name = event['Records'][0]['s3']['bucket']['name']
object_key = event['Records'][0]['s3']['object']['key']
# Construct the bucket ARN
bucket_arn = f"arn:aws:s3:::{bucket_name}"
# Extract the calling role from the context
caller_role = context.invoked_function_arn.split(':function:')[0] + ':role/' + context.identity.cognito_identity_id
# Check access and retrieve the dataset ID
dataset_id = get_dataset_id_and_check_access(caller_role, bucket_arn)
if not dataset_id:
raise PermissionError(f'Role {caller_role} does not have access to bucket {bucket_arn} or no dataset ID found')
# AWS account ID
account_id = context.invoked_function_arn.split(':')[4]
# Create a unique ingestion ID based on the current timestamp
ingestion_id = f"ingestion-{datetime.datetime.now().strftime('%Y%m%d%H%M%S')}"
# Log the ingestion ID and dataset ID
logger.info(f'Bucket: {bucket_arn}, Object: {object_key}')
logger.info(f'Dataset ID: {dataset_id}')
logger.info(f'Ingestion ID: {ingestion_id}')
# Call the CreateIngestion API
response = quicksight_client.create_ingestion(
DataSetId=dataset_id,
IngestionId=ingestion_id,
AwsAccountId=account_id
)
# Log the response
logger.info(f'Response: {response}')
return {
'statusCode': 200,
'body': response
}
except PermissionError as pe:
logger.error(f'Permission Error: {pe}')
return {
'statusCode': 403,
'body': str(pe)
}
except Exception as e:
logger.error(f'Error: {e}')
return {
'statusCode': 500,
'body': str(e)
}
CSV Template
Here is the CSV template content for the access control file:
role,bucket_arn,dataset_id
arn:aws:iam::123456789012:role/RoleA,arn:aws:s3:::bucket1,dataset_id1
arn:aws:iam::123456789012:role/RoleB,arn:aws:s3:::bucket2,dataset_id2
Hi @pagekevi - Thanks for the details. Can you write a blog with some diagram and publish it in AWS QuickSight page. This is very useful details and should be published as a blog. Kristin can help you if you need more details around it.
Hi @Kristin - Can you please help on this.
Regards - Sanjeeb
This has been a long pending requirement. Thanks a lot Kevin for taking up this task and explaining the steps in so much details. Loved it
Thanks Sanjeeb! I will work with @mytree on this as I am working on another blog post as well.