Triggering Dataset Refreshes via API

pagekevi · August 8, 2024, 2:27am

We have use case to trigger a Dataset Refresh via an API call with another API call or another AWS Service such as S3 events. Is there a way to do this with some dynamic variables?

pagekevi · August 8, 2024, 2:29am

We have developed two dynamic solutions for dataset refresh automation, tailored to various tools and services:

Multi-Dataset Approach for Tools like DJS, BDT Services, and API Calls:

This method allows dynamic triggering of dataset refreshes via API calls.
It maps IAM roles (both ADM-owned and customer-owned) to specific datasets through an onboarding process.
The trigger event, initiated by external tools, uses a CSV file stored in S3 to map IAM roles to the corresponding dataset IDs, ensuring only authorized roles can initiate dataset refreshes.

Dynamic S3 Trigger Based on Multiple S3 Buckets:

This method monitors S3 bucket events to trigger dataset refreshes in response to object creation events.
It uses a CSV file stored in S3 to map S3 buckets to their respective dataset IDs.
The solution is designed to handle multiple S3 buckets dynamically, and it requires the IAM role associated with the Lambda function to have access to these buckets.
This approach necessitates additional setup on both ADM and customer sides, including configuring S3 event notifications and setting appropriate bucket policies.

In both methods, we have implemented a robust onboarding process to assign roles to datasets, ensuring dynamic and secure dataset refreshes. The event-based process for tools integrates IAM roles, while the S3-based method focuses on monitoring bucket events, each requiring specific configurations to function seamlessly.

Method 1: Setting Up Lambda Function to Refresh Datasets Based on S3 Events

Overview

The Lambda function will:

Be triggered by S3 events (object creation) from multiple buckets.
Map the bucket to the appropriate QuickSight dataset using a CSV file stored in S3.
Trigger a dataset refresh in QuickSight.

Prerequisites

AWS account with appropriate permissions.
S3 buckets to monitor for new object creation.
QuickSight account and datasets.

Step-by-Step Instructions

Step 1: Update the Existing IAM Role for the Lambda Function

Go to the IAM Console in Your Account:
- Open the AWS Management Console.
- Navigate to the IAM console.
Select the Existing Role:
- Click on “Roles” in the left sidebar, then find and select the role used by your Lambda function (e.g., LambdaQS).
Set Permissions for the Role:
- Attach the following policy to the existing role:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "s3:GetObject"
            ],
            "Resource": "arn:aws:s3:::YOUR_S3_BUCKET/path/to/access_control.csv"
        },
        {
            "Effect": "Allow",
            "Action": [
                "quicksight:CreateIngestion",
                "quicksight:DescribeDataSet",
                "quicksight:ListDataSets"
            ],
            "Resource": "*"
        }
    ]
}

Replace YOUR_S3_BUCKET with your specific bucket name.

Update the Trust Relationship:
- Edit the trust relationship to allow the external account’s role to assume this role if necessary.

Step 2: Set Up the Lambda Function

Go to the Lambda Console:
- Open the AWS Management Console.
- Navigate to the Lambda console.
Create a New Lambda Function:
- Click on “Create function.”
- Choose “Author from scratch.”
- Provide a function name (e.g., S3EventProcessor).
- Select the runtime (e.g., Python 3.x).
- Choose the execution role created in Step 1.
- Click “Create function.”
Add the Lambda Function Code:
- Replace the default code with the following:

import boto3
import csv
import datetime
import logging
import io

# Configure logging
logger = logging.getLogger()
logger.setLevel(logging.INFO)

# Initialize clients
s3_client = boto3.client('s3')
quicksight_client = boto3.client('quicksight', region_name='YOUR_REGION')

# S3 bucket and key for the access control file
S3_BUCKET = 'YOUR_S3_BUCKET'
S3_KEY = 'path/to/access_control.csv'

def get_dataset_id_for_bucket(bucket_arn):
    try:
        # Retrieve the access control file from S3
        response = s3_client.get_object(Bucket=S3_BUCKET, Key=S3_KEY)
        content = response['Body'].read().decode('utf-8')

        # Read the CSV content
        csv_reader = csv.reader(io.StringIO(content))
        
        # Skip the header row
        next(csv_reader)

        # Map the bucket to the dataset ID
        for row in csv_reader:
            if row[0] == bucket_arn:
                return row[1]  # Assuming the dataset ID is in the second column
        return None
    except Exception as e:
        logger.error(f'Error retrieving dataset ID for bucket {bucket_arn}: {e}')
        raise

def lambda_handler(event, context):
    try:
        # Extract bucket name and object key from the event
        bucket_name = event['Records'][0]['s3']['bucket']['name']
        object_key = event['Records'][0]['s3']['object']['key']
        
        # Construct the bucket ARN
        bucket_arn = f"arn:aws:s3:::{bucket_name}"

        # Get the dataset ID for the bucket
        dataset_id = get_dataset_id_for_bucket(bucket_arn)
        if not dataset_id:
            raise ValueError(f'No dataset ID found for bucket {bucket_arn}')

        # AWS account ID
        account_id = context.invoked_function_arn.split(':')[4]

        # Create a unique ingestion ID based on the current timestamp
        ingestion_id = f"ingestion-{datetime.datetime.now().strftime('%Y%m%d%H%M%S')}"

        # Log the ingestion ID and dataset ID
        logger.info(f'Bucket: {bucket_arn}, Object: {object_key}')
        logger.info(f'Dataset ID: {dataset_id}')
        logger.info(f'Ingestion ID: {ingestion_id}')

        # Call the CreateIngestion API
        response = quicksight_client.create_ingestion(
            DataSetId=dataset_id,
            IngestionId=ingestion_id,
            AwsAccountId=account_id
        )

        # Log the response
        logger.info(f'Response: {response}')

        return {
            'statusCode': 200,
            'body': response
        }
    except ValueError as ve:
        logger.error(f'Value Error: {ve}')
        return {
            'statusCode': 400,
            'body': str(ve)
        }
    except Exception as e:
        logger.error(f'Error: {e}')
        return {
            'statusCode': 500,
            'body': str(e)
        }

Step 3: Set Up the Access Control CSV File

Create a CSV File with the following columns: bucket_arn, dataset_id.
Example CSV Content:

bucket_arn,dataset_id
arn:aws:s3:::bucket1,dataset_id1
arn:aws:s3:::bucket2,dataset_id2

Upload the CSV File to S3:
- Upload the CSV file to the specified S3 bucket and key (e.g., YOUR_S3_BUCKET/path/to/access_control.csv).

Step 4: Configure S3 Event Notifications

Go to the S3 Console:
- Open the AWS Management Console.
- Navigate to the S3 console.
Select a Bucket and Configure Event Notifications:
- Choose one of the buckets you want to monitor.
- Go to the “Properties” tab.
- Scroll down to the “Event notifications” section and click “Create event notification.”
Configure the Event Notification:
- Give your event a name.
- Select “All object create events” as the event type.
- In the “Send to” section, select “Lambda function.”
- Choose the Lambda function.
- Click “Save changes.”
Repeat for Other Buckets:
- Repeat the above steps for each S3 bucket you want to monitor.

Step 5: Modify External S3 Buckets for Lambda Function Access

Go to the S3 Console:
- Open the AWS Management Console.
- Navigate to the S3 console.
Select a Bucket and Edit Permissions:
- Choose one of the external buckets you want to monitor.
- Go to the “Permissions” tab.
- Scroll down to the “Bucket policy” section and click “Edit.”
Add a Bucket Policy:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "AWS": "arn:aws:iam::YOUR_ACCOUNT_ID:role/LambdaExecutionRole"
            },
            "Action": "s3:GetObject",
            "Resource": "arn:aws:s3:::BUCKET_NAME/*"
        }
    ]
}

Replace YOUR_ACCOUNT_ID with your account ID and BUCKET_NAME with the name of the external bucket.
Click “Save changes.”

Repeat for Other External Buckets:
- Repeat the above steps for each external S3 bucket you want to monitor.

Step 6: Test the Setup

Upload a Test File to a Monitored Bucket:
- Go to one of the monitored S3 buckets and upload a test file.
Verify the Lambda Function Execution:
- Check the CloudWatch logs for the Lambda function to verify that it executed correctly and triggered the dataset refresh in QuickSight.
Check QuickSight:
- Verify that the dataset in QuickSight has been refreshed.

CSV Template

Here is the CSV template content for the access control file:

bucket_arn,dataset_id
arn:aws:s3:::bucket1,dataset_id1
arn:aws:s3:::bucket2,dataset_id2

Method 2: Using a CSV File for Access Control to Refresh Datasets in QuickSight

Overview

The Lambda function will:

Be triggered by an external tool via an API call.
Check access permissions using an access control CSV file stored in S3.
Trigger a dataset refresh in QuickSight if access is granted.

Prerequisites

AWS account with appropriate permissions.
S3 bucket to store the access control CSV file.
QuickSight account and datasets.

Step-by-Step Instructions

Step 1: Create the IAM Role for the External Tool

**

Go to the IAM Console in the External Account:**

Open the AWS Management Console.
Navigate to the IAM console.

Create a New Role:
- Click on “Roles” in the left sidebar, then click “Create role.”
- Choose “Another AWS account” as the trusted entity.
- Enter the AWS account ID of the account where the Lambda function resides.
Set Permissions for the Role:
- Attach a policy to allow invoking the Lambda function. You can create a custom policy like this:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "lambda:InvokeFunction"
            ],
            "Resource": "arn:aws:lambda:YOUR_REGION:YOUR_ACCOUNT_ID:function:YOUR_LAMBDA_FUNCTION_NAME"
        }
    ]
}

Replace YOUR_REGION, YOUR_ACCOUNT_ID, and YOUR_LAMBDA_FUNCTION_NAME with your specific details.
Click “Next: Tags,” then “Next: Review,” provide a role name (e.g., ExternalInvokeRole), and click “Create role.”

Step 2: Update the Trust Relationship of the Existing Lambda Execution Role

Go to the IAM Console in Your Account:
- Open the AWS Management Console.
- Navigate to the IAM console.
Find the Existing Lambda Execution Role:
- Click on “Roles” in the left sidebar.
- Find and select the IAM role that your Lambda function is using.
Edit the Trust Relationship:
- In the “Trust relationships” tab, click “Edit trust relationship.”
- Update the trust relationship to allow the external account’s role to assume this role. The trust relationship should look like this:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "AWS": "arn:aws:iam::EXTERNAL_ACCOUNT_ID:role/ExternalInvokeRole"
            },
            "Action": "sts:AssumeRole"
        }
    ]
}

Step 3: Ensure the Existing Lambda Execution Role Has Necessary Permissions

Ensure the existing Lambda execution role has the necessary permissions to read from S3 and manage QuickSight datasets. Here is a custom policy you can attach if not already attached:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "s3:GetObject"
            ],
            "Resource": "arn:aws:s3:::YOUR_S3_BUCKET/path/to/access_control.csv"
        },
        {
            "Effect": "Allow",
            "Action": [
                "quicksight:CreateIngestion",
                "quicksight:DescribeDataSet",
                "quicksight:ListDataSets"
            ],
            "Resource": "*"
        }
    ]
}

Replace YOUR_S3_BUCKET with your specific bucket name.

Step 4: Update the External Tool

Ensure the external tool sends the dataset_id and the ARN of the calling role in the event payload when triggering the Lambda function.

Example Event Payload:

{
    "dataset_id": "YOUR_DATASET_ID",
    "caller_role": "arn:aws:iam::EXTERNAL_ACCOUNT_ID:role/ExternalInvokeRole"
}

Step 5: Deploy and Test

Update the Lambda function with the necessary code to handle the dynamic input, verify the caller role and dataset ID from the CSV.
Upload the CSV file to the specified S3 bucket and key.
Test the Lambda function by triggering it with an appropriate event payload that includes dataset_id and caller_role.

Lambda Function Code

Here is the Lambda function code for Method 2:

import boto3
import csv
import datetime
import logging
import io

# Configure logging
logger = logging.getLogger()
logger.setLevel(logging.INFO)

# Initialize clients
s3_client = boto3.client('s3')
quicksight_client = boto3.client('quicksight', region_name='YOUR_REGION')

# S3 bucket and key for the access control file
S3_BUCKET = 'YOUR_S3_BUCKET'
S3_KEY = 'path/to/access_control.csv'

def get_dataset_id_and_check_access(role_arn, bucket_arn):
    try:
        # Retrieve the access control file from S3
        response = s3_client.get_object(Bucket=S3_BUCKET, Key=S3_KEY)
        content = response['Body'].read().decode('utf-8')

        # Read the CSV content
        csv_reader = csv.reader(io.StringIO(content))
        
        # Skip the header row
        next(csv_reader)

        # Check if the role has access to the bucket and retrieve the dataset ID
        for row in csv_reader:
            if row[0] == role_arn and row[1] == bucket_arn:
                return row[2]  # Assuming the dataset ID is in the third column
        return None
    except Exception as e:
        logger.error(f'Error checking access and retrieving dataset ID: {e}')
        raise

def lambda_handler(event, context):
    try:
        # Extract bucket name and object key from the event
        bucket_name = event['Records'][0]['s3']['bucket']['name']
        object_key = event['Records'][0]['s3']['object']['key']
        
        # Construct the bucket ARN
        bucket_arn = f"arn:aws:s3:::{bucket_name}"
        
        # Extract the calling role from the context
        caller_role = context.invoked_function_arn.split(':function:')[0] + ':role/' + context.identity.cognito_identity_id

        # Check access and retrieve the dataset ID
        dataset_id = get_dataset_id_and_check_access(caller_role, bucket_arn)
        if not dataset_id:
            raise PermissionError(f'Role {caller_role} does not have access to bucket {bucket_arn} or no dataset ID found')

        # AWS account ID
        account_id = context.invoked_function_arn.split(':')[4]

        # Create a unique ingestion ID based on the current timestamp
        ingestion_id = f"ingestion-{datetime.datetime.now().strftime('%Y%m%d%H%M%S')}"

        # Log the ingestion ID and dataset ID
        logger.info(f'Bucket: {bucket_arn}, Object: {object_key}')
        logger.info(f'Dataset ID: {dataset_id}')
        logger.info(f'Ingestion ID: {ingestion_id}')

        # Call the CreateIngestion API
        response = quicksight_client.create_ingestion(
            DataSetId=dataset_id,
            IngestionId=ingestion_id,
            AwsAccountId=account_id
        )

        # Log the response
        logger.info(f'Response: {response}')

        return {
            'statusCode': 200,
            'body': response
        }
    except PermissionError as pe:
        logger.error(f'Permission Error: {pe}')
        return {
            'statusCode': 403,
            'body': str(pe)
        }
    except Exception as e:
        logger.error(f'Error: {e}')
        return {
            'statusCode': 500,
            'body': str(e)
        }

CSV Template

Here is the CSV template content for the access control file:

role,bucket_arn,dataset_id
arn:aws:iam::123456789012:role/RoleA,arn:aws:s3:::bucket1,dataset_id1
arn:aws:iam::123456789012:role/RoleB,arn:aws:s3:::bucket2,dataset_id2

Sanjeeb2022 · August 8, 2024, 5:58am

Hi @pagekevi - Thanks for the details. Can you write a blog with some diagram and publish it in AWS QuickSight page. This is very useful details and should be published as a blog. Kristin can help you if you need more details around it.

Hi @Kristin - Can you please help on this.

Regards - Sanjeeb

abhimon · August 9, 2024, 6:33am

This has been a long pending requirement. Thanks a lot Kevin for taking up this task and explaining the steps in so much details. Loved it

pagekevi · August 9, 2024, 3:22pm

Thanks Sanjeeb! I will work with @mytree on this as I am working on another blog post as well.

Topic		Replies	Views
Trigger dataset refresh Q&A dataset	3	2230	August 25, 2023
S3 dataset realtime refresh Q&A feature-request , how-to	14	1995	April 30, 2024
Refresh quick sight dataset when file uploaded to s3 bucket Q&A quicksight , dataset , Business-Intelligence-Engineer	2	401	October 5, 2023
Refresh Quicksight Dataset-Lambda Q&A data-source , how-to	4	1105	July 11, 2023
Keeping track of dataset refreshees Q&A dataset	12	1236	April 10, 2023

Triggering Dataset Refreshes via API

Method 1: Setting Up Lambda Function to Refresh Datasets Based on S3 Events

Overview

Prerequisites

Step-by-Step Instructions

Step 1: Update the Existing IAM Role for the Lambda Function

Step 2: Set Up the Lambda Function

Step 3: Set Up the Access Control CSV File

Step 4: Configure S3 Event Notifications

Step 5: Modify External S3 Buckets for Lambda Function Access

Step 6: Test the Setup

CSV Template

Method 2: Using a CSV File for Access Control to Refresh Datasets in QuickSight

Overview

Prerequisites

Step-by-Step Instructions

Step 1: Create the IAM Role for the External Tool

Step 2: Update the Trust Relationship of the Existing Lambda Execution Role

Step 3: Ensure the Existing Lambda Execution Role Has Necessary Permissions

Step 4: Update the External Tool

Example Event Payload:

Step 5: Deploy and Test

Lambda Function Code

CSV Template

Related topics