Creating dashboard for data reporting using data from external APIs

asli · July 3, 2024, 11:19am

Hi QuickSight Community,

Could someone help me in displaying data using 3 different external APIs?

This is an example of what I am exactly trying to build here :

How to structure all the data coming from external APIs? Do I need to use S3, Lambda, RDS, Glue, RedShift, Arena… I’m a bit lost with all those services and not sure how to create a dashboard within a full stack web app like this example above! Any help please?

Many thanks in advance for your help

Brett · July 3, 2024, 8:24pm

Hi @asli,
There’s a decent amount going on with this linked page. If you’re looking for additional information on embedding your dashboard or specific visuals to a website, you can view additional information about embedding analytics here.
Additionally, here is the Boto3 QuickSight SDK documentation for development and API calls here.

Do you have QuickSight dashboards that you’d like to embed, do you have datasets or data connections to QuickSight? Overall, what part of the process are you currently on so that we can assist further?

Thank you!

David_Wong · July 5, 2024, 2:35am

Hi @asli,

I think you can use Lambda to call the APIs, save the response in S3, bring the data into Athena and create an Athena data source in QuickSight.

This blog can give you an idea of the workflow. Yours would just not involve CloudTrail and the QuickSight APIs.

Sanjeeb2022 · July 5, 2024, 3:41am

Hi @asli - Thanks for your question. As QuickSight do not have a direct support for bring the data or read from rest api, you have to persist the data before doing the reporting. I agree with the solution approach provide by @David_Wong , you can follow the below steps.

Create 1 or 3 lambda functions ( depending upon your requirement) and extract the data from external api’s. If you will require some massaging or minor transformation, you can use of python and make the data structural. Ensure that the whole process should complete in 15 mins as lambda timeout is 15 mins. The data can be persists to a S3 file ( possibly partitioned and parquet ) and you can create an athena table and add the new partition in each run . You may have too many small file if you are going to run in near real time, so you have think another process how to handle small file issue.
You can schedule the lambda using event bridge which can trigger every 5 mins or cron schedule.
Once the pipeline is created, then you can create your data source using quicksight and make the query as data query and build the analysis and dashboard.

The over architecture looks like below.

Regards - Sanjeeb

asli · July 5, 2024, 10:10am

Thanks @Brett For now, I am still stuck on the data fetching and storing and I was previously advised to use RDS but I failed in using a lambda function to call the data from the external APIs and store them into RDS. However I was able to use a lambda function and store the data in S3 BUT the data is just not organized so I might need to review my python code (see below) :

import json
import requests
import boto3
from botocore.exceptions import NoCredentialsError

Define the API URL and S3 bucket details

api_url = “https://api.nationalgrideso.com/dataset/88313ae5-94e4-4ddc-a790-593554d8c6b9/resource/f93d1835-75bc-43e5-84ad-12472b180a98/download/df_fuel_ckan.csv”
s3_bucket_name = “natioanlgridapi”
s3_bucket_folder = “carbonintensitydata/”
s3_object_name = “df_fuel_ckan.csv”

def fetch_data_from_api(url):
“”“Fetch data from the provided API URL.”“”
try:
response = requests.get(url)
response.raise_for_status() # Raise an HTTPError for bad responses
return response.content
except requests.exceptions.RequestException as e:
print(f"Error fetching data from API: {e}")
raise

def upload_to_s3(data, bucket_name, object_name):
“”“Upload data to an S3 bucket.”“”
s3 = boto3.client(‘s3’)
try:
s3.put_object(Bucket=bucket_name, Key=object_name, Body=data)
print(f"File uploaded successfully to {bucket_name}/{object_name}“)
except NoCredentialsError as e:
print(f"Error with AWS credentials: {e}”)
raise
except Exception as e:
print(f"Error uploading file to S3: {e}")
raise

def lambda_handler(event, context):
“”“Lambda function handler to fetch data from API and upload to S3.”“”
try:
# Fetch data from API
data = fetch_data_from_api(api_url)

    # Define the full S3 object path
    s3_object_path = f"{s3_bucket_folder}{s3_object_name}"
    
    # Upload data to S3
    upload_to_s3(data, s3_bucket_name, s3_object_path)
    
    return {
        'statusCode': 200,
        'body': json.dumps('Data fetched and uploaded successfully!')
    }
except Exception as e:
    return {
        'statusCode': 500,
        'body': json.dumps(f"An error occurred: {e}")
    }

asli · July 5, 2024, 10:20am

Thanks @David_Wong for the link much appreciated.
Is there a possibility to use RDS instead of S3? What’s the main difference actually?
I have previously advised to use relational database storage and now I’m not sure technically what is better?

asli · July 5, 2024, 10:40am

Thanks @Sanjeeb2022 the diagram looks great!
So actually the data are updated EVERY 30 minutes, and I want my dashboard to be updated automatically accordingly, anything to add?

I will try to build this today and let you know if I’m struggling somewhere. Does S3 could be replaced by RDS? or there no need for that?

Ultimately, the final result should looks like the Energy Dashboard link I posted in my first message.

Here is an example from one of the 3 APIs:

7 Day Ahead Demand Forecast :

https://www.nationalgrideso.com/data-portal/7-day-ahead-national-forecast

Historic generation mix and carbon intensity :

https://www.nationalgrideso.com/data-portal/historic-generation-mix

Sanjeeb2022 · July 5, 2024, 12:26pm

Hi @asli - You can put SPICE instead of direct query as your refresh schedule is 30 Mins. This way you can improve your performance and it will save the cost from athena prospective. If you are opted for RDS, its purely as per requirement and additional cost and data load is require. Looking at your usecase, I believe RDS is not require at all. This is a simple and easy solution to use s3 for storage, create Athena tables ( with partition) and then use QuickSight with Spice for reporting.

Regards - Sanjeeb