I upload excel files as my dataset, but I am soon to run out of excel row limit. I tried to split and upload the files, but I do not see the option to append the files, only join/merge option. Is there any solution so that I can upload two files and append them perhaps? Or may be another better solution?
I also tried to use a helper column in all files and merge, but the second data file gets add on the right instead of getting added after the last row.
Hi @paruanil, it sounds like you’re facing limitations with Excel row limits and looking for a way to efficiently manage and append large datasets in Amazon QuickSight.
Here’s a detailed approach to address this issue:
Approach: Converting Excel Files to CSV and Using S3 with Amazon Athena
Step 1: Convert Excel Files to CSV
- Convert Each Excel File to CSV:
- Open each Excel file and save it as a CSV file. This can be done manually or via a script if you have many files.
Step 2: Upload CSV Files to S3
- Create an S3 Bucket:
- Create an S3 bucket (if you don’t have one already) to store your CSV files.
- Upload CSV Files:
- Upload all the converted CSV files to the S3 bucket. You can organize them in folders if needed.
Step 3: Create an Athena Table
- Configure AWS Athena:
- Go to the AWS Athena console.
- Create a new database if you don’t have one.
- Create an Athena Table for CSV Files:
- Write a DDL statement to create a table in Athena that reads from the S3 bucket. Here’s an example:
Pseudocode (Syntax may vary)
CREATE EXTERNAL TABLE IF NOT EXISTS my_dataset (
column1 STRING,
column2 STRING,
column3 STRING,
...
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
WITH SERDEPROPERTIES (
'serialization.format' = ',',
'field.delim' = ','
)
LOCATION 's3://your-bucket/path-to-csv-files/'
TBLPROPERTIES ('has_encrypted_data'='false');
- Adjust the column definitions to match your data schema.
- Partitioning (Optional):
- If your data is large and you want to improve query performance, consider partitioning the Athena table by a logical key, such as date.
Step 4: Use Athena Table in QuickSight
- Connect QuickSight to Athena:
- In QuickSight, create a new dataset.
- Choose Athena as the data source and select the database and table you created.
- SPICE (Optional):
Benefits of This Approach
- Scalability: You can manage and query large datasets without hitting row limits.
- Automation: New data added to the S3 bucket will be immediately available in Athena and subsequently in QuickSight (with SPICE refresh if used).
- Flexibility: You can append data easily by simply uploading new CSV files to the S3 bucket.
Alternative Simple Approach
If you prefer a simpler method and don’t need the scalability of Athena, you can directly append CSV files:
- Combine CSV Files Locally:
- Use a script to combine your CSV files locally into a single large CSV file.
- Upload to QuickSight:
- Upload this combined CSV file directly to QuickSight.
Example Script to Combine CSV Files (Python)
Pseudocode (Syntax may vary)
import pandas as pd
import glob
# Get list of CSV files
csv_files = glob.glob('path/to/csv/files/*.csv')
# Combine all files in the list
combined_csv = pd.concat([pd.read_csv(f) for f in csv_files])
# Export to a single CSV file
combined_csv.to_csv('path/to/output/combined.csv', index=False)
By following these approaches, you can efficiently manage large datasets and ensure smooth operation within Amazon QuickSight.
Did this solution work for you? I am marking this reply as, “Solution,” but let us know if this is not resolved. Thanks for posting your questions on the QuickSight Community!
1 Like
Hi,
I’m sort of new to all this, how do I check if have access to S3, Athena etc? Where do I start on them?