Best way to work with multiple-sheet excel file?

risk33 · July 1, 2022, 4:41pm

I have access in AWS Glue Databrew, AWS S3 and AWS Athena. Is there a best way to work with multiple sheet excel file using these services?

Scenario:

I have an excel file that contains 100+ sheets.
The sheets are named per date.

Example: Sheet 1 is named as 1.1.2022, Sheet 2 is named 1.2.2022, etc.

This is an example of what is contained per sheet. The date is not to be seen in the table except the name of the sheet. The ‘Periods’ basically just tell the duration each person spent on a certain period.

Name	ID	Period1	Period2	Period 3
Adam	510	05:11:20	03:10:33	07:19:58
Ben	205	04:00:00	02:02:02	00:25:68

I am lost and have no idea how I could possibly combine all of these sheets together to form a single table so I could query them per date in Athena and visualize it on Quicksight.

I have checked a few tutorial videos about partitioning the data in S3 but I am unsure how to proceed. Please help. Thank you.

SeanBoon · July 19, 2022, 10:27pm

Hi,

Thanks for reaching out. One recommendation that would not require you to individually load the files and combine them which would understandably be pretty tedious, would be to do this in Python and Pandas. read_excel() method in Pandas can return a dictionary of data frames, one DF per sheet. Then just need to concat() into one DataFrame and write to the file type of your choosing. You can then upload that single file.

Sean

risk33 · July 20, 2022, 4:29pm

Thank you! I just remembered I posted this question here. I finally was able to do this on AWS Glue 2.0 with python and panda > then to S3 > to Quicksight.