My data with 230 columns of data is coming in via files posted to S3. Glue is of course providing the initial schema and I’m having to modify it to meet my needs. My challenge is that as new files come in they often have a column removed or added. If I change my schema to match new files, then the old files don’t line up correctly. Creating a new Glue Crawler and S3 drive each time the columns change is a heavy price to pay so I’m hoping someone has a better solution?
Related topics
| Topic | Replies | Views | Activity | |
|---|---|---|---|---|
| Error when refreshing updated AWS Glue column names | 5 | 811 | January 3, 2024 | |
| Dataset from an S3 folder, cannot add new columns on latest CSV file | 6 | 2580 | March 28, 2022 | |
| DataSet change in Quicksight | 2 | 960 | December 22, 2021 | |
| Can you create an automatic data update from aws s3 t? | 5 | 487 | October 5, 2023 | |
| QS dataset does not refresh as should from s3 datasource | 5 | 78 | October 22, 2024 |