My dataset consists of a CSV file that I loaded on S3. But this CSV file is not yet final, and as soon as I add a new column to the CSV file, I cannot refresh the dataset. I then have to start all over again with creating a new dataset, but especially with creating a new analysis.
How can I solve this? This bothers me immensely. I read somewhere that I should first import the CSV into RedShift and then it would work. Is this the only solution?
Example
name;date;balance
JOHN DOE;2022-01-01;100
JOE SIXPACK;2022-01-01;500
My json manifest file
{
"fileLocations": [
{
"URIs": [
"s3://bucket-name/probie.csv"
]
}
],
"globalUploadSettings": {
"format": "CSV",
"delimiter": ";",
"containsHeader": "true"
}
}
After creating the analysis with all the necessary graphs and data, I notice that I need some additional info. I create a new CSV with an additional column.
name;date;balance;new
JOHN DOE;2022-01-01;100;20
JOE SIXPACK;2022-01-01;500;30
But by adding this extra column, I have to start all over again with a new dataset. But most importantly, I have to recreate the analysis again. This is sometimes a work of several days.
Is there really no better solution?
Ivan