Refresh a CSV dataset

Ivan_Eulaers · July 7, 2022, 8:22am

My dataset consists of a CSV file that I loaded on S3. But this CSV file is not yet final, and as soon as I add a new column to the CSV file, I cannot refresh the dataset. I then have to start all over again with creating a new dataset, but especially with creating a new analysis.

How can I solve this? This bothers me immensely. I read somewhere that I should first import the CSV into RedShift and then it would work. Is this the only solution?

Example

name;date;balance
JOHN DOE;2022-01-01;100
JOE SIXPACK;2022-01-01;500

My json manifest file

{
    "fileLocations": [
        {
            "URIs": [
                "s3://bucket-name/probie.csv"
            ]
        }
    ],
    "globalUploadSettings": {
        "format": "CSV",
        "delimiter": ";",
        "containsHeader": "true"
    }
}

After creating the analysis with all the necessary graphs and data, I notice that I need some additional info. I create a new CSV with an additional column.

name;date;balance;new
JOHN DOE;2022-01-01;100;20
JOE SIXPACK;2022-01-01;500;30

But by adding this extra column, I have to start all over again with a new dataset. But most importantly, I have to recreate the analysis again. This is sometimes a work of several days.

Is there really no better solution?

Ivan

Koushik_Muthanna · July 7, 2022, 3:15pm

Hi Ivan,

I had posted a solution which you can test when the schema in your file is changing : Dataset from an S3 folder, cannot add new columns on latest CSV file - #7 by Koushik_Muthanna

Regards,
Koushik