I have tested an s3 manifest two ways. The first way uses URI only, pointing the manifest file directly to a single CSV file.
{
"fileLocations": [
{
"URIs": [
"https://work.s3.amazonaws.com/file.csv"
]
}
],
"globalUploadSettings": {
"textqualifier": "\"",
"containsHeader": true
}
}
The second way uses the URIprefix (see below), since I will have multiple CSVs which I want to be included in the Quick Sight dataset. When using this method, Quick Sight ingests data which does not exist in the CSV. For example, if I have no CSV files in the S3 bucket, Quicksight is still ingesting about 24 rows of data. When using the first method above, Quick Sight only ingests the number of rows in the CSV file(s). When there is a CSV file in S3 and we use the below method, it ingests null rows which do not exist in the CSV file.
{
"fileLocations": [
{
"URIPrefixes": [
"https://s3-us-east-1.amazonaws.com/work/"
]
}
],
"globalUploadSettings": {
"format": "CSV",
"delimiter": ",",
"containsHeader": true
}
}
Where are these null records coming from if not from the CSV file? Is it possible they are picking up some raw or hidden data inside of S3?
Hi @jasanderson - Welcome to AWS Quick Sight community and thanks for posting the question. If you have multiple csv files in a particular path along with other files, the best approach is the option and put all details in a separate entity in the array. The second option is good when we have only CSV files available in the particular prefix.
Hi @WLS-DM - Do we know whether we can put a wild card matching in manifest files for uploading the data from S3 to Quick Sight.
Regards - Sanjeeb
The only files we had in the root of S3 were the manifest and a single CSV file.
If you point the manifest to the root of S3, Quick Sight will ingest Null values into the dataset regardless of whether there is a CSV or not. We tested this by deleting the CSV, such that the only thing in S3 was the manifest itself.
So, we created a subfolder in the S3 bucket for the CSV files and pointed the manifest to the subfolder. This solved the issue of null values, so apparently the manifest picks up something from the root of S3 (even when there is no file matching the GlobalUploadSettings filetype).
Strange behavior, but the subfolder resolved the issue.
Hi @jasanderson - Thanks for highlighting the issue. I believe the issue is happening when you put the file in the root S3 bucket and it added NULL values to the data, however the same issue is not happen when the file is put in a sub folder. This is interesting and we can raise this to Quick Sight team to put it as an enhancement request.
Hi @WLS-DM @WLS-D - Can you please help on this?
Regards - Sanjeeb