Hello,
I am importing a relatively large (around 400 MB) JSON file (DynamoDB-JSON format) into QuickSight as a dataset, either by uploading it directly (“Upload a file”) or by specifying a manifest file on s3 that points to the data file on s3.
There are some columns/database fields that only start to appear in the (about) last third of that data file (because these db fields were only recently been added to the db where the data originally comes from).
QuickSight successfully imports all lines from the json data file, I have checked this by comparing the number of successfully imported rows (no skipped rows) with the number of lines in the json file – they match.
However, in the resulting dataset, the rather newish columns / db fields are not displayed anywhere, neither in the field list nor in the sample data. There are no excluded fields and all fields ought to be displayed. Also, when I create a small subset of the imported json file (e.g. with only a handful of data records of which some contain those newish columns), the newish columns are displayed in the QS dataset.
What is the reason for those newish columns not being displayed at all in the dataset for the whole json file? Please note that this is not a refresh of an existing dataset but I am creating a new dataset based on the large json file.
Could it be that QuickSight only scans the first part of that large json file for determining the data structure and since the mentioned newish columns/fields likely do not appear within that first part, QuickSight ignores them when it encounters them when importing the remaining data from the file? If so, how can I get those columns/fields be made available in the dataset?
More generally, how can I ensure that all columns/fields in a data file end up in the data structure QS is creating based on its analysis of the data file, even if some column/field only appears a few times somewhere (and not necessarily near the beginning) of a data file?
Thanks,
Kaspar
PS: After moving a few data records which contain the mentioned newish fields from the rather end of the data file to the rather beginning of it, those columns finally get displayed in the dataset. However, this result was based on manual editing of the data file. Is there a way to tell QuickSight to build the data structure based on all columns/fields found in a data file?