S3 to QuickSight

Hello Everyone,

I want to integrate show the S3 data on QuickSight Dashboard manually. The data in S3 is in parquet format. I want to know if I would be able to integrate S3 bucket to QuickSight Dashboard directly. The data inside S3 bucket will be in parquet format. Also, can quicksight handle the new data added into S3 automatically? S3 expects to get new data daily, once the data is Put into S3 β†’ expectation is that the data from S3 will be shown on QS Dashboard.

Please guide me,

Thanks a lot.

Hello DeepPatel,

You could refer Creating a dataset using Amazon S3 files - Amazon QuickSight to connect to S3 from Quicksight.
As far as i know QS does not support S3 file in Parquet format. But you may use Athena as Dataset to support parquet format.

You can use files in Amazon S3 or on your local (on-premises) network as data sources. QuickSight supports files in the following formats:

  • CSV and TSV – Comma-delimited and tab-delimited text files
  • ELF and CLF – Extended and common log format files
  • JSON – Flat or semistructured data files
  • XLSX – Microsoft Excel files

QuickSight supports UTF-8 file encoding, but not UTF-8 (with BOM).

Files in Amazon S3 that have been compressed with zip, or gzip (www.gzip.org

), can be imported as-is. If you used another compression program for files in Amazon S3, or if the files are on your local network, remove compression before importing them.

Hope this helps.
Cheers,
Deep

1 Like

Hi DeepPatel - Welcome to AWS QuickSight community and thanks for posting the question.

Yes QuickSight can connect to S3 to import (delimited) text files.

You have to create a manifest file for the same, as @Deep mentioned however it does not support parquet format directly.
https://docs.aws.amazon.com/quicksight/latest/user/supported-manifest-file-format.html

Since you require parquet format, the best way is to load the data in Athena, also if you are looking for a serverless solution:
https://docs.aws.amazon.com/quicksight/latest/user/create-a-data-set-athena.html

Remember to design with partition best practices for cost effectiveness and optimise performance.

Take a look at β€œBest practices when using Athena with AWS Glue”:
https://docs.aws.amazon.com/athena/latest/ug/glue-best-practices.html

Consider SerDe based on the types of values your data contains:
https://docs.aws.amazon.com/athena/latest/ug/serde-reference.html

Regarding your question on frequency to update the data into QuickSight

You can set the Glue Crawler schedule for how often it should update the Glue Data Catalog for Athena.
Schedule the crawler to keep the AWS Glue Data Catalog and Amazon S3 in sync. (in the Best practices link above)

You can still decide wether you would like to query the data from QuickSight directly to Athena or if you would like to use SPICE:
https://docs.aws.amazon.com/quicksight/latest/user/spice.html

SPICE can be setup to do incremental refreshes
https://docs.aws.amazon.com/quicksight/latest/user/refreshing-imported-data.html

Regards - Dylan

2 Likes

Hi @deeppatel7981 - The preferred option to create an athena table for the parquet file and connect QuickSight with Athena. There is NO direct integration for parquet files to QS at present.

Regards - Sanjeeb

1 Like

Hi Deep,

Thanks a lot for your quick response, Parquet Format is not a hard requirement for me. I was able to change the format of the S3 data to β€œjson”, with that being considered I am adding compression to the S3 files to gzip.

Something like this:

glue_context.write_dynamic_frame.from_options(
        frame=dynamic_frame,
        connection_type="s3",
        connection_options={"path": output_path},
        format="json",
    )

write_dynamic_frame.from_options() does not support compression for json. Is it a possible way to connect S3 to QuickSight?

Thanks,
Deep

Hello @deeppatel7981, @Deep , @Sanjeeb2022 or @dylange 's

@deeppatel7981 were you able to find a solution or workaround for this or was one of the suggestions above helpful?

Hello DeepPatel,

Its been a while, we have not heard back from you. we assume the issue has been solved based on the suggestion provided before. I will mark the question as solution provided. Let us know otherwise.

thank you
cheers,
Deep