Minimising data scanned with RLS and partitioned datasets

Hi QuickSight Community :slight_smile: ,

I have an analysis which is shared with clients as an embedded dashboard through a website. The main dataset is stored in Athena and partitioned by two levels (L1 and L2). I have RLS enabled on the dataset which uses the same values as the partitions, some users only have an L1 value and some have both an L1 and L2.

My question is when the RLS is being applied to filter the rows, does it look at the partitions and only load in the relevant ones for that user? Or does it load the entire dataset and then filter?

My aim is to minimise the amount of data that is loaded in, meaning visuals would display faster and cheaper costs on the ingestion side. If it loads the entire dataset, would I be able to use dataset parameters?

1 Like

Hello @jtelford, welcome to the QuickSight community! In regards to this topic, are your datasets that you are using set up to be stored in SPICE? If so then it will only query your dataset on the schedule you have set. Maybe once a day in the middle of the night with an incremental refresh set by a date field in the dataset.

If you haven’t done this, it will greatly improve the load times of your visuals and reduce the burden on your database because the data utilized in the visuals will be stored in QuickSight.

Let me know if this helps with your issue!

Hi @DylanM, the datasets are directly queried via Athena. The load times for visuals haven’t been a noticeable issue, I was just thinking of performance improvements I’d expect with the best data process. The analysis has most of the dataset in it, and then the RLS works to filter based on the end-user with the L1 and L2 values I mentioned in my post.

The data is partitioned in S3/Athena, I’m just wondering if when loading the dataset, the RLS filters the partitions that are scanned and loaded in - or if I could achieve this with dataset parameters?

Hello @jtelford, if you wanted to remain on Direct Query for your dataset, which I wouldn’t recommend for more scalability within your dashboard, using the dataset parameters would make sure the query would only pull the data required for the user’s RLS permissions. Otherwise, the dataset will ingest everything whenever someone is viewing the connected dashboards or analyses, and then filter it to display only the data they have access rights for.

Does your data require a constant refresh or do you think it could be reduced to once a day or a few times a day? That would allow for reduced strain on your S3/Athena resources, especially if you set up an incremental refresh that will only pull in new values.

Hi @DylanM, the data is updated weekly with an ETL script in Glue processing the raw .csv data and pushing it to S3 with the partitions I mentioned.

If we were to switch this to SPICE and use an incremental refresh, would it be better to partition this by date? We don’t have a created date for each row per se, but each row has the start and end of the week that the data corresponds to. Could this be used for creating the look-back window?

The main objective is to reduce resource calls and to maximise speed. I assumed that partitioning the data by the permissions of each user would be the best way to do this, but would partitioning by date and ingesting the data in to SPICE with an incremental refresh work better? Or (because SPICE doesn’t use partitions) could I still partition it by the L1 and L2 values? That way any Athena queries are still optimised.

I’m aware that’s a lot of questions :sweat_smile:, I’m just trying to get my head around the different ways of querying the data!

1 Like

Hello @jtelford, yes if you were going to add this dataset to SPICE and set up an incremental refresh, you would need a Date value the query would know which data was recent and what new data needs to be ingested.

It should continue to apply RLS as it was with the L1 and L2 values, that should not be impacted.

Let me know if that helps!

Hello @jtelford, did my response help you resolve the issue you were facing in QuickSight?

Hi @DylanM, sorry for the late response! Will give the incremental refresh a try when next working on the dashboard

Thanks :slight_smile:

1 Like