I’m experimenting RLS and CLS setup with different approaches. I want to know limits, pros and cons, and QuickSight team’s recommendation for each approach.
having the same RLS dataset per each SPICE dataset vs having its own RLS dataset separately per SPICE dataset
aggregating all RLS ids/rules into 1 row per user/group vs RLS dataset with 1 RLS id/rule per row per user/group
performance impact or latency on this RLS dataset (we want to check factors that may affect the performance)
RLS dataset schema
#1 RLS: username, groupname + rls_id (concatenated a, b, c column values)
#2 RLS: username, groupname + a, b, and c columns
Limits for the number of RLS ids for each user/group for SPICE (The document says 999 rules per user and 192,000 filter values per user. Not sure the difference between filter rules and filter values), and limits for Direct Query mode (when sourcing data from Athena or Redshift)
Any other restrictions that we should consider when applying RLS
For #1 RLS dataset, what I found so far, if RLS id is too long (about 30-40 characters), it omits/cuts off the value silently.
Checking CLS setup recommendation and limitations too.
limit for number of users/groups per each column security rule
limit for number of users/groups per SPICE dataset
Hello @hoyeon, welcome to the QuickSight community! I will try to get through your questions in order, let me know if you have any follow-up questions. RLS:
I generally try to utilize the same RLS dataset per SPICE dataset to make it easier to manage. Now if certain fields do not match or more permissions are required on certain datasets, it may be necessary to build more than 1.
This will depend on if the rules are linked. If it is hierarchical data, where each rule is nested within each other in the data, then a single row can be made per user/group. Otherwise, if the rules are independent of each other, and you do not want one rule to impact another, than the user/group should be split into multiple rows.
I suppose you could see some latency impact if your list of users/groups is massive, but otherwise, I have noticed very low impact
#2 for RLS schemas is the right choice. One thing to note, you will either want it to be at the user level our group level, but columns should be split up as they are in the dataset they will be set to
Filter values will relate to each value you set to a user/group. If you have 5 values in a column seperated by commas or multiple rows for the user with different value options. If you are utilizing Direct Query, the limits are the same, but I would definitely make sure you stay well below the limits to avoid latency.
In regards to CLS, the documentation for this feature is rather vague. I will see if I can find some more specific rules on the limitations, but I would recommend testing a bit locally to see what issues you run into. I would imagine the limitations for users/groups would look similarly to RLS. Nesting should work as anticipated if the column values are linked. Building cascading filters in your dashboard could also assist to ensure the user has proper permissions.
I’ll link some documentation below on cascading filters and CLS, but I hope this helps guide you further!
Q1.
It’s still not clear what this means. 999 rules per user and 192,000 filter values per user.
What is 999 rules per user? Does it mean RLS dataset’s rows per user cannot exceed 999 rows?
192,000 filter values limit means the number of rows per user multiply by the number of columns in RLS dataset?
Does this limit apply to groups too?
Q2.
I suppose you could see some latency impact if your list of users/groups is massive.
How massive can it impact? I expect the number of rows in RLS dataset would be about 300,000, which may grow slowly.
Q3.
If a user belongs to multiple groups, which RLS rules are applied? Is it all unioned rows(OR) for a user or just apply one of them?
Hello @hoyeon, so the 999 rules per user is the limit of rows per user, whereas the 192,000 limit is for the number of specific values. I believe that limitation includes comma seperated values in a single field.
As for Q2, 300,000 shouldn’t be an issue at all. If custom SQL complexity is high, you may see some latency, but if the SQL is simple or you are directly linking a table or view from your database, I wouldn’t be worried. Millions of rows is where impact should be considered.
Group RLS would be applied seperately to users within them. If Group1 can see Customer A and Group2 can see Customer B, the user in both would see Customer A and Customer B. I hope this helps!