I have a question whether my type of use-case will be suitable for building a quicksight dashboard. Basically I have IDs with bunch of columns and some columns are in concatenated string by pipe such as below. And I have few variations of filters that I would like to create
Filter by a specific value e.g., count number of rows that contain ‘a’
Filter by multiple values with OR e.g., count number of rows that contain either ‘a’ OR ‘c’. there could be three or more values that can be filtered to
Filter by multiple values with AND e.g., count number of rows that contain both ‘a’ AND ‘c’. there could be three or more values that can be filtered to
Not just counting total number of rows but there will be additional charts for further breakdown of other columns
id concatenated_column
1 a|b|c|d|e|f|g|h|i|
2 e|f|l|a
3 b
4
5 d|b|f|i
6 g|l|c
7 a|b|c|f
8 a|e
9 a|c|g
From above example, what I would like to know are:
Is there an appropriate way to design the filters such that it can fulfill three cases? Note that there will be like 30+ columns with concatenated values and will need to give users to filter to those columns if needed from the business side
Or is it better to pre-process it at the backend considering the dataset resides in Amazon Redshift?
QuickSight can certainly handle the required filtering scenarios, but you should consider both data modeling and user experience aspects to decide the best approach.
Option 1: Handling Filters in QuickSight does provide the capability to filter data dynamically, and for your scenario, you could use calculated fields or create filters that handle complex cases.
You can create individual filters on the concatenated_column to search for specific values. Use Contains or OR logic to set up multiple values, allowing users to filter for either a OR c OR other values.
AND logic becomes a bit trickier to support complex operations like finding multiple values in the same field. However, you can create custom calculated fields to check for the presence of all specific values. You could create a calculated field to count occurrences of a and c, and then filter based on rows that contain both. This can be done using a combination of ifelse and contains() functions.
Option 2: Pre-Processing in Backend (Recommended for Complex Cases)
Pre-processing the concatenated columns into a normalized format will help in scaling the dashboard as you add more rows or columns. QuickSight can run against the transformed data, making the experience seamless for end-users.
Considering the complexity of multiple concatenated columns and the need for flexibility in filtering, it’s advisable to transform the data within Redshift. A transformation step could break the concatenated strings into separate rows/columns, which allows for simpler, more performant queries.
Once the data is normalized, QuickSight will provide a smoother experience for users to filter across multiple columns, handle OR and AND conditions, and visualize the data without requiring custom calculated fields.
If you need to provide an excellent user experience with seamless filtering, normalizing the data in Redshift is the best option. This will reduce the complexity of managing calculated fields in QuickSight and improve overall dashboard performance, especially if you have large datasets with multiple columns to filter.
Regarding Option 1: I’ve been exploring the quicksight documentation as well as some youtube videos regarding setting parameters, controls and filters but do you recommend any feature that we could use both OR and AND statement in one filter? This is because we will be having more than 30 different columns that we would need to apply OR and AND so was wondering if you there’s a better way to approach this
Sorry, combining AND and OR conditions within a single filter is not supported yet. However, you can achieve complex filtering logic by using filter groups and calculated fields.
This calculated field returns 1 or 0 for rows meeting the condition, which you can then filter on.
Example: (Syntax may vary - replace the fields from your dataset)
ifelse(
(contains(concatenated_column, 'a') AND contains(concatenated_column, 'c')) OR
NOT( contains(concatenated_column, 'b')),
1,
0
)
--
ifelse(
(contains(concatenated_column1, 'a') AND contains(concatenated_column1, 'c')) AND
contains(concatenated_column2, 'a') AND contains(concatenated_column2, 'c')),
1,
0
)
Since we have not heard back from you, I’ll go ahead and close this topic. However, if you have any additional questions, feel free to create a new topic in the community and link this discussion for relevant information.