Sankey Diagram Nodes

bilforde · February 17, 2022, 11:42pm

Hi,

Can anyone shed light on how QuickSight picks the nodes for Sankey diagrams? I have read the docs and viewed the video. I also looked at another post on Sankey diagrams but none really explain what is happening under the hood.

My issue is that when I choose the Sankey diagram for my current dataset, it is choosing the wrong nodes. As a result, the flow is wrong. I have many dimensions of varying granularities, and I suspect that is causing the issue. Is there any way to tell QuickSight the column it should be following? Because having to aggregate datasets into much simpler ones just to use a Sankey diagram is prohibitively burdensome.

Thank you!

Tatyana_Yakushev · February 18, 2022, 12:06am

Think about data provided for the Sankey diagram as a collection of links between nodes. QuickSight looks at the collection and figures out what nodes should be where. In simple cases such as
a->b
b->c
it is very clear that “a” will be in the first column, “b” in the second column and “c” in the third column.

Sankey diagram layout becomes less predictable if there are circular links, such as
a->b
b->c
c->a
In such cases, QuickSight picks one of the nodes (“a”, “b” or “c”) and shows it twice to break the cycle. You can’t really know which one it will choose to repeat. (Repeating node will have all outgoing links coming out of one instance and all incoming links going to the other instance).

The exact X and Y coordinates of each node are determined by an algorithm that tried to make it look better (e.g. have fewer overlaps between links).

bilforde · February 18, 2022, 5:15pm

Thank you for your response. I understand better now. So since there are some circular paths, it is automatically pulling some nodes to the left in order to make the diagram have the fewest overlaps between links.