This is a place for theory about graph layout topic hopefully creating some more understanding about graphs to make it easier to work with.

"We are drowning in information but starved for knowledge. -- John Naisbitt, Megatrends"

google has a problem it mostly shows not interesting spring embedders or graphviz
then the other directed graph programs and libraries are really very difficult to fine.

The theory can be hard to read in scrience papers see this
https://github.com/Jam3/math-as-code

There is a twitter account only about graphs at https://twitter.com/NetworkFact

gdea has many graph drawing papers at http://gdea.informatik.uni-koeln.de/

Graph theory is used in learning ai and explained easy here at
 https://ericmjl.github.io/essays-on-data-science/machine-learning/graph-nets/
"demystify graph deep learning" or here in this graphlearning.pdf

There is a free source including graph algorithms with video lectures from open source university at https://github.com/ossu/computer-science

Graph layout is used in chip design and also using ai example at https://www.nature.com/articles/s41586-021-03544-w

Graph algorithms are used to improve robot movements at https://people.csail.mit.edu/jiex/papers/robogrammar/index.html#

Via google there are many pdfs to find about graph topics. or this link https://www.semanticscholar.org/

The book from kozo sugiyama is available from a bookstore site and recommended.

The book about the edge graph tool from paulisch is still interesting because it is about graph data with added constraints which are implemented at last positioning step when constraints get higher priority then dummy nodes and are handled first, then dummy nodes and then other nodes.

Here is a free pdf about graph data structures and algorithms from jeff ericson http://jeffe.cs.illinois.edu/teaching/algorithms/

or local copy from jeff ericson is in this Algorithms-JeffE.pdf

few link to graph theory math at a math list
https://github.com/rossant/awesome-math#graph-theory

Also free available is a GNU GPL Free book on algorithmic graph theory at https://code.google.com/archive/p/graphbook/

or a local copy of the graphbook is in this latest-r1991.pdf

Wikibooks are free information books for example on graph theory at https://en.wikibooks.org/wiki/Category:Book:Graph_Theory

on freecodecamp is graph explained at i-dont-understand-graph-theory

The brandes node placement is detailed here at brandes.html

Some old links about this topic are at oldlinks.html

pseudo code of few graph algorithms with java https://github.com/mayankraghuwanshi/Graph-Theory
optimizing flow graph in c https://github.com/satukoskinen/lem_in

There are many free graph pdf's available to find using https://scholar.archive.org/

For example this old pdf which shows how a graph layout works in ADA173877.pdf

pdf


The graph layout puts the nodes in relative vertical levels using dfs algorithm or otherwise depending how nodes are connected.
The nodes in one level have relative horizontal position 1 to last node and the sugiyama algorithm sorts the nodes according to barycenter value of a node and the order of the sorted nodes sets the new relative x position of a node.
The graph is scanned from top to bottom level and vica versa and 2 levels are considered at every iteration step.
The barycenter of a node is the average of the positions of the incoming nodes or outgoing nodes depending on level scan direction. Or average of both values.
In second phase the nodes with equal barycenter value are swapped in x position.
During this scan of levels the number of edge crossings reduces to determine at every scan and when the change of edge crossings drops below some value the sugiyama can stop because the graph may continue to change edge crossings.


layout flow

 The graph can be constructed using lists in C or via a matrix like this:

/*
This algorithm is for routing hierarchies of elements.  A "good route" is
one that has a minimum number of link crossings.  An algorithm that was
truly optimal (for minimizing link crossings) would be combinatorial and
therefore cost prohibitive; therefore, this algorithm uses a heuristic
approach that finds a route with close to the minimum number of crossings
in a reasonable amount of time.

This algorithm assumes that all the elements form a directed acyclic graph
(DAG), which means (1) that links flow one way between elements and (2) for
any given node there is no way to get back to the node if, starting at the
node, you traverse the links going from node to node.  This algorithm also
assumes that AT MOST only ONE link may exist between a pair of nodes.

-------------------------------------------------------------------------------

OVERVIEW OF ALGORITHM

All elements that do not have any parents are placed in the first row (row 0).
Elements are assigned to rows, where the row number for each child is equal to
the [maximum(row number of all its parents) + 1].  Crossovers are determined
by examining links between elements on adjacent rows, so if a parent is in a
row that is not adjacent to its child's row, "dummy" nodes are created on the
rows in between the parent and child, and the parent and child are connected
via these dummy nodes.

Once the elements (now called nodes) are assigned to individual rows, the
rows are sorted (repeatedly) in order to minimize link crossings.  The
sort criteria involves attempting to both center children under parents and
to center parents over children.  The sort orders are then tweaked by swapping
nodes that have the same sort value.

After the column orders are finalized, the nodes are spread out so they are
more or less centered above their children and below their parents.  When
centering children below parents, a row of children is sorted by which node
has the greatest number of parents.  These get first choice of where to be
placed under the parents (actually, dummy nodes get first preference, then
all of the others).  Centering parents above children is analogous.

When done with node placement, there may be some empty columns, and the
numbering scheme may not start at 0.  Therefore, the empty columns must
be eliminated and every node needs its column renumbered, starting at 0.

Then you are done.

-------------------------------------------------------------------------------

REALIZATION MATRIX

When it comes to both sorting nodes and horizontally spacing the nodes, two
adjacent rows are always involved.  For example, if we are sorting row[i]
based on the children of row[i]'s nodes, then row[i+1] is also involved
at this step.  These two rows are called the "i-th realization", and form
the "i-th realization matrix".  A realization matrix shows the parent-child
relationships between adjacent rows, with the parents on the rows and the
children on the columns.  If there is a parent-child relationship, a 1 is
stored in the matrix at the position, else a 0 is stored.

An example:

                            A     B     C  D
                            \ \  / \  / /  |
                             \ /\  / \ /   |
                             /\ / \  / \   |
                            / /\   /\    \ |
                           //   \ /   \   \|
                          E      F     G   H

   E  F  G  H
A  0  1  1  0    In this example, parent 'A' has children 'F' and 'G',
B  1  0  0  1                     parent 'B' has children 'E' and 'H',
C  1  1  0  0                     parent 'C' has children 'E' and 'F',
D  0  0  0  1                 and parent 'D' has child 'H'.

-------------------------------------------------------------------------------

ROW AND COLUMN BARYCENTERS

Two other important concepts are the "row barycenter" and the "column
barycenter" for a node.  The "row barycenter" is the basically the average
of the positions of a node's children.  The "column barycenter" is the average
of the positions of a node's parents.  These numbers tell us where a node
would like to be positioned in its row, depending whether we are positioning
relative to children or parents.

For example, using the above realization matrix, we can calculate the row
barycenters for A, B, C, and D, and the column barycenters for E, F, G, and H.
Since the row barycenter of a node is equal to the sum of the positions of
the node's children divided by the number of children of the node, the row
barycenter for A is (1 + 2)/2 = 1.5.  This assumes that we start numbering
rows and columns at 0.  Similarly, the column barycenter of a node is equal
to the sum of the positions of the node's parents divided by the number of
parents of the node.  So, the column barycenter of F is (0 + 2)/2 = 1.0.
The complete example is as follows:


                             Row
      |  E    F    G    H  | Barys
------+--------------------+-----
A     |  0    1    1    0  | 1.5
B     |  1    0    0    1  | 1.5
C     |  1    1    0    0  | 0.5
D     |  0    0    0    1  | 3.0
------+--------------------+-----
Col   | 1.5  1.0  0.0  2.0 |
Barys |                    |


If we were to sort the child nodes here by their column barycenters, the new
order would be G, F, E, H.  If we were to sort the parent nodes here by their
row barycenters, then the order would be C, A, B, D (if two or more nodes have
the same value, be sure to keep the same precedence that already exists
between them, e.g., make sure that order after sorting is not C, B, A, D).

If a node has no parents then it can't have a column barycenter.  This case
should never happen, as all nodes that have no parents should be in the first
level of the hierarchy, and these nodes would only be represented in
realization matrix 0, and they would only have row barycenters.

If a node has no children then it can't have a row barycenter.  In this case,
while sorting based on row barycenters, sort AROUND these nodes, i.e., do
not change their positions at all.  For example, if we had the following:

                             Row
      |  W    X    Y    Z  | Barys
------+--------------------+-----
Q     |  0    1    1    1  | 2.0
R     |  0    0    0    0  | ???
S     |  1    0    0    0  | 0.0
T     |  0    1    0    1  | 2.0
------+--------------------+-----
Col   | 2.0  1.5  0.0  1.5 |
Barys |                    |

and we were to sort by row barycenters, the resulting order should be S, R,
Q, T.  Notice how R stayed in its position, and even though Q and T had the
same barycentric value, Q stayed before T.

The whole reason for sorting rows and columns by their barycenters is to
decrease the number of crossovers.

-------------------------------------------------------------------------------

CROSSOVERS

The realization matrix is also used to count the number of crossovers between
two adjacent rows of nodes.  For each row, starting with the second row, if
a row element has a 1, then sum up all of the matrix elements that are above
AND to the right of this element.  Looking again at the first example:




                            A     B     C  D
                            \ \  / \  / /  |
                             \ /\  / \ /   |
                             /\ / \  / \   |
                            / /\   /\    \ |
                           //   \ /   \   \|
                          E      F     G   H




                             Row
      |  E    F    G    H  | Barys
------+--------------------+-----
A     |  0    1    1    0  | 1.5
B     |  1    0    0    1  | 1.5
C     |  1    1    0    0  | 0.5
D     |  0    0    0    1  | 3.0
------+--------------------+-----
Col   | 1.5  1.0  0.0  2.0 |
Barys |                    |

Starting with the second row (parent B's row), position B-E has a 1.  Looking
at positions above and to the right, we see positions A-F and A-G both have
a 1, so the number of crossovers is currently = 2.  Position B-H has a 1, but
there are no elements above and to the right, so crossovers is still = 2.
For parent row of C, position C-E crosses over with B-H, A-F, and A-G, so
crossovers = 5.  C-F crosses over with B-H and A-G, so crossovers = 7.  For
parent row D, position D-H doesn't cross over with any other link.  So for
this realization matrix representing these two rows, the number of crossovers
is 7.

The total number of crossovers for the whole graph would be the sum of the
crossovers from each matrix realization.

-------------------------------------------------------------------------------

NODE CENTERING

After the nodes for each row have their final sort order, the nodes need to
be assigned to grid positions.  Their initial grid position will be their
column position, by which we mean their array position in the row.  From now
on, when we take a row or column barycenter, we will be using grid positions
instead of column positions.

Note:  The following examples will be based on centering children with respect
to their parents' positions.  Centering parents based on their children's
positions is analogous.

When positioning the nodes on a row based on their parents' positions, the
nodes must be initially sorted to see which nodes get first choice.  The dummy
nodes go first, and the rest of nodes are sorted in descending order based on
the number of parents the node has.  If a dummy node has a parent that has
multiple dummy nodes, all of these dummy nodes are again sorted by how close
to the center of the parent's children they are.  This is a confusing
statement, best illustrated by example:

            P1              P2
             \              |
               \  __________^__________
                 \|    |    |    |    |
                  C1   D1   D2   C2   D3

Here, parent P1 has one child, C1.  Parent P2 has five children, and three of
the child nodes are dummy nodes: D1, D2, and D3.  C1 is child 0 of P2, D1 is
child 1 of P2, D2 is child 2 of P2, C2 is child 3 of P2, and D3 is child 4 of
P2.  The child midpoint underneath the parent is equal to
(the number of children - 1) / 2, so (5 - 1) / 2 = 2.  Since the dummy nodes
go first, the initial order is D1, D2, D3, C1 (because it has 2 parents), and
finally C2.  All of the dummy nodes have the same parent, so we will sort them
based on how far away they are from the parent's child midpoint.  D1 is child
1 of P2, so it is 1 away.  D2 is child 2 of P2, so it is 0 away.  D3 is child
4 of P2, so it is 2 away.  Therefore, the final order for choosing positions
is D2, D1, D3, C1, C2.

In a situation similar to the dummy nodes, if a non-dummy node has a only one
parent, and that parent has other children with just one parent, then these
one parent child nodes that have the same parent need additional sorting in
the exact same manner that we just did the dummy nodes.

The whole purpose behind this is so that the left most node doesn't get first
choice.  If it did, we would get graphs that look like:

                 A                                     A
                 |                                     |
                 |_________       instead of      _____^_____
                 |    |    |                      |    |    |
                 B    C    D                      B    C    D

Anyway, once we have a sort order for the nodes of a row, we place the nodes
in their preferred positions.  Using the previous example, assume that P1
is in grid position 2 and P2 is in grid position 5.  D2 gets first choice,
and its column barycenter (based now on parent grid positions, not column
positions) is 5, so we place D2 in position 5.  D1 is next, its barycenter
is also 5.  We can't give it 5 since that position is occupied, so we give
it the closest possible position we can, which in this case is 4.  D3 is next,
and its barycenter is also 5.  The closest position that we can give it is
position 7, since we must allow room for C2.  C1 is next, and its barycenter
is (2 + 5)/2 = 3.5, which we round to 3.  Position 3 is open, so we go ahead
and give it position 3.  C2 is last, and its barycenter is 5.  However, the
first position available to it based on its left neighbor is position 6, so
we assign it position 6.

-------------------------------------------------------------------------------

GOING 'UP' OR 'DOWN' THE GRAPH

"Going down the graph" means taking each realization matrix situation,
starting with Realization Matrix 0, and performing some action on it, then
going to the next realization matrix, and continuing until all of the
realization matrices have been visited.

"Going up the graph" is analogous, except you start at the bottom with the
last realization matrix and work up to Realization Matrix 0.
*/
Also explained like this:

The Sugiyama layout algorithm
-----------------------------
This layout algorithm's goal is to reduce edge crossings in the graph.

The Sugiyama layout algorithm consists of four phases: 

Phase 1 does a topological sorting of the graph, which assigns edges to levels.
    Each level will be displayed at the same vertical position.
    There may be edges which cross more than a single levels. These edges,
    called `long' edges, are split into a series of shorter ones by 
    inserting special nodes, called `dummy' nodes, on each intermediate 
    level. The result is called a `proper' graph.  

    The label of the long edge is copied to (only) the first edge in the 
    series (i.e.  between source and the first dummy node).  The long
    edge is marked invisible.  When a new layout is requested, the 
    procedure MakeImproper is called.  MakeImproper removes all dummy 
    nodes and the edges between them and restores the long invisible edges
    back to visible.  The topological sort is then done on this graph
    and then the graph is made back into a proper one (MakeProper).
    The Sugiyama layout uses what we call the `level' structure when
    laying out the graph.  This is basically a linked list of levels
    where each level contains a linked list of all the nodes in that
    level.  NewLevelStruct generates such a level structure from the
    graph.

Phase 2 does a repeated barycentric ordering of the nodes on all levels 
    starting on level 0, going down the level structure and then up again. 
    For barycentric ordering one always considers only two adjacent levels.
    Barycentric ordering while going down/up orders the nodes on
    the lower/upper level according to the average position of their 
    predecessors/successors. Nodes with equal barycenters keep their order.
    This heuristic method works quite well to reduce the number of 
    edge crossings.  One can either simply sort the nodes in this level
    according to their barycenter values or one can first check whether
    the new (sorted) arrangement leads to fewer crossings (and only if
    so, change the ordering).  This option is specified in the fourth
    (sortcc) parameter to the algorithm (0=don't count crossings, 
    1=do count crossings).  Phase 2 is repeated until either the number 
    of edge crossings is remains the same or a maximum time of iterations 
    is reached (the first (BC_I) parameter specifies this maximum).  

Phase 3 reorders nodes which have equal barycenters and then calls phase 2
    again.  With the second parameter you can give the maximum number of 
    iterations of phase 3.  Again, there is an option that controls how
    the crossings should be counted in this phase.  When considering
    a level, one can have it count the crossings in both the level
    above and below or only in the next level (depending on whether
    you're going up or down).  The fifth (CC) parameter is 1 to count
    in one direction only or 2 to count in both directions.

Phase 4 does the finetuning of the graph by a procedure similar to phase 2
    (up/down-ward passes in the graph).  This phase calculates the
    x,y position of each node.  The relative positions of the nodes 
    in each level, however, is preserved.  Each node in a level is 
    assigned a priority and nodes with the highest priority are set first.
    The position of a node is set to the average position of its successors
    and predecessors, with the restriction that a node can never cause
    a previously set node which has higher priority to move.  The priority 
    of a node is defined as follows: Dummy nodes have the highest priority.
    This has the effect that long edges crossing several levels are drawn 
    straight. For other nodes the priority is the number of incoming or 
    outgoing edges.  We have added our own modification to the Sugiyama
    algorithm in that we assign dummy nodes which are in the middle
    of the graph an even higher priority than those near the edges.  This
    tends to make the graph a bit wider and much more symmetrical.  Doing 
    phase 4 one time usually is sufficient. Nevertheless with the third 
    parameter you can give the number of repetitions. Every repetition 
    tends to spread the graph a little.  Downward passes tend to make
    the bottom nodes appear centered, and Upward passes vica-versa.  
    After making a series of down/up passes, the final pass is made from
    the widest level downwards.  This should make both the top and the
    bottom of the graph appear nicely centered.  This pass is usually
    the cause for any unexplainable bend in the long (should be straight) 
    dummy edges.

/ TransitiveReduction performs the transitive reduction of graph g in place.
// The transitive reduction of a graph is a graph with as few edges as
// possible with the same reachability as the original graph. This means
// that if there are three nodes A => B => C, and A connects to both
// B and C, and B connects to C, then the transitive reduction is the
// same graph with only a single edge between A and B, and a single edge
// between B and C.
//
// The graph must be valid for this operation to behave properly. If
// Validate() returns an error, the behavior is undefined and the results
// will likely be unexpected.
//
// Complexity: O(V(V+E)), or asymptotically O(VE)
func (g *AcyclicGraph) TransitiveReduction() {
    // For each vertex u in graph g, do a DFS starting from each vertex
    // v such that the edge (u,v) exists (v is a direct descendant of u).
    //
    // For each v-prime reachable from v, remove the edge (u, v-prime).
    for _, u := range g.Vertices() {
        uTargets := g.downEdgesNoCopy(u)

        g.DepthFirstWalk(g.downEdgesNoCopy(u), func(v Vertex, d int) error {
            shared := uTargets.Intersection(g.downEdgesNoCopy(v))
            for _, vPrime := range shared {
                g.RemoveEdge(BasicEdge(u, vPrime))
            }

            return nil
        })
    }
}

The median is the middle value, where half of the data is above and half is below. The mean (aka average) is defined as the sum(value) / count(value),  and the mode is defined as the most common or frequently occurring value.

The median, or the middle value, is also known as the 50th percentile (the middle percentile out of 100). This is the value at which 50% of the data is less than the value, and 50% is greater than the value (or equal to it).

This leads us to percentiles: a percentile is defined as the value where x percent of the data falls below the value.

For example, if we call something “the 10th percentile,” we mean that 10% of the data is less than the value and 90% is greater than (or equal to) the value.

Simply put, using averages can have a dramatic (and negative) impact on how values are reported, while percentiles can help you get closer to the “truth.”

The breadth first search algorithm of the chopping phase per-forms a limited
exploration of the graph.
During that explo-
ration, it disconnects from the graph multiple small graph


The breadth first search algorithm of the chopping phase per-forms a limited
exploration of the graph.
During that explo-
ration, it disconnects from the graph multiple small subgraphs.

Graph chopping algorihtm
Input
: A forgetting graphG= (V,E), a root vertexr in V
Parameters
:n the number of sites to extract,s max node in a site
Output: A graph containing a maximum of n sites

Let Gr= (Vr,Erbe a a graph.

while
|V|>0 and n >0 do

Let q1 be a queue.
Let
q2 be a queue.
Let g= (N,A) be a graph

push q1,r
while|N|< s and |q1|>0 do

empty q2
for all x q1 do

if |N| > s then break end if

for all y such that(x,y) E do
if |N| > s then break end if
if y ∈ N then A←A ∪ {(x,y)} end if

NN∪{y}
AA∪{(x,y)}


push y in q2
end for
end for

q1q2

end while
VV \ {N}
EE \ {(u,v)|u,vN}


rfirst vertex left inV
V
rVr N

ErEr A
nn1


end while

returnGr