Force-directed graph drawing

This article uses the MathML standard to display mathematical equations. It is however not supported by most browsers yet. Consider using Mozilla Firefox which has the best support for it.

Introduction

Motivation

Graph drawing [gra01] is a computationally difficult topic. No efficient algorithm can be found to represent a graph in the visually most appealing way. A graph consists of vertices and edges which have to be positioned. Typical criteria for good graph representations are

Best way to draw the Petersen graph Bad way to draw the Petersen graph
Example to represent the Petersen graph [pet01] in an appealing way (left) and the same graph structure involving intersections and collinearity (right)

Force-directed graph drawing (FDG) is a layouting algorithm for graphs. We define forces (in an “update function”) to move vertices towards a direction and iteratively come to better appealing results. Once certain constraints (we call it “criterion”) are satisfied we terminate the algorithm and accept the intermediate result as final result.

graph = initial_graph
while not criterion(graph):
    graph = update(graph)
High-level pseudocode for our FDG algorithm

The initial_graph is the problem provided as input to our algorithm. Our goal in this tutorial is to discuss criterion and update functions. I will provide a framework so we can focus on those functions and neglect implementation components like visualization.

Wikipedia [wik01] lists 6 advantages of a force-directed graph drawing approach to other alternatives:

Good-quality results
real-world implementations; uniform edge length, uniform vertex distribution and showing symmetry as criteria
Flexibility
can be adjusted to further constraints or more dimensions
Intuitive
easy to predict and understand
Simplicity
few lines of code
Interactivity
pull several nodes out of their equilibrium state and watch migration back into position
Strong theoretical foundations
theory from physics, optimization & statistics research in math

Among the disadvantages the following items are listed:

High running time
We vaguely estimate the running time of FDG algorithms: Given a graph with n vertices, approximately n iterations are required. In every iteration one vertex has to consider all other n (or n-1) vertices to determine the force [01]. Those three n yield a runtime of Ο(n³) .
Poor local minima
Depending on the definition of the criterion, certain special graphs might satisfy the criterion very early (local maximum) even though better layouts (global maximum) exist.

Requirements

What do you need?

The framework requires the following software

On debian-based systems (Ubuntu, Damn Small Linux, Grml, Xandros) you can install those packages using:

sudo apt-get install python3 python3-pyside

Mathematical preliminaries redux

Coordinate system

The coordinate system is defined by Qt [qtc01] using positive x to move horizontally to the right and positive y to move vertically downwards.

However I mapped the coordinate system to apply the usual coordinate system used in geometry. So if you want to specify the position of a vertex, you have to use the following coordinate system: Bottom-Left is coordinate (0, 0) and all positions are positive. The coordinate of the position Top-Right is given as (width, height).

Coordinate system on the canvas
Coordinate system applied to vertex positioning

If you specify a motion, you don't specify the target position, but a relative position. For example a position <5, 5> and motion of <-2, 1> yields the new position <3, 6>. As you can see in that coordinate system negative values are also possible.

Coordinate system on the canvas
Coordinate system applied to motions

Coordinates in both systems are specified with Vector instances.

Trigonometry and Geometry

To specify my desired graphs I also had to use basic trigonometry. I always computed the center point on my canvas and put my vertices along a circle around this central point. So the question is:

Given a central point, a radius and an angle in degree, compute the xy-coordinate of the point lying on a circle specified by a central point and a radius with the given angle.
Compute the xy coordinate of C
Visually the question is:
Given center = (0, 0), radius = 4 and α = 41.33°.
Compute C's x and y coordinate.

We can use the following definition of sine and cosine:

sin ( α ) = | a | | radius | cos ( α ) = | b | | radius |

The length of the opposite side (a) is the y value of C. The length of the adjacent side (b) is the x value of C. Given angle with 41.33°, its sine is:

Python 3.4.0 (default, Apr 11 2014, 13:05:11) 
[GCC 4.8.2] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import math
>>> math.sin(math.radians(41.33))
0.6603949384518162

Be aware that python3's math.sin accepts radians as parameter but not degrees. Hence we use math.radians to convert it (which is basically a multiplication with π 180 ).

>>> math.cos(math.radians(41.33))
0.7509184544724027

Now we can use that to compute x and y:

sinα=aradiussinα·radius=a 0.6603949384518162·4=a 2.571150438746157=a cosα=bradiuscosα·radius=b 0.7509184544724027·4=b 3.064177772475912=b

Hence point C has coordinates (3.064177772475912, 2.571150438746157)

Simple higher-degree polynomials

I am pretty sure you know that from high school:

p=anxn+an-1xn-1++a2x2+a1x1+a0x0
A general polynomials of degree n with coefficients ai

We only look at simple polynomials. The polynomials we will use and their corresponding plots are given below. Roughly speaking a linear function states that one step in one dimension, will always lead to one step in the other dimension. A quadratic or cubic function will not take one step in the other dimension, but several depending on the size of the first step.

Implementation

This tutorial comes with a framework. Several python3 files are given. The framework is provided because we don't want to be concerned with visualization in this tutorial. We want to focus on the definition of update and criterion functions. Also the discussed concepts here are already available as functions in the framework.

Download framework (tar.bz2)
Download framework (zip)

Properties:

Vector

An abstraction for two dimensional vectors.

A two-dimensional vector with x and y
A vector with x and y values

Create a vector:

v = Vector(int x, int y)

Retrieve length of vector:

len(v)

The length of a vector on a canvas or plane is defined by

l=x²+y²

Retrieve unit vector of the given vector (same direction, length 1):

v.unit()

Access its coordinates:

v.x
v.y

Graph

A graph is a tuple of vertices and edges.

For example the Petersen graph is specified by the following data structure:

({0: <460.000000, 200.000000>, 1: <349.000000, 352.000000>, 2: <171.000000, 294.000000>, 3: <171.000000, 106.000000>, 4: <349.000000, 48.000000>, 5: <380.000000, 200.000000>, 6: <324.000000, 276.000000>, 7: <236.000000, 247.000000>, 8: <236.000000, 153.000000>, 9: <324.000000, 124.000000>}, {(0, 5), (0, 1), (5, 7), (1, 6), (1, 2), (6, 8), (2, 7), (2, 3), (7, 9), (3, 8), (3, 4), (8, 5), (4, 9), (4, 0), (9, 6)})

where <x, y> specifies the xy-coordinates of a Vector instance respectively.

Motion specification

A motion specification (motion_spec) is an association of every vertex of the graph to a Vector instance specifying in which direction the vertex should move in the next iteration.

{ int: Vector }

One example would be to move all vertices of the Petersen graph 1 pixel towards bottom-left:

{ 0: <-1, -1>,
  1: <-1, -1>, 
  2: <-1, -1>, 
  3: <-1, -1>, 
  4: <-1, -1>, 
  5: <-1, -1>, 
  6: <-1, -1>, 
  7: <-1, -1>, 
  8: <-1, -1>, 
  9: <-1, -1> }

criterion interface

A criterion specifies when a graph update [won't] be performed, because a local minimum has been reached. Criteria are implemented as functions which take four parameters:

graph
The graph before the iteration.
motion_spec
A motion specification which was used in the last iteration ({} in the first iteration).
iterations
How many iterations have been executed so far? This parameter contains a continuous integer.
cum
A vector cumulating the total distance of all motion specification vectors of previous iterations so far.

update interface

Given a graph, which motion should every vertex perform for the next iteration? Updates are implemented as functions which take four parameters:

graph
The graph before the iteration.
vertex
The vertex (as identifier) to consider.
width
The width of the canvas.
height
The height of the canvas.

Updates are applied after all updates for all vertices are read.

Our first run

Now we want to learn to use the framework. We will use the update function to compute forces between components and compute motion specifications which will be applied on vertices.

First of all we will execute python3 to run the current implementation.

$ python3 draw.py

The following window should appear:

Default window of the application
Default window of the application

Per default, a random graph is created and as the name indicates, your random graph will most likely look different than mine. If you click on “Update”, the next iteration will be computed and visualized. Okay, let's see which graph, criterion and update function is used:

def main():
    """Main routine."""
    logging.basicConfig(level=logging.INFO)

    # fdg run
    width = 600
    height = 400

    def cont():
        """One FDG iteration."""
        update = updates.center_update
        graph = graphs.random_graph(width=width, height=height)
        criterion = criteria.avg_inside_criterion
        count = 0

        for graph in fdg.run(graph, update, criterion, width, height):
            count += 1
            yield (True, graph, count)

        while True:
            count += 1
            yield (False, graph, count)

    # GUI initialization
    app = QtGui.QApplication(sys.argv)
    window = GraphDrawing(cont, width, height)

    window.show()
    return app.exec_()

The main function defines and sets some parameters, defines the cont function and initializes the GUI. GUI initialization happens with the cont function and the canvas dimensions as parameters. GraphDrawing is defined above and not shown in this snippet. Then the Qt event loop is started.

The cont function defines which initial graph, update function and criterion will be used. Predefined ones are stored respectively in:

We will now use other parameters. We replace

        update = updates.center_update
        graph = graphs.random_graph(width=width, height=height)
        criterion = criteria.avg_inside_criterion

with

        update = updates.const_update
        graph = graphs.circle_graph(nodes=5, width=width, height=height)
        criterion = criteria.fifty_criterion

The following video shows how the implementation behaves now:

Instead of a random graph we use a graph where 5 vertices are positioned in a circle. The update function is defined as follows and implements the update interface:

def const_update(graph, vertex, width=600, height=400):
    return Vector(-1, -1)

The criterion function is implemented as follows (and correspondingly implements the criterion interface) (docstrings have been omitted):

def fifty_criterion(graph, motion_spec, iterations, cum):
    return iterations < 50

The criterion always allows exactly 50 iterations. This might be helpful for testing the functions, but not to find good graph representations.

Update functions

Vertex rejection

Obviously the constant update function does not provide us beautiful graphs. We define theorems, implement them and look at random graphs to improve upon our update function.

A vertex is rejected by other vertices. The rejection is proportional to the distance to all other vertices.

The rejection direction r of a vertex v is defined as follows:

r= 1r ( wwv wx-vx , wwv wy-vy )

Intuitively, the rejection is defined as sum of vertices pointing from any vertex w to the given vertex v normalized to length 1. In code it looks like this:

def simple_vertex_rejection(graph, vertex, width=600, height=400):
    cum = Vector(0, 0)
    for w, pos in graph[0].items():
        if w != vertex:
            cum = graph[0][vertex] - pos

    return cum.unit()

But the result does not look promising:

You should recognize that force only rejects components from each other. Nothing attracts them. So vertices move away from each other. Before we fix that, we apply a different theorem.

A vertex is rejected by other vertices. The rejection is quadratic to the distance to all other vertices.

The corresponding equation and code is defined as follows:

r= 1r2 ( wwv wx-vx , wwv wy-vy )
def quadratic_vertex_rejection(graph, vertex, width=600, height=400):
    cum = Vector(0, 0)
    for w, pos in graph[0].items():
        if w != vertex:
            cum = graph[0][vertex] - pos

    if len(cum) == 0:
        return Vector(0, 0)

    return (cum / len(cum)**2)

Nothing really happens. In general vertices close to each other should move more than vertices far away. The force we defined is too small. Most of the times the computed motion of a vertex does not even exceed the 1-pixel threshold. We can simply apply a factor like 100, but we will discuss that later on.

Edge attraction

To define attraction we apply the following theorem:

Vertices connected by an edge are attracted by each other. Edge length should converge towards some distance dependent on the canvas width or height and number of edges or vertices.

I claim the following theorem:

The perfect uniform edge length is given as the distance between two vertices if the vertices are distributed uniformly in a grid on the canvas.

Let's discuss that more visually:

Edge length derived from a grid of vertices on a canvas
Given a number n of vertices. Find the width of a square s used to distributed those vertices uniformly in a grid. If the vertices do not fit the grid size, a minimal number of vertices can be inserted (gray colorized vertices).

The question raised in Figure 16 is unsolvable. Squares do not necessarily cover the whole canvas. We cannot determine the number of rows, number of columns, number of additional vertices and finally the size of the squares. The problem is under-determined.

We consider a different approximation: Consider width w and height h as ratio of rectangles. We divide both values by the greatest common divisor yielding wg and hg. If the product hg·wg is too large, we divide both values by 2 and round down. If the product is too small, we multiply both values by 2.

import itertools
import fractions

def comp(n, w, h):
    g = fractions.gcd(w, h)
    w_g, h_g = w // g, w // g
    if n < 2:
        raise ValueError("No perfect edge length computable for 0 edges")
    elif w_g * h_g > n:
        for i in itertools.repeat(2):
            new_w_g, new_h_g = w_g // i, h_g // i
            if new_w_g * new_h_g < n:
                return w_g, h_g
            w_g, h_g = new_w_g, new_h_g
    elif w_g * h_g < n:
        for i in itertools.repeat(2):
            w_g, h_g = w_g * i, h_g * i
            if w_g * h_g > n:
                return w_g, h_g
    elif w_g * h_g == n:
        return w_g, h_g


for n in range(2, 200):
    w, h = comp(n, 600, 400)
    print("600x400 with {} vertices ⇒ {}x{}   error={}".format(n, w, h, w * h - n))
Source code illustrating our approximation approach. The result of the function is the number of rows and columns (minus 1 each). We need to compute the rectangle dimensions in a separate step.
Approximation error
Let 600×400 be the width and height of an image. Increment the number of vertices and plot the error (i.e. number of additional vertices). We conclude the error increases linearly to the number of vertices and jumps in exponential intervals.

This approximation might not be perfect and share your thoughts on that, but I think it suffices and it provides good performance measures. Let us write the corresponding update function.

The direction of the motion is given by the adjacent vertex of an edge. The length is given by the distance between both vertices minus the perfect edge length. We don't want the vertex to jump directly towards the perfect edge length. Therefore we only take 10% of the mentioned length.

We can observe in the video that two vertices connected by a vertex converge towards a certain length. After infinite iterations, it jumps between two positions because the vertex of degree 2 is attracted in two directions does not find a perfect position to achieve the desired length. Vertices without edges are kept the same way.

Edge attraction is implemented in the functions uniform_edge_attraction and the helper function compute_uniform_edge_attraction. We use functools.lru_cache [lru01] to enable memoization [02].

Wall rejection

We ignored a problem. Vertices might be rejected and walk outside the canvas. We want to avoid that, because all vertices should always be visible.

All vertices are rejected from the wall in an elliptical vignette.
An elliptical vignette
A general elliptical vignette. Dark areas indicate a stronger rejection towards the center of the ellipsis.

If a point lies outside the ellipse, we want to compute a vector pointing towards the ellipse.

An elliptical vignette
Our ellipse problem: Given a point q outside the ellipse and ellipse parameters a and b, determine a point p satisfying the ellipse equation and being closest to q.

The general equation for points on an ellipse is given as

1= x2a2 + y2b2

x and y refer to the x and y-coordinates of p. From trigonometry we know that

tanα= yx=yqxq

We express both equations by variable y and use equality:

tan2α·x2=1-x2a2·b2

And now we solve it by x. y follows from the second equation.

x=b2tan2α·a2+b2 y=tanα·x

We have solved our problem and our rejection vector is given as vector between q and p. If the vertex is inside the ellipse, no force is applied. A vertex is inside an ellipse if the right-hand side of the ellipse equation is smaller 1. Greater 1 means outside.

We introduce the update function [03]:

def vertex_wall_rejection(graph, vertex, width=600, height=400):
    a, b = width * 0.4, height * 0.4
    c = Vector(int(width // 2), int(height // 2))
    q = graph[0][vertex] - c

    # inside or outside?
    rhs = q.x**2 / a**2 + q.y**2 / b**2
    if rhs <= 1:
        return Vector(0, 0)

    # determine p, given q
    tan_alpha = q.y / q.x
    x = math.sqrt(b**2 / (tan_alpha**2 * a**2 + b**2))
    y = tan_alpha * x

    # compute motion vector
    motion = Vector(x, y) - q
    return motion * 0.01

The grid_graph beautifully shows us which points are affected by this rejection. The implementation is given in updates.uniform_edge_attraction.

Modelling Newton's theory of gravitation

Sir Isaac Newton's theory of gravitation is based on a formula:

F=Gm1m2r2

where m1 and m2 define masses of two objects. r defines the radius or distance between those objects and G is the gravitational constant. Then F yields the force towards each other.

Just for fun, we want to apply the same principle to our vertices. If we want to apply this formula, we need to solve three issues:

Time for a new update function:

def newton_update(graph, vertex, width=600, height=400):
    def degree(v):
        degree = 0
        for edge in graph[1]:
            if edge[0] == v or edge[1] == v:
                degree += 1
        return degree

    motion = Vector(0, 0)
    deg = degree(vertex)
    g = min(width, height)

    for v in graph[0]:
        if v == vertex:
            continue

        deg2 = degree(v)
        diff = graph[0][vertex] - graph[0][v]
        dist = len(diff)
        force = g * deg * deg2 / dist

        motion += diff.unit() * force

    return motion

If we add wall rejection and edge attraction, this might result in a nice force-directed graph drawing algorithm.

Criterion functions

Finding good criteria is a difficult task. If we allow too little iterations, a suboptimal solution will be returned. If we allow too many iterations, the running time is high. The following criteria are my suggestions:

constant_criterion
A fixed number of iterations will certainly always lead to a result without exceeding a runtime threshold. But if the number of vertices varies, a fixed number of iterations will certainly lead to a bad result. I only recommend this criterion for debugging purposes.
average_motion_criterion
Given a motion specification of one iteration. Sum up the total distance of the motion. If the distance is under the limit, terminate. I consider this criterion as the most useful and pragmatic.
maximal_motion_criterion
Similar to the average_motion_criterion you can also require that the greatest motion vector of one iteration exceeds a certain length.
structural criteria
If you have domain-specific graphs, where the number of vertices is constant or a certain structure is desirable, you can introduce criteria like the following: the distance between vertices must be within a specific boundary. Or: All vertices must touch a circle with a certain border-width. In that particular case it is required that the update function asks all vertices to position themselves in a circle.
Certainly those criteria require special constraints regarding the update function. Therefore those criteria are not provided as implementation in the framework.

A good idea is to combine those criteria. For example we use the average_motion_criterion, but if the number of iteration exceeds 200 iterations, the graph layout is returned in any case (this criterion is implemented in criteria.average_motion_maximal_iterations_criterion).

Putting it together

We have looked at various update functions, criteria and you can select from a variety of sample graphs. Use the framework to build your own force-directed graph drawing algorithm!