In this post I thought I’d take you through the start-to-finish process of implementing and optimizing one of the new functions in Dgraph — from a fairly straight copy of the pseudo-code in a research paper, to its form in the repository today. The function in question is `dgraph.metric.betweenness`

, which implements the betweenness centrality metric for networks. Following the implementation of this algorithm offers some interesting opportunities to examine simple optimization strategies when coding in D, things that experienced D’ers know well but which it’s easy to miss as a newcomer.

First, some background. The concept of *centrality* in network analysis is used to describe the relative importance of individual vertices or edges in the network’s structure. Multiple different centrality measures exist, including *degree centrality* (how many other vertices are you connected to?), *closeness centrality* (how far do you have to go to reach other vertices?) and *eigenvector centrality* (which derives from the eigenvector with largest eigenvalue of the network’s adjacency matrix).

*Betweenness centrality* is an alternative measure that essentially reflects a vertex’s importance as a channel of communication between other vertices in the network. Specifically, the betweenness centrality of a vertex *v* corresponds to

\[C_{B}(v) = \sum_{s \neq t \neq v \in V}\frac{\sigma_{st}(v)}{\sigma_{st}}\]

where *V* is the set of all vertices in the graph, *σ _{st}*(

*v*) is the number of shortest paths between vertices

*s*and

*t*that pass through

*v*, and

*σ*is the total number of shortest paths between

_{st}*s*and

*t*.

In non-mathematical terms, what we are doing is to calculate, for every pair of other nodes in the graph, the fraction of shortest paths between them that include the vertex *v* — that is, the importance of *v* as a point of connection in going from one node to the other. We then sum these fractions to get the total betweenness centrality of *v*. Its practical importance is that if we knock out a vertex with high betweenness centrality, we are going to make it much more difficult to travel between other vertices, because for many vertex pairs the network distance between them will have increased.

It’s a tricky quantity to calculate, because to do so for even a single vertex winds up scaling with the size of the whole graph. A popular algorithm for (relatively!) efficient computation of betweenness centrality was proposed by Ulrik Brandes in 2001 in a paper in the Journal of Mathematical Sociology (also available as a University of Konstanz e-print), which scales with *O*(|*V*||*E*|), i.e. the product of the total numbers of vertices and edges.

My original version, which was written before the API change introduced by recent updates, was a fairly straightforward copy of the pseudo-code Brandes offered in his paper:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 |
auto betweenness(T = double, bool directed)(ref Graph!directed g, bool[] ignore) if (isFloatingPoint!T) { T[] centrality = new T[g.vertexCount]; centrality[] = to!T(0); size_t[] stack = new size_t[g.vertexCount]; T[] sigma = new T[g.vertexCount]; T[] delta = new T[g.vertexCount]; long[] d = new long[g.vertexCount]; auto q = VertexQueue(g.vertexCount); foreach (s; 0 .. g.vertexCount) { if (!ignore[s]) { size_t[][] p = new size_t[][g.vertexCount]; size_t stackLength = 0; assert(q.empty); sigma[] = to!T(0); sigma[s] = to!T(1); d[] = -1L; d[s] = 0L; q.push(s); while(!q.empty) { size_t v = q.front; q.pop(); stack[stackLength] = v; ++stackLength; foreach (w; g.neighboursOut(v)) { if (!ignore[w]) { if (d[w] < 0L) { q.push(w); d[w] = d[v] + 1L; } if (d[w] == (d[v] + 1L)) { sigma[w] = sigma[w] + sigma[v]; p[w] ~= v; } } } } delta[] = to!T(0); while(stackLength > 0) { auto w = stack[stackLength-1]; --stackLength; foreach (v; p[w]) { delta[v] = delta[v] + ((sigma[v] / sigma[w]) * (to!T(1) + delta[w])); } if (w != s) { centrality[w] = centrality[w] + delta[w]; } } } } static if (!directed) { centrality[] /= 2; } return centrality; } |

A few small things that may be of interest: first, note the template constraint `if (isFloatingPoint!T)`

— the user of this function can optionally specify the return type, but we insist it’s floating point.

Second, the `ignore`

array passed to the function is not part of Brandes’ algorithm. A typical application of betweenness centrality is to calculate an attack strategy to break apart a network — you knock out the node with highest betweenness centrality, then recalculate, then knock out the highest remaining node, and so on. Passing an array of boolean types that indicate which nodes have already been knocked out is cheaper and easier than actually modifying the network.

Third, `VertexQueue`

is a non-growable circular queue implementation that I knocked up fairly quickly as a simplification of Bearophile’s GrowableCircularQueue example on RosettaCode (simplified because we don’t need the growable side of this, as we can guarantee the required capacity). It’s there simply because D’s standard library currently has no queue implementation of its own, and will be replaced by the standard library version as soon as one is available. Bearophile was kind enough to offer his code under the terms of the Boost licence, enabling its use (and that of its derivatives) in other codebases.

Anyway, this betweenness centrality implementation produces identical results to its igraph counterpart, but is more than 3 times slower — so how do we get it up to par?

Profiling makes clear that memory allocation and garbage collection is a major source of slowdown. Contrary to what some might expect, this isn’t an inevitable consequence of D’s use of garbage collection. My personal experience is that it can help to approach memory allocation in D much like you would in C or C++, which is to say that you should choose the places in which you allocate memory so as to minimize the overall number of allocations and deallocations required. A nice example is in what is probably the most glaring flaw of the above code: line 16,

`size_t[][] p = new size_t[][g.vertexCount];`

should very obviously be moved outside the `foreach`

loop. I put it where it is because that’s the place in Brandes’ pseudo-code where `p`

is declared, but there’s simply no need to re-allocate it entirely from scratch at each pass of the loop: we can allocate before the loop begins, and then inside the loop set `p[] = []`

, i.e. set every element of p to be an empty array.

As it happens, this hardly saves us anything, probably because the D compiler and garbage collector are smart enough to figure out that the memory can be re-used rather than re-allocated at each pass of the loop. A much bigger improvement is gained by some tweaks that I made after examining the igraph implementation: while the original re-initializes the entire arrays `sigma`

, `d`

and `delta`

with each pass of the loop, in fact the entire array need only be initialized once, and thereafter we can reset only the values that have been touched by the calculation. Similarly, the individual arrays stored in `p`

need only have their lengths reset to zero in the event that we actually re-use them in a later pass of the loop. Implementing this optimization reduces the time taken to about 3/4 that required by the original version — but it’s still much slower than igraph. What else can we do?

Removing the obligation to pass an `ignore`

array doesn’t really buy us anything, but there is something that will. Note that this implementation involves a lot of array appending — and doing this with regular arrays and the append operator `~`

is not an efficient choice.

Instead, we convert the variable `p`

to be an array of `Appender`

s, a specially-defined type that is dedicated to efficient array appending. This change results in another huge speedup — taken together with the previous changes, it’s now running at twice the speed of the original implementation.

Adding in further tweaks to make the function agnostic as to the graph implementation, and we come to the current final form:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 |
auto betweenness(T = double, Graph)(ref Graph g, bool[] ignore = null) if (isFloatingPoint!T && isGraph!Graph) { T[] centrality = new T[g.vertexCount]; centrality[] = to!T(0); size_t[] stack = new size_t[g.vertexCount]; T[] sigma = new T[g.vertexCount]; T[] delta = new T[g.vertexCount]; long[] d = new long[g.vertexCount]; auto q = VertexQueue(g.vertexCount); Appender!(size_t[])[] p = new Appender!(size_t[])[g.vertexCount]; sigma[] = to!T(0); delta[] = to!T(0); d[] = -1L; foreach (immutable s; 0 .. g.vertexCount) { if (ignore && ignore[s]) { continue; } size_t stackLength = 0; assert(q.empty); sigma[s] = to!T(1); d[s] = 0L; q.push(s); while (!q.empty) { size_t v = q.front; q.pop(); stack[stackLength] = v; ++stackLength; foreach (immutable w; g.neighboursOut(v)) { if (ignore && ignore[w]) { continue; } if (d[w] < 0L) { q.push(w); d[w] = d[v] + 1L; assert(sigma[w] == to!T(0)); sigma[w] = sigma[v]; p[w].clear; p[w].put(v); } else if (d[w] > (d[v] + 1L)) { /* w has already been encountered, but we've found a shorter path to the source. This is probably only relevant to the weighted case, but let's keep it here in order to be ready for that update. */ d[w] = d[v] + 1L; sigma[w] = sigma[v]; p[w].clear; p[w].put(v); } else if (d[w] == (d[v] + 1L)) { sigma[w] += sigma[v]; p[w].put(v); } } } while (stackLength > to!size_t(0)) { --stackLength; auto w = stack[stackLength]; foreach (immutable v; p[w].data) { delta[v] += ((sigma[v] / sigma[w]) * (to!T(1) + delta[w])); } if (w != s) { centrality[w] += delta[w]; } sigma[w] = to!T(0); delta[w] = to!T(0); d[w] = -1L; } } static if (!g.directed) { centrality[] /= 2; } return centrality; } |

There’s probably still more we could do at this point to improve the betweenness centrality calculation — among other things, other researchers have developed a parallelized version of Brandes’ algorithm — but by this point our primary bottleneck is something else — the graph type itself. We’ll discuss this in my next blog post.

**EDIT:** The order of the template parameters for the final version of `betweenness`

has been reversed. This means that you can specify the desired return type without having to also specify graph type (which can be inferred by the compiler).

**EDIT 2:** Bearophile and I had an interesting follow-up discussion on further possibilities for optimization. Some of these I’ll leave to explore in future when Dgraph is more fully developed, but one tweak I made available immediately was to allow users to pass a buffer to `betweenness`

for the centrality values to be returned in.