`std.random`

. D’s range-based approach to random number generation makes for elegant and beautiful code, and there is support for a broad range of random number generators and related functionality. However, there have always been a few annoying issues with this module that simply cannot be resolved without breaking changes. It’s with the resolution of these issues in mind that I’d like to announce a new random number library for D: `hap.random`

This is intended as a candidate for later standard library inclusion, and so I would like to request that anyone with an interest in such functionality take the opportunity to try it out and report any issues that arise. An alpha draft was announced on the D forums under the name `std.random2`

; the current library is a version 1.0.0 release candidate.

Before discussing anything else, here’s how you can get your hands on `hap.random`

and try it out:

- Requires D 2.065 or later.
- Source code is available from https://github.com/WebDrake/hap.
- Library documentation (including migration information) is at http://code.braingam.es/hap/random/.
- It’s available as a dub package named hap.

** hap.random** implements broadly the same API as

`std.random`

, but the RNG types and those that derive from them are implemented as `std.random`

, which is reflected in the author and copyright credits.Some code has also flowed back from `hap.random`

during the development process, including fixes for `uniform`

and `partialShuffle`

and a fast `uniform01`

implementation ported from Boost.Random; more such patches are currently under review, and others are planned for the future, so even if you never use `hap.random`

yourself, you will get some of its benefits. Since the first preview of what is now `hap.random`

, some improvements have been independently implemented in `std.random`

by other developers, such as the use of `pure`

, `nothrow`

and `@safe`

function attributes.

**Library structure.** The library is separated into four different modules:

— pseudo-random number generators`hap.random.generator`

— compile-time template checks for RNG-using code`hap.random.traits`

— random distributions such as uniform, normal, etc.`hap.random.distribution`

— shuffling, sampling, and similar.`hap.random.adaptor`

Importing the package ** hap.random** will bring in all of these, so you can replace your code’s

`import std.random`

statements very easily.`hap.random`

uses the modern D convention of nesting module imports as deeply as possible inside the code requiring them. This should help to avoid unnecessary dependencies being pulled in.

**Naming differences.** Several function and type names have been simplified compared to `std.random`

:

`randomShuffle`

is now simply`shuffle`

;`randomSample`

(helper function) and`RandomSample`

(struct) are now`sample`

(function) and`Sample`

(class) respectively;- similarly,
`randomCover`

and`RandomCover`

are now`cover`

and`Cover`

.

Aliases are provided so as to allow ease of migration, so if you prefer the old names, there’s no need to change.

**New random distributions.** `hap.random.distribution`

introduces range-type random distributions, which offer a way to draw an endless sequence of random variates from a given distribution, via an Input or Forward Range interface. Currently the following distributions are implemented:

, a range-based equivalent to the`UniformDistribution`

`uniform`

function;, a faster floating point uniform distribution drawing variates from the half-open interval [0, 1), and its function equivalent`Uniform01Distribution`

;`uniform01`

, and a (less efficient) function implementation,`NormalDistribution`

;`normal`

, a range-based equivalent to`DiscreteDistribution`

`dice`

which offers significant speed boost.

The `dice`

function has been updated to use an improved algorithm that matches the output of `DiscreteDistribution`

. Consequently, this is the one part of `hap.random`

that can be expected to produce different behaviour to its `std.random`

counterpart.

**New random number generators.** `hap.random.generator`

has a couple of new features compared to `std.random`

:

, a 64-bit Mersenne Twister;`Mt19937_64`

, a typetuple of all the uniform random number generators implemented in`UniformRNGTypes`

`hap.random.generator`

.

The latter can be particularly useful for testing purposes, if you want to check that RNG-using code is compatible with all the potential generators that could be used.

**New compile-time checks.** `hap.random.traits`

has two new features:

- the
template implements stricter checks that verify properties uniform random number generators are expected to possess;`isUniformRNG`

- the
template checks that a type implements a random distribution range interface.`isRandomDistribution`

**Experimental features.** Besides the well-developed modules listed above, `hap.random`

also includes an experimental module, ** hap.random.device**, which implements a very provisional first go at providing access to ‘true’ or otherwise unpredictable sources of randomness such as hardware random devices. The existing functionality will work only for Posix systems, and provides access to

`/dev/random`

and `/dev/urandom`

. The API here is extremely likely to change in future, so use with caution; for this reason, the module is not included in those imported via `import hap.random`

but must be imported separately.**Test coverage.** With the exception of the experimental `hap.random.device`

, all code in `hap.random`

is heavily covered with unittests, to a much more comprehensive degree than `std.random`

. Where possible, these unittests will be ported over.

New random distributions such as the exponential and Pareto distributions will be added in future; the main limits here are simply time, so if anybody wishes to make a contribution to `hap.random`

this is a good place to start. The same goes for new uniform random number generators. There’s a lot here that can simply be ported from Boost.Random if anyone is up for it!

Following discussion with Nick Sabalausky on the D forums, one likely future addition will be the development of *random streams*, that is, uniformly distributed raw binary data implemented via a stream interface. Work on this may be conditional on Steven Schveighoffer’s ongoing work for new standard library stream APIs. A `hap.random.crypto`

module for crypto RNGs is also a likely work target at some point.

Other possible projects include improving the algorithms used by different parts of the library: for example, the Box-Muller implementation of `NormalDistribution`

could be replaced with a Ziggurat algorithm, and `unpredictableSeed`

could be brought in line with the state of the art. It might also be fun to get around to things like reservoir sampling, possibilities for which have previously been discussed but never implemented.

Finally, the unittests, while extensive, could probably do with some tidying. This again offers a nice opportunity for relatively straightforward contribution.

The name is Welsh for ‘random’ or ‘chance’, but I think what you’re really asking is, why launch a separate library like this? ;-)

The basic motivation is that in certain fundamental ways, `std.random`

’s design is broken and can only be fixed with backwards-incompatible changes. The problem is that RNGs are implemented as value-type structs. This means in turn that data structures that need to store an internal copy of an RNG instance — for example, `RandomSample`

— can only take such a copy by value. And hence you can have code like this:

auto rng = Random(unpredictableSeed); auto sample1 = randomSample(iota(100), 5, rng); auto sample2 = randomSample(iota(100), 5, rng); sample1.writeln; // we'd expect this to update rng ... sample2.writeln; // but this produces the same output!

In other words, the obligation to store value copies inevitably leads to unintended and maybe unobservable correlations. With functions, at least one can avoid this by passing in the RNG parameter via `ref`

; however, types like `RandomSample`

and `RandomCover`

that need to store the RNG internally cannot work around things in this way. Nor can we just store an internal pointer to an RNG instance, because that’s unsafe — in order for things to work properly, the RNG type itself needs to have reference type semantics. For example, it would have been pointless trying to add the new random distribution ranges from `hap.random.distribution`

to `std.random`

; it would just have introduced more broken code.

This much has long been acknowledged by all who have looked at `std.random`

; the question has been what to do about it. One option would be to keep RNGs implemented as structs, but to implement reference semantics around the RNG’s internal state data. This would have the advantage of (in API terms) allowing a drop-in replacement. However, it would still be a breaking change, and worse, the actual changes would be below-the-radar for users, so if (for example) the allocation strategy used to create the internal payload had problematic effects on speed or garbage collection, users might only find out when their programs started running like treacle. It’s also finnicky, because it relies on the programmer manually implementing reference semantics correctly for each type.

Classes give you reference semantics for free, but their use in Phobos is rare and may even be considered somewhat un-idiomatic. Their use also makes for a more intrusive change (the requirement to use `new`

), which prevents a drop-in replacement but may be beneficial inasmuch as it forces the user to appreciate where their own code’s behaviour may change. It’s also possible that the different allocation requirements may be problematic in some use-cases.

In short, these kind of changes need significant testing and review ‘in the wild’ before they might be acceptable in even an experimental Phobos module — certainly before any formal update or successor to `std.random`

could be brought in. Hence the need for a 3rd-party library, available via the `code.dlang.org`

D package repository, where things can be tried out and changed if necessary.

`hap.random`

has benefited greatly from code that others have written. Existing `std.random`

code by Andrei Alexandrescu, Chris Cain, Andrej Mitrović and Masahiro Nakagawa made it possible to concentrate primarily on adapting the architecture rather than implementing algorithms. Chris’ work, in a pull request submitted to Phobos but not yet merged, was also the basis of the `DiscreteDistribution`

range. Some parts of the code, such as the updated Mersenne Twister, were ported from Boost.Random.

The design of `hap.random`

has also been influenced by many useful discussions with members of the D community. Besides authors already mentioned, I would like to thank H. S. Teoh, Jonathan M. Davis, and particularly bearophile and monarch_dodra.

**Edit:** I’ve made one small authorship correction: Andrej Mitrović, not Nils Boßung, was responsible for `std.random`

’s uniform distribution for enum types.

Please use and test `hap.random`

, and report your experiences, either here or on the D forums. The future quality of your random numbers may depend upon it. :-)

`dgraph.metric.betweenness`

, which implements the betweenness centrality metric for networks. Following the implementation of this algorithm offers some interesting opportunities to examine simple optimization strategies when coding in D, things that experienced D’ers know well but which it’s easy to miss as a newcomer.
First, some background. The concept of *centrality* in network analysis is used to describe the relative importance of individual vertices or edges in the network’s structure. Multiple different centrality measures exist, including *degree centrality* (how many other vertices are you connected to?), *closeness centrality* (how far do you have to go to reach other vertices?) and *eigenvector centrality* (which derives from the eigenvector with largest eigenvalue of the network’s adjacency matrix).

*Betweenness centrality* is an alternative measure that essentially reflects a vertex’s importance as a channel of communication between other vertices in the network. Specifically, the betweenness centrality of a vertex *v* corresponds to

\[C_{B}(v) = \sum_{s \neq t \neq v \in V}\frac{\sigma_{st}(v)}{\sigma_{st}}\]

where *V* is the set of all vertices in the graph, *σ _{st}*(

In non-mathematical terms, what we are doing is to calculate, for every pair of other nodes in the graph, the fraction of shortest paths between them that include the vertex *v* — that is, the importance of *v* as a point of connection in going from one node to the other. We then sum these fractions to get the total betweenness centrality of *v*. Its practical importance is that if we knock out a vertex with high betweenness centrality, we are going to make it much more difficult to travel between other vertices, because for many vertex pairs the network distance between them will have increased.

It’s a tricky quantity to calculate, because to do so for even a single vertex winds up scaling with the size of the whole graph. A popular algorithm for (relatively!) efficient computation of betweenness centrality was proposed by Ulrik Brandes in 2001 in a paper in the Journal of Mathematical Sociology (also available as a University of Konstanz e-print), which scales with *O*(|*V*||*E*|), i.e. the product of the total numbers of vertices and edges.

My original version, which was written before the API change introduced by recent updates, was a fairly straightforward copy of the pseudo-code Brandes offered in his paper:

auto betweenness(T = double, bool directed)(ref Graph!directed g, bool[] ignore) if (isFloatingPoint!T) { T[] centrality = new T[g.vertexCount]; centrality[] = to!T(0); size_t[] stack = new size_t[g.vertexCount]; T[] sigma = new T[g.vertexCount]; T[] delta = new T[g.vertexCount]; long[] d = new long[g.vertexCount]; auto q = VertexQueue(g.vertexCount); foreach (s; 0 .. g.vertexCount) { if (!ignore[s]) { size_t[][] p = new size_t[][g.vertexCount]; size_t stackLength = 0; assert(q.empty); sigma[] = to!T(0); sigma[s] = to!T(1); d[] = -1L; d[s] = 0L; q.push(s); while(!q.empty) { size_t v = q.front; q.pop(); stack[stackLength] = v; ++stackLength; foreach (w; g.neighboursOut(v)) { if (!ignore[w]) { if (d[w] < 0L) { q.push(w); d[w] = d[v] + 1L; } if (d[w] == (d[v] + 1L)) { sigma[w] = sigma[w] + sigma[v]; p[w] ~= v; } } } } delta[] = to!T(0); while(stackLength > 0) { auto w = stack[stackLength-1]; --stackLength; foreach (v; p[w]) { delta[v] = delta[v] + ((sigma[v] / sigma[w]) * (to!T(1) + delta[w])); } if (w != s) { centrality[w] = centrality[w] + delta[w]; } } } } static if (!directed) { centrality[] /= 2; } return centrality; }

A few small things that may be of interest: first, note the template constraint `if (isFloatingPoint!T)`

— the user of this function can optionally specify the return type, but we insist it’s floating point.

Second, the `ignore`

array passed to the function is not part of Brandes’ algorithm. A typical application of betweenness centrality is to calculate an attack strategy to break apart a network — you knock out the node with highest betweenness centrality, then recalculate, then knock out the highest remaining node, and so on. Passing an array of boolean types that indicate which nodes have already been knocked out is cheaper and easier than actually modifying the network.

Third, `VertexQueue`

is a non-growable circular queue implementation that I knocked up fairly quickly as a simplification of Bearophile’s GrowableCircularQueue example on RosettaCode (simplified because we don’t need the growable side of this, as we can guarantee the required capacity). It’s there simply because D’s standard library currently has no queue implementation of its own, and will be replaced by the standard library version as soon as one is available. Bearophile was kind enough to offer his code under the terms of the Boost licence, enabling its use (and that of its derivatives) in other codebases.

Anyway, this betweenness centrality implementation produces identical results to its igraph counterpart, but is more than 3 times slower — so how do we get it up to par?

Profiling makes clear that memory allocation and garbage collection is a major source of slowdown. Contrary to what some might expect, this isn’t an inevitable consequence of D’s use of garbage collection. My personal experience is that it can help to approach memory allocation in D much like you would in C or C++, which is to say that you should choose the places in which you allocate memory so as to minimize the overall number of allocations and deallocations required. A nice example is in what is probably the most glaring flaw of the above code: line 16,

`size_t[][] p = new size_t[][g.vertexCount];`

should very obviously be moved outside the `foreach`

loop. I put it where it is because that’s the place in Brandes’ pseudo-code where `p`

is declared, but there’s simply no need to re-allocate it entirely from scratch at each pass of the loop: we can allocate before the loop begins, and then inside the loop set `p[] = []`

, i.e. set every element of p to be an empty array.

As it happens, this hardly saves us anything, probably because the D compiler and garbage collector are smart enough to figure out that the memory can be re-used rather than re-allocated at each pass of the loop. A much bigger improvement is gained by some tweaks that I made after examining the igraph implementation: while the original re-initializes the entire arrays `sigma`

, `d`

and `delta`

with each pass of the loop, in fact the entire array need only be initialized once, and thereafter we can reset only the values that have been touched by the calculation. Similarly, the individual arrays stored in `p`

need only have their lengths reset to zero in the event that we actually re-use them in a later pass of the loop. Implementing this optimization reduces the time taken to about 3/4 that required by the original version — but it’s still much slower than igraph. What else can we do?

Removing the obligation to pass an `ignore`

array doesn’t really buy us anything, but there is something that will. Note that this implementation involves a lot of array appending — and doing this with regular arrays and the append operator `~`

is not an efficient choice.

Instead, we convert the variable `p`

to be an array of `Appender`

s, a specially-defined type that is dedicated to efficient array appending. This change results in another huge speedup — taken together with the previous changes, it’s now running at twice the speed of the original implementation.

Adding in further tweaks to make the function agnostic as to the graph implementation, and we come to the current final form:

auto betweenness(T = double, Graph)(ref Graph g, bool[] ignore = null) if (isFloatingPoint!T && isGraph!Graph) { T[] centrality = new T[g.vertexCount]; centrality[] = to!T(0); size_t[] stack = new size_t[g.vertexCount]; T[] sigma = new T[g.vertexCount]; T[] delta = new T[g.vertexCount]; long[] d = new long[g.vertexCount]; auto q = VertexQueue(g.vertexCount); Appender!(size_t[])[] p = new Appender!(size_t[])[g.vertexCount]; sigma[] = to!T(0); delta[] = to!T(0); d[] = -1L; foreach (immutable s; 0 .. g.vertexCount) { if (ignore && ignore[s]) { continue; } size_t stackLength = 0; assert(q.empty); sigma[s] = to!T(1); d[s] = 0L; q.push(s); while (!q.empty) { size_t v = q.front; q.pop(); stack[stackLength] = v; ++stackLength; foreach (immutable w; g.neighboursOut(v)) { if (ignore && ignore[w]) { continue; } if (d[w] < 0L) { q.push(w); d[w] = d[v] + 1L; assert(sigma[w] == to!T(0)); sigma[w] = sigma[v]; p[w].clear; p[w].put(v); } else if (d[w] > (d[v] + 1L)) { /* w has already been encountered, but we've found a shorter path to the source. This is probably only relevant to the weighted case, but let's keep it here in order to be ready for that update. */ d[w] = d[v] + 1L; sigma[w] = sigma[v]; p[w].clear; p[w].put(v); } else if (d[w] == (d[v] + 1L)) { sigma[w] += sigma[v]; p[w].put(v); } } } while (stackLength > to!size_t(0)) { --stackLength; auto w = stack[stackLength]; foreach (immutable v; p[w].data) { delta[v] += ((sigma[v] / sigma[w]) * (to!T(1) + delta[w])); } if (w != s) { centrality[w] += delta[w]; } sigma[w] = to!T(0); delta[w] = to!T(0); d[w] = -1L; } } static if (!g.directed) { centrality[] /= 2; } return centrality; }

There’s probably still more we could do at this point to improve the betweenness centrality calculation — among other things, other researchers have developed a parallelized version of Brandes’ algorithm — but by this point our primary bottleneck is something else — the graph type itself. We’ll discuss this in my next blog post.

**EDIT:** The order of the template parameters for the final version of `betweenness`

has been reversed. This means that you can specify the desired return type without having to also specify graph type (which can be inferred by the compiler).

**EDIT 2:** Bearophile and I had an interesting follow-up discussion on further possibilities for optimization. Some of these I’ll leave to explore in future when Dgraph is more fully developed, but one tweak I made available immediately was to allow users to pass a buffer to `betweenness`

for the centrality values to be returned in.

The first and most important change is that where previously there was a single `Graph`

class, the concept of a graph has been generalized to any type that implements the expected graph API. A new template `isGraph`

can be used to check whether a type conforms to expectations.

The original `Graph`

class has been renamed to `IndexedEdgeList`

, reflecting the name given to this data structure by the authors of igraph. It has been supplemented by a new class, `CachedEdgeList`

, which offers greatly enhanced speed at the cost of a higher overall memory footprint. Other graph types will probably be added to the library in the not-too-distant future.

Because of its performance advantages, I recommend using `CachedEdgeList`

as the default graph type in your code. The additional memory cost will be insignificant for all except the very largest graphs. The simplest way to adapt your code is therefore to do a search-and-replace of `Graph`

for `CachedEdgeList`

(or, in regex, `s/Graph/CachedEdgeList/`

:-).

A better approach is to adapt your code to take advantage of the `isGraph`

template. Here’s an example of a function using the previous version of the library:

void foo(bool directed)(Graph!directed g) { static if (directed) { // ... a directed graph requires one approach } else { // ... an undirected graph requires another } }

Now, using `isGraph`

, we can generalize this to accept any graph type:

void foo(Graph)(Graph g) // Template parameter Graph could be any type ... if (isGraph!Graph) // but we check to make sure it really is a graph { static if (Graph.directed) { // ... a directed graph requires one approach } else { // ... an undirected graph requires another } }

You can see examples of this in how the functions in `dgraph.metric`

have been adapted for use with the new design.

I’ll be following up with a more in-depth look at the changes, particularly in terms of how they impact performance. In the meantime, feedback is very welcome on the new design, and I hope that any inconvenience caused by the breaking changes is repaid by improved speed and flexibility.

]]>I’m planning on making a regular habit, whenever writing on D, of highlighting at least one blog piece by another writer. Sometimes this will be related to the work at hand — more often, it’ll just be something that took my fancy, and there will be a definite bias in favour of *new* bloggers with interesting things to say.

On that note, make sure to check out Gary Willoughby’s very nice tutorial on templates in D, and Vladimir’s own recent post on low-overhead components.

And … if you have something to say about D, don’t be shy. Get writing!

]]>

Interruption!If you’re interested in following developer blogs on the D programming language, do check out John Colvin’s`foreach(hour; life) {/* do something here */}`

. John’s a lovely guy who contributes much helpful advice and insight on the D forums.

Let’s start by reminding ourselves of the basic data structure we’re dealing with:

final class Graph(bool dir) { private: size_t[] _head; size_t[] _tail; size_t[] _indexHead; size_t[] _indexTail; size_t[] _sumHead = [0]; size_t[] _sumTail = [0]; public: enum bool directed = dir; // ... Yes, there was more than this, but this is what we need to recall for now ... }

`_head`

and `_tail`

contain respectively the head and tail vertices of each edge; `_indexHead`

and `_indexTail`

contain indices of the edges sorted respectively according to head and tail IDs; and `_sumHead`

and `_sumTail`

contain cumulative sums of the number of edges with head/tail less than the given head/tail value. This is a transcription into D of the `igraph_t`

datatype from igraph; Gábor Csárdi, one of the authors of igraph, refers to it as an *indexed edge list*.

Let’s start with adding vertices. We could get a bit complicated and make the `vertexCount`

property writeable, but for now let’s keep it simple and just match igraph with a function to increase the number of vertices in the graph:

void addVertices(immutable size_t n) { immutable size_t l = _sumHead.length; _sumHead.length += n; _sumTail.length += n; _sumHead[l .. $] = _sumHead[l - 1]; _sumTail[l .. $] = _sumTail[l - 1]; }

All we have to do is make sure that the cumulative sums of edges get extended, and since the new vertices don’t have any edges, this is a simple matter of copying the final value from the original sum array. We’re doing this using D’s slicing syntax: `arr[i .. j]`

simply means the sub-array going from the `i`

’th index up to the `j - 1`

’st index; the dollar sign `$`

is shorthand for the array’s length. If you’re interested, you can read in much more detail about slices in Ali Çehreli’s excellent online book.

Anyway, adding vertices is boring — it’s adding edges where the exciting stuff starts to happen. In principle there are two ways we can do this: we can add individual edges, one at a time, or we can add *many* edges, all in one go; igraph defines two different functions, `igraph_add_edge`

and `igraph_add_edges`

, to do this.

We’ll go with adding one at a time to start with. The task is simple — we need to add the head and tail values, place the new edge ID in its correct location in the head/tail-sorted indices, and increment the sum values:

public: void addEdge(size_t head, size_t tail) { static if (!directed) { if (tail < head) { swap(head, tail); } } _head ~= head; _tail ~= tail; indexEdges(); ++_sumHead[head + 1 .. $]; ++_sumTail[tail + 1 .. $]; } private: void indexEdges() { _indexHead ~= iota(_indexHead.length, _head.length).array; _indexTail ~= iota(_indexTail.length, _tail.length).array; _indexHead.sort!((a, b) => _head[a] < _head[b]); _indexTail.sort!((a, b) => _tail[a] < _tail[b]); }

I’m actually cheating a bit here, because when I originally wrote the `indexEdges()`

function, I used `schwartzSort!(a => _head[a], "a < b")`

and not the `sort`

formulation given here. `schwartzSort`

caches the results of the transformation `a => f(a)`

, making it cheaper where that transformation is costly; but as `a => _head[a]`

is so cheap, we’re actually better off using regular `sort`

, and the `(a, b)`

transformation here provides the rule for determining whether or not `a`

should come before `b`

in the sort order.

Note that we insist that in an undirected graph, the head of an edge has a vertex ID less than or equal to that of the tail. This may seem like a fussy convention but it’s actually useful — we’ll see why later.

Anyway, this all seems nice and in order, so why not test it? Something like this:

// Let's suppose we have a list of edges stored as an array ... size_t[] edges = [head1, tail1, head2, tail2 /* ... and so on */]; // We create an undirected graph (but it could also be directed if we wanted:-) auto g = new Graph!false; g.addVertices(/* however many we need */); foreach(i; 0 .. edges.length / 2) { g.addEdge(edges[2 * i], edges[2 * i + 1]); }

Benchmarking this with a 50-vertex graph with 200 edges, the D code comes out more than 10 times faster than similar code written using igraph. A win, you think? You speak too soon — if we try 10,000 vertices and 20,000 edges, the D code is over 7 times slower! You might think that something must be seriously wrong here, and you’d be right — in fact, we’ve run into an unfortunate bug in D’s sort algorithm. In the corner case where an array is almost entirely sorted except for the last few entries, the sort displays worst-case behaviour and scales with *n*^{2} rather than the expected *n* log *n*. Fortunately for us there’s an alternative sorting algorithm available, which uses Timsort rather than the standard Quicksort:

_indexHead.sort!((a, b) => _head[a] < _head[b], SwapStrategy.semistable); _indexTail.sort!((a, b) => _tail[a] < _tail[b], SwapStrategy.semistable);

This actually works out slightly slower on the smaller graph, but gives a major speedup for the larger one — suddely what was taking over a minute to run on my system is running in less than a second! Most likely the scaling bug wasn’t being triggered at all for the smaller network, but kicks in as the length of the arrays gets above a certain size, and *O*(*n*^{2}) as *n* → several thousand is going to leave a mark …

… but wait a minute. Aren’t we missing something here? We’re talking about adding one (or very few) new values to an already sorted list — we shouldn’t be using QuickSort or TimSort at all, we should be using an insertion sort. And it turns out that we can write this very simply indeed — even simpler than I’d anticipated, thanks to a technique pointed out to me by Dmitry Olshansky:

void indexEdgesInsertion() { immutable size_t l = _indexHead.length; foreach(e; iota(l, _head.length)) { size_t i = _indexHead.map!(a => _head[a]).assumeSorted.lowerBound(_head[e]).length; insertInPlace(_indexHead, i, e); i = _indexTail.map!(a => _tail[a]).assumeSorted.lowerBound(_tail[e]).length; insertInPlace(_indexTail, i, e); } }

`lowerBound`

uses a binary search to extract from the already-sorted index list the largest left-hand part of the array such that these edges’ head/tail values are all less than the head/tail value of our new edge. `insertInPlace`

then drops the new index value into its correct location. This is going to be expensive if instead we’re adding multiple edges in one go, so let’s make it nicer by doing all the potential memory allocation in one go:

void indexEdgesInsertion() { immutable size_t l = _indexHead.length; _indexHead.length = _head.length; // all the allocation _indexTail.length = _tail.length; // at once ... foreach(e; iota(l, _head.length)) { size_t i, j; i = _indexHead[0 .. e].map!(a => _head[a]).assumeSorted.lowerBound(_head[e]).length; for(j = e; j > i; --j) _indexHead[j] = _indexHead[j - 1]; _indexHead[i] = e; i = _indexTail[0 .. e].map!(a => _tail[a]).assumeSorted.lowerBound(_tail[e]).length; for(j = e; j > i; --j) _indexTail[j] = _indexTail[j - 1]; _indexTail[i] = e; } }

… and suddenly our adding of new nodes is super-fast all round — whereas it takes igraph about 13 seconds on my machine to build a 10,000-vertex, 20,000-edge graph adding edges one at a time, the D code here delivers it in less than 0.1 seconds. Woop!

* * *

Let’s introduce a note of cold sobriety, before we go rushing off and opening the champagne. To compare with igraph like this may give a frisson of pleasure, but we’re not comparing like with like. igraph’s `igraph_add_edge`

function simply wraps its function for adding multiple edges:

int igraph_add_edge(igraph_t *graph, igraph_integer_t from, igraph_integer_t to) { igraph_vector_t edges; int ret; IGRAPH_VECTOR_INIT_FINALLY(&edges, 2); VECTOR(edges)[0]=from; VECTOR(edges)[1]=to; IGRAPH_CHECK(ret=igraph_add_edges(graph, &edges, 0)); igraph_vector_destroy(&edges); IGRAPH_FINALLY_CLEAN(1); return ret; }

This is bound to take a performance hit, because the function for adding multiple edges does a complete rearrangement of the sorted index and sum arrays — whereas our `addEdge`

function just tweaks the bits it needs to. The same could be implemented in igraph, and it would almost certainly beat the present D code in speed. The take-home point here isn’t that D is able to be faster — it’s that because we can write so quickly and simply and effectively in D, we had the time and mental space to implement a tailored solution like this from the get-go. (It also helps that D’s community is very friendly and helpful with suggestions!)

The second sobriety check comes when we try using igraph how it’s meant to be used — passing the graph all the edges in one big go as a vector. Suddenly it’s generating that big 10,000-vertex, 20,000-edge network in about 0.001 seconds! We clearly have some catching up to do.

Let’s assume, as igraph does (and as we already did in an example above) that our list of many edges is going to come in the form of an array `edges = [head1, tail1, head2, tail2, head3, tail3, ...]`

(maybe we’ll change this in future, but for now it serves). So now let’s write a variant of addEdge that will accept this:

void addEdge(T : size_t)(T[] edgeList) { immutable size_t l = _head.length; _head.length += edgeList.length / 2; _tail.length += edgeList.length / 2; foreach(i; 0 .. edgeList.length / 2) { size_t head = edgeList[2 * i]; size_t tail = edgeList[2 * i + 1]; static if (!directed) { if (tail < head) { swap(head, tail); } } _head[l + i] = head; _tail[l + i] = tail; } indexEdgesInsertion(); sumEdges(_sumHead, _head, _indexHead); sumEdges(_sumTail, _tail, _indexTail); } void sumEdges(ref size_t[] sum, ref size_t[] vertex, ref size_t[] index) { size_t v = vertex[index[0]]; sum[0 .. v + 1] = 0; for(size_t i = 1; i < index.length; ++i) { size_t n = vertex[index[i]] - vertex[index[sum[v]]]; sum[v + 1 .. v + n + 1] = i; v += n; } sum[v + 1 .. $] = vertex.length; }

So, we add the head/tail values to their respective arrays, we sort the indices, and we recalculate from scratch the `_sumHead`

and `_sumTail`

arrays. If a large number of edges has been added, the latter should be faster than just incrementing the sums one by one as we did when adding a single edge. For now we’ll stick with the insertion sort — it’s interesting to examine how it performs.

We’re obviously doing something right, because on my machine the time to create the smaller, 50-vertex graph is cut to less than a quarter of what it was before, placing it on a par with igraph. For the 10,000-vertex graph it’s almost 5 times faster than it was adding edges one at a time — but that’s still an order of magnitude slower than igraph. Presumably this is down to the sorting technique — if we’re adding many edges all at once, then insertion sort is probably no longer optimal. Let’s try out a regular sort:

void indexEdgesSort() { _indexHead ~= iota(_indexHead.length, _head.length).array; _indexTail ~= iota(_indexTail.length, _tail.length).array; assert(_indexHead.length == _indexTail.length); _indexHead.sort!((a, b) => _head[a] < _head[b]); _indexTail.sort!((a, b) => _tail[a] < _tail[b]); }

This cuts the creation time for the 50-vertex graph very slightly, and for the 10,000-vertex graph very significantly — the latter is about 8 times faster than it was before! This is probably still very slightly slower than igraph, but only slightly — it’s about 0.002 seconds to generate the larger graph with D compared to about 0.0015 with igraph. (In case you’re wondering, this is an average over 1000 realizations, which takes about 1.5s for igraph, just over 2 for the D code.)

Clearly we’re standing on the shoulders of giants — we’re able to develop these solutions quickly and easily because we have the example of igraph already there in front of us to reverse engineer. The `sumEdges()`

function, for example, is a fairly straight D-ification of igraph’s `igraph_i_create_start()`

function. But as with the previous post, the key thing to note is how cleanly we can implement these ideas; there’s no unnecessary cruft or boilerplate needed to ensure the code operates safely, and many of the solutions are taken straight out of the D standard library, Phobos. Others, such as sorting by head or tail, can be implemented very simply because of the elegant syntax D offers us for delegates and lambdas.

* * *

Actually we’re not quite finished here. We sorted `_indexHead`

and `_indexTail`

according to `_head`

and `_tail`

values respectively, but in fact we want a double sort criterion as there is in igraph: `_indexHead`

should be sorted primarily with respect to `_head`

, but secondarily with respect to `_tail`

; and vice versa for `_indexTail`

. Thanks to advice from suggestions by John Colvin and bearophile on the D forums, we know there is a ready solution for this in the form of the multiSort function:

void indexEdgesSort() { _indexHead ~= iota(_indexHead.length, _head.length).array; _indexTail ~= iota(_indexTail.length, _tail.length).array; _indexHead.multiSort!((a, b) => _head[a] < _head[b], (a, b) => _tail[a] < _tail[b]); _indexTail.multiSort!((a, b) => _tail[a] < _tail[b], (a, b) => _head[a] < _head[b]); }

… while a small tweak to the insertion sort in D ensures correct ordering there as well:

void indexEdgesInsertion() { immutable size_t l = _indexHead.length; _indexHead.length = _head.length; _indexTail.length = _tail.length; foreach(e; l .. _head.length) { size_t i, j, lower, upper; upper = _indexHead[0 .. e].map!(a => _head[a]).assumeSorted.lowerBound(_head[e] + 1).length; lower = _indexHead[0 .. upper].map!(a => _head[a]).assumeSorted.lowerBound(_head[e]).length; i = lower + _indexHead[lower .. upper].map!(a => _tail[a]).assumeSorted.lowerBound(_tail[e]).length; for(j = e; j > i; --j) _indexHead[j] = _indexHead[j - 1]; _indexHead[i] = e; upper = _indexTail[0 .. e].map!(a => _tail[a]).assumeSorted.lowerBound(_tail[e] + 1).length; lower = _indexTail[0 .. upper].map!(a => _tail[a]).assumeSorted.lowerBound(_tail[e]).length; i = lower + _indexTail[lower .. upper].map!(a => _head[a]).assumeSorted.lowerBound(_head[e]).length; for(j = e; j > i; --j) _indexTail[j] = _indexTail[j - 1]; _indexTail[i] = e; } }

We do take a performance hit here — we’re no longer level pegging with igraph, although we might get back that speed by using instead the radix sort that is used by igraph. But the hit is worth it: among other things, this extra level of sorting means we are guaranteed to get a sorted list out of the `neighbours`

function, but its main application is in making it easier to search for edges and their place in the edge list. We’ll examine this next time when we look in more detail at the range of queries we can make of a graph object, and how they shape up in performance.

**Edit 19-07-2013:** I’ve tweaked the article to make clear that the double sort criterion described here is also used by igraph and — unsurprisingly — was the inspiration for doing it here as well.