hap.random — a new random number library for D | mad science, dreams and diversions

One of my great pleasures of the last 2–3 years has been contributing to the D programming language’s standard library, and in particular, its random-number-generation module std.random. D’s range-based approach to random number generation makes for elegant and beautiful code, and there is support for a broad range of random number generators and related functionality. However, there have always been a few annoying issues with this module that simply cannot be resolved without breaking changes. It’s with the resolution of these issues in mind that I’d like to announce a new random number library for D: hap.random.

This is intended as a candidate for later standard library inclusion, and so I would like to request that anyone with an interest in such functionality take the opportunity to try it out and report any issues that arise. An alpha draft was announced on the D forums under the name std.random2; the current library is a version 1.0.0 release candidate.

The essentials

Before discussing anything else, here’s how you can get your hands on hap.random and try it out:

Requires D 2.065 or later.
Source code is available from https://github.com/WebDrake/hap.
Library documentation (including migration information) is at http://code.braingam.es/hap/random/.
It’s available as a dub package named hap.

What’s new or different

hap.random implements broadly the same API as std.random, but the RNG types and those that derive from them are implemented as reference types (specifically, as final classes; we’ll discuss the reasons for this later in this post). As much code as possible has been re-used from std.random, which is reflected in the author and copyright credits.

Some code has also flowed back from hap.random during the development process, including fixes for uniform and partialShuffle and a fast uniform01 implementation ported from Boost.Random; more such patches are currently under review, and others are planned for the future, so even if you never use hap.random yourself, you will get some of its benefits. Since the first preview of what is now hap.random, some improvements have been independently implemented in std.random by other developers, such as the use of pure, nothrow and @safe function attributes.

Library structure. The library is separated into four different modules:

hap.random.generator — pseudo-random number generators

hap.random.traits — compile-time template checks for RNG-using code
hap.random.distribution — random distributions such as uniform, normal, etc.
hap.random.adaptor — shuffling, sampling, and similar.

Importing the package hap.random will bring in all of these, so you can replace your code’s import std.random statements very easily.

hap.random uses the modern D convention of nesting module imports as deeply as possible inside the code requiring them. This should help to avoid unnecessary dependencies being pulled in.

Naming differences. Several function and type names have been simplified compared to std.random:

randomShuffle is now simply shuffle;
randomSample (helper function) and RandomSample (struct) are now sample (function) and Sample (class) respectively;
similarly, randomCover and RandomCover are now cover and Cover.

Aliases are provided so as to allow ease of migration, so if you prefer the old names, there’s no need to change.

New random distributions. hap.random.distribution introduces range-type random distributions, which offer a way to draw an endless sequence of random variates from a given distribution, via an Input or Forward Range interface. Currently the following distributions are implemented:

UniformDistribution, a range-based equivalent to the uniform function;
Uniform01Distribution, a faster floating point uniform distribution drawing variates from the half-open interval [0, 1), and its function equivalent uniform01;
NormalDistribution, and a (less efficient) function implementation, normal;
DiscreteDistribution, a range-based equivalent to dice which offers significant speed boost.

The dice function has been updated to use an improved algorithm that matches the output of DiscreteDistribution. Consequently, this is the one part of hap.random that can be expected to produce different behaviour to its std.random counterpart.

New random number generators. hap.random.generator has a couple of new features compared to std.random:

Mt19937_64, a 64-bit Mersenne Twister;
UniformRNGTypes, a typetuple of all the uniform random number generators implemented in hap.random.generator.

The latter can be particularly useful for testing purposes, if you want to check that RNG-using code is compatible with all the potential generators that could be used.

New compile-time checks. hap.random.traits has two new features:

the isUniformRNG template implements stricter checks that verify properties uniform random number generators are expected to possess;
the isRandomDistribution template checks that a type implements a random distribution range interface.

Experimental features. Besides the well-developed modules listed above, hap.random also includes an experimental module, hap.random.device, which implements a very provisional first go at providing access to ‘true’ or otherwise unpredictable sources of randomness such as hardware random devices. The existing functionality will work only for Posix systems, and provides access to /dev/random and /dev/urandom. The API here is extremely likely to change in future, so use with caution; for this reason, the module is not included in those imported via import hap.random but must be imported separately.

Test coverage. With the exception of the experimental hap.random.device, all code in hap.random is heavily covered with unittests, to a much more comprehensive degree than std.random. Where possible, these unittests will be ported over.

What’s planned for the future

New random distributions such as the exponential and Pareto distributions will be added in future; the main limits here are simply time, so if anybody wishes to make a contribution to hap.random this is a good place to start. The same goes for new uniform random number generators. There’s a lot here that can simply be ported from Boost.Random if anyone is up for it!

Following discussion with Nick Sabalausky on the D forums, one likely future addition will be the development of random streams, that is, uniformly distributed raw binary data implemented via a stream interface. Work on this may be conditional on Steven Schveighoffer’s ongoing work for new standard library stream APIs. A hap.random.crypto module for crypto RNGs is also a likely work target at some point.

Other possible projects include improving the algorithms used by different parts of the library: for example, the Box-Muller implementation of NormalDistribution could be replaced with a Ziggurat algorithm, and unpredictableSeed could be brought in line with the state of the art. It might also be fun to get around to things like reservoir sampling, possibilities for which have previously been discussed but never implemented.

Finally, the unittests, while extensive, could probably do with some tidying. This again offers a nice opportunity for relatively straightforward contribution.

So … why hap?

The name is Welsh for ‘random’ or ‘chance’, but I think what you’re really asking is, why launch a separate library like this? ;-)

The basic motivation is that in certain fundamental ways, std.random’s design is broken and can only be fixed with backwards-incompatible changes. The problem is that RNGs are implemented as value-type structs. This means in turn that data structures that need to store an internal copy of an RNG instance — for example, RandomSample — can only take such a copy by value. And hence you can have code like this:

auto rng = Random(unpredictableSeed);

auto sample1 = randomSample(iota(100), 5, rng);

auto sample2 = randomSample(iota(100), 5, rng);

sample1.writeln; // we'd expect this to update rng ...

sample2.writeln; // but this produces the same output!

In other words, the obligation to store value copies inevitably leads to unintended and maybe unobservable correlations. With functions, at least one can avoid this by passing in the RNG parameter via ref; however, types like RandomSample and RandomCover that need to store the RNG internally cannot work around things in this way. Nor can we just store an internal pointer to an RNG instance, because that’s unsafe — in order for things to work properly, the RNG type itself needs to have reference type semantics. For example, it would have been pointless trying to add the new random distribution ranges from hap.random.distribution to std.random; it would just have introduced more broken code.

This much has long been acknowledged by all who have looked at std.random; the question has been what to do about it. One option would be to keep RNGs implemented as structs, but to implement reference semantics around the RNG’s internal state data. This would have the advantage of (in API terms) allowing a drop-in replacement. However, it would still be a breaking change, and worse, the actual changes would be below-the-radar for users, so if (for example) the allocation strategy used to create the internal payload had problematic effects on speed or garbage collection, users might only find out when their programs started running like treacle. It’s also finnicky, because it relies on the programmer manually implementing reference semantics correctly for each type.

Classes give you reference semantics for free, but their use in Phobos is rare and may even be considered somewhat un-idiomatic. Their use also makes for a more intrusive change (the requirement to use new), which prevents a drop-in replacement but may be beneficial inasmuch as it forces the user to appreciate where their own code’s behaviour may change. It’s also possible that the different allocation requirements may be problematic in some use-cases.

In short, these kind of changes need significant testing and review ‘in the wild’ before they might be acceptable in even an experimental Phobos module — certainly before any formal update or successor to std.random could be brought in. Hence the need for a 3rd-party library, available via the code.dlang.org D package repository, where things can be tried out and changed if necessary.

Acknowledgements

hap.random has benefited greatly from code that others have written. Existing std.random code by Andrei Alexandrescu, Chris Cain, Andrej Mitrović and Masahiro Nakagawa made it possible to concentrate primarily on adapting the architecture rather than implementing algorithms. Chris’ work, in a pull request submitted to Phobos but not yet merged, was also the basis of the DiscreteDistribution range. Some parts of the code, such as the updated Mersenne Twister, were ported from Boost.Random.

The design of hap.random has also been influenced by many useful discussions with members of the D community. Besides authors already mentioned, I would like to thank H. S. Teoh, Jonathan M. Davis, and particularly bearophile and monarch_dodra.

Edit: I’ve made one small authorship correction: Andrej Mitrović, not Nils Boßung, was responsible for std.random’s uniform distribution for enum types.

In brief?

Please use and test hap.random, and report your experiences, either here or on the D forums. The future quality of your random numbers may depend upon it. :-)