One of my great pleasures of the last 2–3 years has been contributing to the D programming language’s standard library, and in particular, its random-number-generation module
std.random. D’s range-based approach to random number generation makes for elegant and beautiful code, and there is support for a broad range of random number generators and related functionality. However, there have always been a few annoying issues with this module that simply cannot be resolved without breaking changes. It’s with the resolution of these issues in mind that I’d like to announce a new random number library for D:
This is intended as a candidate for later standard library inclusion, and so I would like to request that anyone with an interest in such functionality take the opportunity to try it out and report any issues that arise. An alpha draft was announced on the D forums under the name
std.random2; the current library is a version 1.0.0 release candidate.
Before discussing anything else, here’s how you can get your hands on
hap.random and try it out:
What’s new or different
hap.random implements broadly the same API as
std.random, but the RNG types and those that derive from them are implemented as reference types (specifically, as final classes; we’ll discuss the reasons for this later in this post). As much code as possible has been re-used from
std.random, which is reflected in the author and copyright credits.
Some code has also flowed back from
hap.random during the development process, including fixes for
partialShuffle and a fast
uniform01 implementation ported from Boost.Random; more such patches are currently under review, and others are planned for the future, so even if you never use
hap.random yourself, you will get some of its benefits. Since the first preview of what is now
hap.random, some improvements have been independently implemented in
std.random by other developers, such as the use of
@safe function attributes.
Library structure. The library is separated into four different modules:
hap.random.generator — pseudo-random number generators
hap.random.traits — compile-time template checks for RNG-using code
hap.random.distribution — random distributions such as uniform, normal, etc.
hap.random.adaptor — shuffling, sampling, and similar.
Importing the package
hap.random will bring in all of these, so you can replace your code’s
import std.random statements very easily.
hap.random uses the modern D convention of nesting module imports as deeply as possible inside the code requiring them. This should help to avoid unnecessary dependencies being pulled in.
Naming differences. Several function and type names have been simplified compared to
randomShuffle is now simply
randomSample (helper function) and
RandomSample (struct) are now
sample (function) and
Sample (class) respectively;
RandomCover are now
Aliases are provided so as to allow ease of migration, so if you prefer the old names, there’s no need to change.
New random distributions.
hap.random.distribution introduces range-type random distributions, which offer a way to draw an endless sequence of random variates from a given distribution, via an Input or Forward Range interface. Currently the following distributions are implemented:
UniformDistribution, a range-based equivalent to the
Uniform01Distribution, a faster floating point uniform distribution drawing variates from the half-open interval [0, 1), and its function equivalent
NormalDistribution, and a (less efficient) function implementation,
DiscreteDistribution, a range-based equivalent to
dice which offers significant speed boost.
dice function has been updated to use an improved algorithm that matches the output of
DiscreteDistribution. Consequently, this is the one part of
hap.random that can be expected to produce different behaviour to its
New random number generators.
hap.random.generator has a couple of new features compared to
Mt19937_64, a 64-bit Mersenne Twister;
UniformRNGTypes, a typetuple of all the uniform random number generators implemented in
The latter can be particularly useful for testing purposes, if you want to check that RNG-using code is compatible with all the potential generators that could be used.
New compile-time checks.
hap.random.traits has two new features:
isUniformRNG template implements stricter checks that verify properties uniform random number generators are expected to possess;
isRandomDistribution template checks that a type implements a random distribution range interface.
Experimental features. Besides the well-developed modules listed above,
hap.random also includes an experimental module,
hap.random.device, which implements a very provisional first go at providing access to ‘true’ or otherwise unpredictable sources of randomness such as hardware random devices. The existing functionality will work only for Posix systems, and provides access to
/dev/urandom. The API here is extremely likely to change in future, so use with caution; for this reason, the module is not included in those imported via
import hap.random but must be imported separately.
Test coverage. With the exception of the experimental
hap.random.device, all code in
hap.random is heavily covered with unittests, to a much more comprehensive degree than
std.random. Where possible, these unittests will be ported over.
What’s planned for the future
New random distributions such as the exponential and Pareto distributions will be added in future; the main limits here are simply time, so if anybody wishes to make a contribution to
hap.random this is a good place to start. The same goes for new uniform random number generators. There’s a lot here that can simply be ported from Boost.Random if anyone is up for it!
Following discussion with Nick Sabalausky on the D forums, one likely future addition will be the development of random streams, that is, uniformly distributed raw binary data implemented via a stream interface. Work on this may be conditional on Steven Schveighoffer’s ongoing work for new standard library stream APIs. A
hap.random.crypto module for crypto RNGs is also a likely work target at some point.
Other possible projects include improving the algorithms used by different parts of the library: for example, the Box-Muller implementation of
NormalDistribution could be replaced with a Ziggurat algorithm, and
unpredictableSeed could be brought in line with the state of the art. It might also be fun to get around to things like reservoir sampling, possibilities for which have previously been discussed but never implemented.
Finally, the unittests, while extensive, could probably do with some tidying. This again offers a nice opportunity for relatively straightforward contribution.
So … why hap?
The name is Welsh for ‘random’ or ‘chance’, but I think what you’re really asking is, why launch a separate library like this? ;-)
The basic motivation is that in certain fundamental ways,
std.random’s design is broken and can only be fixed with backwards-incompatible changes. The problem is that RNGs are implemented as value-type structs. This means in turn that data structures that need to store an internal copy of an RNG instance — for example,
RandomSample — can only take such a copy by value. And hence you can have code like this:
auto rng = Random(unpredictableSeed);
auto sample1 = randomSample(iota(100), 5, rng);
auto sample2 = randomSample(iota(100), 5, rng);
sample1.writeln; // we'd expect this to update rng ...
sample2.writeln; // but this produces the same output!
In other words, the obligation to store value copies inevitably leads to unintended and maybe unobservable correlations. With functions, at least one can avoid this by passing in the RNG parameter via
ref; however, types like
RandomCover that need to store the RNG internally cannot work around things in this way. Nor can we just store an internal pointer to an RNG instance, because that’s unsafe — in order for things to work properly, the RNG type itself needs to have reference type semantics. For example, it would have been pointless trying to add the new random distribution ranges from
std.random; it would just have introduced more broken code.
This much has long been acknowledged by all who have looked at
std.random; the question has been what to do about it. One option would be to keep RNGs implemented as structs, but to implement reference semantics around the RNG’s internal state data. This would have the advantage of (in API terms) allowing a drop-in replacement. However, it would still be a breaking change, and worse, the actual changes would be below-the-radar for users, so if (for example) the allocation strategy used to create the internal payload had problematic effects on speed or garbage collection, users might only find out when their programs started running like treacle. It’s also finnicky, because it relies on the programmer manually implementing reference semantics correctly for each type.
Classes give you reference semantics for free, but their use in Phobos is rare and may even be considered somewhat un-idiomatic. Their use also makes for a more intrusive change (the requirement to use
new), which prevents a drop-in replacement but may be beneficial inasmuch as it forces the user to appreciate where their own code’s behaviour may change. It’s also possible that the different allocation requirements may be problematic in some use-cases.
In short, these kind of changes need significant testing and review ‘in the wild’ before they might be acceptable in even an experimental Phobos module — certainly before any formal update or successor to
std.random could be brought in. Hence the need for a 3rd-party library, available via the
code.dlang.org D package repository, where things can be tried out and changed if necessary.
hap.random has benefited greatly from code that others have written. Existing
std.random code by Andrei Alexandrescu, Chris Cain, Andrej Mitrović and Masahiro Nakagawa made it possible to concentrate primarily on adapting the architecture rather than implementing algorithms. Chris’ work, in a pull request submitted to Phobos but not yet merged, was also the basis of the
DiscreteDistribution range. Some parts of the code, such as the updated Mersenne Twister, were ported from Boost.Random.
The design of
hap.random has also been influenced by many useful discussions with members of the D community. Besides authors already mentioned, I would like to thank H. S. Teoh, Jonathan M. Davis, and particularly bearophile and monarch_dodra.
Edit: I’ve made one small authorship correction: Andrej Mitrović, not Nils Boßung, was responsible for
std.random’s uniform distribution for enum types.
Please use and test
hap.random, and report your experiences, either here or on the D forums. The future quality of your random numbers may depend upon it. :-)