Coding and computing Physics World  March 2017
(ACC Cyfronet AGH)

When supercomputers go over to the dark side

Despite oodles of data and plenty of theories, we still don’t know what dark matter is. Martin White and Pat Scott describe how a new software tool called GAMBIT – run on supercomputers such as Prometheus – will test how novel theories stack up when confronted with real data

The most memorable thrillers are those where you get part-way through and then suddenly realize that nothing is as you imagined. Unexpected scientific paradigm shifts, where reality turns out not to be as we believed, can be just as exciting and perplexing. One such dramatic change in perspective has been the dawning realization over the last few decades that “ordinary” matter accounts for just a fifth of the matter in the universe, with the rest made of a mysterious “dark” matter. Physicists love unsolved problems, and they don’t come much bigger than working out the nature of this dark stuff.

If a blockbuster movie is ever made about the discovery of dark matter, the next decade may well be the climax. New data from experiments such as CERN’s Large Hadron Collider (LHC) are telling us more about what dark matter can and cannot be, while the recent discovery of gravitational waves reminds us that even century-old theories (general relativity in this case) can be spectacularly confirmed in the blink of an eye (or the chirp of a coalescing black hole). Meanwhile, searches for dark matter here on Earth are entering a crucial phase. Detectors weighing over a tonne are being placed deep underground to try to catch dark matter passing through the Earth itself, while construction is now starting on the Cherenkov Telescope Array – one of the finest facilities for hunting the annihilation of dark matter in outer space. These probes are so sensitive that they will be able to test – and potentially discover – many leading models of dark matter over the next 10 years.

What is less clear is how to make sense of these disparate datasets. Imagine being shown a 10-acre field and told a tiny new insect is hiding somewhere in the grass. You are also given, every few days, a new photo of a cow with a distinctive bite mark, and three distorted recordings of a strange chirrup (one of which is actually from a dying grasshopper). After 30 years – having seen a lot of cows and heard a lot of chirrups – your job is to determine the properties of the insect. Unfortunately, you have never seen it, and have only a vague hunch about its size given that it has never killed any of the cows.

Replace the insect by dark matter, and the photo and chirrups by astrophysical data (which may or may not be related to the insect) and you’ll see why studying dark matter is so hard. Still, hunters of dark matter perhaps have an easier job than our hypothetical entomologist, as they have much more data, plus decades of accurate astronomical observations (see box).

These include measurements of the orbital speed of stars as a function of their radial distance from their galaxy’s centre, so-called “galactic rotation curves”, which deviate from the form expected based on visible matter. There are gravitational-lensing observations – in which massive objects bend light as predicted by general relativity – that let us measure a system’s mass independently of the visible matter. Finally, we have very precise measurements of the temperature fluctuations in the cosmic microwave background (CMB), which are highly sensitive to the assumed matter content of the early universe.

But despite so much information, all we have are decades of null results – we only know that the insect is neither bee nor beetle, not what it is. Any one experiment we carry out has a huge range of possible outcomes, depending on what particle dark matter might be. In fact, the only way of learning the true nature of dark matter is to piece together clues from many different experiments, and seek distinct patterns in observations from totally different sources, such as evidence of a single property emerging in different ways in different datasets.

This quest puts us firmly in the realm of data science, which is booming thanks to the abundance of data provided by everything from social-media applications to government spies and noisy grasshoppers. The cross-pollination of techniques from a variety of disciplines is transforming how we look at and model the world, with research into dark matter benefiting from, and contributing to, this expertise. Machine-learning techniques, including neural networks, for example, are being used to improve algorithms for identifying particles at the LHC.

Genesis of GAMBIT

The GAMBIT (Global and Modular Beyond-the-Standard-Model Inference Tool) Collaboration can trace its origins back to the 2012 International Conference on High Energy Physics in Melbourne, Australia, where the discovery of the Higgs boson was announced via a live video link to CERN in Geneva. Celebrating in the bars on the south bank of the Yarra river, a group of us began a series of spirited discussions about how to find new physics, and identify dark matter, using software. Building on previous work by the particle and astroparticle communities, we dreamt of designing the first software package that could take generic theories of new physics and determine their viability by feeding them with data from all relevant particle and astroparticle experiments.

Realizing we’d need to recruit experts in everything from collider physics and astrophysics to cosmic-ray propagation, statistics and sampling theory, we grew the team by inviting members of our extended collaboration networks. By involving experts from every corner of particle and astroparticle physics, our aim was to ensure that every theoretical and experimental technique covered by GAMBIT is done carefully and rigorously. After a kick-off meeting at CERN in 2012, we’ve held online meetings and face-to-face workshops in Sweden, Australia, Norway, Mexico and Scotland, with a two-hour walk to the pub in the Scottish Highlands giving us the most effective example of gradient-descent techniques we have yet encountered (see photo).

Sloping away Members of the GAMBIT Collaboration take to the Scottish Highlands to test gradient-descent optimization methods firsthand. (GAMBIT Collaboration)

The GAMBIT team grew as we branched out into new areas of expertise. We now have about 30 researchers from Australia, Europe and North America, including members of most major particle and astroparticle experiments, split into overlapping working groups. For each recruit, a proven track record and enthusiasm for software development are essential. All code and collaboration materials are kept in a shared repository, to which all members of GAMBIT have full access.

Principles in practice

Our basic idea is to determine which theories for dark matter and its new particle friends are most likely to be correct, by performing a joint analysis of all the different searches. This essentially involves using known statistical methods to write an equation for a “composite likelihood function”, which describes the probability that a given theory will lead to the full range of results seen at different experiments. We then pick an algorithm for scanning over the parameters of a particular theory before simulating the signals expected at each experiment for each combination of parameters. The algorithm might, for example, choose a set of parameters for a model of dark matter interacting with a Higgs boson, after which GAMBIT tests if the resulting model is compatible with CMB observations, null observations in Earth-bound detectors, and observations of the Higgs boson by the ATLAS and CMS experiments at the LHC.

The composite likelihood function compares the predictions from all the simulations to the results of all the experiments and gives a single number for every parameter combination that indicates how well it agrees with all data. From this, we can determine the preferred and excluded parameter ranges, and start to compare different theories. The same approach has already been widely applied to a variety of problems in many areas of science, including finding the correct parameters of the neutrino sector – namely their oscillation parameters and the differences between their masses.

This all sounds great in principle. In practice, such global statistical fits are horribly difficult to apply to complicated theories for new particles. Simulating multiple experiments requires a lot of computing power, so we have always had to develop approximations to the full simulations. It’s also vital to correctly account for a lot of systematic uncertainties arising from imperfect experimental measurements and inaccurate (or more often, incomplete) theoretical predictions.

Choosing an efficient parameter-sampling algorithm is therefore paramount, which is why global fits of models for new particles have traditionally been carried out for only a few very popular theories (mostly simple supersymmetric scenarios), using only a subset of the available data. No solution has existed for exploring generic new theories of dark matter and new particles, with generic datasets – until now.

Data mining for new particles

Our first challenge in designing GAMBIT was to divorce the calculation of experimental signatures from any knowledge of the fundamental parameters of the theories that might give rise to them (see box below). The idea here was to make experimental likelihood functions reusable with almost any theory. For example, given a model of the distribution of dark matter in a distant galaxy, plus some new theory of particle physics, a theorist can use GAMBIT to calculate the different processes that might occur to produce gamma rays via the annihilation of dark matter, and predict the flux of gamma rays expected here on Earth.

A gamma-ray astronomer, meanwhile, could search for dark matter by looking for a flux of gamma rays from the annihilation of dark matter in a distant dwarf galaxy, and then make some comment about how well the predicted flux compares with the observed one. To do this, they only need to know the predicted flux – not why, how or which theories predict it. The final assessment of the viability of the theorist’s model obviously relies on putting these two things together in a statistically principled way, but the bottleneck of this whole calculation is the gamma-ray flux. The theory calculation can be swapped for any other theory, so long as it still predicts this gamma-ray flux. Likewise, the experimental side can be swapped for data from any other gamma-ray telescope.

GAMBIT is therefore designed as a series of “plug-and-play” packages, where each function in the code is tagged with the quantity that it can calculate. A complex piece of code then stitches the functions together in the right order using techniques from a branch of maths known as “graph theory”, depending on which theory and which data are to be investigated. The user can therefore specify a theory, list the experiments that need to be included, and then let the tool do a statistical fit.

This system has proven remarkably versatile, letting us study supersymmetric theories, models of dark matter interacting only with the Higgs boson, and axions. The package can also be easily extended by adding new models and observable calculations following a straightforward recipe. (We have an extensive manual to help new users do this.)

The second challenge was how to deal with the reams of relevant data in a way that does not “mark” them with theoretical assumptions. This needs not only fast simulation of lots of experiments, but also a detailed knowledge of how to handle the related systematic and theoretical uncertainties. The conventional approach was to use simple heuristic likelihood calculations, which give you only a rough idea of the correct likelihood, or one that can be applied to only a few models, based on published experimental results.

To get the most information out of the data, and to do so as quickly as possible, we instead developed new fast simulations of the LHC, direct searches for dark matter, and indirect searches with the Fermi-LAT gamma-ray telescope and the IceCube neutrino telescope. These calculations provide methods that are more accurate than previous techniques and, most importantly, are generally applicable to new theories. Systematic uncertainties can be added as dedicated models in their own right, and scanned in tandem with the parameters of the theory for new physics.

The final big challenge to making GAMBIT work was perhaps the most obvious: speed and computational power. Over the last four years, using a combination of computational trickery and brute force, we’ve had to make calculations that take hours run in less than five seconds. (Any longer and our global fits would not converge within any practical timescale, making them useless.) This has meant using very fast parameter-sampling algorithms from the data-science literature, and adapting the code to run on a bank of CPUs rather than on a single unit.

To efficiently use many thousands of CPU cores on clusters of multicore machines, GAMBIT uses a “nested parallelization” scheme, which means that the top-level code is split up to run on multiple CPUs, which then call a piece of code that is itself split up to run on multiple CPUs. We have been fortunate to receive time to run and develop GAMBIT on the Prometheus supercomputer at Cyfronet in Krakow, Poland, where GAMBIT collaboration member Marcin Chrząszcz splits his time with CERN. This has let us perform real-time simulations of proton collisions at the LHC, as well as rigorous simulations of a range of astrophysical experiments, for billions of different parameter combinations in various theories for new particles. Even though we haven’t found dark matter yet, these calculations have substantially narrowed the range of viable theories for its identity, bringing the field ever closer to that elusive discovery.

The future

We will soon release the GAMBIT code as a free-to-use open-source public tool, and see several benefits to this. Apart from it being the norm in academic collaborations – and rightly so – most physicists are known to write much better code when it’s for public consumption. Second, all large physics software projects are so complex that it is hard to eliminate bugs entirely, but the scrutiny of the international community is invaluable in locating the hard-to-find bugs down in the least-used corners of the code, which don’t show up in regular testing. Finally, an open-source code opens all sorts of opportunities for future collaboration and development, including the prospect that external users who invent smart new extensions can eventually merge their code with a subsequent GAMBIT release.

GAMBIT’s first release covers a range of the most popular scenarios for dark matter, but the best is yet to come. Many more models remain to be tested, while more fresh data are on the way from the LHC running at higher energies than ever before, from direct searches for dark matter with detectors weighing over a tonne and from indirect detection with larger telescopes than ever before. Axion experiments now seem set to probe many of the most promising parts of the axion parameter space, and large-scale neutrino and rare-decay searches are also due to come online in the next few years.

We will keep GAMBIT updated and ready to incorporate these – and other – new streams of data. The first physics results from GAMBIT will emerge shortly after this article goes to press and we hope the package will go on to become a standard tool for solving problems in particle astrophysics and cosmology.

Eventually, we might be able to track down the mysterious chirruping phantom of the 10-acre field – only to notice some strange marks on its wings, and glimpse from the corner of the eye something expanding rapidly off into the hundred-acre field next door. By then, another mystery will be waiting and it will probably be time to draw on even newer cosmological data to solve it.