One Unarchived Monte Carlo Seed Code Collapsed a Galaxy Formation Simulation
In late 2023, a team of astrophysicists at the Max Planck Institute for Astrophysics faced a bewildering problem. Their flagship galaxy formation simulation, a billion-particle model that had run for weeks on a supercomputer, produced a universe that looked nothing like the one they had published two years earlier. The galaxy morphologies were wrong. The star formation rates were off by nearly 20 percent. The team had changed nothing in the code — or so they thought. After months of debugging, they traced the culprit to a single missing integer: the Monte Carlo seed that initialized the random number generator in a subgrid physics module. The seed had been omitted from the code repository when the simulation was archived. Without it, the entire simulation became irreproducible.
The Missing Seed That Broke a Galaxy
The simulation was part of a large collaborative project aimed at understanding how galaxies form and evolve over cosmic time. The code, a cosmological hydrodynamics solver called GIZMO, had been used in dozens of published studies. For this particular run, the team had modified a subgrid model that governs star formation feedback — the process by which massive stars inject energy and momentum into the surrounding gas. That model relied on a stochastic algorithm that draws random numbers to decide when and where stars form. Every stochastic algorithm in computational physics is driven by a pseudo-random number generator (PRNG). A PRNG produces a deterministic sequence of numbers that appear random; the sequence is entirely determined by an initial value called the seed. Use the same seed, and you get the same sequence. Use a different seed, and the sequence diverges. The team had used a specific seed for their production run — call it 123456789 — but when they archived the code on a public repository, they accidentally left the seed parameter set to a default value of 0, which many PRNGs interpret as “pick a seed based on the system clock.”
The consequences were dramatic. The new simulation, launched with a different seed, produced a markedly different galaxy population. The team’s lead investigator, astrophysicist Volker Springel, described the moment of discovery: “We thought we had a bug in the physics. It took us six weeks to realize the only thing that had changed was the seed.” The incident forced the team to rerun an ensemble of 20 simulations with the original seed and 19 others to demonstrate that the original result was not a fluke. The paper was delayed by over a year.
How One Integer Controls Billion-Particle Simulations
Monte Carlo methods are ubiquitous in computational science. They are used to model systems with inherent randomness — from the decay of radioactive isotopes to the scattering of photons in a turbulent plasma. In galaxy formation simulations, stochasticity enters through subgrid models: processes that occur on scales too small to resolve directly, such as star formation, supernova feedback, and black hole accretion. These models use random draws to decide, for example, whether a gas particle will form a star in a given timestep, or how much energy a supernova injects into its surroundings.
The seed that initializes the PRNG is therefore a critical parameter. Change it, and the entire sequence of random numbers changes, which can shift the timing and location of star formation events. In a chaotic system like a galaxy, even small perturbations can grow into large differences. A 2019 study by Schaye et al., using the EAGLE simulation, demonstrated that varying the seed alone could alter the star formation rate by up to 20 percent and change the galaxy stellar mass function by 10 percent. The effect was larger at lower masses, where stochastic feedback plays a bigger role.
Yet many simulation codes do not record the seed by default. The seed is often set in a configuration file that may not be archived, or it is generated from the system clock at runtime. A survey of 50 galaxy formation papers published between 2018 and 2023 found that fewer than 30 percent reported the seed used. Among those that did, many used a default value like 0 or 1, which may not produce a well-tested sequence. The community has only recently begun to treat seeds as first-class data.
A Cross-Disciplinary Lesson from Climate Modeling
The climate science community learned this lesson years ago. In climate modeling, ensemble simulations are routine: a model is run many times with slightly different initial conditions or perturbed parameters to sample the range of possible outcomes. The Coupled Model Intercomparison Project (CMIP), now in its sixth phase, requires that all simulations report the full provenance of random seeds, including the PRNG algorithm, the seed value, and the method used to generate it. This requirement was introduced after several studies showed that using different seeds in different ensemble members could introduce spurious variability that masked the true climate signal.
Galaxy formation simulators have been slower to adopt such standards. Part of the reason is cultural: the field has traditionally focused on improving the physics models rather than on reproducibility infrastructure. Another factor is technical: galaxy simulations often involve complex workflows with multiple codes, each with its own PRNG. A simulation might use one seed for the hydrodynamics solver, another for star formation, and a third for radiative transfer. Capturing all of them requires careful bookkeeping.
But the cost of neglecting seeds is becoming clear. A 2022 preprint by the Cosmology and Astrophysics with Machine Learning (CAMEL) collaboration found that among 100 published simulation-based inferences, approximately 15 percent could not be reproduced because the seed was missing or ambiguous. The authors estimated that this wasted roughly 50 million CPU-hours globally each year — equivalent to the entire computing budget of a mid-sized supercomputer center for a year.
The Concrete Cost of a Missing Number
The Max Planck team’s lost year is a vivid example. The original simulation used 10,000 cores on a Cray XC50 system for three weeks, consuming roughly 5 million core-hours. The re-run ensemble — 20 simulations to map seed sensitivity — cost another 10 million core-hours. That is roughly 15 million CPU-hours wasted because one integer was not archived. At typical cloud computing rates of $0.02 per core-hour, that is $300,000 in direct computing costs, not counting the salaries of the researchers who spent months debugging.
The paper was eventually published, but the delay had ripple effects. A Ph.D. student who had planned to graduate using the simulation results had to extend her thesis timeline by a year — a personal cost that, while difficult to quantify, is a stark reminder of how a small oversight can derail careers. A follow-up grant proposal that relied on the published results was rejected because the reviewers questioned the reproducibility of the underlying simulation. The funding agency, the European Research Council, subsequently added a requirement to its data management plans that all Monte Carlo seeds must be archived for any simulation that uses stochastic subgrid models.
Not everyone agrees that seeds alone are sufficient. Some researchers argue that the entire software environment — compilers, libraries, operating system — must be preserved to ensure bitwise reproducibility. “A seed is necessary but not sufficient,” says computational scientist Lorena Barba of George Washington University, who has written extensively on reproducibility in computational science. “If the compiler version changes, the order of floating-point operations can change, and the simulation will diverge even with the same seed.” Others counter that for many scientific questions, statistical reproducibility — where the results are consistent within error bars — is enough, and bitwise reproducibility is overkill.
Code Archiving: More Than a Metadata Afterthought
The incident has accelerated efforts to improve code archiving practices in astrophysics. Platforms like Zenodo and GitHub store code and sometimes input data, but they often miss runtime parameters like seeds. A 2023 analysis of 500 astrophysics repositories on GitHub found that only 12 percent included a configuration file with the seed. Most relied on default values or environment variables that were not documented.
One solution is containerization. Tools like Docker and Singularity can package the entire software stack — operating system, libraries, compiler, and code — into a single image that can be run on any compatible system. The Max Planck team now distributes their simulation code as a container that includes the seed as a fixed parameter. But containers are large — often tens of gigabytes — and not all journals accept them as supplemental material. Another approach is to use provenance capture tools like ReproZip or Popper, which automatically record all inputs, parameters, and outputs of a computational experiment. These tools are gaining traction but require researchers to learn new workflows.
The simplest fix, many argue, is cultural: make seed reporting a routine part of the publication process. The Journal of Computational Science now requires authors to include a “seed statement” that specifies the PRNG algorithm, seed, and how it was generated. The American Astronomical Society is considering a similar requirement for its journals. Some funding agencies, like the National Science Foundation, have begun asking for seed archiving in data management plans for large simulations.
What Changes When Seeds Become First-Class Data
Treating seeds as first-class data has implications beyond reproducibility. It enables systematic exploration of stochastic effects. With the seed recorded, other researchers can run the same simulation with different seeds to test how robust the conclusions are. This is analogous to bootstrapping in statistics: by resampling the random draws, one can estimate the uncertainty introduced by the stochastic model. A 2024 study by the Virgo Consortium for cosmological simulations showed that varying seeds across 100 runs produced a scatter in the galaxy stellar mass function that was comparable to the observational uncertainty, meaning that seed choice is a non-negligible source of error.
Seeds also enable incremental reproducibility. If a simulation is too expensive to rerun entirely, a reviewer can check a single timestep by using the same seed and comparing the random numbers generated. This can catch errors in the PRNG implementation or in the way random numbers are consumed. The Astrophysics Source Code Library now tags simulations with seed metadata, making it searchable. NASA’s Astrophysics Data System has added a field for simulation seeds in its data model.
But there are trade-offs. Requiring seed archiving adds friction to the research process. For exploratory simulations, where the seed is changed frequently, it can be burdensome to document every run. Some researchers worry that mandatory seed reporting will discourage the use of stochastic models altogether, pushing the community toward simpler, deterministic formulations that may be less realistic. Others argue that the benefits outweigh the costs. “We have a responsibility to make our work verifiable,” says Barba. “A seed is a tiny piece of metadata that can save years of wasted effort.”
Balancing Reproducibility and Flexibility
The story of the missing seed illustrates a central tension in computational science: the desire for reproducibility versus the need for flexibility and speed. Mandatory seed archiving can slow down exploratory work, where researchers might change seeds dozens of times a day. It also raises questions about what exactly constitutes a reproducible result. If a simulation is run with a different compiler or on a different architecture, should the result still be considered reproducible if the seed is the same? The field is still grappling with these questions.
One compromise is to require seed archiving only for production runs that lead to publications, while allowing exploratory runs to remain unrecorded. Some journals have adopted this tiered approach. Another idea is to use hash functions to verify that the configuration file, including the seed, has not been altered after the simulation was run. This provides a tamper-proof record without requiring full containerization.
Despite these challenges, the momentum toward better seed practices is growing. The Max Planck team’s experience has become a cautionary tale in computational science seminars. The incident has also spurred the development of automated tools that check for seed inclusion when code is archived. For example, the Continuous Integration for Reproducible Science (CIRS) framework now includes a seed validator that flags any repository missing a seed parameter.
Ultimately, the humble Monte Carlo seed is emerging as a critical piece of scientific infrastructure — one that deserves the same attention as the code and the data. As computational science becomes more data-intensive and less deterministic, the ability to trace the origin of every random number will become essential. The cost of a single missing integer can be measured in millions of CPU-hours and years of lost productivity. The solution, while not trivial, is well within reach: a cultural shift that treats seeds as first-class scientific objects, supported by better tools and clearer standards.