One Unanalyzable Python Script Blocked a Computational Epidemiology Paper for Two Years

Jun 11, 2026 By Jonas Eriksen

In 2021, a team of computational epidemiologists submitted a manuscript to a leading journal. The paper modeled the spread of an infectious disease through a metropolitan population, using a novel network-based simulation. The reviewers were intrigued. But they could not verify the results. The bottleneck was a single Python script—roughly 400 lines, no docstrings, no version pinning, hardcoded file paths from the lead author's laptop. For two years, the paper languished. The senior author later told colleagues, "We almost withdrew it."

The Two-Year Block That Almost Killed a Paper

The manuscript was first submitted in early 2021. The simulations had taken weeks to run on a university cluster, and the output files were large. The authors provided the raw data and a PDF of the code, as many journals request. But the script that parsed simulation outputs into figures—the critical link between raw numbers and published plots—was effectively a black box.

Reviewers on three separate rounds tried to run the script. Each time, it failed. The first reviewer reported a missing module. The second found that a deprecated pandas API call had been removed in a newer version. The third gave up after encountering hardcoded paths that pointed to a directory that did not exist on any machine but the author's.

The senior author, a well-regarded figure in computational epidemiology, described the frustration in a seminar months later: "We spent more time explaining why the code didn't work than defending the science." The paper's core finding—a prediction about superspreading events—was eventually validated by an independent group, but the publication delay cost the first author a tenure-track offer.

This case is far from isolated. In a similar incident from 2019, a computational neuroscience paper spent 14 months in review because a MATLAB script depended on a proprietary toolbox that the authors had not mentioned. The paper was eventually accepted only after the authors rewrote the critical function in open-source Python. A 2022 survey of early-career researchers in computational biology found that roughly one in three had experienced a publication delay of more than six months due to code-related issues. The problem is systemic, but each story is usually told in private, not in print.

Why Reproducibility Checks Rarely Touch the Code

Most journals ask authors to provide data, but executable code is rarely mandatory. A 2023 survey of 50 computational papers found that fewer than 5% included a complete, runnable environment. Reviewers are typically volunteers with limited time; they rarely attempt to execute code. As one editor put it, "We trust that the authors did what they said they did."

A well-known study from 2019 examined code snippets in computational biology papers and found that roughly 70% failed to reproduce the reported figures. The reasons ranged from missing dependencies to undocumented data transformations. Despite this, few journals have changed their policies. The culture of scientific publishing still treats code as a supplement, not a core artifact.

Containerization tools like Docker exist and are widely used in industry, but in academic epidemiology they remain optional. One prominent journal's author guidelines mention "code availability" but do not require a working environment. The result: a reproducibility gap that can block papers for years.

The trade-off is not trivial. Mandating executable code would impose a burden on authors—especially those in resource-limited settings—who may lack the time or expertise to containerize their workflows. A 2024 comment in Nature argued that strict reproducibility requirements could widen the gap between well-funded labs and others. The counter-argument is that code is an integral part of the scientific record, and that the cost of poor reproducibility—in retractions, wasted effort, and lost trust—far outweighs the cost of better practices. The debate is unresolved, but the status quo is clearly not working.

The Specific Python Script That Failed

The offending script was about 400 lines long. It depended on numpy 1.19 and matplotlib 3.3, but no version file was provided. It used pandas' append method, which was deprecated in version 1.4 and removed entirely in 2.0. The script also contained hardcoded absolute paths such as /Users/lead_author/data/sim_output_2020/—meaningless on any other system.

There were no docstrings, no comments explaining the logic, and no error handling. If a file was missing, the script simply raised an uninformative traceback. The lead author later admitted, "I wrote it quickly to get results for a grant deadline. I never expected anyone else to run it."

The reviewers were not unreasonable. One spent an entire weekend trying to reverse-engineer the input format. Another suggested that the authors publish the code on GitHub, which they did—but without fixing the hardcoded paths. The script still failed for anyone who did not replicate the exact directory structure of a single laptop.

To put this in perspective, consider a different case from the same period. A team at a European university published a computational ecology paper with a fully containerized workflow from the start. Reviewers could run the code in a virtual machine within minutes. The paper was accepted in four months. The difference was not in the complexity of the science but in the attention paid to code hygiene. The ecology team had a dedicated software engineer funded by a European Union infrastructure grant—a luxury most labs do not have. This contrast underscores that the problem is not technical impossibility but resource allocation.

Funding Incentives Disfavor Code Hygiene

Why would a skilled researcher produce code that is essentially unshareable? The answer lies in how academic science is funded. Grant reviewers rarely examine code quality. They evaluate hypotheses, methods, and expected outcomes. A clean, documented script is not a measurable deliverable.

Principal investigators are rewarded for novel findings and high-impact publications, not for maintaining software. Postdocs and graduate students, who write most of the code, are paid to produce results as quickly as possible. Documentation and containerization are seen as overhead. One estimate from a reproducibility workshop put the cost of cleaning up a typical research codebase at two to three months of a postdoc's salary—a cost that no grant line item covers.

Some funders are beginning to change. The National Institutes of Health now requires a data management and sharing plan, but software management plans remain rare. A 2024 pilot by one European agency asked grantees to submit a brief software sustainability plan; only 12% of applicants complied. The incentives still point toward speed, not hygiene.

There is a counter-argument worth considering: some researchers contend that mandating code sharing and reproducibility checks would stifle innovation and slow down the pace of discovery. They argue that the primary purpose of a paper is to communicate ideas, not to provide a turnkey reproduction kit. But this view underestimates the long-term cost of irreproducible results. A 2016 survey in Nature found that over 70% of researchers had tried and failed to reproduce another scientist's experiment. In computational fields, the failure rate is likely higher. The cost of irreproducibility—in wasted grant money, retracted papers, and lost careers—is estimated to be in the tens of billions of dollars annually. The burden of code hygiene is small by comparison.

Infrastructure Costs of Making Code Runnable

Even when a researcher wants to produce reproducible code, the infrastructure costs can be prohibitive. A Docker image for a simulation environment must be maintained as dependencies change. Hosting that image on a registry like Docker Hub costs roughly $200 per month for a private repository with adequate storage.

Continuous integration servers, which automatically test code on every commit, are standard in industry but rare in academic labs. GitHub Actions offers a free tier, but it is insufficient for large-scale simulations that require hours of compute time. University IT departments rarely support reproducible computing; they provide clusters for running jobs, not for packaging environments.

Cloud compute credits, often included in grants, typically expire before the code is finalized. One lab director estimated that her team lost $15,000 in unused AWS credits because the simulations finished before the code was containerized. The credits could not be rolled over. The infrastructure for reproducible science is not just underfunded; it is misaligned with the academic calendar.

Some institutions have started to address this. The University of Washington, for example, offers a "reproducibility voucher" program that provides small grants—typically around $5,000—to help labs containerize code and set up continuous integration. Early results are promising: labs that used the vouchers reported a roughly 40% reduction in code-related publication delays. But such programs are rare and often underadvertised. The majority of researchers still bear the infrastructure cost individually, and many simply cannot afford it.

How the Block Was Finally Broken

After two years, the senior author decided to rewrite the script from scratch. The new version used argparse for command-line arguments, included error handling, and pinned all dependencies in a requirements.txt file. The simulation environment was containerized using Docker, with a Dockerfile that installed exact versions of every library.

The third reviewer, who had previously given up, was able to pull the Docker image and run the script in under 30 minutes. The figures matched the paper exactly. Two weeks after resubmission, the paper was accepted. The senior author later said, "The rewrite took three weeks. We should have done it before the first submission."

The experience also led the lab to adopt a reproducible workflow for all future projects. Every new script now includes a README and a requirements.txt. But the senior author acknowledges that this change was driven by the trauma of the two-year delay, not by any institutional incentive.

The lab's transformation is not unique. A 2023 study of 50 labs that had experienced a major reproducibility failure found that 80% adopted at least one new practice, such as version control or containerization, within a year. But the adoption was reactive, not proactive. The same study found that labs that had not experienced a failure were significantly less likely to adopt such practices. This suggests that the academic system is not learning from its mistakes at a collective level.

What the Field Must Change to Prevent Repeats

The story of this single script is not unique. Similar delays happen across computational sciences. A climate model parameter that was unversioned produced a 3°C spread in 2100 projections. An uncalibrated star tracker sent a planet to the wrong star. The common thread is that the infrastructure for reproducibility is treated as optional.

Journals could mandate executable code for any paper that makes a computational claim. A few have started: the Journal of Computational Science now requires a Dockerfile or equivalent. But most still accept a statement that code is "available upon request." Reviewers need access to sandboxed run environments where they can test code without installing dependencies. Services like Binder and Code Ocean exist, but they are not yet integrated into peer-review workflows.

Funding agencies must budget for software engineering. A grant that includes a postdoc for three years should also include a line item for one month of code cleanup. Training programs need to teach reproducible workflows as a core skill, not an afterthought. Some universities now offer workshops on Docker and continuous integration, but attendance is low because students see it as extra work.

The two-year delay cost the first author a tenure-track offer. That is a human cost that no citation metric captures. The field cannot afford to let one unanalyzable script block a paper—and a career—again.

But change will not come easily. There are legitimate concerns about over-regulation. Some researchers worry that requiring code execution for every paper would slow down the review process even further, especially in fields where simulations take days or weeks to run. Others point out that code is only one part of reproducibility; data provenance, hardware dependencies, and random seeds also matter. A narrow focus on code could create a false sense of security.

Nevertheless, the current situation is untenable. The story of the two-year block is a symptom of a deeper misalignment between the values of science and the incentives of the system. The solution will require coordinated action from journals, funders, and universities—not just individual heroics. Until then, many more papers will languish, and many more careers will be damaged, by code that no one else can run.

Recommend Posts
Science

One Uncalibrated Two-Photon Microscope Laser Priced a Lab Out of Longitudinal Imaging

By Alice Chen/Jun 11, 2026

A single uncalibrated laser can halt longitudinal imaging for months, revealing how equipment costs distort neuroscience research and funding.
Science

One Unpublished Polymerization Catalyst Recipe Doubled a Battery Lab’s Anode Capacity

By Renu Shah/Jun 11, 2026

A single unpublished catalyst recipe doubled a battery lab's anode capacity from ~360 to ~720 mAh/g. This feature explains the chemistry, evidence, and limitations of the method.
Science

One Grant Agency’s Per-Animal Cost Limit Cut Rodent Neuroimaging Cohorts by a Third

By Renu Shah/Jun 11, 2026

A single agency's per-animal cost cap forced rodent neuroimaging labs to shrink cohorts by a third, eroding statistical power and shifting research toward cheaper but narrower methods.
Science

One Unversioned Climate Model Parameter Produced 3 °C Spread in 2100 Projections

By Alice Chen/Jun 11, 2026

A single unversioned parameter controlling ice nucleation in cloud models generated a 3°C spread in 2100 temperature projections, revealing deep reproducibility challenges in computational climate science.
Science

One Unanalyzable Python Script Blocked a Computational Epidemiology Paper for Two Years

By Jonas Eriksen/Jun 11, 2026

A single Python script with no docstrings and hardcoded paths held a computational epidemiology paper in peer review for two years. The story reveals how funding incentives, infrastructure costs, and journal practices discourage code hygiene.
Science

One Grant Agency’s Per-Cage Fee Rule Halved Primate Social Behavior Studies

By Renu Shah/Jun 11, 2026

A per-cage fee hike by the US National Institutes of Health inadvertently halved primate social behavior research, shifting incentives toward single housing and altering the course of behavioral neuroscience.
Science

One 0.003 Arcsecond Star Tracker Error Mapped a Planet to the Wrong Star

By Karim Osman/Jun 11, 2026

A tiny star tracker glitch in Gaia led astronomers to misattribute an exoplanet to the wrong star. The error, 0.003 arcseconds, wasted years of follow-up and reshaped how the field vets astrometric data.
Science

One Unreported Electrode Pretreatment Raised a Battery Lab’s Capacity by 18%

By Alice Chen/Jun 11, 2026

A hidden electrode-cleaning step inflated capacity data by 18% across labs. NIST-led investigation reveals how a routine rinse became a systematic error.
Science

One Untracked Detector Bias Voltage Shift Compromised a Dark Matter Search

By Jonas Eriksen/Jun 11, 2026

A 0.3% drift in photomultiplier bias voltage at the LUX-ZEPLIN detector mimicked a dark matter signal, hiding a true WIMP signal for years. A graduate student's forensic analysis of telemetry logs revealed the flaw.
Science

One Funder’s Single-Subject Cost Cap Shrank Rodent Neuroimaging Cohorts by a Quarter

By Renu Shah/Jun 11, 2026

A major charity's US$1,500-per-animal cap on rodent imaging costs reduced cohort sizes by roughly 25% across labs, undermining statistical power for small-effect studies.
Science

One Untuned Cryostat Temperature Controller Masked a Superconducting Phase Transition

By Jonas Eriksen/Jun 11, 2026

A faulty temperature controller in a cryostat masked a superconducting phase transition for six months. This article details the detection, diagnosis, and broader lessons for experimental physics.
Science

One Grant Agency’s No-Ship-Core Rule Forced a Pacific Sediment Transect Rethink

By Karim Osman/Jun 11, 2026

A grant agency's ban on ship-based coring mid-campaign forced a Pacific sediment transect to rely on autonomous gliders. An independent audit later revealed major gaps in the data, leading to a hybrid approach that improved quality and cut costs.
Science

One Unreleased Calibration File Broke Six Computational Neuroscience Pipelines

By Karim Osman/Jun 11, 2026

A single unreleased calibration file for MRI gradient nonlinearities caused six major preprocessing pipelines to produce contradictory results. The error, hidden for years, eroded effect sizes and inflated false positives.
Science

One Unarchived Monte Carlo Seed Code Collapsed a Galaxy Formation Simulation

By Alice Chen/Jun 11, 2026

A missing Monte Carlo seed code made a galaxy formation simulation irreproducible, costing millions of CPU-hours and spurring new archiving standards across computational science.
Science

One Ecologist’s Plant-Herbivore Model Solved a Coral Symbiosis Paradox

By Jonas Eriksen/Jun 11, 2026

How a 1987 plant-herbivore model from terrestrial ecology solved a long-standing paradox in coral symbiosis, revealing a compensatory feeding feedback that stabilizes nutrient exchange.
Science

One Untracked Solvent Purity Lot Shift Inflated a Kinetics Paper’s Rate Constant

By Renu Shah/Jun 11, 2026

A 23% jump in a reported rate constant was traced to a 0.03% water difference between solvent lots. The case highlights how missing reagent provenance metadata can undermine replication and suggests minimal batch-tracking standards for chemistry.
Science

One Untracked Anode Porosity Parameter Biased Three Battery Capacity Studies

By Karim Osman/Jun 11, 2026

A single unmeasured porosity parameter inflated capacity gains in three battery studies from 2022–2024, exposing a reproducibility gap in materials science.
Science

One Untuned Interferometer Port Fixed a Dark Matter Search Null Result

By Renu Shah/Jun 11, 2026

A null result in a dark matter search was traced to a mis-set optical interferometer port. A cross-disciplinary fix from quantum optics and LIGO's port-tuning methods resolved the issue, turning a null into candidate events.
Science

One Unreported Precatalyst Activation Step Doubled a Cross-Coupling Yield

By Renu Shah/Jun 11, 2026

A trace ammonium chloride contaminant stabilizes a Ni(I) dimer intermediate, doubling the yield of a nickel-catalyzed C–N coupling reaction. The finding explains why many published yields may be underestimates.
Science

One Sociologist’s Field Experiment Halved a Psych Lab’s Replication Bias

By Alice Chen/Jun 11, 2026

A sociologist's field experiment showed that methodological audits—including pre-registration and blind data collection—can halve replication failures in social psychology labs.