Assessing the Transparency and Reproducibility of Peer Review in Software Engineering Research

The Imperative of Transparency in Software Engineering Peer Review

Peer review remains the primary mechanism for quality control in scholarly publishing. Within software engineering—a discipline that both studies and produces complex, often verifiable systems—the stakes are particularly high. A lack of transparency in how reviews are conducted can mask systemic biases, inconsistent standards, and even reviewer incompetence. Transparent peer review, wherein review reports, editor decisions, and sometimes reviewer identities are made public alongside the published article, directly addresses these risks. Journals such as the International Journal of Software Engineering and Knowledge Engineering have experimented with open reports, finding that authors receive more thoughtful feedback when reviewers know their work will be visible to the community. For software engineering specifically, transparency also means allowing the community to inspect the review process for reproducibility claims: if a reviewer rejects a paper because an experiment "cannot be replicated," the community should be able to see exactly what the reviewer asked for and whether the author addressed those concerns.

Reducing Bias Through Open Identities

Traditional double-blind review (where both author and reviewer identities are hidden) was designed to protect against bias based on gender, institution, or reputation. However, evidence from fields like biomedicine suggests that blind review does not eliminate bias—it merely masks it. In software engineering, where small research communities mean that many reviewers can guess the authors anyway, the pretense of anonymity is thin. Open peer review, where reviewer names are published, increases accountability. A reviewer is less likely to dismiss a paper from a little-known university if their name will be attached to the review. Several top-tier software engineering conferences, including ICSE, have piloted open review tracks with positive feedback from both authors and reviewers.

Clear Criteria and Checklists

Transparency also demands that reviewers know exactly what standards they are expected to apply. Many software engineering venues now provide structured review forms with specific questions about methodology, data availability, and artifact quality. For example, the ACM Transactions on Software Engineering and Methodology (TOSEM) uses a detailed review checklist that asks reviewers to evaluate the completeness of the reproducibility package. Making these checklists public before submission helps authors prepare their work and aligns expectations across all stakeholders.

Reproducibility: The Core Challenge for Software Engineering Research

Reproducibility—the ability to obtain consistent results using the same input data, computational steps, methods, and codes—is the cornerstone of scientific evidence. In software engineering, the challenge is acute. Papers often present new algorithms, tools, or empirical studies that rely on custom code, specific hardware configurations, or proprietary datasets. When authors do not share their implementation, subsequent researchers cannot verify claims, build upon the work, or compare results. A landmark study by the Nature reproducibility survey found that more than 70% of researchers have tried and failed to reproduce another scientist's experiments. Within software engineering, that number is likely higher given the additional complexities of compiler versions, operating systems, and library dependencies.

Common Barriers to Reproducibility

Closed-source or unshared tools: Many research prototypes are never released, or are released only as binaries without source code. This makes it impossible to inspect or modify the implementation to test boundary conditions.
Incomplete methodological descriptions: Papers often omit details about parameter tuning, random seeds, data preprocessing steps, or the exact procedure for running experiments. Even when the code is shared, missing instructions prevent reproduction.
Unavailable datasets: Datasets may be proprietary (e.g., industrial logs), too large to host on academic repositories, or collected under licenses that forbid redistribution. Authors sometimes promise to share “upon request” but fail to respond.
Evolving dependencies: A tool that worked with Python 2.7 and a specific version of a library may break when run on a modern system. Without a container or virtual environment specification, reproducibility decays rapidly.

Current Practices in Software Engineering Peer Review

The state of peer review across software engineering venues is highly heterogeneous. Major conferences like ICSE, FSE, ASE, and ISSTA have adopted artifact evaluation tracks, where submitted papers are judged not only on their written contribution but also on the quality and reproducibility of accompanying artifacts. These tracks have raised awareness but are not yet mandatory for all papers. Journals often lag behind; many still rely on traditional anonymous review with no reproducibility requirement. A 2022 survey of 50 top software engineering journals found that fewer than 20% had any formal reproducibility policy. This inconsistency sends mixed signals to authors about what constitutes acceptable research practice.

Variability in Review Quality

Even within a single conference, review quality can vary dramatically. Some reviewers write detailed, constructive feedback while others provide a few generic sentences. Without transparency, poor reviews go uncorrected. One proposed remedy is to publish review reports alongside papers so the community can assess the thoroughness of the evaluation. This would create a natural incentive for reviewers to invest effort, knowing their work is visible to their peers and to future authors. Several software engineering venues already do this for accepted papers, and the practice is spreading.

Actionable Strategies for Greater Transparency

Open Peer Review Models

Adopting open peer review means making reviewer comments and author responses publicly available. This can be done for both accepted and rejected manuscripts (with the author’s consent). The Software Engineering Journal Club has shown that open discussions of reviews improve the overall quality of feedback. For the field of software engineering, where many papers present novel methods that require careful evaluation, open review allows the broader community to see how conclusions were shaped by the review process. It also enables meta-research on review quality and bias.

Structured Review Criteria and Templates

Providing reviewers with clear, structured forms that ask specific questions about transparency and reproducibility leads to more consistent evaluations. For example, reviewers might be asked: "Is the dataset publicly available? If not, is there a clear justification?" or "Are the experimental steps sufficiently detailed to allow independent reproduction?" Some conferences now use a multi-stage review process where a separate artifact evaluation committee (AEC) evaluates the reproducibility package before the technical committee reviews the paper. This separation of concerns ensures that reproducibility is judged by experts who directly test the artifacts.

Encouraging Full Methodological Disclosure

Authors should be required to submit a "reproducibility statement" at the time of manuscript submission, detailing what materials (code, data, containers, instructions) are available and under what license. This statement becomes part of the review process. Journals like ACM TOSEM now mandate such statements. To further incentivize openness, conferences can award "reproducibility badges" (e.g., ACM's artifact badging) that appear directly on the published paper.

Strategies for Enhancing Reproducibility in Software Engineering

Open-Source Tools and Platforms

Using open-source tools (e.g., Python with freely available libraries instead of proprietary statistical packages) promotes reproducibility because anyone can inspect and run the same environment. However, open-source itself is not a silver bullet: dependencies must be pinned, and the exact versions used should be documented. Containers (e.g., Docker) or virtual machine images that encapsulate the entire computational environment are becoming standard for artifact submissions. Platforms like Zenodo provide DOI assignment for code and datasets, ensuring stable archival.

Mandatory Reproducibility Packages

A growing number of software engineering conferences now require authors to submit a reproducibility package at the same time as the paper. This package typically includes the source code, a list of dependencies with versions, a script to run all experiments, and the raw results (or a script to regenerate them). The package is reviewed by the AEC before the paper is accepted. This practice has been shown to dramatically increase the reproducibility rate of published results. For example, at the International Symposium on Software Testing and Analysis (ISSTA), artifact evaluation has been a requirement since 2020, and the percentage of artifacts that pass independent replication has risen steadily.

Community Standards for Documentation

Beyond individual conferences, the whole field would benefit from a community-wide standard for documenting research workflows. Initiatives such as the Research Compendium provide templates for structuring code, data, and documentation. Such standards make it easier for reviewers to navigate and test artifacts. They also lower the barrier for authors who may be unsure what to include. The long-term goal is to make reproducibility packages as routine as writing the paper itself.

Case Studies: Successes and Ongoing Challenges

Positive Example: The Mining Software Repositories (MSR) Conference

MSR was one of the first conferences in software engineering to require artifact submissions for all accepted papers. Authors must provide a Docker container with their data and scripts, and the AEC runs the entire pipeline to verify that the reported results can be obtained. The result has been a marked improvement in trust among researchers: papers that pass artifact evaluation are more likely to be cited and built upon. The MSR conference publishes the names of reviewers who served on the AEC, adding an extra layer of accountability.

Ongoing Challenge: Proprietary Industrial Datasets

Not all software engineering research can be conducted using public data. Studies involving industrial logs, proprietary codebases, or confidential user data pose unique reproducibility challenges. Some researchers have begun to use synthetic data generators that mimic the statistical properties of the original data, allowing reproduction of analyses without exposing sensitive information. Others provide detailed "public use" versions of their datasets after anonymization. However, these solutions are not yet standard, and reviewers often struggle to evaluate whether the surrogate data truly reflects the original context. The field needs clearer guidelines for how to handle cases where full disclosure is not possible.

Conclusion

Improving the transparency and reproducibility of peer review in software engineering is not merely a bureaucratic exercise—it is a fundamental step toward strengthening the credibility and impact of the research. Transparent review processes reduce bias, raise the standard of feedback, and allow the community to evaluate the quality of evaluation itself. Reproducibility ensures that claims can be verified, built upon, and translated into practice. While significant challenges remain—particularly around proprietary data and the diversity of review practices—the adoption of open models, structured checklists, and mandatory artifact evaluation shows that change is possible. Researchers, conference organizers, and journal editors each have a role to play. By continuing to push for transparency and reproducibility, the software engineering community can set an example for other fields and produce research that truly advances the state of the art.