In 2016, Nature surveyed more than 1,500 scientists and found that more than 70% of them had tried and failed to reproduce experiments by other scientists published in scientific journals. More than half couldn’t even reproduce their own work. A study accepted at one of AI’s largest conferences in August analyzed 30 AI research papers, and found that authors largely held back key portions of how their algorithms were trained and calibrated, making it difficult to recreate the lab’s results.
We trust in science because we can verify the accuracy of its claims. We test and verify that accuracy by repeating the scientist’s original experiments.
What happens when those tests fail, particularly in a field that has the potential to create billions of dollars of revenue?
AI labs today are incentivized to publish state-of-the-art results, especially ones that might be difficult to replicate, due to the massive industry built around the technology. Research that results in higher accuracy, new capability, or even increased efficiency could earn a lab’s parent company millions of dollars in cloud service revenue, as well as a reputation that makes it easier to recruit top talent.
Joelle Pineau, an associate professor at McGill University and head of Facebook’s AI research lab in Montreal, is pushing back against unreproducible AI research, through a challenge for students coordinated with professors from five other universities across the world. Students have been tasked with reproducing papers accepted by the 2018 International Conference on Learning Representations, one of AI’s biggest gatherings. The papers are anonymously published months in advance of the conference. The publishing system allows for comments to be made on those accepted papers, so students can add their findings below each paper.
“If you’re doing science, then there’s a process through which science gets done”, Pineau says. “If you build these systems that no one else can build, what you’re doing is producing a scientific artifact, which can advance our knowledge and understanding, but it’s a different standard than producing a scientific result.”
The research that students will be tasked with reproducing comes from the world’s top AI labs- from universities to tech giants like Google, DeepMind, Facebook, Microsoft, and Amazon.
Babatunde Olorisade, a Ph.D student at Keele University who authored the study analyzing the 30 AI research papers, says proprietary data and information used by large technology companies in their research, but withheld from papers, is holding the field back.
He makes the point that the software a computer runs when reproducing an algorithmic experiment, as well as the configuration of that software and the data used, are comparable to the impact of gravity and temperature in the physical world. These elements provide context for an experiment, and need to be replicated to understand how and why the experiment works.
“Verifiable knowledge is the foundation of science”, Olorisade says. “It’s about understanding. If you verify the claims you will have a better insight of where to grow from there-you can grow branches from that knowledge if it’s accurate and sound.”
Ideally, Pineau’s reproducibility challenge will run every year. This continuity could start a virtuous cycle in the AI industry, in which students learn to audit research, and then carry the importance of creating reproducible research into their careers in academia or industry.
“I expect authors will be more on their toes, in terms of their results and the claims”, Pineau said. “I expect some authors will think more about how to make their code available, and how to incorporate the public release of code as a part of their scientific process.”