How To Train Your Robot: Best Practices For Automated Screening

DistillerAI made its global debut on Feb 11, 2018 in the latest release of the DistillerSR systematic review software platform. Since then, researchers around the world have been testing it and applying it to their literature reviews. In the process, they’ve discovered a few best practices for training DistillerAI to assist with the screening process. Here are their top tips on optimizing your training set.

First, a primer on how DistillerAI learns. When you’re screening, you’re training DistillerAI without even knowing it. As you include and exclude references during your normal screening process, the software is monitoring your decisions and learning from them. This way, training happens organically and doesn’t add overhead to what you’re already doing. Once it has enough data to make confident decisions, DistillerAI lets you know that it’s ready.

One important note on the training set you use. You want to train DistillerAI using the most accurate data possible. If you train it on data with screening errors, it will learn to replicate those errors. So, here are a few tips to keep in mind:

Clean your data before using it to train

Before you hand your screening data over to DistillerAI for training, make sure that you have reviewed it for conflicts between your human screeners and resolved any inclusion exclusion discrepancies. If we humans can't agree on what's right, it's not really fair to ask your robot to arbitrate while it's still learning!

Minimum training set size

We have found that ten or more included references and 40 or more excluded references should be enough to get DistillerAI trained to recognize inclusion and exclusion characteristics. While you can run the AI Toolkit any time to test DistillerAI, the AI Reviewer tab on your dashboard won't pop up until you have 100 screened references. Please note that a reference must have been screened by the required number of reviewers to be counted as part of the training set.

Maximum training set size

We’ve observed that learning diminishes with training sets in excess of 300 references. With that in mind, adjust your training set percentage (the percentage of human-screened references to train on) so that the set has no more than 300 references.

Please note that if you use a training set larger than 1,000, the training process may fail (the progress bar will just disappear….I know...we’re working on that). If that happens, just reduce your training set size and try again.

Use multiple screening levels to train

Humans almost always include more references than we should at the first level or stage of a systematic review. This is a best practice to avoid false exclusions. However, if you train DistillerAI on Level 1 screening results results, you will be training your robot on falsely included references. It will learn to make the same mistakes as your human screeners.

To avoid this issue, start your training set at level 1 and finish at the highest level that you have screened references at. For example, if you include Levels 1 to 3, DistillerAI will look at all references excluded at, or before, Level 3 and only look at references included at Level 3. This should dramatically reduce the number of false inclusions in your training set and make your robot smarter.

The time it takes to train an artificial intelligence tool has often been cited as one of the roadblocks to broader adoption of these technologies. DistillerAI has been designed to learn from you as you work, dramatically reducing the time required to get it up to speed. Keep these training tips in mind and your robot assistant will benefit from the education!

[FREE Webinar] Rise of the Robots: Leveraging Artificial Intelligence for Better, Faster, Systematic Reviews. February 7, 2018. Register Now!


Peter O'Blenis

Peter O’Blenis is a co-founder of Evidence Partners and has assembled a collection of best practices and methodologies for using web-based software to streamline clinical research. He believes that well written web-enabled software can solve real-world problems and has presented globally on the topic.