Whether you call it “predictive coding” or “technology assisted search,” the time is nigh when we will leave much of the heavy lifting of search to machines trained to find responsive documents. These tools won’t be the heuristic marvels like HAL-9000 envisioned by Arthur C. Clarke, but they probably won’t try to kill us either.
We’ll train these tools by presenting them with examples of patently responsive documents culled by flesh-and-blood reviewers from key custodians’ ESI. Using sophisticated algorithms that analyze these “seed sets” and identify patterns, the tools will ferret out other documents like the examples. Because we can train the tools to find similar ESI using any documents, we won’t be relegated to using seed sets derived from actual documents. We can train the tools with contrived documents–fabrications of items like the genuine counterparts we hope to find. I call this “imagining the evidence,” and it’s not nearly as crazy as it sounds.