Three weeks ago, skulking around the mummies in a small-but-fine museum on the University of Sydney campus, I learnt that mystery writer Agatha Christie was married to archaeologist, Max Mallowan, and that she’d assisted him in Syrian digs.  Dame Agatha even used her cold cream and knitting needles to clean rare ivory artifacts.  The experience found its way into her work.  An exhibit of Christie-cleaned carvings included a quote from the author’s fictional detective, Hercule Poirot, in Death on the Nile (1937):

Once I went professionally to an archaeological expedition–and I learnt something there.  In the course of an excavation, when something comes up out of the ground, everything is cleared away very carefully all around it.  You take away the loose earth, and you scrape here and there with a knife until finally your object is there, all alone, ready to be drawn and photographed with no extraneous matter confusing it. That is what I have been seeking to do–clear away the extraneous matter so that we can see the truth–the naked shining truth.

This naturally got me thinking about the way we approach search in electronic discovery.  Most lawyers use keywords to find documents responsive to discovery despite their propensity to sweep up too much chaff.  We get lots of the documents we seek with keywords; unfortunately, the results come caked with the loose earth of documents containing keywords but having no connection to the case.  Testing confirms this occurs with a ratio of about 20% responsive matter to 80% extraneous.  That’s a lot of loose earth!

The current industry practice is for keyword-culled documents to undergo horrifically expensive brute force review, i.e., bored lawyers reading each page.  Such spirit crushing linear review accounts for anywhere from 50-90% of the total cost of e-discovery; consequently, when you reduce lawyer review time, you slash the biggest contributor to cost…and waste.  If most of the material culled by keyword search is extraneous matter, any technique that pulls away chaff without grabbing wheat translates to significant savings of time and money while improving quality by minimizing candidates for mischaracterization.

So, maybe we should be looking at the value in a second, unique keyword pass preceding review that, like Agatha Christie’s knitting needle or the archeologist’s knife, clears away loose earth.  This pass doesn’t look for responsive documents.  It employs keywords to find documents that are NOT likely to be responsive; that is, it’s calculated to clear away the extraneous matter so we can see the naked shining truth.

This is “negative search.”  The notion of negative search isn’t original with me, but neither is it much used by anyone else.  Though similar in certain respects, negative search is not the same as using Boolean constructs to exclude noise hits.  Boolean constructs are quite effective when artfully composed, but can be challenging to frame and tricky to execute.  Negative search doesn’t restrict queries in the way Boolean constructs do.  Instead, negative search finds all documents containing terms deemed highly unlikely to occur within responsive documents, like “birthday cake,” “fantasy football” or “bridal shower.”  These are then excluded from review.  Clearly, negative search terms must be chosen wisely and tested carefully against representative samples of the collection before broad deployment.  Like the NIST list, negative search terms, once compiled, can be used in subsequent cases–again with testing to guard against unexpected outcomes.  So, consider if there’s a role for negative search in your next e-discovery effort and know that, in almost any collection, there’s a corpus of extraneous data that can be cost-effectively culled by negative search.

Advertisements