In my last post, I addressed why search terms used to cull data sets in discovery should not be protected as attorney work product. Today, I want to distinguish an attorney’s “investigative queries” (for case assessment, to hone searches or to identify privileged content) from “culling queries” (to generate data sets meeting a legal obligation, whether conceived by an attorney, client, vendor or expert). I contend culling queries warrant no work product protection from disclosure.
Let’s assume a producing party has a sizable collection of potentially responsive electronic information. Producing party concludes that it would be too costly, slow or unreliable to segregate the ESI by reading everything and, instead, decides to examine just those items that contain particular words or phrases. Keyword queries thus serve to divide the ESI into two piles: one that will be reviewed by counsel and another that no one and nothing will qualitatively review. The latter is the “discard pile.” Culling queries may be applied iteratively, first to collect data from the enterprise and later to cull the collection for review. The reductive process may entail the successive use of a client’s local and enterprise search capabilities and/or a law firm’s or vendor’s search tools.
The common thread is that each lexical search mechanism serves to exclude ESI lacking certain terms from substantive review. No one ever assesses the discards for relevance or responsiveness.
Now, if we could be confident that keyword culling worked reasonably well and that the persons who came up with keywords were lexical magicians, there’d be no need to worry over the discard pile. We could trust that what we don’t know doesn’t hurt us.
But we do know that a hefty slug of responsive items ends up in that discard pile. We know this because studies and experience have established that keyword search is a crude, mechanical filter. It leaves most of what we seek behind.
Whether we are leaving behind an endurable or unendurable volume of responsive items depends on just how poorly those keywords performed. To gauge that, we’ve got to know what queries were run.
When we demand disclosure of queries used to collect or cull, our opponents say, “I won’t tell you because if you know the words I looked for, you’ll know how I think. You’ll know my mental impressions, my work product. Plus, if my client suggested keywords to me, you’ll know the contents of privileged attorney-client communications.”
Opposing counsel may also argue that, while it’s important to know just how well or poorly keywords performed, “That’s not something anyone but I need to know.” It’s somehow work product. Opponents are just supposed to trust one-another to make sure the keywords were sufficient and that only an endurable volume of responsive material won’t ever be scrutinized or produced. The customary justification for this is, “that’s how we did it in the good, ol’ paper days.”
Not so. In the “old days,” we did not exclude great swaths of responsive data from review by haphazard guesses about lexical content. We managed paper records, and that management afforded us greater confidence that things would be found where they were kept. It was a very different world.
Now, let’s put the shoe on the other foot. Rewind to 30 years ago, and the producing party invites you to their document repository and says, “Put a Post-It® on what you want us to copy.” Could you reply, “No, that reveals my mental impressions?” Fast forward to 2013 and the producing party insists that you propose search terms. Can you say, “Okay, but I don’t have to reveal them to you. You must run my searches without knowing what they are because if you know my preferred search terms, you know what I think is important?”
See how the ‘keywords as work product’ notion falls apart like a two-bit suitcase in the rain?
In the fog of war, we must not forget that the law favors disclosure. A core principle of Anglo-American jurisprudence is that the public has a right to every man’s evidence. Privileges, explained the great evidence scholar John Henry Wigmore, are “distinctly exceptional.” And the U.S. Supreme Court put it masterfully in United States v. Nixon: “[E]xceptions to the demand for every man’s evidence are not lightly created nor expansively construed, for they are in derogation of the search for truth.”
Contrary to what some lawyers presume, privileges are to be narrowly construed, and there must be a compelling public good demonstrated to justify concealing the search terms run through machines to cull ESI. Moreover, the burden of proof falls squarely on the proponent of privilege. So it’s not for the requesting party to show a particularized need to know—it’s for the responding party to show a compelling justification to suppress.
It’s long been clear that existing documents do not become protected as work product by virtue of being examined by counsel. “An attorney may not bring a document within the scope of the work product rule simply by reviewing it if it was not originally prepared in anticipation of litigation.” Brown v. Hart, Schaffner & Marx, 96 FRD 64, 68 (N.D. Ill 1982). If the entire document is not protected, how can a few search terms it contains be protected?”
Moreover, disclosure of search terms need not reveal who chose the terms or why. An attorney can protect his or her mental impressions by simply keeping his or her own counsel as to whether the terms sprang from the attorney’s noggin or (my preferred method) descended from Heaven on wings of white doves. Absent voluntary disclosure of origins, the queries are just a bag of words.
Finally, work product cannot be interposed to protect from disclosure the underlying facts. What queries were run to exclude potentially responsive ESI from review is an inquiry into underlying facts. Why those queries were selected is a far different question, and one that may be out of bounds.
This is where a crucial distinction should be made between keyword searches executed by counsel for case assessment, what I’ve dubbed “investigative queries,” in contrast to keyword culling for collection, review or production, which I’ve called “culling queries.”
If counsel runs keyword searches against a collection for the purpose of better understanding the case, formulating strategies or refining searches, it seems to me that counsel can (and should) do so without obligation to disclose the details of that effort absent exceptional circumstances (such as, e.g., when counsel’s competency, honesty or diligence are in issue). Investigative queries are benign when they do not operate to expand or contract the corpus of the collection subject to review.
But when queries are used to collect, filter and cull the collection to exclude information from collection, review or production, such culling queries should be scrutinized and freely discoverable, warranting no privilege or work product protection. Otherwise, undisclosed queries may be so insufficient, whether by design or innocent flaw, that they operate to exclude responsive information from all further inspection, i.e., they operate in derogation of the search for truth.
No keyword cull is perfect. None is even close to perfect. And “perfect” is not the standard. But when an opponent hands you a little pile of data and says, “That’s all there is;” shouldn’t you be able to ask, “Did you just search for car or did you search for car, auto, automobile, vehicle, sedan, Ford, Taurus and Tawrus?”
The litmus test for me is this: Are search terms deployed so as to exclude potentially responsive ESI from review and production? If so, the search terms are not protected as work product and may be discovered. If the search terms were purely investigative and did not serve to cull or exclude ESI from a collection, those search terms may be protected from disclosure if they can be shown to be attorney work product or otherwise privileged.
This sort of dichotomy is supported by the leading cases addressing whether an attorney’s selection of documents is protected from discovery as work product. See, e.g., In re San Juan Dupont Plaza Hotel Fire Litigation, 859 F.2d 1007 (1st Cir. 1988) and Sporck v. Peil, 759 F.2d 312 (3rd Cir. 1985), cert. denied, 474 U.S. 903 (1985). Courts generally deny disclosure of which documents were chosen by counsel in the assembly of documents for a client’s review but decline to protect selections of documents for review by others. That is, you can discover the selections, though you may not be able to establish that counsel made the selections. “Like requiring pleadings, answers to contention interrogatories, pretrial exhibit and witness lists, and trial memoranda, the district court’s [order requiring disclosure of the documents that may be used at deposition] merely adjusts the timing of disclosure. The situation is not remotely analogous to the situation where a party seeks an attorney’s personal notes and memoranda which contain his confidential assessments of the testimony of prospective witnesses.” In re San Juan Dupont Plaza Hotel Fire Litigation, at 1017.
Much of the fight about keywords and work product strikes me as more about lawyer hubris than substance. While requesting parties certainly care about the precision of searches lest they get data dumps, requesting parties tend to care more that the search terms prompt high recall of responsive documents. Accordingly, requesting parties want disclosure of search terms to insure that responding parties have not failed to include a term or variant they think likely to strike gold. The responding party who says, “I’ll disclose the queries we used, but there are a couple I need to keep to myself” is likely to meet with a shrug and a nod from requesting counsel because the savviest requesters care more about the searches the other side overlooked than the ones piled on.
I’m trying to get to the right formulation here, so don’t hesitate to draw my attention to any cases on point or otherwise mix it up. I’m all ears, and I won’t bite. Leastwise, not hard enough to break the skin.
bmschulman said:
Search terms generally do not “cull the collection for review” or “filter and cull the collection to exclude information from collection, review or production” or “exclude great swaths of responsive data ” or “serve to cull or exclude ESI from a collection.” They are inclusive, designed to target likely responsive materials. I agree that if we are using keywords like “Amazon.com” to exclude junk, there’s not much to protect. But keywords tend to be inclusive, intended to mimic what would otherwise be the contents of a memo instructing reviewers to look for documents falling into topical categories. I understand that searching and culling are two sides of the same coin, but your assumption seems to be that the search terms are being used to identify materials to be left behind rather than to identify responsive materials. Because they are more commonly used for the latter inclusive approach, search terms may reflect an attorney’s thoughts and work product.
Brendan Schulman
LikeLike
craigball said:
I couldn’t disagree more. Search terms used as culling queries are indeed deployed to leave data behind so as to avoid the burden of reviewing everything. It’s the perception (and fact) that so much is NOT responsive that prompts the use of electronic search.
One key difference between search terms and a memo instructing reviewers is that we assume that reviewers look qualitatively at every document and include or exclude items by bringing a measure of judgment to every document. For good or ill, we protect those subjective assessments as exercises of professional judgment.
Whether a filter filters “in” or filters “out” is simply a matter of perspective. The outcome is exactly the same. Keyword filtering serves to leave responsive data behind–data that will not be afforded any subjective assessment for responsiveness. There’s the rub.
If culling queries were work product, the protection afforded them would not hinge on whether they were employed inclusively or exclusively. An attorney’s “judgment” that documents with the term “Amazon.com” are not responsive isn’t any less revealing of mental impressions than the judgment that documents with the word “Amazon.com” are responsive. Neither is work product in my estimation; but, since when does your work product rationale depend upon whether the lawyer’s impressions are wise?
When you use a magnet to separate needles from haystacks, is it because you value needles or because you don’t want needles in your feed? The magnet doesn’t tell me that.
LikeLike
John Tredennick said:
Craig:
One point in your interesting post made me smile. I fondly remember the days of paper discovery and yellow stickies. I also recall the standard in-house instruction to “make two copies of everything counsel marked.” The goal was to make sure we knew what our opponent found interesting.
So, standard practice was to mark the whole box for copying rather than tag a particular document. This wasn’t as bad as it sounded because the good stuff usually came clumped together in the same box. But, our strategy was to be as opaque as possible about which documents we might find interesting.
We are in a different world today, mostly because the volumes of electronic information are orders of magnitude larger than what I faced in the paper world. But the thinking isn’t much different.
Indeed, I spoke with a partner in a very large law firm who absolutely refused to consider using TAR techniques because of the fear that the court would require disclosure of methods and that this would reveal thoughts and strategies. I believe it is likely an issue for more than a few lawyers out there. Or maybe it is a convenient reason to keep the status quo.
Thanks for writing.
LikeLike
craigball said:
Thanks, John. When it comes to e-discovery, I’ve yet to see a surfeit of thought or strategy within the bar. If there were more of same, I’d probably be more disposed to protect it. Right now, lawyers and their work product claims are Geraldo Rivera hyping the contents of Al Capone’s vault.
LikeLike
bmschulman said:
You are questioning whether search terms can replace human judgment. I understand that concern, but we have an obligation to conduct a reasonable search in good faith. In, say, an antitrust case, the search terms I choose can reveal a fair amount about what I feel is relevant to an allegation of collusion even if there is no such collusion. Mandating disclosure of the search terms invites the requesting party into that thought process in order to second-guess whether I have chosen the appropriate terms and topics. I do think it is analogous to a memo instructing first-pass reviewers about what to look for. If the memo said “don’t bother with Company X” and you disagreed about the relevance of Company X, you would probably want to know that too. Disagreements are inevitable either way. But it’s my job to figure out how to respond in good faith to the document request. (I am not disputing the value of an alternative cooperative approach, only the absolute assertion that this process never implicates work product.)
LikeLike
craigball said:
I wanted very much to give you the last word, if only because I appreciate the thoughtful way you have approached the issue despite our strong disagreement. But here I go nonetheless.
I don’t question whether search terms can replace human judgment. The only human judgment keyword search replaces is the judgment whether the search term is there or not. It’s typically far superior to human judgment on that score (unless you’re dealing with very bad OCR).
I have not made the absolute assertion that you suggest. I’ve done my utmost to leave room for the exceptional circumstance that might engender queries so unduly revealing as to prompt genuine work product concerns. In that event, I say, “claim it and prove it.” From where I stand, those arguing most strenuously for protection aren’t as worried about revealing their ‘strategy’ as they are about being taken to task for errors made. When work product is used as a means to conceal incompetence, that’s obstruction. I certainly don’t direct that at you, Brendan. I’m just making a broader point about motivation.
I think the better approach would be to deal with those issues through the mechanism of investigative queries rather than culling queries; but, I sense you give no quarter to that distinction, and I don’t proffer same as an end run around transparency as much as a recognition that there is a place for protected attorney inquiry.
LikeLike
bmschulman said:
I’ll end with a question: why doesn’t your concern about concealed incompetence apply to manual review? If someone sends a document review to far off lands at the cheapest price it compels no such disclosure or scrutiny.
LikeLike
craigball said:
Brendan, Why would you assume that the topic of this post defines the bounds of all that concerns me in law practice or e-discovery? If I ran the world, I would change a lot of things with respect to requisite competence and transparency. I would not, however, be so xenophobic as to assume that outsourcing review to foreign lawyers equates to diminished competence. My sense is that a top graduate of an Indian law school who values the work might be a more capable reviewer than an American contract lawyer with a middling legal education and one eye on the want ads. On the question of disclosure, I have little to say because the law seems clear: if a U.S. lawyer wants to conceal such delegation, it’s hard to pierce that veil. I don’t have to like it, but I spare readers from my ranting about everything I don’t like. That would eat into my time yelling at neighbor kids to get off my lawn.
LikeLike
James Keuning said:
“The common thread is that each lexical search mechanism serves to exclude ESI lacking certain terms from substantive review. No one ever assesses the discards for relevance or responsiveness.”
What!? Sampling the null set is absolutely critical especially when many electronic documents have no text or faulty text or embedded text which does not match the face of the document. If there is an agreement between parties to cull the review set based on agreed-upon keywords I can see where you might get away with ignoring the null set, but otherwise sampling that null set is critical. Disclosing keywords does little good if the null set is not sampled. You do not mention sampling the null set in either of these posts. If the null set does not contain significant responsive documents, then what do lawyers or judges expect to gain by looking at the keywords that got us there. Isn’t that where angels fear to tread?
“Whether a filter filters ‘in’ or filters ‘out’ is simply a matter of perspective.”
No way. These are two very different things and the difference is more than perspective. (I agree with Brendan Schulman here.) You use the word “exclude” a lot in these two blog posts. As far as I can tell the authors of the No Disclosure article did not use that term at all. Exclusive versus inclusive keyword searching, date culling, extension filtering, foldering, custodian selection, etc. yield very different results and the steps taken to achieve those results are just as different. The QC at the end to ensure that a proper and successful inclusive/exclusive cull are also very different. It is not just perspective.
LikeLike
craigball said:
Yes, I considered the issue of sampling the null set. If it occurs in even 5% of cases where search terms are run, I will sing “Mack the Knife” at LegalTech wearing a tutu and a turban.
From the standpoint of the requesting party, there’s value in knowing the filters employed (including search terms) whether the null set is sampled or not. “Where angels fear to tread” referred to the temerity of lawyers or judges lacking expertise in search presuming to opine on the relative efficacy of particular queries. Judge Facciola didn’t decide that lawyers couldn’t acquire expertise; only that they customarily lacked the expertise of statisticians, linguists and computer scientists. While we are on the subject of Judge Facciola, he had no qualms ordering the disclosure of search terms in Newman v. Borders, Inc., 257 F.R.D. 1, 7 (2009).
You misapprehend what I mean by “perspective.” I am not addressing exclusionary or negative search. I’ve written about that in a prior post, so I’m not just now pretending to appreciate the difference. I meant that whether the magnet is filtering in or filtering out depends upon whether you look at the process from the perspective of the needle or the hay.
Or let’s say it this way: Put ten marbles in a circle. Move three out of the circle. Have you selected seven marbles and excluded three, or have you excluded seven and selected three? That’s what I meant by “perspective.” What you don’t collect by search receives lesser scrutiny (more often, no scrutiny). Anyone who has studied keyword search must concede that keyword search (as customarily employed in e-discovery today) performs poorly in terms of recall and precision relative to any optimal method (of which, it might fairly be argued, there are none).
LikeLike
Pingback: Top 10 e-discovery developments and trends in 2013: Part 1 | Technology Law Source
Joshua said:
Reblogged this on The eDiscovery Nerd.
LikeLike