For the last week, I’ve been in Australia’s capital, Canberra, delivering the keynote speech at the first-ever X-Ways Forensics Users Conference and conducting a forensic witness skills workshop for the Australian Federal Police. I flew to Australia from New Orleans, where I’d delivered three presentations in a day for the Louisiana State Bar Association. It’s been quite the busy week; so, after a picturesque drive to Sydney this morning and bidding goodbye to my top bloke and host, Zoran Iliev, I was glad for a few moments to catch my breath in this incomparable city of bridge, bay and soaring Opera House. Continue reading
I love the challenge—the chance to mix it up with skilled interrogators, defend my opinions and help the decision makers hear what the electronic evidence tells us. There is a compelling human drama being played out in those bits and bytes, and computer forensic examiners are the fortunate few who get to tell the story. It’s our privilege to help the finders of fact understand the digital evidence.
This post is written for computer forensic examiners and outlines ways to become a more effective witness and avoid common pitfalls. But the advice offered applies as well to almost anyone who takes the stand. Continue reading
Today was ostensibly the last day for public comment on the proposed amendments to the Federal Rules of Civil Procedure. The good news for other procrastinators is that the submission deadline has been extended to accommodate scheduled website maintenance, The new deadline for submitting public comments is 11:59 PM ET on Tuesday, February 18, 2014. Over 1,600 comments have been submitted, and I’ve been trying to wade through them, unsurprised at the deep division between plaintiffs and corporate interests. I can’t recall another time when so much has been spent by corporate lobbyists to influence the civil rulemaking process. Clearly, corporate America expects a bigger payoff from these proposed amendments than I do.
Notwithstanding their strengths, there are aspects of the proposed amendments that should go back to the drawing board. Many commentators focus on problems with Proposed Rule 26 and it’s efforts to narrow the scope of discovery. Some are incensed that proposed Rule 37(e) offers insufficient immunity from sanctions for spoliation, choosing to ignore the fact that the incidence of spoliation sanctions in federal court is historically less than the national incidence of death by lightning strike. Ironically, those grousing the loudest are the same white shoe-types who play golf in a thunderstorm.
I finally threw my comment on the pyre, I mean pile, or, at least I tried to do so; but, the submission web page was indeed shut down for website maintenance. That gave me time to solicit your input, dear reader, while there’s still a chance to tweak my comments if you find I’ve made a mess of it. Here’s what I’m planning to submit: Continue reading
On Wednesday, February 5, 2014 at 9:00am, I’m moderating a plenary session at LegalTech New York where the panelists are a veritable Mount Olympus of e-discovery leaders from the federal bench: John Facciola, James Francis, Andrew Peck, Lee Rosenthal and Shira Scheindlin. I can hardly imagine a more quintessential quintet of rare knowledge and eloquence! Kudos to ALM educational coordinator, Judy Kelly, for deftly getting them all to commit.
The judges will be discussing some of what you might expect, e.g., proposed Rules amendments, predictive coding, Rule 502 and expectations of lawyer technical competence. We will also be exploring a few fresh issues, like the impact all those little screens are having on everyone in and out of court.
There’s still time to add topics and questions of interest to you to the program; so, if you have questions you’d pose or topics you’d explore, please share them here as a comment (or e-mail them to me: craig at ball dot net), and I’ll try to work them in. Hope to see you in New York!
Sorry to take your time asking for help. so I’ll be quick about it.
But first, thank you. Thanks to you, dear reader, this blog and its 85 posts reached 100,000 views a few days ago. That’s nothing compared to the millions of page views others see, but it’s very gratifying to me because I launched this blog without saying a word to anyone. Somehow, you just found it. Ball in Your Court is an outlet born of frustration with the two-month publication lag attendant to my former print column and the sudden shuttering of an American Lawyer Media blog where I’d previously posted. I wanted a place where no one could pull the plug but you or me. This blog is a very personal connection to you.
The favor I ask is this: if you like the content here or find it of some value, please share it with someone you think might be interested. If you have a blog or site with a blogroll, please consider adding Ball in Your Court to your blogroll. I will try to earn my place on your page and in your day. Thanks.
“Forms that function.” Forms of production that work.
Ever since the demanding class, “Architecture for Non-Architects” at Rice University, I’ve been a wannabe architect, and the battle cry, “form follows function,” my mantra. It’s ascribed to Louis Sullivan, legendary American architect and Father of the Skyscraper. “Form follows function” fairly defines what we think of as “modern,” and it’s a credo at the heart of the clearest idea I’ve had in a while, being that we should produce e-mail in forms that can be made to function in common e-mail client programs like Microsoft Outlook.
I don’t point to Outlook because I think it a suitable review platform for ESI (I don’t, though many use it that way). I point to Outlook because it’s ubiquitous and, if a message is produced in a form that can be imported into Outlook, it’s a form likely to be searchable, sortable, utile and complete. More, it’s a form that anyone can assimilate into whatever review platform they wish at lowest cost.
The criterion, “Will the form produced function in an e-mail client?” enables parties to explore a broad range of functional native and near-native forms, not just PSTs. It an objective “acid test” to determine if e-mail will be produced in a reasonably usable form; that is, a form not too far degraded from the way the data is used by the parties and witnesses in the ordinary course.
Forms that Function retain essential features like Fielded Data, allowing users to reliably sort messages by date, sender, recipients and subject, as well as Message IDs, supporting the threading of messages into coherent conversations. Forms that Function supply the UTC Offset Data within e-mails that allows messages originating from different time zones and using different Daylight Savings Time settings to be normalized across an accurate timeline. Forms that Function don’t disrupt the Family Relationships between messages and attachments. Forms that Function are inherently electronically searchable.
Best of all, producing Forms that Function means that all parties receive data in a form that anyone can use in any way they choose, visiting the costs of converting to alternate forms on the parties who want those alternate forms and not saddling parties with forms so degraded that they are functionally fractured and broken.
If you are a requesting party, don’t be bamboozled by an alphabet soup of file extensions when it comes to e-mail production (PST, OST, MSG, EML, DBX, NSF, MHTML, TIFF, PDF, RTF, TXT, DAT, XML). Instead, tell the other side, “I want Forms that Function. If it can be imported into Microsoft Outlook and work, that form will be fine by me.”
If the other side says, “We will pull all that information out of the messages and give it to you in a load file,” say, “No thanks, leave it where it lays, and give it to me in a Form that Functions!“
I once wrote a column titled “Page Equivalency and Other Fables.” It lambasted lawyers who larded their burden arguments with bogus page equivalencies like, “everyone knows a gigabyte of data equates to a pile of printed pages that would reach from Uranus to Earth.” We still see wacky page equivalencies, and “from Uranus” still aptly describes their provenance.
Back in 2007, I wrote, “It’s comforting to quantify electronically stored information as some number of pieces of paper or bankers’ boxes. Paper and lawyers are old friends. But you can’t reliably equate a volume of data with a number of pages unless you know the composition of the data. Even then, it’s a leap of faith.”
So, I’m happy to point you to some notable work by my friend, John Tredennick. I’ve known John since the emerging technology was fire and watched with awe and admiration as John transitioned from old-school trial lawyer to visionary forensic technology entrepreneur running e-discovery service provider, Catalyst. John is as close to a Renaissance man as anyone I know in e-discovery, and when John speaks, I listen.
Lately, John Tredennick shared some revealing metrics on the Catalyst blog looking at the relationship between data and document volumes, an update to his 2011 article called, How Many Documents in a Gigabyte? John again examines document volumes seen in the data that Catalyst receives and processes for its customers and, crucially, parses the data by file type. As the results bear out, the forms of the data still make an enormous difference in terms of data volume. Even as between documents we think of as being “the same” (like Word .doc and .docx formats), the differences are striking.
For example, John’s data suggests that there are almost 60% more documents in a gigabyte of Word files in the .docx format (7,085) than in a gigabyte of files stored in the predecessor .doc format (4,472). This makes sense because the newer .docx format incorporates zip compression, and text is highly compressible data.
[One exercise I require of the law students in my E-discovery class is to look at the file header of a Word .docx file to note its binary signature, PK, characteristic of a zip-compressed file and short for Phil Katz, author of the zip compression algorithm. For grins, you can change the file extension of a .docx file to .zip and open it to see what a Word document really looks like under the hood. Hint: it’s in XML].
John reports a similar discrepancy between new and old Excel spreadsheet formats (1,883 .xlsx files per gigabyte versus 1,307 for .xls). Here again, the .xlsx format builds in zip compression.
But, the results are reversed when it comes to PowerPoint presentations, with John finding that there are marginally fewer of the newer .pptx files in a gigabyte (505) than the older .ppt format files (580). This makes sense to me because Microsoft phased out the .doc format ten years ago. Since then, presenters have gotten better about adding visual enhancements to deadly-dull PowerPoints, and they tend to add ‘fatter’ components like video clips. The biggest factor is that pictures are highly incompressible, and common image formats (i.e., .jpg images) have always been compressed. Compressing data that’s already compressed tends to increase, not decrease its size.
Wisely, John speaks only of document volumes and makes no effort to project page equivalencies, not even by extrapolating some postulated ‘average-pages-per-file type.’ Anything like that would be as insupportable today as it was when I wrote about it in 2007. Also, when you look at John’s post, note that there is no data supplied concerning TIFF images. I’m not sure why, but I can promise you this: TIFF images are MUCH fatter files, costing far more in terms of storage space and ingestion costs than their native counterparts. Had John added TIFF to the mix, I’m confident his weighted averages would have been much different…and far less useful–much like TIFF images as a form of production. ;-)
T’was the night before Christmas
at Ball in Your Court.
Not a syllable’s stirring.
We’re sipping mulled port!
The chestnuts are roasting, the wassailing’s started;
Don’t look for a posting ‘til Santa’s departed.
Au revoir data hash, and adieu data mapping.
I really must dash– I’ve got to get wrapping!
Thank you, dear reader, for all the perusing.
I hope it’s been helpful (and sometimes amusing).
And thank you, dear reader, for sharing your comments.
I cherish them deeply, those kudos and laments.
In my last post, I addressed why search terms used to cull data sets in discovery should not be protected as attorney work product. Today, I want to distinguish an attorney’s “investigative queries” (for case assessment, to hone searches or to identify privileged content) from “culling queries” (to generate data sets meeting a legal obligation, whether conceived by an attorney, client, vendor or expert). I contend culling queries warrant no work product protection from disclosure.
Let’s assume a producing party has a sizable collection of potentially responsive electronic information. Producing party concludes that it would be too costly, slow or unreliable to segregate the ESI by reading everything and, instead, decides to examine just those items that contain particular words or phrases. Keyword queries thus serve to divide the ESI into two piles: one that will be reviewed by counsel and another that no one and nothing will qualitatively review. The latter is the “discard pile.” Culling queries may be applied iteratively, first to collect data from the enterprise and later to cull the collection for review. The reductive process may entail the successive use of a client’s local and enterprise search capabilities and/or a law firm’s or vendor’s search tools.
The common thread is that each lexical search mechanism serves to exclude ESI lacking certain terms from substantive review. No one ever assesses the discards for relevance or responsiveness.
Now, if we could be confident that keyword culling worked reasonably well and that the persons who came up with keywords were lexical magicians, there’d be no need to worry over the discard pile. We could trust that what we don’t know doesn’t hurt us.
But we do know that a hefty slug of responsive items ends up in that discard pile. We know this because studies and experience have established that keyword search is a crude, mechanical filter. It leaves most of what we seek behind.
Whether we are leaving behind an endurable or unendurable volume of responsive items depends on just how poorly those keywords performed. To gauge that, we’ve got to know what queries were run. Continue reading
I’m rarely moved to criticize the work of other commentators because, even when I don’t share their views, I applaud the airing of the issues their efforts bring. But sometimes a proposition is just so blatantly ill-advised, so prone to unfairly tilt the litigation playing field, that any reader and every writer should stop and say, “Wait a second….” One such article, currently running in the New York Law Journal and called No Disclosure: Why Search Terms Are Worthy of Court’s Protection, charges that judges who require disclosure of search terms “discount or misunderstand” what the authors term the “protected nature of key aspects of the e-discovery process,” namely filtering of data by use of search terms. The authors think that disclosure of search terms used to exclude data from disclosure compromises the work product privilege and argue that judges should “recognize that a search term is more than a collection of words, rather, the culmination of an attorney’s interaction with the facts of the case.”
Espousing the sanctity of work product privilege to an audience of litigators is like saying, “I support our troops.” It’s mom, baseball and apple pie. It’s also popular to paint judges as addled abusers of discretion. But let’s not let jingoism displace judgment. Search terms are precisely what the authors claim they are not: search terms are a collection of words. They are lexical filters. Nothing more.
Search terms deserve no more protection from disclosure than date ranges, file types and other mechanical means employed to exclude data from scrutiny. Search terms strip out information that will never see the light of day nor benefit from the application of lawyer judgment as to their relevance. In that sense, search terms are anathema to the core principles of work product and warrant more, not less, scrutiny. Continue reading