Category: Processing

De-Mystifying De-Duplication

I’m pleased to announce that I’ve had two articles published about de-duplication – the imposing-sounding term that flies over the head of many folks. Hopefully, these two articles will help clear the confusion.

First, “Dispelling Doubts about De-Duplication” is the July 2008 InsideTech Column on InsideCounsel.com.

Second, “Review: Equivio: Near-Duplicate Ediscovery Technology” was the TechnoFeature Article from TechnoLawyer on July 22, 2008.

Panic with Pages, or Deal with Documents

Tom O’Connor authors the cover story for the May 2008 issue of Law Technology News entitled “Defining Documents.” Tom makes the excellent case that litigators in the digital age must move away from a page-centric attitude and focus instead on the whole document.

Paper AirplaneElectronic files do not have pages. Pages only happen when you reduce an electronically stored file to paper. The only reason that Microsoft Word or Adobe Acrobat offers a page view is because they know you’re eventually going to print the document. Web pages, Excel spreadsheets, and e-mail messages don’t have pages when you view them on your computer screen.

Most lawyers, however, cannot conceive of a document without considering the page count. If you insist on knowing the page count of an e-mail message, then the message must be processed to a TIFF image which is the electronic equivalent of printing the file. Not only is this an unnecessary step, but Tom points out that it’s also expensive and time-consuming.

To shed a little more light on the topic, the editor of Law Technology News, Monica Bay, interviewed Tom on her Law Technology Now podcast (playable online here from the Legal Talk Network). Tom explains that attorneys are “very wed” to the traditional Bates numbers that have managed documents for years in the legal world. The equivalent of a Bates number for electronic files is a hash function which is a unique string of numbers that can be applied to each individual, all-inclusive file. For more information on this concept, visit Ralph Losey’s blog post “The Days of the Bates Stamp are Numbered.”

Tom’s article provides a couple of options for getting around our page-centric thinking, but it mainly comes down to finding a comfort level with reviewing documents in their native format. The default should not be to have the documents automatically printed, or scanned, or processed to TIFF. Instead, we must consider keeping the documents in their native format and being open to new methods for referencing each document, moving away from the Bates stamp.

Tom states it well:

Attorneys and clients who focus on a document-based system will save time and money and can conduct native file review. In today’s world of vast quantities of electronic documents, the days of the Bates stamp are numbered.

Link to article.

Real-Life E-Discovery Examples – Good as Gold

Maryland A story from Maryland’s Daily Record entitled "How law firms are coping in the era of e-discovery" elucidates some of the practical aspects of how law firms are dealing with e-discovery projects.

One spotlight in the story shines on Bowie & Jensen LLC litigator Matthew Hjortsberg who is obviously very comfortable using the Internet. He admits to using the SEC’s EDGAR database, a Greek singles Web site, and the Wayback Machine in past matters for reconnaissance on the opposing party. He accurately states:

“Somebody said something someplace that appears in writing … on the Internet”

Next the story actually *gasp* quotes the Director of Litigation Technology at Bowie & Jensen – Tina Gentle. Don’t get me wrong, I have nothing against hearing the recycled warnings and same dry quotes about e-discovery from litigation attorneys, I’m just thrilled to see an actual litigation support professional included in a story about e-discovery.

Yes of course it is the attorney who is ultimately responsible for the success and security of an electronic document database, but it is the litigation support professionals who sweats through the essential tasks of creating and maintaining a workable document database, constantly training attorneys who neglect (and/or refuse) to learn how to adequately use the review application, keeping the document collection accurate when subsequent load sets appear, and interpreting abstract and inconsistent coding practices into conclusive production sets.

Ms. Gentle explains that the firm uses CT Summation iBlaze to host their litigation databases. The firm currently has 5 licenses, but has considered upgrading to the much more expensive CT Summation Enterprise suite to handle their growing needs.

Mr. Hjortsberg compares iBlaze to a Google search specifically for evidentiary documents and provides the golden quote:

"You load your documents into Summation and then you can do ‘bullion‘ [sic] searches."

(Obviously, I don’t hold Mr. Hjortsberg responsible for the loss in translation.)

Another spotlight in the story goes to Abby Rosenbloom, who is the director of business development for the Baltimore office of the attorney-staffing agency Special Counsel Inc.

Special Counsel is one of the many companies that can supply temporary contract attorneys for a large document review project. Ms. Rosenbloom explains that many of her clients now require contact attorneys to be experienced in tools like CT Summation, Concordance, or Ringtail.

Mr. Hjortsberg continues with some insightful comments:

“There’s a huge distinction between the economic haves and the economic have-nots in this e-discovery world. If you listen to people at a large law firm talk about e-discovery, it’s a completely different process than when you’re representing smaller clients. Before I suppose it was the ability to invest a lot of [associates] into a case; now it’s more like I’m investing in the technology tool.”

Maryland’s Chief U.S. Magistrate Judge Paul Grimm offers the fitting conclusion to the story:

“It’s not about the tools, it’s about the end result of the tools.”

Link to story.

How to Pick a Lousy Keyword

This is the title I wish Brian Larsen would have given his article on Law.com. Instead, he went with the tamer title of “Filtering Responsive Data in EDD” where he provides some excellent guidelines for the filtering phase of an e-discovery project.

The second paragraph gets right to the point:

“…the attorneys requested production of all documents containing the word “buy.” Despite being cautioned against this broad search, they were reluctant to heed the warnings, and many unrelated documents were incorrectly deemed responsive.”

Data Funnel

One of the most difficult aspects of an e-discovery project is crafting an effective culling or filtering plan for large volumes of ESI. From a legal perspective, there is always an overwhelming fear that “we’ll miss something” if we don’t cast a broad a net as possible over all the data. Therefore we come up with some “lowest common denominator” words or phrases that will grab everything remotely related to a litigation matter.

Brian provides an example in his article where an attorney wanted to use the keyword “gas” presumably for a litigation matter involving something like a gas pipeline.  The only problem was that one of the parties was an oil and gas company which meant that the “gas” keyword returned every e-mail message since the word appeared in every e-mail signature.

In my opinion, the best solution to this problem is to communicate with the key custodians involved in a litigation matter. Brian addresses this about midway through the article:

“Custodians will often know more about the situation than almost anyone else involved. Attorneys should take advantage of custodians, if possible, as a source of ideas for keywords since they can positively identify specific terms, documents and files related to the matter. Custodians may also offer some insight into the unique vocabulary of the company, industry or subject matter, abbreviations or slang used as a reference to common organizational or project terminology.”

And while it makes total sense to talk to the people closest to the matter, I don’t find a lot of attorneys willing to take the time to do this. Is it because they’re afraid to bother the custodians? Do they not trust the custodians to give them accurate information? Do attorneys believe they can glean more information about the matter by reading and reviewing the documents themselves?

It comes down to communication. As consultants, we need to better communicate why a keyword like “gas” might be ridiculously broad, and be better prepared to offer alternate methods for retrieving accurate results. Attorneys also must understand that filtering ESI is not as simple as running a Google or Lexis search – it takes thoughtful conversations with key custodians and experienced vendors to effectively narrow down the body of data while still ensuring that no relevant results get lost in the shuffle.

Brian Larsen’s article further provides some good tips on excluding items like file types and file locations.

Link to article.

“Hot Tips for Effective e-Discovery Review”

Rich Wersinger is a Technical Trainer at Fios, Inc. and authors a short article entitled “Hot Tips for Effective e-Discovery Review” (via In Re Discovery).

Rick offers three tips but the first one called “Bulk Categorizing for Smart, Efficient Document Review” is worth the whole article. He describes a scenario where a major custodian in a product liability suit subscribes to a daily business newswire e-mail message. That means his e-mail file will have a ton of messages that will ultimately be irrelevant to the present matter. Note, however, that with a simple keyword search, many of these messages may get returned as relevant.

Rick suggests taking a “proactive approach to identifying patterns of documents that are clearly not relevant to your matter.”

For me, the tip emphasizes the importance of actually interviewing major custodians or “key players.” It’s not enough to simply grab their entire e-mail file, you need to obtain a big picture idea of how they use e-mail and understand some of their e-mail habits.

This requires a more hands-on approach to a collection project. This won’t make the custodians happy because they have to answer more questions, but it is absolutely necessary for reducing costs.

If you can eliminate and filter out huge groups of e-mail messages from a single source like the daily news bulletins, stock quotes, weather updates, or fantasy football scores, then you could drastically cull down the e-mail file before having it processed by your vendor. This will save a lot of money both on the processing side as well as the review time.

Link to “Hot Tips.”

“The Data Explosion”

Forbes.com runs a story entitled “The Data Explosion” (via The Datakos Blawg) which profiles H5 and reports some amazing dollar figures associated with e-discovery projects today. My favorite quote is from the second paragraph:

“Corporations are evidence machines, generating terabytes of electronic documents, e-mails and digitally recorded phone calls each year. Lawyers try to sift through all this dross in search of the smoking gun that can determine the outcome of a case. But, so say studies by library scientists and others, the lawyers aren’t very good at sifting. Worn down by the anesthetizing process of flipping through thousands of digital images a day, they miss as much as they find.”

The story touches on a couple of well-known e-discovery stories and profiles H5 as having 250 employees and taking in over $60 million in contracts during the fiscal year that ended June 30. The Forbes story provides an interesting peek inside an e-discovery company and reports that H5 charges an average of $10 million per case.

Granted, H5 is working on some of the more major e-discovery projects out there, but that’s still a lot of money per project.

Link to story

Craig Ball “Explodes Page Equivalency Myth”

I have a lot of respect for Craig Ball’s insights into e-discovery in his monthly column in Law Technology News. It gets reprinted in online sibling Law.com and this month it’s a good one as Craig discusses the vast discrepancies in page counts you’ll find when looking at electronic media.

Craig tackles one of the most difficult topics in the industry. Difficult because it’s something that lawyers accept without question and vendors promulgate because it commonly means more money.

I don’t fault the vendors as much as the lawyers, because it’s the lawyers that demand a tangible page count so they can come up with a tangible number in tangible dollars. Unfortunately, when it comes to electronic data, lawyers can’t be bothered by the legal mantra “it depends.”

As Craig points out, it’s ludicrous to make a blanket statement like 1 GB = 500,000 typewritten pages, but that’s exactly what attorneys want to hear. Attorneys do not want to hear “well it depends on what resolution the documents were scanned with, and if they contained images, and if they were scanned in color or black & white;” or “it depends on if the documents were converted to TIF/PDF electronically, or if the Excel spreadsheets contained hidden columns, or if the Word documents contained images or tables.”

But all of these questions can significantly impact the file size of the data and should be thoroughly considered when providing a quote for an e-discovery project. A common practice today is to take a representative sliver of the data you will be processing and “sample” it through the system you’ll be using. This is really becoming one of the most accurate ways we have today for “guessing” what a project will cost.

“Covering the Bases of Electronic Discovery”

A truly excellent article outlining the interplay between inside and outside counsel regarding e-discovery. Michael Gold (partner) and Ryan Mauck (associate) at Jeffer, Mangels, Butler & Marmaro author a superb overview of how an e-discovery project should flow and who has what responsibilities on both sides of a litigation matter.

I especialy appreciate the first paragraph which does a fine job of listing the responsibilities of today’s lawyers regarding e-discovery:

“Effective compliance with the new rules requires:

  1. a fundamental understanding of how digital technology works,
  2. sufficient skill to manage the identification, harvesting and production of electronic information,
  3. an ability to communicate with:
    a) the court,
    b) opposing counsel,
    c) outside litigation counsel,
    d) and the client”

Other key quotes from the article include:

One of the most critical skills is the ability to understand every dimension of the e-discovery process, from the creation of discoverable information to the business use of such information and — this is where many lawyers will falter — to the effective addressing of e-discovery issues with the client and the coordination of e-discovery between in-house and outside litigation counsel.

One of the most important steps in e-discovery takes place when outside counsel develops an effective working relationship with the client’s IT staff. In “traditional” litigation, you have the usual roster of players. With e-discovery, there are several more — the client’s IT staff, your outside e-discovery expert (who may be retained by in-house counsel or outside counsel), and, of course, the opposing party’s IT staff and outside e-discovery expert.