E-Discovery Made Simple (Series 2 of 3): Features


In the first blog in this series, we reviewed the basic Core Capabilities every e-discovery solution must employ in order to be effective. But beyond those core capabilities, there are a host of features that may or may not be applicable to your specific firm’s needs.

This second part in the series will focus on those features and, hopefully, give you a better understanding of what may be valuable to your practice.

Recursive Data Parsing

First, and perhaps foremost, e-discovery software should be able to read the produced data in its original format (PDF, native, or TIFF). This is no small task. A diverse native production could include deeply nested subdirectories, various types of files that contain other files, password-protected files, files whose extension has been tampered with, corrupted files, and even files infected with malware.

When the parser encounters any kind of file, its job is to extract the text content and metadata so that it can be searched and viewed by the user. The parser must also be recursive, meaning it can handle various nested data container files. For instance, it should be able to accommodate the contents of a zip file that contains numerous other files and file types, including other zip files.

Search Index & Interface

The parser works to provide clean, normalized, and structured data that can be used to build a search index. The search index is a bit like the index at the end of a book: a compact listing of every word (or “token” in computer science terminology) in the original data and every location in which the word appears. A search index allows the user to obtain almost instantaneous results when performing full-text search for various words or phrases.

Some e-discovery applications also let the user configure the search index to include or omit so-called “stop words” such as articles and pronouns. Some are also optimized to allow for foreign language content or unusual characters, like emojis.

Generally, e-discovery review platforms allow you to perform powerful searches using a combination of four techniques: Boolean (logic based), Proximity (words within a proximity to one another), Stemming (words containing a specific set of letters) and Fuzzy (“like” words that may be misspellings).

Search indexes also enable so-called “fielded” searches that filter or search using common properties (metadata) of the documents. Take an email, for instance. A fielded search could easily search for send dates, sender addresses, recipient addresses, and more. Fielded search is one of the most powerful features for handling big review jobs efficiently, so be sure that any platform you’re investigating includes it.

Tagging & Organizing Documents

Another essential feature of e-discovery software is the ability to organize the documents in your case with labels or tags. Attorneys can create simple tags, such as “Relevant,” “Privileged,” and “Hot” – or more complex tags that correspond to the elements of the claims and defenses.

With tags, subsets of documents can be filtered to generate reports, export desired files in bulk, or to convert files for submission as an exhibit (as described below).

Conversion to Other Formats

Frequently, it is useful – if not essential – to be able to convert a native format document to another format so that it can be used in litigation. For example, most electronic filing systems only permit PDF documents to be uploaded, so documents in any other formats must first be converted. Most review tools offer the ability to convert documents from any supported format to PDF.

Document Viewing Options

When you open a document in your e-discovery tool, you may see it as a chunk of plain text, or it may be displayed in a similar appearance to what the document would look like if opened in its original application. When reviewing documents, being able to view a styled version without opening the original document is often useful. Remember, opening the original document is rarely a good idea – you could inadvertently change the file and expose you to the risk of infecting your computer with a virus or malware.

Audio Transcription

Electronic document productions often contain a variety of media, but they can only be searched for matching text. As a result, a file from which text cannot be extracted is invisible to the search engine. Fortunately, some e-discovery applications offer the ability to create text transcriptions from audio or video files, permitting those files to be searched with text-based queries. If audio transcription is not available, some review tools will flag audio, video, and other files that lack text content so that they can be reviewed manually.

Optical Character Recognition (OCR)

OCR is to images what text transcription is to audio – they both locate text in non-textual media.

Sometimes discovery sets include images taken from a smartphone or camera, and these images could be pictures of documents or even screenshots. OCR is necessary for the information in these images to be visible to the search index.

Email Threading

Emails are some of the most common sources of electronic evidence, and they can also be the most revealing. Having adequate email review functionality is critical for any document review project. Each email in the review tool should be grouped with its attachments and with other emails in the “chain” to which the email belongs.

For example, if you are viewing an email that is a reply to an original message and that was subsequently replied to, there should be links to the original message and the later replies.

Data Visualization

Data visualization in e-discovery runs the gamut from simple pie graphs showing the relative frequency of various file types in the source data to complex diagrams showing the volume of email traffic between different persons of interest in the case over time. While a well-designed data visualization can help litigators see the forest and not just the individual trees, visualizations can be gimmicky and like any tool, they can be counter productive if misused. It’s important to have an idea of what you’d like to see depicted in visualizations first, and then to assess whether a tool supplies that specific need.

There are, of course, many other features and it would pay to familiarize yourself, but these represent the most common and essential. If you are unsure what features you may need or what other features may be available, just ask your Hilltop Consultants representative. We will be happy to help.

Check out the third part of this 3-part series on Pricing Models.

Author: James Beard, Business Development Manager at Hilltop Consultants