By analyzing word patterns within and between documents, concept searching goes beyond merely telling you whether a document contains one or more words or phrases; it discerns what the document is about. As a result, it goes a long way towards eliminating two of the frustrations of keyword searching: false positives (search hits that are not responsive) and false negatives (responsive documents that were not retrieved in the search).
Concept searching can help you to quickly find the most important documents in a large population. Reviewers can now be given a body of thematically related documents, rather than the mix of unrelated documents normally encountered in a date-ordered linear review. As part of a wide-ranging set of tools and processes, concept searching now makes possible a new approach to first-pass responsiveness review: computer-assisted coding that requires much less time and effort than a traditional linear, doc-to-doc review requires.
The technology that facilitates concept searching has been rigorously tested and proven to be robust and reliable.
The disadvantages of traditional linear review
Traditional linear review is time-consuming and expensive. Document reviewers, generally lawyers, are human; they get tired and make mistakes. Furthermore, even if reviewers' assessments were consistently reliable and accurate, today's massive data volumes make linear review impractical. But linear review was never reliable and accurate.
As studies have shown,2 human reviewers' assessments are often inaccurate and unreliable. When used properly, computer-assisted review that combines machine learning3, sampling and human review and correction is more reliable and more accurate than a traditional lawyer document-to-document review.
The limitations of keyword searching
Keyword searching often fails because the terms were poorly chosen. Generally any word-based search suffers from two core limitations. Polysemy (multiple meanings for a single word) causes search terms to treat irrelevant documents as "hits", while synonymy (multiple words and phrases for a single meaning) makes it impossible to construct a list of words or phrases that will find all relevant documents. Polysemy causes false positives and synonymy causes false negatives. Consequently, keyword-based searches will always pull in irrelevant documents and leave out relevant ones. Studies have shown that concept searching is better at finding responsive documents and leaving out non-responsive documents than traditional keyword-based search technologies (see footnote 2).
What is concept searching?
Concept search tools use an array of mathematical techniques and computer technologies, but they all aim to identify semantic (meaning-related) patterns within and across documents. They recognize conceptual similarities between documents by noticing how words relate to each other, how often they appear together, how far apart they are, and how often they appear in other documents with similar characteristics.
Concept searching does not require that a particular word appear in a document (the threshold requirement for any traditional keyword search). In fact, the most recent, most advanced technologies are "language-independent" or "language-agnostic"; they do not rely on dictionaries, taxonomies or thesauruses and do not need to "know" the meanings of specific words.
Focusing on patterns rather than specific words helps to eliminate false positives and false negatives. For example, documents discussing investments will often contain the word "stock"; but to a concept search engine, the word "stock" on its own is not enough to warrant inclusion in the investment cluster of identified documents. If "stock" appears along with "soup" and "chicken", it will be assigned to a different cluster.
Having identified the semantic patterns in and among documents, the search tool will group documents with similar semantic content so that they can be reviewed together. Once grouped by conceptual content, the documents can also be relevance-coded in bulk.
How can concept searching be used?
Some concept search applications can be used as a preliminary topic-grouping tool. The software is given free rein to assess the concept clusters in the population. Reviewers who know the case or the subject of the review can quickly see interesting clusters. They can then review these documents using whatever methods they prefer.4
The value of this simple grouping by topic is not to be underestimated. The mental fatigue caused by having to switch between thematically dissimilar documents can be greatly reduced by giving reviewers groups of conceptually similar documents.
Free-form clustering can also identify concept clusters that reviewers would otherwise not have known about.
Computer-reviewer collaboration: machine learning
The best way to use concept-search tools is to have the technology assist the human and the human assist the technology. This approach takes advantage of the "learning" capabilities of (most) concept search programs. Someone who understands the issues in the case reviews a statistical sample of documents and makes the desired document decision(s). The software identifies shared characteristics in the documents selected, looks for the same characteristics in the remaining dataset, finds similar documents and presents them for review. The reviewer goes through these results and corrects the work of the software, confirming which documents are wanted and which are not. These new decisions are fed back to the software and, in further rounds, the computer refines its search, finding more true positives and leaving out more false positives.
An experienced lawyer can use these tools to extend his or her decisions on a subset of documents across the entire population of documents in the case.
Starting with a teaching set
A variation on the approach just described involves submitting to the software documents already known to be relevant. After analyzing this "teaching set" (or "seed set"), the software finds in the larger population documents that have similar semantic content. The review team then proceeds with the correction-and-learning phases described above.
Using concept searching for first pass review
Using the above approaches, alone or in combination, and incorporating careful quality-control measures (see below), a review team can now, responsibly and defensibly, do away with a traditional linear review at the first-pass stage. Important documents can be found quickly using keyword and/or concept searches and the coding decisions applied to these documents can be transmitted to the remaining documents in the population.
To guard against inadvertent disclosure of privileged documents, concept searching can help to find privileged documents even where attorney names and email domains are not present. Concept searching can also scour the population for hidden subsets of potentially relevant documents.
Effective, timely and thoughtful quality review is essential to any eDiscovery project. Even with the use of advanced tools, human involvement raises the risk of error and inconsistency. A quality review program should include:
- Reviewers who are knowledgeable about the case, its issues and documents
- Metrics that capture reviewer coding decisions and relationships between those decisions
- Timely quality review throughout the process
- Steps to identify false positives and false negatives
- Ongoing feedback to ensure that the front-line reviewers benefit from observations and enhancements
- Documentation of all quality-control steps undertaken
Part of an integrated set of tools and processes
Through a carefully designed process that uses advanced computer technology and involves knowledgeable reviewers at key stages in the process, it is now possible to save significant amounts of time and money by harnessing technology as effectively as possible. The costs and benefits and enhanced efficiencies have been demonstrated in studies conducted with the most rigorous evaluation methods.5 There is no promise of flawless accuracy, as every tool, every approach, has its inherent weaknesses.
However, the defensibility of concept searching combined with sampling as an approach to legal document review has been confirmed by The Sedona Canada Principles, which, themselves, have been adopted by legislators and judges across Canada.6
Law firms and their clients should now feel comfortable adopting these methods as part of a comprehensive and well-designed discovery plan. They can do so, not simply to save on time and expense, but also to secure the very real benefits, in both accuracy and consistency, that this technology provides.
1 See Dominic Jaar & David Sharpe, "Making Document Review Faster, Cheaper and More Accurate: How Concept Searching Can Change the Way Your Legal Teams Handle First Pass Review" available at https://www.kpmg.com/Ca/en/WhatWeDo/Advisory/RiskCompliance/KPMG-Forensic/Documents/ Making%20Document%20review%20FINAL%20for%20WEB.pdf.
2 See Doug Stewart, "Application of Simple Random Sampling (SRS) in eDiscovery," April 20, 2011, available at http://www.umiacs.umd.edu/~oard/desi4/papers/stewart2.pdf.
3 "Machine learning" is a process whereby the concept searching software develops algorithms, and then adjusts those algorithms, as it identifies the characteristics of documents that are relevant to the specific matter (as further discussed below).
4 For example, the software may group documents into two similar groups: one it identifies as "accounting, loss, genesis," the other "accounting, reports, annual." Someone who knows the issues in the case can quickly see that the first will be relevant, while the second will not.
5 Maura R. Grossman & Gordon V. Cormack, Technology-Assisted Review in E-Discovery Can Be More Effective and More Efficient Than Exhaustive Manual Review, XVII RICH. J.L. & TECH. 11 (2011), http://jolt.richmond.edu/v17i3/article11.pdf. See also Bruce Hedin et al., Overview of the TREC 2009 Legal Track, in NIST Special Publication: SP 500-278, The Eighteenth Text Retrieval Conference (TREC 2009) Proceedings 16 & tbl.5 (2009), available at http://trec-legal.umiacs.umd.edu/LegalOverview09.pdf; Douglas W. Oard et al., Overview of the TREC 2008 Legal Track, in NIST Special Publication: SP 500-277, The Seventeenth Text Retrieval Conference (TREC 2008) Proceedings 8 (2008), available at http://trec.nist.gov/pubs/trec17/papers/LEGAL.OVERVIEW08.pdf.
6 See, generally, Sedona Canada Principles; Sedona Canada Working Group, The Sedona Canada Commentary on Practical Approaches for Cost Containment (June 2011), available at http://lexum.org/e-discovery/documents/SedonaCanadaCostContainment.pdf; Sedona Canada Working Group, The Sedona Canada Commentary on Proportionality in Electronic Disclosure & Discovery (October 2010), available at http://www.lexum.com/e-discovery/documents/WG7CommentaryonProportionality-for-public-comment.pdf.
Dominic Jaar is a Partner and the National Leader of KPMG's Information Management (IM) and eDiscovery Services group, where he focuses on supporting clients' IM and eDiscovery requirements. He has over 10 years of experience in information, records and knowledge management, and legal technology, as well as 6 years of international experience in eDiscovery. Dominic has been involved in the development of many international standards and leading practices regarding IM and eDiscovery and has supported a number of multinational organizations, including many legal departments and law firms, in the assessment of their capacities in these areas.
Contact: firstname.lastname@example.org or (514) 840-2262
David Sharpe is responsible for the operation of the Forensic Technology and eDiscovery production lab in KPMG's Toronto office. After obtaining his law degree from the University of Toronto and clerking at the Supreme Court of Canada, he worked for 12 years as an attorney and eDiscovery expert in New York City, including as a senior client-facing project manager for two of the leading providers of eDiscovery and hosted document review solutions in the US. As Manager of eDiscovery Services in KPMG's Toronto office, he combines a broad understanding of the legal and business dimensions of eDiscovery with a solid grasp of the technical and operational aspects of how evidence is collected, processed, reviewed and produced.
Contact: email@example.com or (416) 777-3738