Sunday, February 17, 2013

eDiscovery Economics: Part 2

In Part 1 we looked at the economic theory in play during litigation and applied some sample data to analyze the dynamics.  Now that we have some of the basics of the economic models for litigation, let's look at the dynamics during the eDiscovery process, as specified by the Electronic Discovery Reference Model (EDRM).  The EDRM is shown graphically below (courtesy of

The phases (or stages) of this model are:
  • Information Management
  • Identification
  • Preservation
  • Collection
  • Processing
  • Analysis
  • Review
  • Production
  • Presentation

Information management refers less to a distinct eDiscovery phase and more to the ongoing record-keeping rules and procedures.  As such, it so does not have a large role in eDiscovery dynamics, except perhaps that having good records management may help to reduce the overall cost of the identification, preservation and collection phases.

The identification phase is where legal counsel and their agents identify custodians and relevant electronically stored information (ESI).  This phase is important, since incorrectly executing it, by missing custodians or relevant topics, may result in having to re-do a lot of the work in subsequent phases.  I think of this phase as the design phase.  If executing the EDRM is like building a house, this phase is where the blueprint and specifications for the house are created.

Preservation and collection are sometime combined, however, I see a distinct difference between these phases in some situations.  Preservation can simply mean implementing rules that cause relevant ESI to not be deleted.  Preservation can also mean making a weekly, bi-weekly or monthly copy of relevant ESI.  Preservation is for securing the relevant ESI.  Collection is the processing of gathering up the relevant ESI specifically for moving through the rest of the EDRM.  Sometimes these are two distinct phases, sometimes they are not.

Processing is where de-NIST (removing system and other know files), de-duplication, filtering (removing files by date range, custodian, file type, etc.) and searching (removing files by keyword or keyword combinations) happen.  This is also where technology-aided review (TAR), like predictive coding begins and continues into the review phase.  The purpose of this phase is to reduce the amount of ESI that goes into the analysis and review phases.

Analysis and review involve looking at the ESI to see what is there.  During analysis, counsel may realize that some important known documents got left behind, so the keywords need to be revisited.  Perhaps one custodian worded a relevant issue or item one way, while another worded it another way.  Both need to be captured.

Review is normally the most costly part of the EDRM.  Estimates put average review costs at around $1 per document1.  This is the phase where ESI that made it to this point is evaluated and classified as relevant, privileged, etc.

Production involves taking the relevant ESI and producing it in the agreed-upon format (native, image files, load files, etc.) to opposing counsel or for your own legal counsel team.

Presentation if the ESI takes place in a deposition, hearing or trial.  It is simply the ESI presented and explained before an audience.

It's important to note that the EDRM is iterative, meaning that all throughout most of the EDRM, legal counsel may go back to revisit a previous phase, then move forward, then go back again.  This is actually one of the ways that I recommend handling ESI, since the eDiscovery team (counsel, client, consultants, etc.) often learn more information as the EDRM progresses and must take this new information and apply it to the EDRM.  This can often mean adding new custodians, changing search keywords or adjusting review criterion.

eDiscovery Dynamics
Now let's look at some of the dynamics during the eDiscovery process, in particular, the costs and the probabilities.  Recall equation (5) from Part 1,

PpJ - Cp ≤ S ≤ PdJ + Cd (5)

which demonstrates the S that will result in a settlement.

Keeping this in mind, and focusing on the costs for a minute, most estimates place the highest costs in the processing and review phases of the EDRM.  Processing can make up anywhere from 15% to 40% of the total cost, and review can take 50% to 80% of the total.  A recent RAND study2 asserts the following normalized percentages per phase cost based on empirical data:
  • Collection: 8%
  • Processing: 19%
  • Review: 73%
David Degnan, in his 2011 Minnesota Journal of Law article3 cites a table with the following phase cost percentages:
  • Collection: 4%
  • Processing: 36%
  • Review: 58%
  • Production: 2%
So a representative cumulative cost curve of the Degnan breakdowns might look like:

The RAND cost curve shown below is similar, except missing the production piece.

These curves look like they could be approximated with the sigmoid or "S-curve" function.

The curve starts with a low cost and builds over time, with an acceleration during review, then decelerates as it approaches the total cost.

So let's apply this to the eDiscovery process and revisit our settlement functions.  Going back to Part 1, and the example we used in which two parties were involved in a lawsuit where the settlement was $50,000, each of their costs were $10,000 and we let the probabilities vary.

Applying the s-curve eDiscovery cost function above and assuming both plaintiff and defendant total costs (they both approach $10,000), the probabilities are both 50% and cost functions are the same.  The settlement curves, plotted by day as the eDiscovery process progresses and the costs go up, should look something like

Recall from Part 1 that a settlement is possible in the area below the defendant's curve (Sd) and above the plaintiff's curve(Sp).

But if the defendant's costs are more expensive than the plaintiff's (growing towards $20,000 instead of $10,000), then we see a wider gap between the curves, a steeper, faster climb of the defendant's settlement curve and a much better chance for a settlement, as shown below.

Making the total final costs even again at $10,000, if we look at the two extremes where the probabilities are both 10%, we get the graph below, where a settlement is always possible at under $15,000.

When both parties believe there is a 90% chance that the plaintiff will win, we get the curves below, which show the settlement curves shifted up and a settlement always possible under $55,000.

Finally, the case where there is vast disagreement about the probabilities is shown below.  Here, a settlement cannot be made at any point, since at no point is the plaintiff's curve below the defendant's curve.

But it's probably not realistic to assume the probabilities stay the same throughout discovery.  Disclosure of information should bring the probabilities together.   Below shows the settlement curves as well as the probability curves, where the probabilities start apart and converge at 50% halfway through discovery.  A logarithmic scale is used so both sets of curves can be shown on one graph.

We can see that in the beginning a settlement is not possible.  But as the probabilities cross and reach the 50% mark, a settlement becomes possible.

Hopefully the dynamics shown here have shed some light on how parties are expected to behave during discovery, especially considering the costs and dynamics of eDiscovery.  If one wants a settlement, discovery is potentially the best time to press for it, since that's when a lot of the costs and probabilities shift.  The conclusion based on what we've seen here is that the best way to a settlement is:

(1) Make sure your opponent is spending a lot (at least more than you), and

(2) Use discovery to change the opposing side's probability by sharing information as soon as you get it to keep your own costs down.

Another dynamic which a lawyer friend of mine suggested was the theory that once a party is already in for a certain amount, they might as well stay in and see the matter through.  This is similar to a game theory premise that poker players face: If you're already playing at a loss, what's a little more money to play out the game and see where it goes?  I have not represented that dynamic here, but it may be an interesting one to consider.

Next time we'll look at some empirical studies to try to tackle eDiscovery cost estimation.

1. Palazzolo, J., "Why Hire a Lawyer? Computers Are Cheaper",

2. Pace, Nicholas M. , Laura Zakaras, "Where the Money Goes: Understanding Litigant Expenditures for Producing Electronic Discovery",

3.  Degnan, D., "Accounting for the Costs of Electronic Discovery", Minnesota Journal of Law, Science & Technology. 2011;12(1):151-190.