back to article UK court approves use of predictive coding for e-disclosure

A UK court has approved for the first time the use of predictive coding as a basis for determining which electronic documents are relevant to a dispute. The High Court in London sanctioned its use in a multi-million pound dispute where more than 3 million electronic documents have to be assessed to determine whether they are …

  1. Pascal Monett Silver badge

    "until an acceptable level of accuracy is reached"

    More like an acceptable level of loss.

    Yeah, 3 millions documents - and that's not pages - is a lot to review. Maybe we need to pare down the discovery process to allow for a restricted number of document submissions, instead of trying to find a way to skip analysing every one of them ?

    Because if the 3 million are relevant, then what ?

    1. Bc1609

      Re: "until an acceptable level of accuracy is reached"

      You don't really do "loss" in predictive coding (bloody stupid name). To use the needle-in-the-haystack analogy, PC isn't for pulling out the needles and only the needles, but rather for discarding as much hay as possible without losing a single needle. To use the standard classification terms, it treats having 100% recall as an absolute necessity, and then tries to bring precision up as a secondary consideration. Even if you only shed 20% of the documents you're saving hundreds of billable hours on a big case.

      I'm not sure how you'd go about limiting discovery by quantity - what if there's only one email out of the 3m documents that's relevant? And if all 3m are relevant, that's great - pick some representative ones and use those. If they get challenged you have plenty of back-up material. The judge and jury don't need to read every single document.

      1. TRT Silver badge

        Re: "until an acceptable level of accuracy is reached"

        Swamping the other legal team during disclosure is an unfortunate and ever more common strategy now. As we get more and more litigious, they'll be looking for smaller and smaller needles e.g. that one internal Samsung email in amongst 20,000,000 that says "Hey! Have you seen the new iDevice? Rounded corners look cool, don't you think?"...

        It's an increasingly sad world we live in.

        We'll see what comes of it.

        1. Bc1609

          Re: "until an acceptable level of accuracy is reached"

          @TRT "Swamping the other legal team during disclosure is an unfortunate and ever more common strategy now."

          Well, in the US, yes. It's much less common in the UK - in fact we're fairly good at cooperation during disclosure, simply because it reduces the costs so much for both sides. I don't know exactly why - it could be that without the massive penalty figures introduced into compensation the incentives are different (fewer enormous payouts means that baseline costs are proportionally higher, and so it makes more sense to reduce them), or it could be because the UK system is much less adversarial in general (no objections, for example - you're supposed to behave properly even if your opponent isn't keeping you in check).

      2. Anonymous Coward
        Anonymous Coward

        Re: "until an acceptable level of accuracy is reached"

        @Bc1609

        pick some representative ones and use those. If they get challenged you have plenty of back-up material. The judge and jury don't need to read every single document.

        The Crown Prosecution Service would have received enough evidence from either the police or the serious fraud office, to convince them that a crime had been committed and that there was a reasonable chance of conviction based on that evidence. Everything else is just distraction which works in the defendant's favour.

        1. Bc1609

          Re: "until an acceptable level of accuracy is reached"

          @Philip Clarke

          1) While I'm sure it will be useful in some white-collar crime cases, this kind of mass discovery stuff is really for civil suits - for exactly the reasons you describe (if the CPS is prosecuting it almost certainly has the evidence already and so has no need for more discovery).

          2) This isn't mandatory. Discovery is requested, not enforced - if the CPS has what it needs then it simply won't ask for 3m+ documents. So I'm not really sure why you're worrying about it proving a "distraction".

          1. This post has been deleted by its author

        2. Anonymous Coward
          Anonymous Coward

          Re: "until an acceptable level of accuracy is reached"

          > The Crown Prosecution Service would have received enough evidence from either the police or the serious fraud office, to convince them that a crime had been committed and that there was a reasonable chance of conviction based on that evidence.

          Unless, as is happening to a friend at the moment, the police somehow 'forgot' to include a witness statement that strongly supports him and also 'forgot' to even interview another key witness (who also strongly supports him) so that the CPS get a heavily biased report and decide to go ahead with the prosecution.

          The police don't seem to care about whether an offence has actually been committed: they've made their mind up that he's guilty and are happy to stand back and see him financially ruined. There's no come back on them whatever happens.

          They even went back to the first witness weeks later - after the submission to the CPS - and tried to persuade her to change her statement. If I did that it would be called perverting the course of justice.

          Anon because the trial is about to start as I write.

          1. Ian 55

            "'Forgot' to even interview another key witness"

            It's not the job of the police to do the defence's work for them.

            From another perspective, if there is a key witness who can clear the defendant, so much the better if only the defence have talked to them.

            1. Anonymous Coward
              Anonymous Coward

              Re: "'Forgot' to even interview another key witness"

              > It's not the job of the police to do the defence's work for them.

              But it is the Police's job to investigate. And that involves interviewing everyone who could reasonably be expected to have pertinent information, whether for or against.

            2. Intractable Potsherd

              Re: Police to do defence's job

              Actually, since the Crown Prosecution Service took over the role of prosecution from the police, investigating the crime in an even-handed manner is exactly what the police should be doing. Evidence for either side should be put forward. The fact that the police are still allowed to go on thinking that they work for the prosecution is part of the huge bundle of problems we have with the police.

    2. Trigonoceps occipitalis

      Re: "until an acceptable level of accuracy is reached"

      Do you know the plot of Class Action?

    3. Michael Wojcik Silver badge

      Re: "until an acceptable level of accuracy is reached"

      More like an acceptable level of loss.

      Term of art. This is a binary classification problem, and in such problems "accuracy" is usually taken to mean some function of precision and recall. So both false positives and false negatives are included in the metric, and there's no ground for complaining that one of them is being neglected.

      The appropriate relative weights can be debated, but that's a rather more nuanced discussion.

  2. Anonymous Coward
    Anonymous Coward

    It was Professor Plum in the dining room with the lead pipe.

    Don't need to submit every book in the library because it's adjacent or teach a computer to read Plum's research papers.

    1. Paul Crawford Silver badge
      Gimp

      Re: It was Professor Plum in the dining room with the lead pipe.

      No! It was Miss Scarlet, in the basement, with a strapon...

      1. TRT Silver badge

        Re: It was Professor Plum in the dining room with the lead pipe.

        I thought she was in the bedroom with a candlestick.

  3. Anonymous Coward
    Anonymous Coward

    Just one question ..

    Who verifies the quality of the code to do this work?

    Sorry to be picky here, but if the outcome is determined on the basis of what this software selectes after training, I would like to be damn sure that that is the only training the code gets. If this is about millions, "investing" a sizeable amount in programmer "encouragement" to slant the selection would not be a surprising move IMHO.

    1. The Mole

      Re: Just one question ..

      Who verifies the quality of the paralegals/interns who are given the task to read through 3 million documents? (The expensive lawyers won't be looking at them all). Given how dull and boring many of these documents are I wouldn't be surprised if there is a very high error rate from humans doing the filtering.

      1. Anonymous Coward
        Anonymous Coward

        Re: Just one question ..

        Who verifies the quality of the paralegals/interns who are given the task to read through 3 million documents? (The expensive lawyers won't be looking at them all). Given how dull and boring many of these documents are I wouldn't be surprised if there is a very high error rate from humans doing the filtering.

        Hang on, let me get this straight.

        Are you offering the "intern error" as an argument not to bother with verifying that the software is actually performing as designed? So we should not check if there is (a) improvement and (b) bias because the current method is shown to be deficient?

        I have a small problem parsing that one, so show me where I got that wrong.

  4. Jason Bloomberg Silver badge

    I'm obviously missing something...

    The Master also said the English civil procedure rules do not prohibit the use of predictive coding software.

    So why is (cue Dr Who theme) The Master making a ruling to allow it?

    If those undertaking discovery insist on an automated mechanism and end up missing things, weaken their case, that is their problem. Is it that those having discovery carried out against them were hoping to force the other side to search through millions of documents in the hope that they might miss a few important ones?

    1. The Mole

      Re: I'm obviously missing something...

      I presume the ruling was that English civil procedure rules do not prohibit the use of predictive coding software. Presumably before the ruling nobody knew if it was legal.

  5. Duncan Macdonald

    Why did the Master need to rule in this case ?

    As both sides had agreed to use this method to reduce the costs, where was the necessity for him to rule? I thought that in most civil cases, any issue that the two sides agreed upon did not need further approval .

  6. Doctor_Wibble
    Boffin

    TLDR: Court Allows Grep, etc

    In the tradition of 'TLDR' being gross oversimplification but from the baillliii etc link :

    "the term 'predictive coding' is used interchangeably with 'technology assisted review', 'computer assisted review', or 'assisted review'. It means that the review of the documents concerned is being undertaken by proprietary computer software rather than human beings. The software analyses documents and 'scores' them for relevance to the issues in the case."

    And also on the basis that I spent the first several paragraphs of the article wondering WTF this newfangled predictive coding was, if it wasn't some improvement for the benefit of the stenographers (second time I've had that word in less than a week, dangerous times indeed) in software-related cases.

    1. Ian 55

      Re: TLDR: Court Allows Grep, etc

      I think what's actually proposed is training some Bayesian filters, as in a decent spam system. They can be very quick to get to several 'nines' accuracy and are much better than coming up with loads of regular expressions for grep to find - if some shifty documents are found in the sample, finding similarly written ones that use none of the same unusual words is quite possible.

      But given that..

      "The costs of using predictive coding software would depend on various factors, including importantly whether the number of documents is reduced by keyword searches, but the estimates given in this case vary between £181,988 plus monthly hosting costs of £15,717, to £469,049 plus monthly hosting costs of £20,820."

      ... that seems quite high. How do I get some of it?

      1. Michael Wojcik Silver badge

        Re: TLDR: Court Allows Grep, etc

        I think what's actually proposed is training some Bayesian filters, as in a decent spam system.

        From what I've seen in academic papers and commercial announcements on the subject, legal document classification is rarely done with naive Bayes. Some researchers use ANN-based methods (perceptron networks seem to be popular); others use one or more type of more-sophisticated classifier such as SVMs or kNN; a few are promoting fancier approaches based on lattice algebras or P-algebras.

        It's possible someone might use a variation of LSA, probably pLSA, to attack the problem, though LSA's bag-of-words model and opacity would make me mighty nervous about using it in this application.

        But spam-filter-style naive Bayes, even built into a Maximum-Entropy Markov Model or similar, is not likely to do a good job with this sort of classification. You need to extract higher-order features.

  7. Otto is a bear.

    A new war

    You can just see a bunch of lawyers getting together to see if they can write documents that won't show up in a predictive coding exercise. At the very least to favour some documents over others.

    The saving grace is that most UK individuals can't write in english anyway, init blood.

  8. Gordon 10
    IT Angle

    Is this true machine learning?

    Or some shoddy con by a vendor on non-techie lawyers?

    1. Anonymous Coward
      Anonymous Coward

      Re: Is this true machine learning?

      "Is that a real poncho . . . I mean

      Is that a Mexican poncho or is that a Sears poncho?

      Hmmm . . . no foolin' . . . " F. Zappa

    2. Michael Wojcik Silver badge

      Re: Is this true machine learning?

      Natural Language Processing in the Legal field is already big business, and has been for years. This is the English courts catching up to what legal firms have been doing for a long time. The software is quite sophisticated.

      I can't say whether it's "true machine learning", because that phrase doesn't mean anything useful.

      The economic forces behind discrete and probabilistic document classification in legal work are tremendous. Court systems are by nature conservative, but they'll gradually be dragged into allowing it, if only because it's already widespread elsewhere in the legal industry.

      Soon we'll be seeing more of it in other industries too. Some years back I was talking with some medical researchers about doing it for Cochrane meta-studies. Just a matter of time.

  9. allthecoolshortnamesweretaken

    So it's basically a search engine?

    1. Michael Wojcik Silver badge

      Basically, no. Also sophisticatedly, no.

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon