Can predictive coding be used to classify records?

Recommind has produced a short booklet titled ‘Predictive Coding for Dummies’, available as a free, 36-page, pdf. The Dummies Guide notes that ‘Real living, breathing legal experts are essential to predictive coding. These experts use built-in search and analytical tools — including keyword, Boolean and concept search, category grouping, and more than 40 other automatically populated filters — collectively referred to as predictive analytics — to identify documents that need to be reviewed and coded.’ Replace ‘legal experts’ with ‘records managers’ and the role of the records manager is clear. Of course, finding all the correct documents within a classification is only one part of the requirement. The classification needs to be persistent and connected with other record-keeping requirements including retention management.

I recently had the chance to sit with one of the predictive coding vendors to discuss these issues and my concerns about the effectiveness of the technology to classify records correctly. The technology looks appealing as a tool that can support digital record-keeping. What struck me most about the technology was the way it presents the results to a user. If you didn’t know it was an advanced search and categorisation engine, you might be forgiven for thinking that the screen was actually from an EDRMS.

On the left hand side is the classification scheme that I could browse.

Click on one of the activities, and I was taken to the subject.

The list of results shows all or most of the same basic metadata you would expect in your EDRMS; more if it was added when the record was saved.

From the results listed I could find similar documents, see similar or related search results, and add public or private tags.

The technology doesn’t just figure out the classification by itself – it has to be trained, and who better to train the system than records managers? Start with the business classification scheme, find 100 records that match the classification, ask the system to find 1000 and confirm (and manage exceptions). And so on, until all the digital records you have allowed the technology to search (including network drives, email etc) classifies all your digital records.

The link with disposal is there too. The technology classifies all the digital records you point it at. If your classification system is linked to your retention schedules, you are presented with all the information that can be then subject to a retention rule. Keeping the metadata about the records that you destroy is then a matter of capturing the metadata found in the search and applying additional metadata about the disposal action. (Incidentally, the same concept is used to put records on legal holds to prevent their disposal)

Options for using technology like this might include:

Applying it against legacy digital stores, to clean up old digital records.

Applying it in conjunction with an EDRMS (or SharePoint), so that the technology assigns the correct classification regardless of where the user puts it (or can file it according to that classification).

Applying it against legacy and active digital stores, instead of using an EDRMS.

The online legal site law.com published an article on 31 October 2012 ‘Pitting Computers Against Humans in Document Review‘. It concludes that ‘one should not jump to discredit the usefulness of computer assisted methods; techological solutions depend on the expertise of the people who use it.’

Craig Ball offered an interesting reply to this article leading with the comment: ‘Whoever challenges our assumptions and forces us to defend them is performing a valuable service, no matter what their motives’.

Ball noted the ‘sad fact … that human reviewers perform poorly in a consistent fashion.’ He added ‘the fact is that the errors human reviewers make are rife, even when well trained and -motivated (if we are candid, an all-too-rare and exceptional circumstance)’.

I think it’s possible to replace ‘human reviewers’ in this context with ‘end users’ who don’t understand classification terms (or, generally, record-keeping). My own view is that predictive coding technology has the ability to support digital record-keeping, with the active involvement of records managers, with or without an EDRMS. It has the ability (once trained) to aggregate records by BCS terms and to apply retention rules to those records. - AW