Predictive Analytics – Driving effectiveness in decision-making

By Vanessa Douglas-Savage

Predictive analytics is the practice of extracting patterns from information to predict future trends and outcomes. Typically used as a decision-making guide, predictive analytics is steadily impacting the way in which governments will design and deliver public services to their citizens.

The success of predictive analytics tools in public service delivery hinges on overcoming some key big data analytics challenges.

Firstly, they must consolidate and integrate data from across a range of sources, including complex legacy systems, isolated information groups, and data from external sources that maintain citizen information. With so many sources, the amount of data is overwhelmingly large.

To derive analyses and value from these systems, the key questions to ask are: What information is important? How do the pieces of information fit together? 

Historically, the big data wave has been poor at answering these questions, and reliance is still on human intervention to reach decisions, determine directions and remove the noise from the real message. The key for humans to navigate this effectively is to recognise patterns in the data, and this is where machines can assist.

Structured and unstructured data

Of the data landscape, 80% is unstructured data. That is, the data is contained in documents, emails, images or social media text-based posts rather than structured databases. Where structured databases provide a level of consistency in format and metadata, unstructured data requires a level of organisation prior to any analysis activity. The aim is to establish context and connections to structured data sources. This e-discovery activity can be complex and costly particularly in the review phase where traditionally every piece of data would have to be read by a human, sometimes involving armies of humans.

Taking some steps towards breaking through this challenge is the practice of predictive coding. This involves a level of machine intelligence where systems are taught by human subject matter experts to gather data, perform analysis and make decisions about what is relevant. In large, complex information environments, machines are taught to do the heavy lifting which in the long run can cut the cost of e-discovery significantly. The downside of predictive coding is that media files like video, images and audio cannot be read.

Another method of unstructured content organisation is automated metadata tagging for classification, description, indexing and management of content. In the case of automated tagging, a machine is taught via rules, suggestion-based tuning and previous experience to apply tags to content.

However, predictive coding and automated tagging highlight the third big data analysis challenge, imperfect data. As with all learning experiences, mistakes will be made by both machine and human. Humans must learn how to teach the machine, and the machine must go through the learning curve of understanding subject matter and reaching relevant conclusions.

But more significantly, the challenge of garbage in - garbage out remains at large. Because the machine is learning, if human input is incorrect, or there are errors in the source data, then the machine will arrive at incorrect conclusions. As a result, predictive coding and automated tagging are not completely trusted to deliver the same outcomes as a human eyeball activity.

At the same time, care must be taken to ensure that information gathering does not infringe on individuals’ privacy, and that due measures are taken to protect civil liberties.

A significant challenge in this area is around the de-identification of data so that the identity of a person cannot be determined. This activity is often underestimated as conclusions are drawn around the importance and use of the data prior to release. A recent example was in New York City where taxi trip logs were made available. Within a short space of time, it was found that despite some data anonomisation it could easily be determined who drove what vehicle, a driver’s gross income and where they live. There is ongoing debate about whether de-identification works and whether you can truly anonomise a dataset.

Finally, for any predictive analytics effort to be effective there needs to be consistent and on-going training for officers. This training needs to help them deploy analytics tools and technologies in the most effective manner, and remain abreast of the changes and improvements in the industry. Training activities can also be an effective feedback tool to improve the analytics process and refine predictions.

Law and order – predictive analytics 

The practice of predictive analytics is finding wide application in areas such as law and order. It is helping to make reliable intelligence available to officers, equipping them with effective everyday operational decisions, as well as the best strategies to fight crime, detect frauds and prevent terrorism.

Further, given today’s increasingly complex information environment, predictive analytics helps police forces generate coherent and meaningful patterns of information while simultaneously sifting through several sources including social media, video CCTV footage and geographic profiling systems. Traditionally, such exercises warrant huge budgets, but combining analytics tools with digital technologies is helping police forces gain key intelligence insights, without having to depend on a large department of intelligence analysts.

As budget allocations shrink and resources decline, governments will continue to explore ways to use predictive analytics as an effective route for quality public services delivery. As always, the key to success will be the quality of the data, pattern recognition and the assumptions that underpin analysis.

Dr Vanessa Douglas-Savage is a Senior Consultant with Glentworth.