Five email migration headaches

Tuesday, September 2, 2014 - 21:44

By Eddie Sheehy

The vast majority of Australian businesses are now using some form of cloud computing and cloud-based email is a common choice for local organisations. Since the start of this year, retailer Woolworths and publishers News Corp and Fairfax have announced plans to switch their on-premises email systems to Google Apps, while Qantas and the Queensland Government said they were shifting to Microsoft’s Office 365.

When selecting the next-generation email platform they want to migrate to, organisations go through an extensive process of issuing RFPs, requesting on-site vendor meetings, completing thorough proofs of concept of the technologies and checking references.

Yet few pay any attention to another factor which has the potential to impact on the success or failure of their transition just as much: the choice of migration technology. In particular, organisations need to be aware that migrating data from an old email archive to the cloud can be a time-consuming and frustrating task.

Five major migration headaches

Ageing archives are often filled with redundant, obsolete and trivial information that does nothing but add to storage costs. Worse, years of use can corrupt legacy archive indexes making their contents difficult to search and sometimes impossible to extract.

These issues are exactly what is driving many to invest in newer archive software or migrate to the cloud. However, the problems in the original archive often create headaches for organisations seeking to move their data. Here are five of the most common obstacles that organisations run up against:

Slow APIs

Most migration technologies available today use the legacy archive’s in-built application programming interface (API) to extract and migrate data. Unfortunately these APIs typically weren’t designed to deal with large volumes of data – in fact, many are only capable of processing one item at a time.

At this slow pace it can take months or even years to migrate relatively small volumes of data. And this delay will only blow out further as the volume of data created, sent and stored doubles every two years.

Corrupt indexes

API-based migration relies on the legacy archive’s internal index containing accurate records of which messages and attachments are stored where. Burdened with terabytes of data accumulated over many years, many archives’ indexes have become corrupted. Extracting data with a corrupt index can result in some items being left behind or data that is in a healthy state at the storage level being scrambled during extraction. This makes migration difficult or even a complete failure.

Delayed search and discovery

Gaining access to more responsive—or at least functional—search tools is often a major impetus for migrating data. However, the new platform’s search won’t return accurate results until all the legacy data has been successfully migrated.

As I’ve already mentioned, it can take months or years before organisations can fully migrate and access the search capabilities that were the initial draw-card for adopting newer storage and cloud solutions. This delay effectively voids the anticipated return on investment (ROI) that justified migration in the first place

Hidden risks

Data accumulated over a number of years can conceal business risks involving sensitive private or financial information. This information is often buried deep within terabytes of poorly organised and old data. What’s more, most legacy archives lack advanced search capabilities to identify these risks.

Many organisations are struggling to find important data – if they urgently need a piece of information for litigation or other reasons, it can often take weeks or months to find.

This has serious implications for Australian businesses that will soon be subject to enhanced national privacy laws. They must be able to audit the information they hold to ensure it doesn’t retain any private information deemed unnecessary by the Privacy Act.

The difficult question for business is, if their new platform isn’t yet fully migrated and searchable, and their legacy archive contains large amounts of unsearchable unstructured data, how will they ensure they comply with the new laws?

No way to leave out the junk

Traditional migration approaches have no efficient way of distinguishing low-value space-filler from important data. They cannot selectively leave behind data by date or other criteria that determine value. Consequently, organisations have no option but to ‘pump and dump’ the entire archive – with all its bloat and drag on performance – into the new platform. This can see an organisation paying as much to store data in the new platform as in the legacy archive.

Cloud platforms and new hardware may be more efficient, but migrating an entire legacy archive onto these new platforms doesn’t solve the original issues. Organisations will continue to struggle with issues such as search, poor performance and irrelevant data if the problems of the old archive have simply been carried into the new one.

A smarter way forward in migration

In 2012, Nuix worked with a large US-based global financial institution to index the contents of its legacy email archives so it could apply searches for eDiscovery related to major litigation. After we worked through 330 terabytes of legacy data (or around 3 billion emails) in just 45 days, the bank asked for help solving a related challenge: data migration.

This was the beginning of our Intelligent Migration business, which uses the Nuix Engine’s parallel processing framework to accelerate the data migration process from years to months or even weeks.

It further helps avoid the headaches of migration by allowing organisations to index and make the previously unsearchable data searchable before migrating to new systems. The technology bypasses the API and directly indexes and extracts the data from the archive storage.

Once the data is indexed and fully searchable, organisations can make informed judgments about risk and value. Before migrating, they can pinpoint business risks associated with private and financial data for remediation. They can also filter out data to leave behind in the legacy archive, which includes data past its retention date, very large and infrequently accessed files, duplicated email messages such as company-wide email memos and trivial content containing keywords such as ‘lunch’ or ‘kitten’. Organisations can also prioritise information to be migrated first, such as data on legal hold or belonging to executives.

The ability to streamlining and reduce the size of years’ worth of data in this way makes migration drastically faster. It also brings major cost savings. For instance, it is no longer necessary to maintain the legacy archive while data is transferred to the new solution. Similarly, our approach reduces future storage costs, as the new archive contains less of the unnecessary data that might otherwise have clogged the system.

Searchable data opens up endless possibilities

Our approach allows organisations to start afresh and take charge of their new archive by introducing and enforcing data retention policies. They will also likely see increases in staff productivity due to making it easier for people to search and use data that has current business value. This also opens up possibilities for conducting analytics across email communications, which may provide a competitive edge over many other companies who will continue to struggle through masses of unstructured data. And finally, Australian organisations will be better positioned to stay ahead of new privacy laws that require them to prove they are only holding private data necessary to run their business.

Eddie Sheehy is CEO of Nuix, an Australian technology company that enables people to make fact-based decisions from unstructured data.

Search form

Five email migration headaches