The Need For Policies To Manage Your Unstructured Data

by Randy Hopkins

Unstructured data management policies ensures that data is always stored in the appropriate environment according to its usage, age, value and business priority.

For instance, an electric car manufacturer wants to understand how its vehicles perform under different climate conditions. Therefore, they may want to create a data management policy to continually pull trace files from cars at regular intervals into data lakes and analyze them. Once the study has completed, that policy will retire and the moved data could be deleted or moved to deep archive storage.

A hospital may have a policy to retain medical images for the life of the patient and the policy could dictate where and when those images move to cold storage. Managing policies manually is no longer a viable option given the scope of data stored in enterprises today.

With data growing at an unprecedented rate, comprising 30% or more of the overall IT budget on its storage, now is the time to hunker down on the idea of unstructured data management policy automation. The benefits of adopting a systematic way to create, execute and manage policies for data include:

  • Automated policies align data strategy with business goals;
  • Simplifies data management by reducing manual effort and ad hoc decision-making;
  • Deliver the means to maximize cost savings by continuously moving cold data tiering to less expensive storage;
  • Ensure compliance with industry regulations;
  • Add ransomware protection by copying data from primary storage into object lock storage where it cannot be compromised.
  • Automatically feed data pipelines into data lakes and tools for analytics and AI programs.

The notion of data management policies isn’t new, but historically, this activity took place within storage vendor technology. A storage vendor-centric approach was all well and good before data hit the petabyte and growing levels of today and before organizations were using multiple storage vendors and clouds to manage their data.

But now, the storage-centric approach to policy management creates vendor lock-in and silos, making it onerous to cost-effectively manage data and move it expediently to different storage technologies and services as needed to support users, big data analytics initiatives and cost-saving mandates.

Considerations for Unstructured Data Management Policies

  • Access anywhere: Distributed workforces now require instant access to data—regardless of where it’s stored—with a transparent user experience.
  • Automate as much as you can: Many organizations still employ IT managers and spreadsheets to create and track policies. The worst part of this bespoke manual effort is searching for files containing certain attributes and then moving or deleting them. These efforts are inefficient, incomplete and impede the goals of having policies - it’s so painful to maintain them and IT professionals have too many competing priorities. Plus, this approach limits the potential of using policies to continuously curate and move data to data lakes for strategic AI and ML projects. Instead, look for solution with an intuitive interface to build and execute on a schedule and which runs in the background without human intervention.
  • Measure outcomes and refine: Any data management policy should be mapped to specific goals, such as cost savings on storage and backups. It should measure those outcomes and let you know status so that if those goals are not being met, you can change the plans accordingly. This is akin to a smoke detector which is always checking its own battery and then alerts you when it’s time to change it out. For instance, if you have a data management plan which tiers data after it reaches one year of age into object storage in the cloud, you’ll expect a certain percentage of savings. However, if this cold data ends up being frequently pulled back into local applications and storage, you face high egress fees which counteract those savings. At that point, you would want to consider a different tiering model. Better yet, a data management solution can recognize the trend and applies the declarative action to right-place it.
  • Align staff roles: Data management policies should be managed by a team within the organization that identifies how policies are created and used and align with business units to ensure retention and protection considerations are consistent. The team is also responsible for managing, enforcing and refining policies and communicating them to employees with a need to know. Large enterprises should consider including top executives who contribute to discussions concerning data governance, protection and monetization.
  • Metadata management: Another consideration is to simplify searches across all file metadata from a unified global file index but also enables actions to copy, move, archive, tier and report on unstructured data files.

In closing, enterprise data is not owned by any individual or business unit; it is owned by the enterprise and needs to be managed holistically and strategically to meet stakeholder needs and broad organizational objectives. Data should be accessible to users no matter where it resides. Ultimately, a data management policy should guide your organization’s philosophy toward managing data as a valued enterprise asset.

Randy Hopkins is VP, Global Systems Engineering & Enablement at Komprise.