Cracking the AI Code: Why data and how much data companies keep  matters

By Robert Yang, Vice President Asia Pacific, Seagate Technology

AI has garnered a lot of attention in Australia and globally over the last years as businesses increasingly explore the potential to anchor their operational strategies in artificial intelligence. While big tech companies including Microsoft has stated that generative AI could add $40 billion to their top line and the generative AI market could drive an almost $7 trillion increase in global GDP, the big news is that  75% of companies expect to adopt AI technologies over the next five years and according to Deloitte, there will be seven times as much invested annually in AI by Australian businesses in 2020 compared to today.

But the best AI deployments are useless without one key ingredient: data.

Companies need volumes of data to train AI models to find insights and value from previously untapped information. Because tomorrow’s AI tools will be able to derive yet-unimagined insights from yesterday’s data, it is vital that organisations keep as much data as possible.

Chatbots and image and video AI generators will also create more data for companies to manage, and their inferences will need to be kept to inform future algorithms. Gartner expects generative AI to account for 10% of all data produced, up from less than 1% today by 2025. By cross-referencing this study with IDC’s Global DataSphere Forecast study, we can expect that generative AI technology like ChatGPT, DALL-E, Bard, and DeepBrain AI will result in zettabytes of data over the next five years.

Organisations can only take advantage of AI applications if their data storage strategy allows for simple and cost-effective methods to train and deploy these tools at scale. Massive data sets need mass-capacity storage. The time to save data is now if not yesterday.

Why AI Needs Data

According to IDC, 84% of enterprise data created in 2022 was useful for analysis, but only 24% of it was actually analysed or fed into AI or ML algorithms. This means companies are failing to tap the majority of available data. That equates to lost business value. Think of it as an electric car - without a charged battery, the car won’t take you to your destination. Similarly, if data isn’t stored, even the most intelligent AI tools won’t be of any assistance.

As companies begin to train AI models, they will need robust mass-capacity storage strategies that support  both raw and generated data.The cloud will provide support  for some of their AI workloads and storage, but they will also store and process some data on the premises.

Keeping raw data even after it’s processed is essential too. Intellectual property disputes will arise regarding some content created by AI. Industry inquiries or litigation can concern questions regarding the basis for AI insights. “Showing your work” with stored data will help demonstrate ownership and soundness of conclusions.

Data quality also affects the reliability of insights. To help ensure a better quality of data, enterprises should use methods that include data preprocessing, data labelling, data augmentation, monitoring data quality metrics, data governance, and subject-matter expert review.

Organisations Must Prepare

Understandably, data retention costs sometimes cause companies to delete data. Companies need to balance these costs against the need for AI insights, which drive business value.

To  reduce data costs, leading organisations deploy cloud cost comparison and estimation tools. For on-premises storage, they should look into TCO-optimising storage systems that are built with hard drives , which are not only cost-effective, but also durable and reliable  for massive data sets. They can store the vast data needed to feed AI models for continuous training. Additionally, they need to prioritise monitoring data and workload patterns over time and automate workflows where possible.

Comprehensive data classification will also be essential to identify the data needed to train AI models. Part of it means ensuring that sensitive data— for instance, personally identifiable or financial data—is handled in compliance with regulations. There must be robust data security. Many organisations encrypt data for safekeeping, but AI algorithms generally can’t learn from encrypted data. Companies need a process to securely decrypt their data for training and re-encrypt it for storage.

To ensure AI analysis success, businesses should:

  1. Get used to storing more data because in the age of AI, data is more valuable. Keep your raw data and  insights. Don’t limit what data can be stored—limit instead what can be deleted
  2. Put processes in place that improve data quality
  3. Deploy proven methods of minimising data costs
  4. Apply robust data classification and compliance
  5. Maintain data security

Without these actions, the best generative AI models will be of little use.

Even before the emergence of generative AI, data was the key to unlocking innovation. Companies most adept at managing their multicloud storage are 5.3× more likely than their peers to beat revenue goals and  Generative AI could significantly widen the innovation gap between winners and losers.

So while  the buzz around generative AI has rightly focused on its innovative potential, smart  business leaders will also look closely at how their  data storage and management strategies can make or break their AI success.