Data Prep Bottleneck Drives Unstructured-Microsoft Deal

Data preparation, one of the biggest barriers to moving AI from pilot to production, is the focus of an expanded collaboration between Unstructured and Microsoft.

Under the arrangement, Unstructured's cloud-native ETL platform is integrated with Microsoft Azure services to prepare complex enterprise content for AI workloads. The platform parses, chunks and enriches data from more than 64 file types. The output feeds large language models, retrieval-augmented generation (RAG) pipelines, copilots and agentic workflows.

Native connectors ingest data from Azure services such as Azure Blob Storage. The prepared data can then be indexed in Azure AI Search and used with Microsoft Foundry to support production AI applications.

The collaboration targets highly regulated industries, including financial services, healthcare, insurance, pharmaceuticals and government. Unstructured can run within customer Azure environments, allowing organisations to retain their own security, compliance and data governance controls rather than sending content to an external service.

Enterprises can also procure the platform through Microsoft Marketplace, aligning purchases with existing Azure spending commitments. Unstructured supports more than 30 content connectors, including Microsoft OneDrive, SharePoint and Azure Blob Storage.

The deal underlines how data preparation has become a competitive battleground for enterprise AI. Most enterprise information sits in documents, PDFs, presentations, emails and content systems that AI applications cannot use without preprocessing.

“Enterprise AI is only as effective as the data that powers it,” said Brian Raymond, CEO of Unstructured.

“Most enterprise data remains unstructured and inaccessible to AI systems. By working with Microsoft, we're helping organizations unlock that data and accelerate the path from raw information to production-ready AI applications on Microsoft Azure.”

Unstructured claims its platform is used by 87 per cent of the Fortune 1000.

https://unstructured.io