SAP Targets Data Fragmentation With Dremio Acquisition

SAP has agreed to acquire Dremio, an open-source data lakehouse platform, in a bid to solve the data fragmentation problem that the enterprise software giant says is the primary reason AI initiatives fail in large organisations. Financial terms were not disclosed.

The acquisition targets a structural weakness common across organisations running complex, mixed-vendor technology estates: enterprise data that is spread across SAP and non-SAP systems, stored in incompatible formats, and stripped of the business context AI agents need to function reliably.

"Enterprise AI doesn't stall because the models aren't good enough; it stalls because the data isn't ready for AI agents," said Philipp Herzig, CTO of SAP SE. "Dremio eliminates that bottleneck. Combined with SAP Business Data Cloud, we can now take customers from raw, fragmented data to governed, AI-ready intelligence on a single open platform."

Dremio is built on Apache Iceberg, the open table format that has become the de facto standard for large-scale analytical data lakes. Its central capability is federated querying - the ability to run queries across data stored in different systems without first moving or reformatting it.

For organisations with legacy SAP deployments alongside acquired systems, departmental lakes and regional data repositories, this eliminates the extract-transform-load (ETL) overhead that has historically delayed AI and analytics projects.

The Dremio acquisition positions SAP more directly against Databricks and Snowflake in the enterprise data platform market, as well as with cloud-native offerings from Microsoft (Azure Synapse), Amazon Web Services (Redshift, Athena) and Google (BigQuery

With Dremio integrated, SAP Business Data Cloud will become an Apache Iceberg-native enterprise lakehouse. SAP and non-SAP data will coexist on the same open foundation. Dremio's serverless, elastic architecture scales automatically with demand, removing fixed capacity constraints.

SAP says it will deliver a universal open catalogue built on Apache Polaris and the Apache Iceberg REST Catalog API, covering data meaning, relationships, access rights and lineage across both SAP and non-SAP sources. This catalogue will form the foundation of the SAP Knowledge Graph, embedding business relationships, organisational hierarchies, regulatory classifications and cross-system lineage as native data properties.

The announcement specifically cites "compliance risk when organisations cannot explain how an AI-driven decision was reached" as one of the costs of data fragmentation the acquisition addresses.

Independent analyst Shashi Upadhyay argued SAP should have built or acquired Apache Iceberg capability earlier, describing the Dremio purchase as "rectifying a mistake." The analyst also flagged a governance risk for customers: while SAP has committed to maintaining the open-source status of Apache Iceberg, Polaris and Arrow, the durability of that commitment post-acquisition is not contractually enforceable.

https://www.sap.com