History Is Repeating: Why AI Needs a New Kind of Data Strategy

History Is Repeating: Why AI Needs a New Kind of Data Strategy

July 19, 2025

The Past: Rise of Central Data Teams

In the early 2000s, data belonged to the domains. Finance had its spreadsheets, sales had its CRMs, marketing had its campaign folders. Each team made decisions based on what they owned. But then something seismic happened—big data and the cloud arrived.

Suddenly, executives realized that growth came not just from better tools but from better insights—cross-domain insights. Enterprises rushed to unify data and break down silos. Data lakes were born. Warehouses were modernized. And from this, a new role emerged: the central data team—the gatekeepers of analytics, governance, and compliance.

They had one mission: turn data into business.

The New Frontier: Unstructured Data + GenAI

Fast forward two decades. Language models like ChatGPT, Claude, and Gemini are rewriting how people interact with technology. For the first time in history, non-technical users can ask questions and get insights from raw data—no dashboards, no SQL, no training.

This is revolutionary. But here’s the catch:

💡 80% of enterprise data is unstructured.

That’s documents, emails, chats, PDFs, contracts, proposals, call notes, internal wikis. And most of it is still trapped in silos.

Worse, the ownership problem is back. HR owns the resumes and offer letters. Legal owns the contracts. Sales has the decks. Marketing has the positioning docs. And every department guards its data—not just because of policy, but because of fear.

We’re Watching History Repeat Itself

Remember when structured data was too fragmented to be useful? That’s where unstructured data is today. Everyone’s trying to solve their own problems in isolation:

  1. HR wants a resume copilot.

  2. Legal wants a contract summarizer.

  3. Sales wants a proposal generator.

  4. IT wants to know where the sensitive files live.

But nobody is stitching the bigger picture.

And that’s a problem. Because the real magic of GenAI happens when data works across boundaries.

  1. When sales proposals are cross-checked with legal clauses

  2. When support tickets are enriched with engineering notes

  3. When compliance policies are mapped to employee communications

That’s what leads to new insights, faster decisions, and competitive advantage.

But right now, GenAI can’t get there.

What’s Blocking AI in the Enterprise?

Most enterprises are hitting the same wall:

  1. Fragmented ownership of unstructured data (domain-level silos)

  2. No semantic context around documents (just filenames and folders)

  3. Security & privacy risks in enabling open access (CISO anxiety)

  4. RAG systems hallucinate without proper grounding and governance

Some orgs try using Copilot. But Copilot is limited—5 files at a time isn’t enterprise-ready.

Others attempt DIY projects using LangChain, GPT-4 APIs, and custom ETL pipelines. But that leads to fragile workflows, compliance headaches, and endless engineering overhead.

And let’s be honest: searching through 5 million documents to find everything about gender-related topics, and extracting it securely and meaningfully? That’s not a weekend side project. That’s a whole infrastructure challenge.

Why Context—and Control—Are Everything

Here’s the core truth: Unstructured data isn’t governed by rows and tables. It’s governed by context.

You can’t protect a contract just by file name. You need to know that paragraph 3.1 mentions GDPR, and it must be masked from an intern but visible to the compliance team.

You can’t summarize a proposal without understanding that section 5 refers to pricing and must be redacted for external contractors.

This means:

  1. Access control must go beyond folders—it must go down to paragraphs

  2. Classification must be semantic—not just by keyword, but by meaning

  3. Retrieval must be policy-aware—so copilots never leak sensitive data

  4. Lineage must be provable—for every chunk retrieved, redacted, or exposed

In short, the future of AI in the enterprise depends on building an infrastructure that treats context as a first-class citizen.

Central Data Teams Still Matter—But Must Evolve

We’re not saying decentralize everything.

Central data teams are critical. They understand policy. They manage tools. They own the data strategy. But they can’t be the bottleneck. Today, everyone—from software engineers to product managers—needs access to governed data.

  1. Data scientists want experimentation-ready corpora.

  2. Sales engineers want LLMs that understand product + legal docs.

  3. Security teams want traceability across Slack, SharePoint, and S3.

These users can’t wait weeks. They need insight in seconds.

And CIOs and CISOs can’t accept blind spots. They need auditability by default.

We need to stop treating these goals as tradeoffs. Productivity and privacy are not opposites.

What Enterprises Need to Win with GenAI

To unlock the value of GenAI, you don’t just need better models. You need:

  1. Complete visibility into unstructured data — across sources like SharePoint, S3, Slack, and Salesforce

  2. Contextual enrichment — break content into semantic chunks and classify them

  3. Granular access control — paragraph-level masking, redaction, and approvals

  4. No data duplication — read-only overlays, not brittle pipelines

  5. Role-aware APIs — power copilots, dashboards, and RAG systems without rewriting ACLs

  6. Immutable lineage and audit — prove what was accessed, when, and why

This is how we move from copilot experimentation to copilot scale.

This is how we go from data silos to data leverage.

This is how we finally make AI safe, trusted, and useful in the enterprise.

Let’s Connect

Let’s Connect

Drop a note, we reply fast.

Drop a note, we reply fast.

Product

Resources