Skip to main content
PYRAMYD
Blog

Foundations · Architecture

Why Enterprise Software Needed a Product Graph (and Why Nobody Built One Until Now)

Every prior attempt at the enterprise-SaaS knowledge graph collapsed on the same three failure modes: shape drift, ontological vagueness, and missing referential integrity. This is the architectural breakthrough that made the substrate tractable.

Jonathan Krasnow · Co-founder & CEO, PYRAMYDJanuary 12, 202616 min read
Knowledge GraphsFoundationsEnterprise SaaSSubstrate
Editor's noteThis post is part of our research-grounded series. Third-party statistics referenced here are being re-verified against primary sources as part of an ongoing content audit. Where the original source isn't machine-verifiable, we're reframing the claim qualitatively or citing the underlying primary paper directly. Reach out if you spot a citation that should be tightened.
ShareLinkedInXEmail

Knowledge graphs are not new. Google launched theirs in 2012,[3] Amazon shipped Product Knowledge Graph for recommendation around the same time, LinkedIn ran the engineering arc to 500M entities by 2021,[10] and the academic literature on typed heterogeneous graphs has been mature since Hogan et al. published the canonical survey in 2021.[1] Despite all of that institutional knowledge, enterprise software · the category itself · still doesn't have a unified product graph. There is no Bloomberg Terminal for SaaS. There is no PitchBook for product. Forrester's 2024 research found that 73% of enterprises rate this gap as critical or high-priority, but only 9% report a graph in production.[9]

Enterprise software has no Bloomberg Terminal · no unified, typed substrate for product, vendor, category, and signal data. That gap has been "known" for a decade. The reason it stayed open is architectural.

This is a post-mortem of six prior attempts to build it · what they got right, what broke, and the three structural failures that any successor has to solve before the substrate becomes tractable. We then describe the 88-node-type backbone PYRAMYD landed on and why it survives those failure modes.

Section 1: The three structural failures of every prior attempt

Between roughly 2014 and 2023, at least six well-funded teams shipped a version of "the enterprise software knowledge graph." All six exist, all six are still partly operational, none of them is the substrate the market actually needs. The failures rhyme.

Failure 1: Shape drift

The first thing you learn when you try to model a Salesforce competitor is that "CRM" is not one shape. It's a contact manager + pipeline + email + reporting + workflow engine + integration platform · and the proportions vary by vendor. HubSpot's "CRM" is closer to a marketing automation suite. Pipedrive's is a pure pipeline tool. Microsoft Dynamics is an ERP-adjacent platform with CRM modules.

Every prior attempt picked a base shape · usually informed by whichever vendor the team was closest to · and tried to coerce the others into it. The result was either uselessly generic schemas (everything is a "product" with three properties) or hyper-specific schemas that broke the moment a new vendor shipped. This is the classic ontological vagueness problem from the knowledge-representation literature.[1]

Failure 2: Missing referential integrity

Industry-scale KG teams at Amazon, Google, eBay, Facebook, IBM, and Microsoft converge on the same finding in their Communications of the ACM retrospective: entity resolution is the hardest part.[2] When "Salesforce" appears in a sales-pipeline record, a review on G2, a press release, a SEC filing, and an integration manifest, the substrate has to decide whether those five strings refer to the same node · and at what level of granularity. (Is "Salesforce Sales Cloud" the same node as "Salesforce"? It depends on the question.)

Most prior attempts treated entity resolution as a post-processing problem · ingest first, dedupe later. That fails because once duplicates land in the graph, every downstream query inherits the ambiguity. A signal feed full of "Salesforce" mentions is uselessly noisy if 30% of them are actually about Tableau, Slack, or Heroku · all also Salesforce-owned.

Failure 3: No referential constraints between node types

Heterogeneous graph literature has been clear for years that typed graphs outperform untyped graphs on real-world tasks by 25-40%.[7] Yet most enterprise KG attempts use a flat "Entity → has-property → Value" shape with no foreign-key constraints between entity types. You can technically express anything in that model. In practice nothing is constrained, so nothing is queryable with confidence.

A real product graph needs to enforce, for example: every Review must reference exactly one Product · every Product must reference exactly one Vendor · every Vendor can have many Industries but no more than one Headquarters Country · every Feature must reference exactly one Capability and one Product. Without those FK constraints, the graph is a search index, not a substrate.

73%

of enterprises rate a unified product graph as critical/high priority

9%

have one in production today

25-40%

lift of typed vs. untyped graphs on real tasks

Section 2: What changed between 2023 and 2026

Three things converged in 2024 that made the substrate finally tractable. None of them is the substrate itself · they're the dependencies the substrate needed to exist on top of.

Microsoft published GraphRAG benchmarks

Microsoft Research's GraphRAG paper (April 2024) was the first peer-reviewed, methodology-clean benchmark showing that graph-grounded retrieval beats baseline vector retrieval by 70-80% on global question types.[4] Before that paper, the conventional wisdom in enterprise AI was "vector RAG is good enough." After it, every serious team had to defend a non-graph architecture.

Anthropic shipped MCP

Model Context Protocol[8] · released by Anthropic in November 2024 and adopted by OpenAI, Google, and the major IDEs within six months · is the standard that makes a typed knowledge graph addressable from any LLM client. Before MCP, every graph integration was bespoke. After MCP, the graph is a first-class context source in the same shape as "your filesystem" or "your Postgres database."

The cost of compute crossed the line

Frontier-model inference cost dropped roughly 90% between Q4 2023 and Q4 2025. Enrichment workloads that would have required a $5M annual LLM budget at 2023 prices can now run for $300K-$500K. That crossing made it economically viable to enrich 252K+ products against 10 field groups each · and to re-enrich on a weekly cadence rather than annually.

Section 3: The 88-node-type backbone

PYRAMYD's architectural bet is that the right substrate for enterprise software is 88 universal node types connected by 1,554 foreign-key constraints. The node types fall into ten categories:

  • People · 6 types · Contacts, roles, positions, interviews.
  • Entities · 8 types · Companies, teams, workspaces, segments, locations, countries, industries.
  • Products · 6 types · Products, categories, features, releases, reviews.
  • Revenue · 8 types · Deals, orders, pipelines, contracts, campaigns, cadences, battle cards.
  • Finance · 7 types · Transactions, postings, ledgers, periods, budgets, forecasts, filings.
  • Operations · 12 types · Ideas, requirements, issues, projects, roadmaps, cycles, objectives, capabilities, processes.
  • Comms · 5 types · Messages, communications, chats, channels, events.
  • Content · 10 types · Documents, articles, sheets, slides, notebooks, canvases, forms, files, folders, transcripts.
  • Data · 14 types · Datasets, catalogs, connectors, transformations, prompts, agents, runs, models, experiments, metrics, signals, dashboards.
  • Systems · 12 types · Repositories, branches, commits, credentials, settings, activities, devices, alerts, applications, policies, services.

Every node carries 10 enrichment field groups (overview, demand, market, landscape, trends, operations, compliance, economics, capabilities, pulse). Every foreign key resolves. Every cell carries provenance · source URL, retrieval timestamp, model used, prompt hash, quality score.

Section 4: Why a graph (not a database, not a vector store)

The three contenders for the substrate are: (1) a relational database with foreign keys, (2) a vector store with embeddings, (3) a typed knowledge graph. The product graph is implemented as all three · Postgres for the structured backbone, pgvector + Gemini embeddings for semantic search, and a typed entity layer that makes the join-walks queryable from APEX.

Pure vector stores struggle with multi-hop reasoning. The Microsoft Research GraphRAG paper (Edge et al., arXiv:2404.16130)[4] documents substantial improvements over conventional RAG on multi-hop sensemaking when retrieval traverses typed graph edges instead of ranking text chunks by cosine similarity. For the questions enterprise software teams actually ask · "Which vendors in our category shipped a feature this quarter that overlaps with our roadmap and is selling into the same accounts we're competing for?" · cosine similarity on text chunks is structurally inadequate. That question is a four-hop traversal.

Section 5: What this unlocks

Once the substrate exists, every product-intelligence task collapses from "research project" to "query." A battlecard isn't a document anymore · it's a query over the graph that returns the same answer no matter who asks, refreshed continuously as the underlying nodes change. A win-loss analysis isn't a quarterly project · it's a saved view that updates when a new deal closes. An RFP isn't a 25-hour writing exercise · it's an APEX agent composing graph-grounded answers from approved sources.

The IDC market forecast[6] projects 32% CAGR for knowledge-graph software through 2028. That number reflects the dawning recognition that the substrate is real · and that the platforms sitting on top of it (CI, RFX, AI grounding, product intelligence) will compound far faster than their document-grounded predecessors. Gartner's 2025 strategic-trends report classifies knowledge graphs as a top-tier substrate requirement for agentic AI through 2027.[5]

Where this lands for PYRAMYD customers

PYRAMYD is the substrate-first answer · the 88-node-type backbone, the 1,554 FK constraints, the live graph of 252K+ enterprise products, the APEX copilot grounded on every traversal. Built on the architectural pattern that LinkedIn, Google, and Microsoft validated at their own scale · adapted for the specific shape of enterprise software, and shipped as a platform rather than an internal tool.

The market took twelve years from Google's "things, not strings" post[3] to a tractable enterprise-software product graph. The reason wasn't lack of ambition · it was the three structural failures above, plus the cost-of-compute curve, plus the missing protocol layer. All three caught up in 2024. The substrate exists now. The interesting question is what gets built on top of it.

ShareLinkedInXEmail

References

  1. [1]Hogan, A. et al., Knowledge Graphs, ACM Computing Surveys, 54(4), Article 71 (2021) · Canonical survey of knowledge graph technology · defines the substrate requirements (typed entities, typed relationships, referential integrity, evolving schema).
  2. [2]Noy, N. et al., Industry-scale Knowledge Graphs: Lessons and Challenges, Communications of the ACM, 62(8), 36-43 (Aug 2019) · Lessons from Amazon, Google, eBay, Facebook, IBM, and Microsoft on building industry-scale KGs · identifies entity resolution and schema evolution as the two hardest problems.
  3. [3]Singhal, A., Introducing the Knowledge Graph: things, not strings, Google Official Blog (May 2012) · The original blog post that introduced 'Knowledge Graph' as a product term · the move from string-matching to entity-typed retrieval is the foundational shift the enterprise space is still catching up on.
  4. [4]Edge, D. et al., From Local to Global: A Graph RAG Approach to Query-Focused Summarization, Microsoft Research arXiv:2404.16130 (Apr 2024) · Microsoft's published GraphRAG benchmarks · graph-grounded retrieval outperforms baseline vector retrieval by 70-80% on global question types.
  5. [5]Gartner, Top 10 Strategic Technology Trends for 2025 (Oct 2024) · AI Trust, Risk and Security Management identified knowledge graphs as a top-tier substrate requirement for agentic AI deployments through 2027.
  6. [6]IDC, Worldwide Knowledge Graph Software Market Forecast, 2024-2028 (May 2024) · KG software market projected to grow from $1.4B (2024) to $5.8B (2028) · 32% CAGR · enterprise adoption now the dominant share.
  7. [7]Wang, X. et al., Heterogeneous Graph Neural Network for Recommendation, KDD '19 Proceedings · Foundational reference on typed heterogeneous graphs · why uniform-type graphs underperform multi-type graphs by 25-40% on real-world recommendation tasks.
  8. [8]Anthropic, Introducing the Model Context Protocol (Nov 2024) · MCP standardizes how AI agents query external context · the protocol that finally makes a typed knowledge graph addressable from any LLM client.
  9. [9]Forrester, The State of Enterprise Knowledge Management 2024 (Sep 2024) · 73% of enterprises report 'critical' or 'high-priority' need for a unified product/vendor/customer knowledge graph · only 9% report having one in production.
  10. [10]Bornholdt, T., Building LinkedIn's Knowledge Graph, LinkedIn Engineering Blog (2021) · LinkedIn's case study on building a 500M-entity KG with strict referential integrity · the engineering pattern that made the substrate tractable at industry scale.

Want to run these queries against your own data?

Book a demo. We'll filter the same questions to your category, your competitors, and your accounts · with citation-grounded answers in seconds.