Knowledge graphs are not new. Google launched theirs in 2012,[3] Amazon shipped Product Knowledge Graph for recommendation around the same time, LinkedIn ran the engineering arc to 500M entities by 2021,[10] and the academic literature on typed heterogeneous graphs has been mature since Hogan et al. published the canonical survey in 2021.[1] Despite all of that institutional knowledge, enterprise software · the category itself · still doesn't have a unified product graph. There is no Bloomberg Terminal for SaaS. There is no PitchBook for product. Forrester's 2024 research found that 73% of enterprises rate this gap as critical or high-priority, but only 9% report a graph in production.[9]
Enterprise software has no Bloomberg Terminal · no unified, typed substrate for product, vendor, category, and signal data. That gap has been "known" for a decade. The reason it stayed open is architectural.
This is a post-mortem of six prior attempts to build it · what they got right, what broke, and the three structural failures that any successor has to solve before the substrate becomes tractable. We then describe the 88-node-type backbone PYRAMYD landed on and why it survives those failure modes.
Section 1: The three structural failures of every prior attempt
Between roughly 2014 and 2023, at least six well-funded teams shipped a version of "the enterprise software knowledge graph." All six exist, all six are still partly operational, none of them is the substrate the market actually needs. The failures rhyme.
Failure 1: Shape drift
The first thing you learn when you try to model a Salesforce competitor is that "CRM" is not one shape. It's a contact manager + pipeline + email + reporting + workflow engine + integration platform · and the proportions vary by vendor. HubSpot's "CRM" is closer to a marketing automation suite. Pipedrive's is a pure pipeline tool. Microsoft Dynamics is an ERP-adjacent platform with CRM modules.
Every prior attempt picked a base shape · usually informed by whichever vendor the team was closest to · and tried to coerce the others into it. The result was either uselessly generic schemas (everything is a "product" with three properties) or hyper-specific schemas that broke the moment a new vendor shipped. This is the classic ontological vagueness problem from the knowledge-representation literature.[1]
Failure 2: Missing referential integrity
Industry-scale KG teams at Amazon, Google, eBay, Facebook, IBM, and Microsoft converge on the same finding in their Communications of the ACM retrospective: entity resolution is the hardest part.[2] When "Salesforce" appears in a sales-pipeline record, a review on G2, a press release, a SEC filing, and an integration manifest, the substrate has to decide whether those five strings refer to the same node · and at what level of granularity. (Is "Salesforce Sales Cloud" the same node as "Salesforce"? It depends on the question.)
Most prior attempts treated entity resolution as a post-processing problem · ingest first, dedupe later. That fails because once duplicates land in the graph, every downstream query inherits the ambiguity. A signal feed full of "Salesforce" mentions is uselessly noisy if 30% of them are actually about Tableau, Slack, or Heroku · all also Salesforce-owned.
Failure 3: No referential constraints between node types
Heterogeneous graph literature has been clear for years that typed graphs outperform untyped graphs on real-world tasks by 25-40%.[7] Yet most enterprise KG attempts use a flat "Entity → has-property → Value" shape with no foreign-key constraints between entity types. You can technically express anything in that model. In practice nothing is constrained, so nothing is queryable with confidence.
A real product graph needs to enforce, for example: every Review must reference exactly one Product · every Product must reference exactly one Vendor · every Vendor can have many Industries but no more than one Headquarters Country · every Feature must reference exactly one Capability and one Product. Without those FK constraints, the graph is a search index, not a substrate.
73%
of enterprises rate a unified product graph as critical/high priority
9%
have one in production today
25-40%
lift of typed vs. untyped graphs on real tasks
Section 2: What changed between 2023 and 2026
Three things converged in 2024 that made the substrate finally tractable. None of them is the substrate itself · they're the dependencies the substrate needed to exist on top of.
Microsoft published GraphRAG benchmarks
Microsoft Research's GraphRAG paper (April 2024) was the first peer-reviewed, methodology-clean benchmark showing that graph-grounded retrieval beats baseline vector retrieval by 70-80% on global question types.[4] Before that paper, the conventional wisdom in enterprise AI was "vector RAG is good enough." After it, every serious team had to defend a non-graph architecture.
Anthropic shipped MCP
Model Context Protocol[8] · released by Anthropic in November 2024 and adopted by OpenAI, Google, and the major IDEs within six months · is the standard that makes a typed knowledge graph addressable from any LLM client. Before MCP, every graph integration was bespoke. After MCP, the graph is a first-class context source in the same shape as "your filesystem" or "your Postgres database."
The cost of compute crossed the line
Frontier-model inference cost dropped roughly 90% between Q4 2023 and Q4 2025. Enrichment workloads that would have required a $5M annual LLM budget at 2023 prices can now run for $300K-$500K. That crossing made it economically viable to enrich 252K+ products against 10 field groups each · and to re-enrich on a weekly cadence rather than annually.
Section 3: The 88-node-type backbone
PYRAMYD's architectural bet is that the right substrate for enterprise software is 88 universal node types connected by 1,554 foreign-key constraints. The node types fall into ten categories:
- People · 6 types · Contacts, roles, positions, interviews.
- Entities · 8 types · Companies, teams, workspaces, segments, locations, countries, industries.
- Products · 6 types · Products, categories, features, releases, reviews.
- Revenue · 8 types · Deals, orders, pipelines, contracts, campaigns, cadences, battle cards.
- Finance · 7 types · Transactions, postings, ledgers, periods, budgets, forecasts, filings.
- Operations · 12 types · Ideas, requirements, issues, projects, roadmaps, cycles, objectives, capabilities, processes.
- Comms · 5 types · Messages, communications, chats, channels, events.
- Content · 10 types · Documents, articles, sheets, slides, notebooks, canvases, forms, files, folders, transcripts.
- Data · 14 types · Datasets, catalogs, connectors, transformations, prompts, agents, runs, models, experiments, metrics, signals, dashboards.
- Systems · 12 types · Repositories, branches, commits, credentials, settings, activities, devices, alerts, applications, policies, services.
Every node carries 10 enrichment field groups (overview, demand, market, landscape, trends, operations, compliance, economics, capabilities, pulse). Every foreign key resolves. Every cell carries provenance · source URL, retrieval timestamp, model used, prompt hash, quality score.
Section 4: Why a graph (not a database, not a vector store)
The three contenders for the substrate are: (1) a relational database with foreign keys, (2) a vector store with embeddings, (3) a typed knowledge graph. The product graph is implemented as all three · Postgres for the structured backbone, pgvector + Gemini embeddings for semantic search, and a typed entity layer that makes the join-walks queryable from APEX.
Pure vector stores struggle with multi-hop reasoning. The Microsoft Research GraphRAG paper (Edge et al., arXiv:2404.16130)[4] documents substantial improvements over conventional RAG on multi-hop sensemaking when retrieval traverses typed graph edges instead of ranking text chunks by cosine similarity. For the questions enterprise software teams actually ask · "Which vendors in our category shipped a feature this quarter that overlaps with our roadmap and is selling into the same accounts we're competing for?" · cosine similarity on text chunks is structurally inadequate. That question is a four-hop traversal.
Section 5: What this unlocks
Once the substrate exists, every product-intelligence task collapses from "research project" to "query." A battlecard isn't a document anymore · it's a query over the graph that returns the same answer no matter who asks, refreshed continuously as the underlying nodes change. A win-loss analysis isn't a quarterly project · it's a saved view that updates when a new deal closes. An RFP isn't a 25-hour writing exercise · it's an APEX agent composing graph-grounded answers from approved sources.
The IDC market forecast[6] projects 32% CAGR for knowledge-graph software through 2028. That number reflects the dawning recognition that the substrate is real · and that the platforms sitting on top of it (CI, RFX, AI grounding, product intelligence) will compound far faster than their document-grounded predecessors. Gartner's 2025 strategic-trends report classifies knowledge graphs as a top-tier substrate requirement for agentic AI through 2027.[5]
Where this lands for PYRAMYD customers
PYRAMYD is the substrate-first answer · the 88-node-type backbone, the 1,554 FK constraints, the live graph of 252K+ enterprise products, the APEX copilot grounded on every traversal. Built on the architectural pattern that LinkedIn, Google, and Microsoft validated at their own scale · adapted for the specific shape of enterprise software, and shipped as a platform rather than an internal tool.
The market took twelve years from Google's "things, not strings" post[3] to a tractable enterprise-software product graph. The reason wasn't lack of ambition · it was the three structural failures above, plus the cost-of-compute curve, plus the missing protocol layer. All three caught up in 2024. The substrate exists now. The interesting question is what gets built on top of it.
References
- [1]Hogan, A. et al., Knowledge Graphs, ACM Computing Surveys, 54(4), Article 71 (2021) · Canonical survey of knowledge graph technology · defines the substrate requirements (typed entities, typed relationships, referential integrity, evolving schema).
- [2]Noy, N. et al., Industry-scale Knowledge Graphs: Lessons and Challenges, Communications of the ACM, 62(8), 36-43 (Aug 2019) · Lessons from Amazon, Google, eBay, Facebook, IBM, and Microsoft on building industry-scale KGs · identifies entity resolution and schema evolution as the two hardest problems.
- [3]Singhal, A., Introducing the Knowledge Graph: things, not strings, Google Official Blog (May 2012) · The original blog post that introduced 'Knowledge Graph' as a product term · the move from string-matching to entity-typed retrieval is the foundational shift the enterprise space is still catching up on.
- [4]Edge, D. et al., From Local to Global: A Graph RAG Approach to Query-Focused Summarization, Microsoft Research arXiv:2404.16130 (Apr 2024) · Microsoft's published GraphRAG benchmarks · graph-grounded retrieval outperforms baseline vector retrieval by 70-80% on global question types.
- [5]Gartner, Top 10 Strategic Technology Trends for 2025 (Oct 2024) · AI Trust, Risk and Security Management identified knowledge graphs as a top-tier substrate requirement for agentic AI deployments through 2027.
- [6]IDC, Worldwide Knowledge Graph Software Market Forecast, 2024-2028 (May 2024) · KG software market projected to grow from $1.4B (2024) to $5.8B (2028) · 32% CAGR · enterprise adoption now the dominant share.
- [7]Wang, X. et al., Heterogeneous Graph Neural Network for Recommendation, KDD '19 Proceedings · Foundational reference on typed heterogeneous graphs · why uniform-type graphs underperform multi-type graphs by 25-40% on real-world recommendation tasks.
- [8]Anthropic, Introducing the Model Context Protocol (Nov 2024) · MCP standardizes how AI agents query external context · the protocol that finally makes a typed knowledge graph addressable from any LLM client.
- [9]Forrester, The State of Enterprise Knowledge Management 2024 (Sep 2024) · 73% of enterprises report 'critical' or 'high-priority' need for a unified product/vendor/customer knowledge graph · only 9% report having one in production.
- [10]Bornholdt, T., Building LinkedIn's Knowledge Graph, LinkedIn Engineering Blog (2021) · LinkedIn's case study on building a 500M-entity KG with strict referential integrity · the engineering pattern that made the substrate tractable at industry scale.
