Engineering leaders at B2B SaaS companies have been telling each other a version of the same horror story for at least a decade: somebody on team A spent two quarters building a thing, then discovered that team B had built essentially the same thing 14 months ago, but called it by a different name and parked it in a different repo. GitHub's Octoverse data and internal Atlassian benchmarks both suggest that 30-40% of new internal feature work duplicates capability that already exists — either in another part of the codebase, in a vendor integration the company already pays for, or in a competitor's GA release.[3][8]
The Standish CHAOS Report has been measuring this for 30 years from a different angle. Their 2020 benchmark put the on-time / on-budget / on-scope rate for enterprise software projects at 31%.[4] Two-thirds of the work either ships late, ships over budget, ships descoped, or doesn't ship at all. A large fraction of the "descoped" bucket is functionality the team realized mid-cycle they didn't need to build because it already existed.
The cheapest feature to build is the one you don't need to build — because you discovered the equivalent in a vendor integration, an internal library, or a competitor's public API before kickoff.
This post is about why feature discovery is structurally a graph problem, what the GraphRAG benchmarks actually show, and what an engineering organization looks like when its decision-making substrate is a knowledge graph instead of a Confluence search.
Section 1: The discovery problem is multi-hop
Consider the question a tech lead asks at sprint planning: “Before we commit two engineers to building this capability, does any vendor we already integrate with ship it, does our codebase already have a version of it in a sibling product line, and have any of our top three competitors shipped it in their last two quarterly releases?”
That's three traversals:
- Internal product: this capability → does it exist in our own product taxonomy → which internal team owns it → is it accessible via our internal APIs?
- Vendor integrations: this capability → which of our 47 active integrations ships it → does it cost extra → who owns the integration relationship?
- Competitor releases: this capability → which competitors → in which of their products → in which release window → with what reception?
Each traversal requires a different data substrate. Internal product: an architecture wiki, plus tribal knowledge from engineers who've been at the company more than three years. Vendor integrations: a contract repository, plus the vendors' own release notes. Competitor releases: whatever the CI team ships in their slack updates, if anyone reads them.
The result is that the question doesn't actually get answered before kickoff. The team starts building. Mid-sprint, somebody notices the vendor integration ships it. Mid-quarter, somebody sees Competitor B already has it. The work ships, gets repackaged or descoped, and the org records a successful release. The duplicated effort never shows up in any retrospective because nobody owns documenting it.
Section 2: What GraphRAG actually measures
The most rigorous public benchmark of graph-grounded retrieval against the status-quo vector-RAG approach is Microsoft Research's April 2024 paper, From Local to Global: A Graph RAG Approach to Query-Focused Summarization by Edge et al.[1] It's worth reading directly because the results aren't marketing — they're a head-to-head benchmark on standard NLP datasets.
Microsoft GraphRAG: 70-80% improvement on global questions
The paper distinguishes between two question types:
- Local questions: "What did this specific document say about X?" — the retrieval problem vector RAG was designed for. Both approaches perform well here.
- Global questions: "Across the entire corpus, what are the recurring themes about X?" — the multi-hop, aggregate-reasoning problem most enterprise questions actually require.
On global questions, GraphRAG won human-preference evaluation against baseline vector RAG by 70-80% of comparisons.[1] The reason: vector retrieval surfaces semantically similar passages, but it has no mechanism to recognize when two passages refer to the same entity or when one entity is a parent of another. GraphRAG builds an explicit entity graph and uses community-level summarization to answer aggregation questions correctly.
Fluree + AIMultiple multi-hop benchmarks
Independent benchmarks from Fluree and AIMultiple in 2024 tested similar question types:[10]
8% → 23%
vector vs graph accuracy on aggregation queries
8% → 33%
vector vs graph on cross-document reasoning
70-80%
graph win-rate on global questions (MSR)
For an engineering team trying to answer "has this capability been built anywhere we should know about?" the relevant accuracy threshold isn't 8%. It isn't even 33%. It's "high enough that I'm willing to redirect two engineers based on the answer." Vector RAG isn't there. Graph RAG is, in the categories that matter for cross-product, cross-system reasoning.
Section 3: Why this maps onto engineering productivity
McKinsey's 2023 generative AI productivity report identified software engineering as one of the highest-impact GenAI use cases — with documented productivity gains of 10-50% for organizations that successfully ground the AI tools.[2] The lower end of that range corresponds to AI used as autocomplete; the upper end corresponds to AI used as a discovery and reasoning substrate.
Forrester's 2023 TEI study on enterprise knowledge management found that organizations with formal knowledge-management infrastructure reported 25-32% reduction in development rework and 16-24% reduction in time-to-merge for cross-team PRs.[9] The mechanism is the same as the discovery problem above: when the engineer asking "has someone built this?" gets a useful answer in 30 seconds instead of three days (or never), they make different decisions.
Atlassian's 2024 Developer Experience survey put the median engineer at 47% of work time on tasks adjacent to coding — search, doc reading, status updates, finding the right person to ask.[8] A meaningful slice of that 47% is the "trying to find out if X exists" problem.
The median engineer spends 47% of work time on tasks adjacent to coding. A meaningful slice of that 47% is the "trying to find out if X exists" problem.
Section 4: What an engineering org looks like with a graph in place
Pre-sprint capability check
Before the sprint commit, the tech lead asks the graph: “Capability X — does it exist anywhere we can reach?” The graph traverses:
- Internal product nodes — have we shipped this in another product line? Returns: yes, the Risk product team shipped a version in 2024 Q3. Slack thread linked.
- Vendor integration nodes — do any of our 47 active integrations ship this? Returns: HubSpot Operations Hub ships a version since June 2025; cost included in current contract.
- Competitor product nodes — which competitors have shipped this in the last two quarters? Returns: Klue, Crayon, and Gong shipped versions in the last 90 days, each with citations to release notes and review sentiment.
Total time to answer: under 2 minutes. Total cost: zero engineers redirected. Outcome of the sprint planning conversation: “Let's wire up HubSpot Operations Hub for the immediate need and watch how Gong's version performs over the next 60 days before we commit to building.”
In-cycle drift detection
Six weeks into the build, a competitor ships an updated version with a feature the team hadn't scoped. The graph's signal-ingestion layer catches the release within 24 hours and flags it to the engineering team as a watch item. The team decides whether to expand scope, ship as planned, or pause.
Post-ship competitive positioning
After shipping, the team queries the graph for: “Across our 80 most-similar accounts, which are now evaluating Competitor B's version of this capability?” This isn't engineering work anymore — it's GTM work — but the substrate is the same graph.
Section 5: The architectural choice
Engineering orgs that have looked seriously at building this for themselves typically end up at one of two architectures:
Path A: Internal-only knowledge graph (Neo4j-class build)
Build a Neo4j cluster (or equivalent), define your schema, ingest your internal systems (Jira, GitHub, Confluence, Salesforce, Pendo), and iterate. This is what LinkedIn, Google, and Microsoft did at scale.[6] It works, but the cost is real: 12-24 months and $500K-$3M of engineering and data acquisition spend before the first useful query lands. The ongoing maintenance is its own team.
Path B: Hybrid — subscribe to a market graph, integrate your internal data into it
Subscribe to a productionized market graph (vendor, product, category, signal data) and integrate your internal entities (your accounts, your product, your integrations) into the same graph schema. The market side is already there; you only own the internal side. This is the build-vs-buy math that recently flipped.
Path B is the architectural pattern PYRAMYD is built around. The market graph is the public side; tenant slices add your internal product, your accounts, and your integrations into the same graph — with row-level security so your data never crosses tenant boundaries.
Where this lands for PYRAMYD customers
For engineering teams, PYRAMYD's graph is queryable through the MCP server, which means your existing copilots (Claude, ChatGPT, GitHub Copilot, your internal LLM) can ground capability-discovery questions in the live graph without re-tooling.
The pre-sprint “does this exist?” check becomes a one-line query: get_capability(name) returns a typed entity with edges to internal product nodes, vendor integrations, and competitor releases — each with citation provenance.
We replace 18 months of internal KG build with a subscription you can activate this week. Live in days, not quarters.
What this means for the next 24 months
The vector-RAG era of enterprise AI is going to age the way the "upload your PDFs and chat with them" demos from 2023 did. They were impressive, but the answers weren't reliable enough to redirect engineering work on. GraphRAG benchmarks have made the reason measurable: typed entities and graph traversal beat dense-vector similarity on the question types engineering actually needs.
Engineering organizations that ground their AI tooling in a knowledge graph — whether built or bought — are going to spend the next 24 months pulling away from the ones that don't. The data already shows it. The question is which side of the gap your team wants to be on.
We'll cover the GTM side in the next post: The Account Intelligence Problem — and the Graph that Solves It.
References
- [1]Edge, D. et al., Microsoft Research, From Local to Global: A Graph RAG Approach to Query-Focused Summarization, arXiv:2404.16130 (Apr 2024) — Microsoft Research's GraphRAG: graph-grounded retrieval outperformed baseline vector RAG by 70-80% on global question types in human-preference evaluation.
- [2]McKinsey Global Institute, The Economic Potential of Generative AI: The Next Productivity Frontier (June 2023) — Software engineering identified as one of the highest-impact GenAI use cases; documented 10-50% productivity gains, conditional on grounding quality.
- [3]GitHub, Octoverse 2024: Open Source Trends Report — Tracking of duplicated and abandoned features across repositories; data suggests 30-40% of new internal features replicate existing public-or-internal capability.
- [4]Standish Group, CHAOS Report 2020 — Long-running benchmark of software project outcomes; only 31% of enterprise software projects ship on time, on budget, and on scope.
- [5]Hogan, A. et al., Knowledge Graphs, ACM Computing Surveys, 54(4), Article 71 (2021) — Foundational survey of knowledge graph technology — entity resolution, schema design, and reasoning patterns.
- [6]Bornholdt, T., LinkedIn Engineering, Building LinkedIn's Knowledge Graph (2021) — Production architecture for LinkedIn's Economic Graph at multi-billion node scale.
- [7]Lewis, P. et al., Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks, NeurIPS 2020 (arXiv:2005.11401) — The original RAG paper. Establishes the framing for retrieval-grounded generation that GraphRAG later extends with typed entities.
- [8]Atlassian, State of Developer Experience 2024 — Survey of 5,000+ engineers: median engineer spends 47% of work time on tasks adjacent to coding (search, doc reading, status updates).
- [9]Forrester, The Total Economic Impact of Enterprise Knowledge Management 2023 — Companies with formal knowledge-management systems reported 25-32% reduction in development rework and 16-24% reduction in time-to-merge for cross-team PRs.
- [10]Fluree + AIMultiple, GraphRAG vs Vector RAG: Multi-Hop Benchmark Comparison (2024) — Vector RAG accuracy on aggregation queries: ~8%. GraphRAG accuracy on the same queries: ~23%. Cross-document reasoning: 8% (vector) vs 33% (graph).
