InsightFinder's $15M Bet: AI Agents Need Full-Stack Monitoring
InsightFinder's $15M Series A targets the hidden crisis in enterprise AI: diagnosing failures across the entire tech stack when AI agents are involved. This is a bet that the monitoring market needs a fundamental reset.
- What happened: InsightFinder raised $15M to build AI-specific observability that monitors not just models but the entire tech stack when AI agents are running.
- Why it matters: Current APM tools like Datadog and New Relic were built for deterministic software—they can't trace failures when an AI agent's bad output is caused by a database slowdown or a prompt injection.
- The key tension: The industry is racing to deploy AI agents without the operational infrastructure to understand why they fail, creating a massive blind spot that InsightFinder aims to exploit.
Why Is Full-Stack Observability Suddenly Critical for AI Agents?
CEO Helen Gu put it bluntly in the TechCrunch interview: "The biggest problem facing the industry today is not just monitoring and diagnosing where AI models go wrong—it's also diagnosing how the entire tech stack operates now that AI is part of it." This is a fundamental shift in thinking. For the past two years, the AI industry has been obsessed with model accuracy, RAG pipelines, and prompt engineering. But Gu is pointing out that an AI agent can fail because a database query timed out, because a GPU cluster had a memory leak, or because a downstream API changed its schema—and none of those are model problems.
According to a 2025 Gartner report cited in the article, 40% of AI agent failures in production are actually infrastructure-related, not model-related. That number is likely higher in practice because most teams don't have the tooling to distinguish between the two. InsightFinder's pitch is that they can correlate model-level metrics (token usage, response latency, hallucination scores) with infrastructure-level metrics (CPU, memory, network, database query performance) in a single pane of glass.
What Does This Mean for Incumbents Like Datadog and New Relic?
This is where it gets interesting. Datadog and New Relic dominate the application performance monitoring (APM) market, but their tools were designed for a world where software behavior is deterministic. You write code, it runs, you monitor it. AI agents are non-deterministic—they can produce different outputs from the same inputs, and those outputs can trigger cascading effects across the stack. Traditional APM tools can tell you a database query was slow, but they can't tell you that the query was slow because the AI agent generated a malformed SQL statement.
InsightFinder is specifically targeting this gap. Their platform ingests logs, traces, and metrics from both the AI layer and the infrastructure layer, then uses machine learning to correlate anomalies across both. The $15M raise suggests investors believe this is a $1B+ market opportunity that the incumbents are currently ignoring. I expect Datadog to acquire a startup in this space within 12 months—they cannot afford to cede this category entirely.

Who Actually Benefits From This Deal Beyond InsightFinder?
The immediate winners are enterprises deploying AI agents in production—think financial services, healthcare, and e-commerce companies that cannot afford unexplained failures. If you're a bank running an AI-powered trading agent, you need to know whether a bad trade was caused by a model hallucination or a network latency issue. InsightFinder's tooling provides that root-cause analysis.
The losers are the legacy APM vendors who will have to scramble to add AI-specific features. New Relic has already started adding LLM monitoring, but it's a bolt-on feature, not a rearchitecture. The deeper loser might be the open-source monitoring ecosystem—tools like Prometheus and Grafana are powerful but require significant customization to handle AI workloads. InsightFinder's value proposition is that it works out of the box for AI agents.
There's also a subtle winner here: cloud providers like AWS and Azure. If enterprises can reliably monitor AI agents, they'll deploy more of them on cloud infrastructure. Better observability removes a key adoption barrier.
| Capability | InsightFinder | Datadog | New Relic |
|---|---|---|---|
| AI model-level metrics | Native (token usage, hallucination scores) | Bolt-on via LLM Observability | Bolt-on via AI Monitoring |
| Infrastructure correlation | Built-in from day one | Requires custom dashboards | Requires custom dashboards |
| Root cause analysis for AI failures | ML-driven correlation across layers | Manual investigation | Manual investigation |
| Non-deterministic behavior support | Designed for it | Not designed for it | Not designed for it |
| Pricing model | AI workload-based | Host/metric-based | Host/metric-based |
| Verdict | Winner for AI-native monitoring | Requires integration work | Requires integration work |
My thesis is simple: InsightFinder is not just building a monitoring tool—they are building the operating system for the AI era, and the incumbents are asleep at the wheel.
Let me be direct: the $15M raise is small compared to what Datadog and New Relic spend on R&D, but that's precisely the point. InsightFinder is attacking a greenfield problem that the incumbents have not prioritized because their existing customers aren't screaming for it yet. By the time those customers start screaming, InsightFinder will have a two-year head start on data and integrations.
In the short term (next 12 months), InsightFinder will win early adopters among AI-forward enterprises—companies like JPMorgan, Uber, and Shopify that are already deploying AI agents in production. These companies will discover that their existing monitoring tools are insufficient and will pay a premium for InsightFinder's solution.
In the long term (24-36 months), the question is whether InsightFinder can scale its sales and marketing to compete with the incumbents' distribution advantages. Datadog has thousands of enterprise sales reps; InsightFinder has a few dozen. They will need to either partner with cloud providers or get acquired to achieve scale.
The biggest winner here is the enterprise AI market as a whole. Better observability reduces the risk of deploying AI agents, which accelerates adoption. The biggest loser is any company that thought they could just bolt AI monitoring onto their existing APM stack—they will be caught flat-footed when their AI agents fail in production and they can't explain why.
I predict that by Q3 2027, Datadog will acquire InsightFinder for at least $500M, because they will realize that the AI observability category is too important to leave to a startup. The alternative—building it themselves—would take 18 months and leave them vulnerable.
My Predictions
- Datadog will acquire InsightFinder by Q3 2027 for $500M-$1B because they cannot build AI-native observability fast enough internally and will pay a premium for market leadership.
- New Relic will lose 15-20% of its enterprise AI customer base within 18 months as companies realize its bolt-on AI features cannot match InsightFinder's integrated approach.
- AWS will launch a competing AI observability service by Q2 2027 to capture the cloud-native monitoring market, potentially partnering with or acquiring a startup in the space.
- April 2026InsightFinder raises $15M Series A
Company announces funding to build full-stack observability for AI agents, targeting the gap between model monitoring and infrastructure monitoring.
- 2025Gartner reports 40% of AI agent failures are infrastructure-related
Research highlights that most AI failures are not model problems but infrastructure problems, validating InsightFinder's thesis.
- 2024-2025Incumbents bolt on AI monitoring features
Datadog and New Relic add LLM monitoring features, but as bolt-ons rather than native capabilities, creating an opening for startups.
- InsightFinder's $15M raise validates that AI agent failures are increasingly infrastructure failures, not model failures—a blind spot most enterprises haven't addressed.
- Legacy APM vendors like Datadog and New Relic built their tools for deterministic software and will struggle to adapt to non-deterministic AI agents without fundamental rearchitecture.
- The real winner is the enterprise AI adoption curve—better observability removes a key barrier to deploying AI agents in production.
- Expect consolidation in the AI observability space within 24 months as cloud providers and legacy vendors scramble to acquire startups like InsightFinder.
- Companies deploying AI agents today should immediately audit whether their monitoring tools can trace failures across both model and infrastructure layers—most cannot.
Source and attribution
TechCrunch AI
InsightFinder raises $15M to help companies figure out where AI agents go wrong
Discussion
Add a comment