AI & Tech

LLMs in the Enterprise: What Actually Works, What Doesn't, and What's Overhyped

After integrating LLMs into three production products in 2024, here's our honest view of where they shine and where we'd still reach for more traditional tools.

Where LLMs Fit Nicely

In our experience, LLMs work best when they're helping humans make sense of unstructured information, not replacing them. Some patterns that consistently deliver value:

Summarisation and briefing — turning long documents, tickets, or call transcripts into a one-page brief that gives a human a head start.
Copilots for internal tools — where the model can suggest next steps or generate queries, but the system still enforces permissions and business rules.
Natural language search — especially over FAQs, policy docs, or product manuals.

Where We've Pulled LLMs Back

We've also had experiments we quietly turned off. Common failure modes:

Trying to have the LLM “decide” critical business outcomes (e.g. approve/decline transactions) without a deterministic layer underneath.
Overloading the model with poorly-structured context so responses became slower and less predictable.
Exposing free-form prompts directly to end users without meaningful guardrails, leading to support load when answers were misinterpreted.

What We've Learned About Architecture

The LLM layer is just one part of a production system. The projects that aged well shared a few traits:

Clear separation between the retrieval layer (vector search, filters, permissions) and the generation layer (the model itself).
Strong observability: logging prompts, responses, and basic quality metrics so we can tune or fall back when behaviour drifts.
Fallbacks to simpler, deterministic flows when confidence is low, or for actions that must be exact.

How to Decide If It's Worth It

We now ask three questions before proposing LLMs in a brief:

Is there enough unstructured text where summarisation or rewriting would meaningfully save time?
Can we bound the problem so that mistakes are recoverable and don't damage trust?
Do we have (or can we collect) the telemetry to learn whether it's actually helping end users?

When the answer is yes, LLMs can feel like a natural extension of the product. When it's no, we're comfortable saying “not yet” and focusing on a more traditional path first.