Agentic AI Financial Services Point of View 14 min read

Agentic AI in Financial Services: The Model Was Never the Hard Part

Multi-agent systems have moved out of research labs and into production at major banks. The frontier models change every few weeks now. The thing that decides whether a bank ships agentic AI to production has almost nothing to do with which model it picked.

Views are my own and do not represent IBM. This piece reflects personal analysis of public information; nothing here references confidential client work.

The short version

Why banks are betting on agents now

Every Tier-1 bank I've spoken with in the past year has moved past the pilot stage. The technology underneath (tool use, chain-of-thought reasoning) has been around for a while. What changed is the evidence. The results are now public and specific. A large Dutch financial institution reports cutting KYC onboarding time by roughly 90 percent and analyst workload by about 30 percent. A bank in Singapore took onboarding from three days to minutes. Trade-surveillance teams report review workload down by around 60 percent. When the case studies stop being projections and start being post-mortems, the board conversation changes.

But financial services is not a typical deployment environment. You cannot iterate quickly when you are handling PII, processing live trades, or generating regulatory filings. The constraints are not blockers. They are requirements. The teams that treat them that way are the ones shipping to production. The teams that treat them as friction are still running pilots.

The pattern we've landed on

The systems I design at IBM follow a structure that balances autonomy with auditability. It has four layers, and each one matters equally.

Animated agentic orchestration diagram A central orchestrator pulses while typed tasks flow along four connection lines in sequence to specialist agents (document extraction, watchlist screening, risk scoring, audit); each specialist briefly highlights as activated, then a human-in-the-loop approval gate lights up at the end of the cycle. Document Extract KYC docs, contracts Watchlist Screen OFAC, sanctions Risk Score Model-risk gated Audit Trail Append-only log Orchestrator Azure OpenAI Human-in-the-loop
Fig. 1: The orchestrator dispatches typed tasks to four specialist agents (one per cycle). Each connection lights up as the task flows; the destination card briefly highlights on arrival; the human approval gate completes the loop.

Orchestrator agent

A central reasoning agent that breaks tasks apart, selects tools, and tracks workflow state. In our deployments this runs on a current-generation frontier model through Azure OpenAI, behind structured output schemas. Note the deliberate vagueness about which model. Seven frontier models shipped in the first quarter of 2026 alone, and the model that wins a benchmark this month is rarely the one in production next quarter. The architecture is built so the orchestrator model is a swappable component. The schemas, not the model, enforce deterministic routing, and deterministic routing is the difference between a demo and something you can explain to a regulator.

Specialist tool agents

Each one handles a single domain task: document extraction, entity resolution, watchlist screening, or risk scoring. We enforce three rules on each specialist:

Human-in-the-loop checkpoints

Every workflow has mandatory review gates. This isn't a trust problem; it's a regulatory one. The OCC is not yet ready to accept fully autonomous decisions on risk assessments. So the agent does 80% of the work (extraction, cross-referencing, drafting), and the analyst owns 100% of the decision. That split is deliberate.

Audit and lineage layer

Every agent action, tool invocation, and reasoning step writes to an immutable log. When a regulator asks "why was this customer flagged?", you need the full decision chain, including which model version produced it. This layer is non-negotiable.

Key insight

The best agentic systems in financial services aren't the most autonomous. They're the most auditable. Design for explainability first; automation follows naturally.

Three use cases running in production

KYC/AML document processing

This is where the ROI is clearest. Agents pull entities from identity documents, cross-check against sanctions lists (OFAC, EU, UN), and generate structured risk profiles. On the system we deployed, manual review time dropped by 60% while maintaining a false-negative rate below 0.1%. The key was keeping the human sign-off in the loop; the agent does the legwork, not the judgment call.

Trade surveillance

Traditional rule-based surveillance generates thousands of false positives. Agents change that equation by contextualizing alerts against market conditions, news, and historical trader behavior. We've seen alert-to-investigation ratios improve from roughly 50:1 down to 8:1. The agents aren't replacing the surveillance team; they're filtering the noise so investigators can focus on the signals that matter.

Regulatory reporting copilots

These agents help compliance teams draft regulatory filings. They pull data from source systems, populate templates, run validation checks, and flag inconsistencies. The compliance officer still owns the filing. What the agent eliminates is the hours of manual data gathering that used to precede each submission.

Compliance is the engineering problem

Here's what gets lost at AI conferences: building the agent is maybe 30% of the effort. The remaining 70% is compliance engineering:

"Regulation doesn't slow AI adoption down. It forces better engineering. The banks that treat compliance as a design constraint, not an afterthought, deploy faster."

The stack: Databricks + Azure OpenAI

We've standardized on a reference architecture at IBM's One Microsoft Practice that addresses both the AI problem and the governance problem at once:

This stack works because it solves two problems at once. You get strong AI capabilities and a governance layer that regulators can actually inspect. Neither one matters without the other.

What I'm watching next

Three developments that could shift the field significantly:

  1. Agent-to-agent protocols. Standards for agents at different institutions to exchange information securely. Think interbank KYC data sharing through AI-mediated APIs. Early, but promising.
  2. Continuous evaluation. The industry is moving from periodic model validation to real-time drift monitoring. If your agent's behavior changes between quarterly reviews, you should know immediately.
  3. Regulatory sandboxes. Singapore and the UK are already creating safe-harbor environments for testing autonomous financial AI. Expect the US to follow, slowly.
Reminder: This reflects my personal analysis and opinions. It does not represent the views, strategy, or endorsement of IBM, Microsoft, Databricks, or any other organization. All trademarks belong to their respective owners.

Want to talk architecture?

I work with financial services teams building and deploying these systems. Happy to compare notes.

Book a Session