AI / ML

Do Agentic AI Workflows Actually Work in Kenya?

Ambition is high. Infrastructure is catching up. The gap is where engineering lives.

Apr 2025·16 min read·AI / ML

Last month I was troubleshooting a deployment for a Nairobi-based fintech team. They had built what looked like the perfect agentic workflow: an autonomous system that took a customer complaint about a stuck M-Pesa transfer, reasoned through the steps, called the payment API, checked transaction logs, drafted a response, and if the rules matched, triggered a reversal. On office fiber during the demo it was flawless. The agent planned, looped, used tools, and closed the ticket in under ninety seconds. Everyone in the room smiled.

Then they flipped the switch to real traffic.

Evening peak hit. Mobile broadband congestion spiked. One undersea cable hiccup later the LLM call timed out mid-reasoning. The agent retried, burned tokens, and eventually gave up. Meanwhile a brief grid dip in the co-location facility killed the proxy they were using for outbound calls. The workflow died halfway through a refund action. The customer received two partial messages and no resolution. Support tickets doubled instead of dropping. The founder looked at me and said, quietly, "It worked so well in the lab."

That is the real question with agentic AI in Kenya right now. Not whether the idea is sound. Multi-step reasoning, tool calling to APIs and databases, iterative loops that keep going until the goal is met -- all of that is real and it works. The question is whether the whole stack survives the environment it actually runs in.

I have spent the last few years building data pipelines, production ML systems, and real-time orchestration layers for clients across continents, many through Turing and Upwork engagements. Some of those environments give you redundant power, sub-50ms latency to the nearest cloud region, and budgets that treat token spend as rounding error. Others do not. Kenya sits in the interesting middle. The pieces are there. The momentum is genuine. But the friction is constant, and if you are building here it will teach you things about reliability that no textbook covers.

Why Kenya Is a Strong Fit on Paper

The economy is mobile-first and API-native in a way that most markets are not. M-Pesa is not just a payment rail. It is the connective tissue for salaries, school fees, goods, and services. Logistics APIs, government portals, telco services -- they all expose endpoints that an agent can actually call and act on. The developer community in Nairobi is deep and pragmatic. I have paired with engineers here who ship clean, observable code faster than some teams I have worked with in more heavily resourced markets.

The National AI Strategy 2025 to 2030 is not decorative. It puts real weight behind compute infrastructure, GPU access, broadband expansion, and green-energy data centers, with explicit lanes for finance, agriculture, health, and public services. That creates both supply -- talent and early pilots -- and demand: government use cases that genuinely need automation.

So the promise lands. An agent that monitors a farmer's supply chain, pulls weather data and market prices when connected, reasons about delays, and coordinates transport without a human in the loop? Plausible. A customer support agent that handles the majority of routine M-Pesa queries, escalates the complicated ones, and logs everything for compliance? Already happening in pockets. The ambition is not misplaced.

The Environment Pushes Back

Internet reliability is the first teacher. 4G coverage is strong and 5G is spreading, but peak-hour congestion and occasional cable-cut outages are simply facts of life. An agent deep in a ten-step reasoning trace -- calling an external API, parsing the response, deciding the next tool -- does not gracefully degrade when round-trip latency jumps from 120 milliseconds to 800. It either times out or keeps retrying while the bill climbs.

I have watched the same pattern in other constrained environments. What makes it particularly instructive here is how quickly the economics surface the problem. When data bundle costs are real and visible, a runaway retry loop is not just a technical failure. It is a financial one.

Power is the second teacher. Grid incidents are not catastrophic but they are frequent enough that any long-running autonomous loop needs to assume interruption. A cloud-only agent that holds state in memory or depends on a single long-lived session simply dies when the power blinks. I learned this building real-time ETL pipelines in places with similar reliability profiles. The discipline it forces is useful: design the system to fail fast, checkpoint often, and resume from the last known good state. An agent that cannot answer the question "where was I before the outage" has no business running in production here.

Cost stops being a line item and becomes a first-class engineering constraint. Agentic loops are token-hungry by nature. Each reasoning step, each tool call, each verification pass accumulates. When you factor in local data-bundle costs for operators monitoring the observability dashboard or for end users interacting via USSD or WhatsApp, the unit economics shift fast. Founders who lit up during a demo go quiet when the monthly projection lands. Kenyan businesses often love the demo but hesitate at scaling -- "we'll circle back next quarter" is a phrase you hear often, and it is a rational response to cost uncertainty, not indifference.

The reflex is to reach for cheaper models or smaller context windows, but that trades capability for fragility in ways that tend to surface at the worst moments. The smarter approach is to treat cost as a runtime signal.

Cap spend per workflow. Route trivial classification steps to a local lightweight model. Only escalate to the heavy cloud LLM when the agent's own confidence score justifies it. Build cost observability into the system from the first commit, not as a retrofit.

Reliability as Core Agent Architecture

Reliability becomes core agent architecture rather than a DevOps checkbox you fill in later. You need retries with jittered exponential backoff so a flaky connection does not hammer the same endpoint in a tight loop. You need timeouts that force the agent to simplify its plan rather than stall indefinitely. You need idempotency everywhere -- especially for payment actions -- because a network blip should never result in two transfers or two debits.

The agent's internal trace -- thoughts, tool inputs, tool outputs, decisions -- must be logged with the same rigor you would apply to a distributed financial transaction. Otherwise you are debugging in the dark when something flakes, and something always flakes.

The patterns that survive here are hybrid by necessity. Cloud LLMs handle the heavy reasoning and planning layers. Local or edge models handle quick classification and offline decisions. Queue-based orchestration means a long workflow can survive a power dip: the agent emits a next-step message, the queue persists it, and a separate worker picks it up when connectivity returns.

Human-in-the-loop becomes a deliberate engineering decision rather than an afterthought. High-value or high-risk actions route to a human queue after a defined number of failed attempts or when the agent's uncertainty exceeds a threshold you set deliberately, not arbitrarily. Offline-first design where it makes sense: cache common reference data, allow the agent to propose actions that sync later, build systems that are useful even when the connection is degraded.

Where It Is Already Working

In practice, hybrid patterns are already working in three places.

Fintech is where the wins are clearest. Agents that triage transaction disputes, pull status from the M-Pesa API, and either resolve or escalate have short enough loops that latency spikes are manageable. The ROI is immediate and measurable: fewer support hours, faster resolution, audit trails that the compliance team actually wants.

In agriculture, a supply-chain monitoring agent that tracks aggregator contracts, cross-checks transport availability, and surfaces exceptions has survived in production because it runs on a schedule rather than in real-time and tolerates delayed execution without breaking.

Government services are beginning to experiment with form-prepopulation and status-checking agents. Success there depends on tight scoping, predictable inputs, and audit-ready logs from day one -- which aligns well with what the draft AI Bill 2026 is signalling.

That bill is worth taking seriously as an engineering input rather than a legal concern. Risk-based requirements around transparency, impact assessments, and record-keeping are not yet law but they are clearly coming, layered on the Data Protection Act 2019. For engineers this is not red tape. It is a forcing function to build the observability and audit trails you should have had anyway. Version your prompts. Log your tool calls. Make the agent's reasoning inspectable. It makes the system better, not just compliant.

What Works Today and What Does Not

Contained, high-ROI workflows in urban areas with reliable fiber or strong 5G coverage are viable right now. Short-to-medium reasoning loops with clear success criteria and well-defined tool interfaces. Systems built with cost controls, retry policies, idempotent actions, and observable traces from the first deployment.

What does not work yet is fully hands-off, long-horizon autonomy running on consumer-grade mobile connections without fallbacks. The fully autonomous supply-chain optimizer that runs for hours across variable rural links, reasons about exceptions in real time, and takes consequential actions without checkpoints is still more science project than production system. That will change. The infrastructure trajectory is genuinely positive. But it is not today.

If I Had to Ship Something Tomorrow

I would start narrow. One painful, repetitive workflow: post-transaction customer support in fintech, or exception handling in agri-logistics, or status lookups for a government service. I would design cost caps, retry policies, and idempotent actions into the first commit, not as a cleanup pass later.

The stack would be hybrid: local model for intent classification, cloud LLM for complex planning, queue for durability across interruptions. I would instrument everything so the team can watch the agent reason in real time, catch the failure modes early, and build confidence in the system before expanding scope. I would add a human escalation path that feels like a feature rather than a fallback. And I would measure token spend and resolution time as aggressively as uptime.

Then iterate from there.

The infrastructure is improving. The talent is real. The government strategy, whatever its execution gaps, points in the right direction. But the distance between the ambition on paper and the bits on the wire is precisely where the interesting engineering lives.

Agentic workflows do work in Kenya. They work when you treat the environment as a first-class design constraint rather than an inconvenience to abstract away. The teams that ship here will not be the ones with the most sophisticated prompts. They will be the ones who built for the outages, the latency spikes, the cost curves, and the power dips, and still delivered measurable automation on the other side.

The demo is easy. Surviving the real world is what separates the prototypes from the platforms. Kenya is an excellent place to learn that lesson quickly, because the environment does not let you pretend otherwise.

Related Research

Reliability Framework for AI-Driven Financial Systems

The formal reliability model behind the patterns discussed here -- audit logging, idempotency guarantees, and confidence-gated escalation for high-stakes agentic workflows.

Read →