Architecture

Building AI for Environments That Don't Fit the Defaults

When your users, budgets, and infrastructure don't look like Silicon Valley.

Apr 2025·11 min read·Architecture

The meeting was going well until someone opened a spreadsheet.

A small retail distribution company outside Nairobi had been evaluating an AI-powered inventory management system. The vendor had done a polished demo: a natural language interface that could answer stock questions, predict reorder points, and flag anomalies in incoming supplier invoices. The team was enthusiastic. Then the vendor's account executive walked through the pricing model. API calls billed per token, cloud hosting fees, an integration fee for their existing inventory software, and a monthly support retainer. The finance director added it up and divided by the number of orders the business processed per month.

The cost per transaction was more than their margin on several product lines.

The meeting ended politely. The vendor flew back. The business went back to spreadsheets and WhatsApp groups.

This story has variations, but the structure is consistent. The technology works. The interest is genuine. The mismatch is in the assumptions the system was built on.

Most AI systems that are production-ready today were designed for a specific kind of environment. Not because their designers were careless or exclusionary, but because the environment they know best is the one they designed for. That environment has some reliable characteristics: internet connectivity measured in single-digit milliseconds, compute budgets that treat a few hundred dollars a month as rounding error, datasets that are clean enough to work with after light preprocessing, and users who are comfortable with complexity and willing to tolerate a rough interface in exchange for capability.

Those assumptions compound. They compound in the way libraries are chosen, designed for low-latency cloud environments where a retry is a millisecond inconvenience. In how prompts are structured, verbose and context-heavy because tokens are cheap. In how error handling is implemented, retry immediately because the network will be fine. And in how interfaces are designed, dense and feature-rich with the expectation that someone will read the documentation before touching the product.

None of this is wrong for the environment it was designed for. It becomes wrong when you take that system somewhere the assumptions do not hold. And in practice, the gap shows up in three places: cost, infrastructure, and users. They are related, but they each require different design responses.

Cost is the most legible problem. If you are building an AI feature that makes ten API calls per user session and each call costs a fraction of a cent, the cost is invisible at small scale. At 10,000 sessions per day it becomes a real budget line. At 100,000 it becomes a constraint on whether the feature can exist at all. For a well-funded startup in San Francisco this is a growth problem, which is a good problem to have. For a logistics startup operating on thin margins with no external funding, it is a product death condition before the product ships.

The instinct when cost becomes a constraint is to reach for a smaller model. That instinct is right directionally but incomplete. Smaller models are cheaper per token but may require more engineering to reach acceptable quality: more careful prompting, retrieval-augmented generation to supplement limited context, fine-tuning on domain-specific data. All of those add upfront cost and ongoing maintenance. The net economics may not favor the smaller model as clearly as the per-token price suggests.

What I have found more useful is thinking about where in the workflow AI is actually necessary. Most AI-powered systems I have reviewed do things that do not need AI. They use a language model to parse a date out of a string. They call an embedding API to extract a field that could be read with a regex. They generate summaries of documents nobody reads. Stripping those uses out, routing them to cheaper alternatives, and reserving the model for genuinely hard decisions can cut inference costs significantly without touching the parts users actually care about. It is not glamorous engineering. It is the kind of cost awareness that should have been in the design from the start but rarely is, because it was never a constraint.

Infrastructure is less legible because it fails silently. A system built to assume low-latency, always-on APIs behaves strangely when those assumptions are violated. It does not usually fail cleanly. It hangs, retries in loops, times out in confusing ways, and produces errors that are hard to diagnose without knowing what the system was expecting in the first place.

I worked on a pipeline for a client whose infrastructure spanned an on-premise warehouse management system and a cloud analytics layer. On paper the integration was straightforward. In practice the on-premise system was behind a connection that could go from excellent to unusable in the space of an hour, for reasons nobody could fully explain and that seemed to correlate with peak hours and the building's power stability. The cloud tooling had sensible retry logic for transient errors. It had no concept of a four-hour connectivity gap followed by a burst of backlogged data.

Every design assumption about latency tolerance, write ordering, and data freshness had to be revisited. Not because the tooling was bad, but because it was built for a different kind of unreliable: the kind where things are occasionally slow, not the kind where things disappear for hours and then come back all at once. In environments like this, the first engineering priority is making the system useful when conditions are not ideal, not making it maximally capable when they are. That reordering of priorities produces a different architecture. Local processing before cloud sync. Durable queues that survive power events. Fallback outputs that are less rich but always available. The system that handles degradation gracefully is worth more than the system that performs brilliantly under ideal conditions and fails confusingly when conditions change.

The user problem is the subtlest and the most consequential. AI systems tend to be designed with an implicit user in mind: someone relatively comfortable with technology, who has enough context to know what to expect from the system, and who will read an error message and make a reasonable inference about what to do next. That user profile is real and common in the environments where most AI systems are first tested. It is not universal.

In many SME contexts, the person interacting with an AI-powered tool is not a technologist. They are a business owner who adopted mobile money years before most of their counterparts in other markets, who runs a sophisticated informal operation with no ERP and no data team, and who has a very low tolerance for systems that are hard to understand or that behave unpredictably. This is not a capability gap. It is a different set of expectations and a different relationship with technology, shaped by a history of being sold tools that did not quite work as advertised.

The design implication is not "make it simpler in a condescending way." It is "make it transparent and trustworthy in a way that earns adoption." An AI system that gives a recommendation without explaining why it made that recommendation asks the user to extend trust they have not yet built. An AI system that says "based on your last three months of sales data, here are the three products most likely to run out before the end of the month, and here is why I think so" gives the user something they can verify against their own knowledge and choose whether to act on.

Human-in-the-loop is often framed as a reliability feature, a way to catch AI errors before they propagate. In environments where trust has not yet been established, it is also a trust-building feature. The human review step is not a concession to the system's limitations. It is how the system earns the right to eventually operate with more autonomy.

The technology adoption hesitation that shows up among Kenyan SMEs is frequently misread as conservatism or lack of sophistication. In my experience it is neither. It is a rational response to a specific set of conditions. These are businesses that have been pitched tools that did not work as advertised, that required expertise they did not have, that broke in ways they could not diagnose or fix, and that came with support contracts more expensive than the problem being solved.

The hesitation is not "we do not believe AI can help." It is "we have seen this pattern before and we need to see it work before we commit to it." That is not an irrational position. It is the correct update given the evidence available. What breaks through that hesitation, in my observation, is not a better demo. It is a narrow, working deployment with a measurable result.

A logistics company that sees a route optimization tool save a specific, verifiable amount in fuel costs over three months is far more likely to expand usage than one that saw an impressive demo and was asked to take the ROI on faith. The path to adoption is through proof, not pitch. This means the first deployment has to be designed for demonstrability as much as functionality. It has to work in the actual environment, with the actual data quality, on the actual hardware, operated by the actual people who will use it. Not in a sandbox with clean data and a fast connection and an engineer on standby.

In practice this means the first use case should be narrow, high-repetition, and clearly measurable. Invoice matching. Anomaly detection on a specific data feed. Automated responses to a well-defined category of customer query. Not "AI-powered decision making across the business." Scoping down feels like giving up ambition. It is actually the only path to building enough trust to eventually expand the scope.

The data problem deserves its own attention because it is consistently underestimated. Systems built for well-resourced environments tend to assume reasonably clean data. Not perfect, but structured, mostly complete, and stored somewhere queryable. In many SME environments the data is in WhatsApp messages, handwritten logs, Excel sheets shared over email, and the memories of the person who has been doing the job for eight years and has not documented any of it because there was never a reason to.

Running an AI system against data like this is not straightforwardly possible. Before the AI question is a data engineering question: how do you get this information into a form the model can work with, and how do you do it in a way that does not require a full-time data engineer to maintain. The answer is usually more validation, more preprocessing, more explicit handling of missing values, and more tolerance for incomplete inputs than a system designed for cleaner environments would need.

This is not an insurmountable problem. It is an engineering problem that requires being named clearly, scoped honestly, and addressed before the AI layer is built. The mistake I have seen repeatedly is teams trying to solve the AI problem and the data quality problem simultaneously. That tends to produce a system that handles neither well, and a client who concludes that AI does not work rather than that the sequencing was wrong.

When I think about what I would design differently today if I were building specifically for SME environments in markets like Kenya, a few things stand out with more clarity than they did before I encountered these constraints directly.

Cost awareness would be a first-class design constraint from the first line of code, not a retrofit after launch. I would profile every inference call, every API request, every scheduled job for its marginal cost at the scale the client can actually reach. I would design around the assumption that the budget is fixed and the volume is variable, not the other way around. That changes which architecture decisions feel conservative and which feel reckless.

I would invest heavily in offline tolerance and graceful degradation. Not because the infrastructure is broken, but because the cost of a system that behaves unexpectedly during an outage is much higher in an environment with limited technical support. A system that degrades predictably is a system people learn to work around. A system that fails confusingly is a system that gets abandoned, usually at the worst possible moment.

I would spend more time on explainability than on raw capability. A model that is 80% accurate and explains its reasoning in terms the user can verify is more useful in these environments than a model that is 93% accurate and produces outputs that feel like magic. The 93% model will be trusted less and therefore used less, which makes it effectively less accurate in deployment. Transparency is not a feature you bolt on after the core system is working. In environments where trust has to be earned, it is the core system.

And I would build incrementally and visibly. Small deployments with clear metrics, expanded only when those metrics demonstrate value. Not because the ambition should be small, but because the path to large-scale adoption in environments where trust has to be earned is through a sequence of small, verifiable wins rather than a single large bet that may or may not land.

The systems that work in constrained environments are not inferior versions of systems built for well-resourced ones. They are different systems, designed to different priorities, reflecting a different understanding of what the user actually needs and what the environment will actually support.

The interesting engineering challenge is not "how do we bring the defaults to these environments." It is "what does a well-designed system actually look like when the defaults do not apply." That question tends to produce more careful architecture, more honest tradeoffs, and more durable deployments than the alternative: taking a system built for one environment and hoping it survives contact with another. It usually does not. And the lesson, when it finally lands, is that the mismatch was not a surprise. It was designed in from the beginning, quietly, through every assumption that went unexamined.

Related Research

Dynamic Bias Mitigation in AI Systems

The formal research behind detecting and correcting assumption mismatches in AI systems, including approaches to fairness and adaptability across diverse user populations and deployment contexts.

Read →