The paradox of May 2026

Here's the situation business finds itself in right now.

Skip AI and you’ll most likely fall behind on some tasks. Clients are getting used to the response speed that AI-equipped competitors offer: while you reply to an inquiry in four hours, someone replies in fifteen minutes — and some of those leads are gone before you even call. Hiring gets harder: specialists already used to working with AI tools are reluctant to join companies that don’t have them. Over a long stretch this turns into an accumulated gap that’s hard to close in one go.

Adopt it head-on and you’ll almost certainly burn money. By various 2025–2026 estimates, somewhere between 75% and 95% of corporate AI initiatives either never reach production or deliver no measurable financial impact (MIT NANDA “GenAI Divide,” August 2025; Prosigns; and others). The numbers are contested — critics fairly note that “no measurable P&L” ≠ “failure,” some authors are affiliated with vendors, and the methodology draws from small samples. But the overall signal is consistent: most corporate projects don’t pay off. Not because the technology is bad. Because it gets bolted onto old processes without anyone thinking through the economics first.

It’s worth separating, right away, two different phenomena that this statistic tends to blur. Individual use of AI by employees (ChatGPT for a draft email, transcribing a call, finding an Excel formula) is widespread and often useful. A corporate initiative with a budget, a project office, and contractors fails on a massive scale. These are different things. From here on, I’m mostly talking about the second.

So you get a fork: you can’t afford not to spend, but almost no one gets a fast, obvious return on the big projects. The question, then, isn’t “adopt or don’t.” The question is how to move step by step: understanding what we’re doing at each step, what interim goal we’re setting, which tools we add gradually, and what culture we grow around it all.

What follows is my breakdown of the main decisions and mistakes along that path. It’s not a consultant’s playbook, and it’s not a rollout plan with a guaranteed result. It’s an attempt to gather, in one place, the forks where I regularly watch companies lose money.

How this article is built

  1. Part 1 — what AI actually does for a business. The simplest filter for ideas.
  2. Part 2 — a company's maturity stages with AI. From "employees try it themselves" to "AI at the core of the product."
  3. Part 3 — how to tell whether it paid off. The longest and most important part.
  4. Part 4 — how to pick the first process, build a sandbox, and who to put on it internally. This expands on Stage 1 from Part 2.
  5. Part 5 — the cultural changes without which even a technically successful pilot rolls back within six months.
  6. Part 6 — cloud or your own infrastructure. A fork that comes later, once the first pilots are running.
  7. Part 7 — the pitfalls that most often actually sink projects.

Part 1. What AI does for a business

If we simplify, AI usually does one of two things.

  1. Cuts costs. It replaces a human on routine work or speeds them up dramatically. Fewer people, or the same people doing more. Add to that: fewer errors and fewer redos.
  2. Grows revenue. It does what, without AI, is either impossible or too slow. Reply to a client faster — higher conversion. Recommend more accurately — bigger average order. Analyze data more deeply — better decisions.

There are borderline cases too — risk reduction (compliance, security, anomaly detection), quality improvement (fewer redos, higher NPS) — that don’t land in the P&L directly. They’re harder to reason about at the start, so it’s sensible to pick first pilots that fit one of the two categories above. The more complex effects are worth tackling once you have a couple of working projects under your belt.

If you can’t say, reasonably clearly, which of these two categories your idea belongs to — it’s worth pausing to think more. This filter screens out a big share of “let’s just try it.”

“Let’s give everyone ChatGPT” is not a category. It’s buying a tool without understanding what to do with it. Spending exactly like this is what so often lands in the “never paid off” pile.

A useful formulation sounds roughly like this:

“Our support team answers N similar emails a day, spending X hours of each person’s time on it. If AI takes on most of that, we free up so many person-hours a week.” That’s cost reduction.

Or:

“A manager processes an inquiry in hours. If AI qualifies the lead and prepares an initial offer in minutes, we’ll reply ahead of competitors.” That’s about revenue.

You don’t have to know the exact numbers on day one. Ideally you can name them by the end of the first week. If you can’t measure the process now, with no AI at all, then putting AI on it is premature. First you have to learn to measure it.

Part 2. Maturity stages

Adopting AI isn’t a single event — it’s several sequential stages. Each has its own purpose, its own interim goal, its own toolset, and its own cultural shift. You can skip stages in your head, but not in practice — trying to jump straight to Stage 4 from zero is exactly what produces that wave of failed investments.

Stage 0. Individual curiosity

What happens: individual employees try ChatGPT/Claude/local LLMs on their own tasks. No system, no budget, no approvals.
Purpose of the stage: permit and encourage. Make it so this stops being “guerrilla” use.
Tooling: whatever employees already use.
Cultural shift: leadership says out loud — “trying things is fine, sharing what you find is good, fearing replacement is not the point.”
Interim outcome: you know who in the company already knows how to work with AI. That’s your future talent pool for the test team.
How long it takes: a few weeks of observation.

A contextual note. By 2026, individual AI use among senior leaders and a large share of employees is no longer “leading-edge practice” but the market’s baseline state. Surveys from late 2025 and early 2026 show that nearly all C-level executives use generative AI for work tasks in some form, and most employees save a few hours a week this way. Which means Stage 0 in your company is most likely already underway on its own — the only question is whether you notice it and use that knowledge. And one more thing: a successful Stage 0 doesn’t mean the company is ready for Stage 1. Individual benefit and corporate payback are different things (see the paradox at the start of the article).

Stage 1. The first pilot

What happens: we pick one process, assemble a test team, build a sandbox, take it all the way to launch in one department. Details on this phase are in Part 4.
Purpose of the stage: prove, on a single case, that AI can deliver a measurable result in your company. Not “in the industry,” not “at some big bank,” but at your place, on your data, in your routine.
Tooling: one or two AI-model subscriptions, one low-code automation platform, a data export into the sandbox.
Cultural shift: a ritual appears — “before doing a new process by hand, ask whether it can be automated.” Someone is now responsible for this.
Interim outcome: one working process with a before/after measurement, one trained person inside, an understanding of the real cost and timeline for your company.
How long it takes: from a few weeks (a ready cloud tool on a standard task) to 2–3 months (a custom solution with integration).

Stage 2. The second and third processes. Accumulating experience

What happens: using the first pilot’s experience, we launch 2–3 parallel processes in different departments. Each one is shorter and cheaper than the first.
Purpose of the stage: check whether the success repeats. One working process can be luck. Three is a method.
Tooling: it expands. API access appears, a shared “prompt library,” internal documentation on “how we do an AI project.”
Cultural shift: employees from different departments come to you with ideas of their own: “we’ve got this routine over here, can we take a look?” Once there’s a queue from below, Stage 2 is basically done.
Interim outcome: 3–4 working processes, an accumulated library of standard solutions, several people who know how to work with this.

Stage 3. Systematization

What happens: AI stops being a separate “experiment” and becomes an ordinary tool. Standards appear: how to assess new initiatives, how to calculate payback, how to ensure data security, who approves access to models.
Purpose of the stage: remove the chaos. So an AI project doesn’t depend on one person’s enthusiasm but runs through a clear process.
Tooling: an internal registry of AI initiatives (what’s in progress, what’s queued, what was rejected and why), a standard payback-calculation method (Part 3), a security policy.
Cultural shift: “AI project” stops being a special phrase. It’s just a project. With its own stages, budget, and KPIs.
Interim outcome: a steady flow of new pilots, low dependence on specific individuals, an understanding of the AI program’s overall economics.

Stage 4. AI at the core of product and management

What happens: the company starts using AI not just for internal processes but in the product, in decision-making, in strategy. A minority reaches this stage — and reaches it honestly, with stages 1–3 behind them.
Purpose of the stage: build an advantage that’s hard to replicate without AI.
Tooling: possibly your own models, fine-tuning, local infrastructure (Part 6), integration with key business systems.
Cultural shift: AI becomes part of how the company thinks about itself.

Over a two-to-three-year horizon, most companies make it to Stage 2–3. That’s normal. At Stage 3 the real economics are already in play. Jumping over stages is the main way to lose money.

Part 3. How to measure effectiveness

This is the most important part, because this is where everything else usually breaks. You can pick the right process, assemble the right team, build the sandbox — and still, six months later, have no idea whether the pilot delivered any effect at all. Simply because you never measured how it was BEFORE.

One important fork up front, because it determines everything downstream.

Two different scenarios — two different economics

Scenario A. A ready cloud tool on a standard task. A ChatGPT/Claude/local-LLM subscription, or a ready service like a call transcriber, or AI features already built into your CRM. Adoption is “give employees access + agree on how to use it + measure.” No development, no integrations, no infrastructure of your own. Cost starts at tens of dollars a month. Launch in days or weeks. The payoff can come fast.

Scenario B. A custom solution with integration. A pipeline in Make/n8n wired into the CRM, or your own tool on a model’s API with scaffolding, or a local deployment. Here you add time for development, testing, run-in, and training people. Cost runs to tens or hundreds of thousands of rubles up front plus monthly upkeep. Launch in weeks and months. The payoff, accordingly, takes longer.

These are two different kinds of project with different economics. Measure them with the same ruler and either the first looks suspiciously fast (“that can’t be real”) or the second looks like a failure (“where’s the result after a month”). When we get to timelines and thresholds, we’ll keep them separate.

When the payoff can come fast

Before we get into the measurement method, it’s worth being clear about this upfront: there are scenarios where a short payback period is the norm, not a reason to doubt the math. If you don’t say so out loud, a lot of pilots stall at “too good to be true.”

  • A ready cloud tool on a standard task. A call transcriber that slotted into the sales team’s workflow. ChatGPT for draft support replies. An AI feature in the CRM that’s already there and just got switched on. Cost: tens of dollars a month per user. If the task is high-volume and regular, payback can land in the first weeks of use.
  • A high-volume, repeatable process. The higher the volume, the faster the savings pile up in absolute money at the same tool cost. A thousand transcriptions a month saving 10 minutes each is already serious economics, even with inference costing next to nothing.
  • A pilot with no integration into production systems. If at the start you don’t integrate the AI into the CRM / billing / accounting, but just drop drafts into email or a shared folder, you save weeks and months on development. That changes the economics of the pilot. Integration is the next stage, once it’s clear the tool is useful.
  • Cheapening inference. Token costs at major providers have fallen hundredfold over the last two or three years and keep falling — roughly 5–10x a year for frontier models, faster still for some task classes. If you ran a “won’t pay off” calculation a year ago, redo it at today’s prices; the answer may have changed.
  • Tools that remove a specific bottleneck. If inquiries sit unprocessed in your sales funnel for 4 hours while competitors reply in 15 minutes, even a simple AI helper that immediately sends a first reply and a qualification will pay for itself with the first few rescued deals.

The general logic: the less development and integration in the project, the higher the volume, the more critical the speed — the shorter the payback. And if the calculation comes out implausibly fast, check not the payback figure but the completeness of the costs: did you count the team’s time for launch, training, upkeep, periodic retuning.

Step 1. A baseline measurement before the pilot

The most common mistake is starting a pilot with no baseline. Two months in, there’s nothing to show the improvement against. “Feels like it got better” is not an argument for a budget.

What to measure BEFORE AI enters the process:

  • Time per operation. Stopwatch it. The employee logs it in a simple table: date, type of inquiry, start time, end time. A plain Google Sheet does the job; no automation needed yet.
  • Volume over a period. How many requests/documents/calls pass through the process per day and per week. If volume swings a lot, calculate the mean and the spread.
  • Error and redo rate. What counts as an error in this process, and how many there are per week. If no one counts them now, then the first task isn’t AI — it’s introducing error tracking for at least a month.
  • The fully loaded cost of an employee-hour. Not “take-home pay” but loaded: salary + taxes + insurance + workspace rent + software + equipment depreciation. In Russia this runs 1.5–2x take-home pay. If accounting doesn’t have a ready figure, calculate it yourself — don’t skip this step. Without it, the time saved never turns into money.

How long the measurement lasts. It depends on the process volume.

  • If 50+ operations pass through per day, two weeks is plenty.
  • If 5–10 operations a day, two weeks is enough to get the basic picture.
  • If there are few operations (a handful a week), don’t stretch the measurement over months. Better to log the timing of 10–15 typical cases and move on, knowing the statistics are weak.

The main rule isn’t the number of cases but how representative it is: the measurement should include all the main operation types, including the rare and complex ones. Seasonality and peak loads are a separate conversation — best accounted for, or at least recorded as context.

While the measurement is running — no changes to the process. Don’t change the procedure, don’t introduce new tools, don’t reshuffle people. Otherwise, a month later, you won’t be able to tell what affected what.

Step 2. Name one main metric

A pilot must have one headline number on which the “continue or close” decision is made. Not three, not five. One.

Examples:

  • Average handling time per inquiry dropped by X%
  • The share of inquiries answered within 15 minutes rose from Y% to Z%
  • The cost of processing one document fell from N rubles to M rubles
  • Inquiry-to-deal conversion rose by P points

This metric must be:

  • Tied to money (either directly or through an obvious chain: time → money, response speed → conversion → revenue)
  • Measurable in the same units as the baseline (you can’t have “it was in hours, now it’s in gold stars”)
  • It needs a predefined success threshold. For example: “we count the pilot a success if handling time drops by at least 20%, otherwise we close it.”

Fix the threshold BEFORE launch, not after the fact. Otherwise any number, in hindsight, will look like success.

Plus 2–3 diagnostic metrics, to understand what’s going on:

  • Answer quality (e.g., the percentage of answers the employee accepted without edits)
  • Real usage — are employees actually using the tool, and on what share of the work
  • Team satisfaction (a short weekly survey, scored 1–5)

A separate thought on “real usage.” People often cite the percentage of employees actively working with AI. That number by itself means little. If 5 of 20 people use it actively but they handle 80% of the volume, that’s not a failure — it’s a normal distribution. Don’t look at the percentage of people, look at the percentage of work that went through the AI. If that percentage is small, it’s worth figuring out why: it doesn’t fit the task, it’s inconvenient, they don’t trust it, they weren’t trained. That’s a diagnostic signal, not a death sentence for the pilot.

Step 3. A pilot with a control group — if you can

This is the step that gets skipped most. Without a control group, it’s harder afterward to prove that the improvement is the AI’s doing and not something else (the season, a new manager, a traffic shift).

The simplest version: a 10-person department. Five work with AI help, five keep working the old way. After a month you compare their metrics against each other.

More advanced versions:

  • A/B test: incoming inquiries are split randomly in half. Half go through AI, half through the regular process.
  • Staged rollout: the first branch/department this month, the second next month. You compare them against each other, and each against itself over time.

If a control group is impossible (a two-person department, an indivisible inquiry flow, or simply no resources for parallel tracking) — that doesn’t mean you shouldn’t launch AI. It’s a reason to honestly record the measurement’s limitations and compensate with something else: log the external context (what else was going on in the company in parallel — any staff changes, new products, load shifts), compare several before-and-after periods, look at the dynamics within the pilot (week one vs. week four). The measurement will be less clean — that’s normal for a small business. Not perfect, but you can still see a workable result.

Step 4. Measurement rhythm and converting to money

The rhythm depends on which scenario you’re in.

For Scenario A (a ready cloud tool). The timeline is shorter.

  • End of week 2: you can already see whether people use it and whether it helps. If, on a standard task, no one has saved any time after two weeks of use, it’s most likely the wrong task or the wrong tool. You can close it or switch.
  • Month 1: a first estimate of savings in hours and an attempt to convert them to money.
  • Month 3: the final “scale / close / change approach” decision.

For Scenario B (a custom solution). The cycle is longer.

  • Month 1: measure whether people use it (see the diagnostics in Step 2), the percentage of AI answers that had to be redone (quality), the number of errors in the output. At this stage there may be no savings yet — people are still learning.
  • Month 3: the main metric against the baseline. Compare with the control group or with yourself before the pilot. By now you can see whether it works.
  • Month 6: calculate the real economics. Not “potential” but the actual hours saved × fully loaded hourly cost, minus all the real expenses. This is the first point where you can say “it paid off” or “it didn’t.”
  • Month 12: compare actual ROI against the plan. The recorded gap (if any) is the most valuable knowledge for the next pilots.

These timelines are guidelines, not dogma. If the process is high-volume and the tool is ready, you can run all the cycles faster. If volume is low, you need to wait for the statistics to accumulate.

The base formula for converting to money:

Monthly savings = (Minutes saved per operation ÷ 60) × Fully loaded hourly cost × Operations per month

An example (numbers are illustrative, to show the mechanics):

  • Before the pilot: 25 minutes to process one inquiry
  • After: 10 minutes on the same inquiry
  • Savings: 15 minutes per inquiry
  • Volume: 800 inquiries a month
  • Savings in hours: 15 × 800 / 60 = 200 hours a month
  • Savings in money: 200 × your fully loaded hourly cost

From there, by the same logic, you add savings on errors (number of errors prevented × average cost of one error) and, if any, the revenue uplift from speed. From that you subtract the costs: subscription / monthly API cost + amortized development cost (spread the one-time cost over 12 or 24 months) + upkeep (team hours for support).

Net monthly benefit = savings − costs.

Payback period = one-time launch cost ÷ net monthly benefit.

Tracing it into the P&L

Any savings figure must point to a specific line in a financial statement or an operational metric that changed. If you say “we saved 200 hours a month,” there has to be an answer to: what happened to those hours?

Direct financial effects (visible in the P&L right away):

  • “Six months in, we didn’t hire the two planned headcount” — a reduction in the planned headcount costs.
  • “We cut one position in the department” — a direct payroll reduction.
  • “We redirected the freed-up time to work we’d never had time for, and it produced revenue uplift X” — a revenue increase.

Indirect effects (visible in operational metrics; they show up in the P&L a quarter or two later):

  • More clients handled per manager, the department’s revenue grows. Here you need to wait for it to accumulate and verify the growth is really tied to the freed-up time and not the season or marketing.
  • Reduced load, employees burn out less and quit less. The effect is real, but through payroll it’s only visible once the cost of hiring and onboarding replacements falls.

What you’re better off not counting:

  • “Employees got more done, efficiency went up.” If “more” didn’t turn into revenue, a payroll reduction, or a number of handled inquiries, that’s not an effect — it’s a feeling.

This discipline is unpleasant, but it’s exactly what separates companies where AI pays off from those where it “sort of does something.”

When not to start

  • If the process can’t be measured without AI. First learn to count.
  • If a baseline can’t be fixed because the process “keeps changing.” First stabilize the process.
  • If the economics on paper come out razor-thin. In practice they’ll sag — in reality there are always costs you didn’t account for: training, upkeep, prompt tweaks, downtime during run-in, periodic quality reviews.
  • If it’s unclear what to do with people’s freed-up time. Without an answer to that, savings turn into idle time.
  • If there’s no one to own the measurements. Without a metrics owner, they don’t get collected.

Benchmarks from practice

A few numbers that show up in cases regularly enough to lean on:

  • Realistic time savings on standard text tasks (classifying emails, drafting replies, summarizing calls) — from 30% to 60% of the time per operation. On high-volume, well-described tasks, once the tool is built into the routine (support, handling standard inquiries, call transcription), the upper bound shifts noticeably higher — you see steady figures of 50–80%, and at major support vendors the share of inquiries closed by AI with no human involvement reaches tens of percent. If your calculation comes out at “almost everything” on complex, non-standard tasks, you’re most likely overestimating, and you’ll be disappointed later.
  • Error rates drop 3–5x on classification and checking tasks versus manual work.
  • Hidden costs (training, process changes, security, legal review) inflate the budget noticeably — build in a buffer from the start. (On cheapening inference, see the “when the payoff can come fast” section above.)

Part 4. The first pilot in practice

This part expands on what Part 2 called Stage 1. Here’s how to pick the process, build the sandbox, and who to staff it with inside the company.

4.1. Where to start

The most common mistake is starting with the most important thing. Better to start with the most boring.

Pick the first process using five criteria. The fifth one matters most.

5. Describable. This isn’t the least important point — it’s the number-one cause of failure. You need to understand exactly what the person currently doing this process does. Which steps, in what order, by what signals they make decisions. If, asked “how do you do this?”, the employee answers “well, by feel” or “by experience,” that’s a signal. It doesn’t mean you can’t automate it. It means the first stage isn’t AI — it’s describing the process.

It often turns out that half the “process” lives in one person’s head: they look at an email and in half a second decide who to forward it to. Ask why and they can’t explain. AI can’t reproduce that “can’t explain” until you’ve broken it down into signals yourself. So before the sandbox, a short step helps: sit the process owner down and work through 20–30 typical cases out loud — what they see, what they decide on, what they do.

A side effect: even if the AI pilot ultimately doesn’t take off, you’re left with a documented process. That’s value in itself — the company stops depending on one person, a new hire is easier to onboard, and any future automation attempt has the description ready.

The other four criteria:

  1. Repeatable. Done every day or every week to a more-or-less identical script. Not one-off analytics — AI won’t close that anyway.
  2. Measurable. It’s clear how many hours it takes and what volume passes through. If it’s unclear, it’s not a process yet — it’s a feeling.
  3. Low error risk. If the AI makes a mistake, the loss isn’t catastrophic. Not financial reporting, not medical diagnoses, not legal opinions for court. But draft emails, inquiry classification, call transcription, initial résumé screening.
  4. A baseline exists. It’s known how the process works now and how long it takes. Without this you’ll have nothing to prove the AI improved anything.

Most often a good fit for a first launch:

  • Handling standard client inquiries (support, FAQ)
  • Transcribing and summarizing managers’ calls
  • Initial résumé screening and drafting interview questions
  • Drafting commercial proposals from a standard template
  • Building reports from the CRM and spreadsheets in natural language

Most often not worth taking first:

  • Finance and accounting
  • Legal documents
  • Strategic planning
  • Creative marketing (brand communications)
  • Anything that requires context across the whole company at once

An illustrative case (anonymized). A services company, ~30 employees; the sales team complains there’s “no time.” The measurement: an average manager spends ~40 minutes a day transcribing and summarizing calls and another ~50 minutes on draft proposals for standard requests. They rolled out a ready transcription service + ChatGPT with pre-tuned prompts for the proposal draft. After three weeks of use, savings of about an hour a day per manager — five working hours a week each. No development, no CRM integration at the start — just two ready tools and documented rules for when to use them. CRM integration came four months later, once it was clear the tool had stuck. This is the classic Scenario A pilot profile: a cheap start, a fast payoff, complexity added as the habit settles in.

4.2. The sandbox

A sandbox is a small fenced-off space where you try AI on real tasks but without the risk of breaking anything in production systems.

The minimum set:

  1. One ChatGPT Plus or Claude Pro or local-LLM premium account.
  2. An account on a no-code automation platform: Make, n8n (self-hostable for free), Lindy, Albato.
  3. API access to one of the language models with a small starter deposit.
  4. A test copy of the data — a CRM export to Google Sheets, a month’s worth of inquiries, document templates. No production integrations yet.

The procedure:

  • Week 1. The test team takes one process and runs it through ChatGPT/Claude by hand. No automation. They just copy the email, ask, get an answer, assess the quality. They do this a few dozen times across different cases.
  • Week 2. They gather statistics. In how many cases was the answer immediately usable? In how many did it need a rewrite? In how many was it rejected? If fewer than half are usable, they tune the prompt, give examples, add company context.
  • Weeks 3–4. Once quality is stable, they assemble it into a simple pipeline via Make or n8n. An email comes in — goes to the AI — the answer lands in drafts, a human reviews and sends.
  • Month 2. Launch into live work for one or two employees. Not the whole department. Watch for a month, measure the savings, log the bugs.
  • Month 3. If it works, scale to the whole department. If not, honestly close the experiment. That’s normal. Some pilots will close this way.

A sandbox isn’t “a separate innovation team off tinkering somewhere with no contact with the business.” It works on the real tasks of a real department, just without access to production systems. Otherwise it turns into a research club that ships nothing.

4.3. Who to assemble at the start

A common mistake is handing AI experiments to the IT department or hiring a separate “Head of AI.”

Why not the IT folks. They’ll start with infrastructure — which model to deploy, which vector search, which database. They’ll spend months on the platform. They usually don’t know and don’t love the business process. The output is a pretty thing nobody uses.

Why not a dedicated “Head of AI.” They’ll immediately start building a grand vision, doing strategy, hiring a team. The effect comes in a year, if you’re lucky. The money is needed right now.

Who you need at the start — 2–3 people inside the company. This isn’t a “team” in the sense of dedicated headcount; it’s a small working group where everyone is busy with it about 20–30%:

  1. One employee from the department whose process you’re automating. Not the manager — a frontline person who does this work by hand every day. They know all the nuances, the exceptions, the “oh, and there’s also this.” Without them you can’t tune the AI.
  2. One person with a technical bent — not necessarily a programmer. A marketer good with Excel formulas and Google Sheets, or a manager who’s built integrations themselves. The key thing: they’re not afraid to dig into Make/n8n and an API.
  3. One person who makes decisions. Someone from leadership who can say, in a single conversation, “yes, keep going” or “no, we’re closing it.” Without this the group gets stuck in approvals.

Where to find them internally:

  • Who among the employees already uses ChatGPT for work (the very people from Stage 0).
  • Who complains about routine the most. They’re motivated to kill it.
  • Who picks up new tools fast.

Hiring externally at the first stage usually makes no sense — an outside person doesn’t know the business and will spend the first months adapting.

Part 5. Cultural changes

AI inside a company isn’t only technology — it’s a new habit too. Install the technology but don’t grow the habit, and within six months it all rolls back. What’s worth growing alongside the pilots:

  • The habit of measuring. Before AI, a lot of processes just ran “somehow.” Nobody counted the hours. With AI that becomes impossible — you need a baseline measurement, or it’s unclear whether there was any effect. This skill stays in the company even after the specific project.
  • The habit of describing. In parallel with the habit of measuring. Before AI, a lot lived in specific people’s heads — and that wasn’t a problem. With AI it becomes important to be able to break down “how I do this” into steps and signals. It’s a useful skill in its own right: dependence on irreplaceable people goes away, new hires are easier to bring up to speed.
  • The habit of trying and closing. Before — “we started a project, we’re obligated to finish it.” With AI — no. Some pilots get closed because they showed no economics. That’s not a failure, it’s normal hygiene. Without this culture, a company starts dragging a dozen dead AI projects along “because we invested.”
  • The habit of sharing internally. If one department finds a good prompt or builds a working pipeline, it should quickly become available to the rest. A shared channel, a prompt library, regular short demos of “here’s what’s new.”
  • An attitude toward errors. AI makes mistakes. Not “can make” — it’s guaranteed to make them on some share of cases. You have to live with that and design processes accordingly, not pretend AI is “the right answer out of a machine.” The employee’s role isn’t “hand the question to the AI,” it’s “review and make the decision.”
  • An open conversation about layoffs. If people fear AI will replace them, they sabotage. If they know the freed-up time will go to work they never had time for, they help. The conversation has to be sincere. If you’re planning layoffs, better to tell the truth in advance and help people retool.
  • A clear right to AI. There must be a clear answer to “am I allowed to use this?”, “what can I send, what can’t I?”, “who do I go to with an idea?” Without clear rules, half the people try nothing, just in case.

Part 6. Cloud or on-premises

This fork doesn’t appear right away. At Stage 1, the cloud is usually enough. A company reaches the “maybe it’s time to run our own?” question later — usually at Stage 3, once there are several working processes, a clear volume, and real API costs.

Path 1: Cloud

You pay per API request (OpenAI, Anthropic, local providers) or subscribe to a ready service.

  • Pros: fast to launch, no infrastructure needed, always the latest model, you pay for actual use.
  • Cons: expensive at high volume, data goes to someone else’s server, dependence on the provider.
  • When to choose it: at the start, when testing hypotheses, at moderate volumes, when the data isn’t critical.

Path 2: A local solution

Your own server with an open model (Llama, Qwen, DeepSeek, local provider variants).

  • Pros: data never leaves the perimeter, much cheaper at high volume, no dependence on external providers.
  • Cons: significant up-front hardware investment, you need an engineer for upkeep, open models are weaker than commercial ones on hard tasks, launch takes months.
  • When to choose it: when volume is steadily high, or when the data is critical (medicine, law, banking, defense), or when independence from external services is strategically important.

A simple rule. Estimate the monthly cloud cost at your volumes. Multiply by 18 months. Compare with the cost of your own server plus an engineer's annual salary over the same period. If the cloud is much more expensive in the long run, it's time to think about local. If not, stay in the cloud.

Most small and mid-sized businesses never reach the volumes at which local pays off. That’s normal.

Part 7. Pitfalls

I’m not claiming completeness. This is what, in my practice, most often actually sinks projects.

  1. Sabotage out of fear. If people fear replacement, they don’t give honest feedback, they don’t test, they deliberately hunt for bad examples. Cured by an open conversation (see Part 5) and reinforced by action.
  2. Data quality. AI runs on your data — if the CRM is garbage, the output is garbage. Before adopting, it’s worth checking the state of the data for the process you’re automating. If there’s no data, or it’s a total mess — get the data in order first, AI second.
  3. A contractor selling strategy instead of tactics. The typical story: a company that doesn’t yet have a single working pilot gets pitched a “year-long AI strategy” — several million rubles, decks, roadmaps, nothing working at the end. A useful filter: if a contractor offers a “strategy” before “one specific process we’ll launch in two months,” it’s probably the wrong contractor. Strategy is Stage 3, when there’s something to generalize. Before that, you need tactical pilots.
  4. Bus-factor of one. Often the whole AI project rests on one enthusiast who mastered Make/n8n and writes the prompts. They leave and the project stalls within a month. Cured by keeping the pipelines and prompts in a shared, accessible place and documented in plain language from the very start, with at least two people on the team who understand how it works.
  5. Tech debt from “built it fast and forgot it.” The pipeline was thrown together to test a hypothesis faster — that’s fine for the sandbox. But if that same pipeline went into production without a recheck and documentation, then six months later no one remembers why that particular prompt is there, what that node in Make does, or why this extra check is needed. When something breaks, there’s no one to fix it. Cured by separating “rough pilot” from “production version”: once the sandbox shows a result, the working pipeline is rebuilt cleanly and with documentation.
  6. Hype instead of a task. The team gets carried away with the technology and forgets the task. “Let’s try a multi-agent system,” “let’s deploy our own model,” “let’s build a RAG.” Bring it back to the question — what specific business task are we solving right now? If a beautiful technology doesn’t solve the task, shelve it.
  7. Data security. When employees start mass-sending client databases, contracts, and financial data into ChatGPT, that’s a leak with real consequences. The rules: what can go to the cloud, what can’t. What can’t — either anonymize it or run it through a local model.

Instead of a conclusion

The point isn’t to “adopt AI” but to learn to measure processes, describe them, and make decisions by the numbers. AI is just a good excuse to learn this earlier than your competitors do.

Every other rule in this article follows from that one idea. The maturity stages, the sandbox, the control group, tracing into the P&L, describing the process before automating — it’s all about one thing: stop going by feel and start seeing what actually works in the company.

And one more thing — about money. You can’t afford not to spend, but that doesn’t mean you have to spend a lot. A first pilot with a ready cloud tool on a standard task can cost tens of dollars a month and pay off in weeks. A custom solution with integrations costs noticeably more and pays off more slowly. If a contractor quotes big numbers and multi-month timelines for a task you could close with a ready tool, they’re selling you not AI but consulting. Those are different products.