I let a small bot run every thirty minutes for a few weeks.

It was not trading real money. It was a paper trading system called Nova Trader, built to watch a small basket of crypto assets, pull market data, read news and fear data, score signals, and keep a fake portfolio in SQLite.

Five assets. A $10,000 paper account. Small position sizes. A scheduler. A watchdog. A few databases. Nothing glamorous.

After 851 scheduled cycles, I audited the whole run.

The account was basically flat. It ended at $9,984.82, down 0.15 percent. The passive basket over the same window was down 11.78 percent. Buy and hold UNI was down 23.60 percent. ARB was down 13.66 percent. WETH was down 11.56 percent.

So the bot did something useful. It avoided most of a bad market.

That was not the interesting part.

The first failure was not the strategy

Out of 851 scheduled runs, 848 were fine. Three failed.

All three were timeouts.

The watchdog script hit a 120 second limit and the scheduler killed it. A manual run worked. The tests passed. The data was still moving. The trading system was not dead.

The wrapper was impatient.

That is the kind of failure that does not show up in a demo. In a demo, the system runs once while everyone is watching. In operation, the system has to wake up again and again, wait on slow APIs, write state, report its own health, and keep enough evidence that you can reconstruct what happened later.

A lot of automation breaks in that boring middle space. Not because the model is stupid. Not because the strategy is wrong. Because the job wrapper, timeout, retry logic, state model, or reporting path is weaker than the thing it is supposed to supervise.

The bot did not fail first. The operating layer did.

Flat is not the same as healthy

The headline number looked decent. Down 0.15 percent while the basket was down almost 12 percent is not bad.

But the trade distribution was ugly.

The backtest ended at $9,929.94, down 0.70 percent. It had a 44.44 percent win rate on closed trades. That alone does not bother me. A strategy can work with a low win rate if the winners are large enough.

These were not.

Average win: $6.57.

Average loss: $45.79.

That is the smell.

The system was good at avoiding some exposure, but bad at extracting clean upside once it entered a trade. The open positions were green at the audit point, which helped the paper account look better. The closed trades told a less flattering story.

This is why I like boring audits. They stop you from worshipping the equity curve.

A strategy can beat passive exposure and still have bad trade mechanics. A system can look resilient because the market window was favorable to its restraint. A nearly flat account can hide the fact that the exits are doing too much damage.

If I had only checked the latest balance, I would have learned almost nothing.

The paper book and the backtest disagreed

The paper trading database showed 31 trades.

The backtest showed 23.

That gap matters more than the performance number.

A backtest is supposed to be a clean replay of historical signals. Paper trading is the messier thing that actually ran: processed signal logs, stop behavior, position state, duplicate handling, and whatever happened when the scheduler woke up every thirty minutes.

Eight trades of difference means the system does not yet have one truth.

Maybe the paper trader includes stop exits that the backtester does not model. Maybe some signals were processed differently. Maybe the backtest is too clean. Maybe the paper execution path is doing something the strategy description does not admit.

I do not want a theory here. I want a reconciliation file.

Same signals. Same account assumptions. Same execution rules. Every trade explained. If two ledgers disagree, the system should tell me exactly why.

This is not just a trading problem. It is an operating problem.

Every business with automation eventually runs into some version of this. The CRM says one thing. The spreadsheet says another. The payment processor says a third. The dashboard says everything is fine because it is looking at the wrong surface.

The question is not whether the company has data. The question is whether the data agrees with itself when it matters.

Most signals were nothing, which was good

Across 4,270 signal rows, most were HOLD.

That restraint probably helped. The system did not try to trade every wiggle. BUY and SELL events were comparatively rare and gated by confluence.

In this run, the latest Fear and Greed reading was 23, extreme fear. Nova’s logic leans contrarian. It tries to wait for enough agreement between technicals, sentiment, and market regime before acting.

That is probably why it avoided a lot of passive drawdown.

But restraint is not a full strategy. It is only one part of one.

If the system waits well but exits badly, the next fix is not more clever entry logic. It is trade hygiene. Position rules. Exit thresholds. Stop behavior. A better explanation for why realized losses are so much larger than realized wins.

The audit made that obvious.

This is the part most automation projects skip

The seductive version of automation is the first successful run.

A bot posts the update. A script moves the file. A model writes the summary. A workflow sends the email. Everyone sees the thing happen once and starts talking as if the system exists.

It does not really exist yet.

A system starts to exist when it can run without attention and still leave enough evidence behind for someone to inspect it later.

Did it run?

Did it run on time?

Did it fail cleanly?

Did the data advance?

Did the output match the internal state?

Did the backtest agree with the paper book?

Did the health check measure health, or did it only measure whether one script finished before an arbitrary timeout?

That is the boring work. It is also where most of the value is.

This is why I am skeptical of automation projects that jump straight to the visible action. The visible action is usually the easiest part. The hard part is ownership of state, exception handling, reconciliation, and knowing when the system is lying to you.

The Japan business version of the same problem

This is the same pattern I see in Japan SMEs, just with less crypto and more folders.

A company adds a tool. Then another tool. Then a spreadsheet survives because it is familiar. A customer record lives partly in email, partly in LINE, partly in a shared drive, and partly in someone’s memory. A dashboard exists, but nobody can explain which source of truth it reflects. The process works because competent people keep repairing it by hand.

From the outside, the system appears to run.

Under audit, the gaps show up.

The issue is rarely that the company lacks effort. Usually the opposite. People are working hard enough to hide the weakness of the system. Politeness, memory, and manual repair keep things moving until the business asks for something harder: succession, foreign customers, automation, investor reporting, acquisition diligence, or AI.

Then the old system has to prove what happened.

That is where it breaks.

This is why I keep coming back to infrastructure before automation. If the business cannot explain its own state, automation only makes the confusion faster.

What I would fix before going further

First, the scheduler timeout. The watchdog should not report a false failure because an API call was slow. Either the timeout needs to be longer, or the job needs better internal timing and partial failure reporting.

Second, the ledger mismatch. Paper trading and backtesting need to agree trade by trade, or at least explain every difference.

Third, the exits. The system does not need to become more adventurous. It needs to stop accepting $45 losses in exchange for $6 wins.

Fourth, the baseline report should be automatic. Every run should compare the strategy against buy and hold WETH, WBTC, LINK, ARB, UNI, and the equal weight basket. Without that, it is too easy to confuse absolute performance with relative performance.

None of these fixes are glamorous. That is the point.

The honest result

Nova Trader did better than I expected and worse than I would want.

It did not blow up. It did not make money. It avoided most of a nasty market drawdown, which is encouraging. It also exposed weak exits, a scheduler timeout, and a disagreement between the paper book and the backtest.

That is a good paper trading result.

Not because it proves the strategy works. It does not.

Because it made the next questions obvious.

This is what small autonomous systems are good for when you treat them seriously. They create a controlled place to find out where your assumptions stop matching reality.

The useful lesson was not that a bot can trade.

The useful lesson was that after 851 cycles, the system had finally produced enough evidence to audit itself.

That is when automation starts to become operational infrastructure.

If your company is adding automation, AI workflows, dashboards, or internal tools, the same question applies: can the system explain what happened after nobody was watching?

If it cannot, you do not have automation yet. You have a demo with a memory problem.

For teams trying to turn fragile workflows into systems that can survive audits, handoffs, and growth, that is the work I care about: making the operating layer legible before asking it to move faster. Start with a Stack Audit or a plain conversation about where your current system stops being trustworthy.