Layer 5 · The signal correlation graph

Racing has always run on whispers.
We built the mathematics of the whisper.

The most original thing inside EachWay.ai is not a prediction model. It's a probabilistic temporal graph — a living map of who knows whom in racing, who tips whom, whose money moves first, and how much any of it should be believed. This page is the thinking behind it, in full.

Everything below operates on public information only — published tips, public market moves, declared bookings, public records. The insight is statistical, not stolen.

5
entity types in the network
18
distinct relationship types tracked
180 days
belief half-life — the graph forgets by design
≤ 15%
maximum influence on any consensus price
The premise

Insider knowledge in racing isn't a conspiracy theory. It's published economics.

Economists have measured information asymmetry in British racing markets for decades — Crafts (1985) found evidence of insider knowledge in betting patterns; Shin (1993) quantified its incidence from the structure of bookmaker odds; Kyle (1985) described how informed traders move any market. Connections — trainers, owners, jockeys, stable staff — genuinely know things the market doesn't: the gallop report, the breathing operation, the stable's quiet confidence.

The leak

Private knowledge escapes

It leaks two ways: through betting patterns, when informed money takes a price; and through tipsters with privileged access, whose "form analysis" is sometimes better described as proximity.

The problem

Each whisper is invisible alone

One tipster being right about one trainer's horse means nothing — luck does that daily. The signal only exists across many events, many relationships, much time. No human can hold that pattern in their head.

The answer

A spreadsheet sees rows. A graph sees relationships.

So we model racing as a network: entities as nodes, relationships as weighted edges, every edge carrying a probability that the connection is informed — updated after every race, decayed with every passing day.

"The market hears the whisper eventually.
The graph's job is to hear it structurally."— why Layer 5 exists

The cast

Five kinds of node. One question: who actually knows?

The graph is heterogeneous — five entity types, each with its own behaviour, its own statistical personality, and its own reason to be watched.

📣

Tipsters

The public voices. Most recycle the form book. A few are systematically early about specific yards — and "early" is the tell.

🎩

Trainers

The centre of racing's information economy. Nobody knows a horse's wellbeing better — and intent shows in entries, bookings and patterns.

🏇

Jockeys

The most volatile node type. Bookings churn week to week — which is exactly why a surprising booking is information.

🏠

Stables

Where the daily truth lives — the gallops, the appetite, the limp nobody mentions. Staff proximity is a documented leak path.

💼

Owners

The least visible and most interesting: one owner's horses can span many trainers, which lets information travel routes no trainer-level view can see.

The wiring

18 kinds of relationship — and the ones that matter most are about timing.

Edges are weighted, directional, and typed. The 18 types fall into four families, in ascending order of cunning:

Family one · Activity

Who pays attention to whom

Edges like TIPS_FOR and CO_MOVEMENT record raw association — this tipster keeps tipping this yard's horses; these two entities keep moving together. Attention is cheap, so these edges prove little alone. They tell the graph where to look.

Family two · Accuracy

Who is right about whom

ACCURATE_FOR edges record predictive success per relationship — not "is this tipster good?" but "is this tipster good about this specific trainer?" Information is local. A tipster with one golden source looks mediocre on average and brilliant on exactly one edge.

Family three · Structural

The industry's actual wiring

BOOKS_JOCKEY, TRAINS_AT, OWNER_SYNDICATE_LINK — the real-world relationships through which information can physically travel. These edges are the map of possible leak paths, against which everything else is read.

Family four · Temporal

Who moves before the news

The crown jewels: TIPS_BEFORE_BOOKING — a tipster who repeatedly tips a horse before the strong jockey booking becomes public. PRECEDED_DRIFT — tips that keep landing just before the market moves. STABLE_STAFF_PROXY — accuracy patterns consistent with yard proximity. Accuracy can be skill. Being early, repeatedly, about one yard — that's access. No accuracy statistic alone can see it; the timing edges exist precisely because of it.

The level-crossing insight

Information doesn't respect organisational charts. An owner with horses across four trainers can be the common source behind four signals that each look too weak to matter at trainer level — the pattern only becomes visible when you aggregate at the owner level. So the graph hunts for affinity at all four entity levels simultaneously — trainer, jockey, stable, owner — because a strong signal at one level can fragment into statistical dust at another. This single design decision is why a graph, and not a per-entity scoreboard, was the right architecture.

The belief engine

Every edge starts a sceptic. And every edge is slowly forgetting.

An edge's weight is not a tally — it's a Bayesian posterior: the probability, given everything observed so far, that this connection is genuinely informed. Three design choices keep that belief honest:

Informative priors — extraordinary claims need evidence

Every edge begins anchored to the population base rate: the graph's default belief about any tipster–trainer pair is "as informed as average, which is to say not at all". A handful of lucky hits barely moves it. Only sustained, repeated success drags the posterior away from scepticism — and owner edges start even more sceptical, because small samples lie loudest.

Temporal decay — the graph forgets by design

Racing relationships go stale: jockeys switch yards, staff move on, a tipster's source dries up. So every observation's influence decays exponentially with a 180-day half-life — and jockey edges decay faster (120 days) because booking relationships churn fastest. Evidence from last season counts a fraction of evidence from last week. A graph that never forgot would be a museum of dead relationships.

Effective sample size — decay-aware honesty about how much we really know

Decay creates a subtle trap: an edge with 50 observations, mostly old and faded, knows far less than 50. We compute the effective sample size (Kish's correction) and demand a real evidential floor — Neff ≥ 20 — before an edge can even be promoted from candidate to active. No edge gets believed on a technicality.

The lift trap — measured against the price, not the strike rate

Here's the trap most "tipster ratings" fall into: a tipster who only tips a top stable's short-priced favourites has a glorious strike rate and zero information. So the graph measures lift against the market-implied probability at the time of the tip — did following this edge beat the price actually available? Information is only information if the market hadn't already priced it.

The referee

A graph this curious will find ghosts. So we built the exorcist in.

Test thousands of entity pairs and some will look brilliant by pure chance — that's not a risk, it's a statistical certainty. The graph's findings only survive if they pass three layers of formal scrutiny:

Communities

Leiden community detection

Before judging individual edges, the graph asks who clusters with whom — Leiden community detection (Traag et al., 2019) surfaces groups of entities that behave like information-sharing networks: the syndicate, the yard-and-its-voices, the owner circle. Communities give context that single edges can't.

Multiple testing

False-discovery-rate control

Every pairwise affinity claim across every entity type is corrected jointly with the Benjamini–Hochberg procedure — and deeper, three-entity hypotheses are only tested conditional on the simpler ones surviving. The graph is forced to pay the statistical price of every question it asked, not just the ones it liked the answers to.

Confounders

The trainer-or-jockey problem

A tipster looks informed about a trainer. But that trainer always books the same jockey — so is it the trainer connection or the jockey connection? The Cochran–Mantel–Haenszel test stratifies one entity across the other to isolate which relationship actually carries the signal. Attribution matters, because the jockey will eventually ride for someone else.

Honesty about time: the relationship that existed then

Racing's relationships are a moving target — jockeys switch yards, horses change trainers, syndicates restructure. The graph keeps full point-in-time history (the data-warehouse SCD-2 discipline), so every signal is credited to the relationship that existed when the prediction was made — never the one that exists today. Without this, a graph quietly rewrites history and flatters itself; with it, hindsight is structurally impossible.

The primary signal

The best-informed people in racing never publish a tip.

They bet quietly. Which is why the graph's primary signal source isn't tipsters at all — it's the market's microstructure. The entity graph then does what it does best: explain why the quiet money might know something.

Anomalies, not opinions

The system watches exchange prices move and asks a cold question: is this move unusual for this time-of-day, for this field size, against 90 days of history? Moves are scored as statistical anomalies — z-scores against the market's own normal behaviour — not as "steamers" in a tout's newsletter. Most moves are recreational noise and score near zero.

Corroboration is the magic

An anomalous move alone is a curiosity. An anomalous move on a horse whose owner-level edge has been quietly strengthening, in a community the graph already flagged, shortly after a tipster with a TIPS_BEFORE_BOOKING history published — that's a story told by independent witnesses. The graph's job is to notice when the witnesses agree.

One of three self-improving loops

The graph doesn't just learn about racing. It learns how to learn.

EachWay.ai contains three self-improving elements: the analysts evolve their prompts, the strategies breed (arriving), and — the loop on this page — the graph tunes its own machinery. Thirty-one parameters govern how it detects, decays, clusters and believes; an optimiser searches that space continuously, under statistical control.

Discrete choices

Multi-armed bandits

Three Thompson Sampling bandits run live experiments on the graph's discrete decisions — how strict the anomaly threshold, which community-detection algorithm, how sceptical the priors. Each arm's track record decays over time, so the bandits keep re-asking old questions as the market changes.

Continuous dials

Bayesian optimisation

Continuous parameters — community resolution, decay half-lives — are searched with Bayesian optimisation against a composite objective dominated by market-beating skill (50%), with returns (30%) and calibration (20%) alongside. The optimiser is rewarded for being right, not busy.

Regime change

Drift detection

A Page–Hinkley drift detector watches the graph's own performance for structural breaks. When the market's behaviour shifts, the system enters an exploration burst — widening its search — instead of confidently applying yesterday's settings to tomorrow's market.

Self-improvement on a leash

The loop activates progressively, through six data gates — shadow scoring first, bandits only after enough signals, full Bayesian optimisation only after months of evidence. Every proposed configuration must beat the incumbent as a challenger against the champion, judged by day-level permutation tests with multiple-challenger corrections. An optimiser that could promote itself on noise would just be overfitting with a press release; the gates are what make "self-improving" an engineering claim rather than a marketing one.

Knowing its place

After all that cleverness, the graph gets one capped vote.

The deepest design decision is restraint. However compelling its signals, the graph is the seventh voice in the consensus — capped at 15% of the total weight, with its execution-time adjustment hard-limited to ±3 percentage points, and the six analysts' pricing never overwritten.

Capped influence

The cap exists because graph signals are the most exciting and therefore the most dangerous component — the one a lesser system would let run the show. Ours is structurally prevented from doing so.

A state machine, not tenure

Signals live in a lifecycle: ACTIVE → DAMPENED → DISABLED. Data goes stale or rolling skill dips — signals are dampened. And a standing kill switch ends the experiment wholesale: if the signal layer reaches 500 scored signals without positive skill against the market, it is disabled. It keeps learning in shadow mode, but it stops touching prices.

Scored like everything else

Every signal's contribution is measured with the same Brier Skill Score discipline as the analysts — against the market, in public. The whisper network has to beat the market to keep its seat. No component of EachWay.ai is grandfathered in. Not even the clever one.

"We gave the most novel layer the least power.
That's not timidity — that's the design."— the Layer 5 governing principle

You've seen how we listen to the whisper network.
Now watch what it hears.

Informed-money alerts from this graph are part of The Owners' Enclosure tier — and the methodology above ships in full in the Partner governance pack.

See plans & join Partner with the engine