An AI-generated marathon training plan is, by 2026, what a coach used to be: the default first stop for the millions of recreational runners building towards a spring or autumn 26.2-mile race. A new study published in the British Medical Bulletin from Oxford Academic has now run a structured audit of the plans the leading large language models actually produce when asked to draft a six-month marathon programme — and the answer is that they get the broad structure right, mostly hit the basics of intensity distribution, but still stumble on the parts of training that hurt runners when they go wrong.
The descriptive study, "Artificial intelligence-generated marathon training programs: reliable tools in exercise prescription for athletic performance?", asked four widely-used AI systems to produce six-month marathon plans for three athlete profiles — beginner, intermediate and advanced. The plans were then scored against a current evidence-based scaffold drawn from sports-medicine literature: weekly mileage progression, polarised intensity distribution with more than 80 per cent of training volume at low intensity, recovery weeks every third or fourth week, a progressive long run, an in-built taper of two to three weeks, and load-management cues responsive to pain, fatigue or illness. Each of the AI systems identified those structural pillars in some form. None of them got every pillar right.
The most consistent failure was the taper. Three of the four systems shortened the taper week to within a fortnight of race day, two failed to drop weekly volume by the recommended 30-50 per cent in the final 14 days, and one tapered only the long run rather than overall volume. The other recurring issue was the speed of mileage progression. Several plans pushed beginner runners through the so-called 10 per cent rule and into 15 to 20 per cent week-on-week jumps, a pattern that the underlying systematic review evidence ties to a higher overuse-injury risk. The advanced-runner plans, by contrast, tended to be too conservative, capping peak weekly mileage well below what an experienced marathoner would need to break 2:50.
The other pattern the BMB authors flag is the gap between what the plans say and what they leave the runner to figure out. AI-generated programmes were good at giving daily session prescriptions and broadly sensible at periodising volume across the six months. They were less good at the in-context judgement calls that human coaches make routinely — how to swap a session in a heatwave, when to abort a long run because of a niggle, what to do with the back end of the plan after a head cold. The systems generally fell back on generic disclaimers ("listen to your body", "consult a medical professional") rather than producing structured load-management guidance.
The takeaway from the study is not that AI training plans are dangerous, but that they need a layer of human review before they are followed end-to-end. The authors recommend that runners using an AI-generated plan check three things before printing it: that the taper drops volume by at least 30 per cent over two to three weeks, that beginner mileage progression sits inside the 10 per cent rule, and that the programme has weeks specifically designated as recovery rather than compressing rest days into a busy training block. With AI-built plans now used by more recreational marathoners than any commercial coaching app, those checks may be the single highest-leverage 30 minutes a runner can spend on the entire build-up.
