Medical terms are a recognised problem in doctor–patient consultations. By contrast, the language difficulties of online healthcare documents are underestimated, even though patients are often encouraged to go to the internet for information. Literacy levels in the community vary, and for patients, carers and health workers with limited reading skills (including first- and second-language users of English), the language of web-based health documents may be challenging or impenetrable. Online delivery of health information is inherently problematic because it cannot provide two-way discussion; and amid the range of health documents on the web, the intended readership (whether general or specialist) is rarely indicated up front. In this research study, we focus on the language and readability of web-based cancer documents, using lexicostatistical methods to profile the vocabularies in two large test databases of breast cancer information, one consisting of material designed for health professionals, the other for the general public. They yielded significantly different word frequency rankings and keyness values, broadly correlating with their different readerships, that is, scientifically literate readers for the professional dataset, and non-specialist readers for the public dataset. The higher type/token ratio in the professional dataset confirms its greater lexical demands, with no concessions to the variable language and literacy skills among second-language health workers. Their language needs can, however, be addressed by a new online multilingual termbank of breast cancer vocabulary, HealthTermFinder, designed to sit alongside health documents on the internet, and provide postconsultation help for patients and carers at their point of need.