Bruce, Bertram C.; Rubin, Ann D.; & Starr, Kathleen S. (1981).
Why readability formulas fail. IEEE
Transactions on Professional Communication, PC-24, 50-52.
Also as Reading Education Report No. 28 (1981, August). Urbana, IL:
University of Illinois, Center for the Study of Reading, and
BBN Report No. 4715 (1981). Cambridge, MA: Bolt Beranek and Newman.
Why Readability Formulas Fail
Kathleen S. Starr
This research was supported by the National Institute of Education
under Contract No. MS-NIE-C-400-76-0116.
Being able to measure the readability of a text with a simple formula
is an attractive prospect, and
many groups have been using readability formulas in a variety
of situations where estimates of text
complexity are thought to be necessary.
The most obvious and explicit use of readability formulas is by
publishers designing basal
and remedial reading texts;
some states, in fact, will consider using a basal series only if it
certain readability formula criteria.
documents such as insurance policies, tax forms,
contracts, and jury instructions must meet criteria stated
in terms of readability formulas.
Unfortunately, readability formulas just don't fulfill their promise.
This failure can be attributed to three
weaknesses in the formulas.
From a theoretical point of view, they ignore or violate much of
about reading and the reading process.
Second, their statistical bases are shaky,
being at once poorly supported mathematically and
difficult to generalize.
Finally, as practical tools either for matching
children and texts or for providing
guidelines for writers they are totally inappropriate.
Criticisms such as these have been leveled at
readability formulas from many quarters
(Gilliland, 1972; Redish, 1979; Kintsch & Vipond, 1977), but
the formulas' uses have expanded in spite of the growing number of
papers discussing their weaknesses.
We attempt here to categorize and summarize some of the
problems with readability formulas and their use.
Factors Not in the Formulas
The first category of problem involves the discrepancy
between the characteristics of texts which readability formulas
measure and those which we know to influence text
Because most of the formulas include only sentence length and word
difficulty as factors,
they can account only indirectly for other factors
that make a particular text difficult, such as
degree of discourse cohesion,
number of inferences required,
number of items to remember,
complexity of ideas,
rhetorical structure, dialect,
and background knowledge required.
Further, because the formulas are measurements based
on a text isolated from the context of its use, they
cannot reflect such reader-specific
factors as motivation, interest, competitiveness,
values, and purpose.
Readability formulas fail to account for differences in readers'
dialect and cultural backgrounds.
For example, a passage in Black Vernacular from
the Bridge series (Simpkins, 1977), a
reading program, starts:
Willie went and got hisself a lightweight gig.
The gig wasn't saying too much.
It wasn't paying nothing but chump change.
Readers familiar with this form of Black Vernacular find the passage
Others can infer the meanings of individual words only with difficulty.
Because they view texts so narrowly, readability formulas also fail
to measure the effect of the context in which a passage is read.
A health information sheet describing the concept and treatment of
example, may communicate quite effectively if a patient has enough time
to read it
and feels comfortable asking a physician for clarification.
In a rushed, brusque encounter, however,
the document would be much less comprehensible.
Lack of Statistical Basis
Despite the shortcomings of readability formulas on
theoretical grounds, strong empirical evidence of their predictive
might justify their use for some tasks.
Unfortunately, when such evidence is examined, the second major problem
readability formulas--their lack of solid statistical
Many of the hundreds of formulas in existence were
validated only in terms of earlier formulas.
The early formulas, in turn,
were validated using the McCall-Crabbs Standard Test
Lessons in Reading (McCall & Crabbs, 1950, 1961).
But the McCall-Crabbs lessons were intended only
as practice exercises in reading,
never as measures of comprehension or text comprehensibility;
nor were they intended to be general indicators
of reading ability across age,
class, or cultural groups.
Nevertheless, the most respected
formulas have all used the McCall-Crabbs lessons as the criterion of
difficulty (Stevens, 1980).
Spache (1978), a readability formula designer, stated
the problem succinctly:
The reading level given by the formula should mean
that a child with that level of reading ability could read the book
with adequate comprehension and a reasonable
number of oral reading errors.
This assumption has seldom if ever been
tested in the development of this and other readability formulas
While validation studies were generally not performed in the
course of developing readability formulas, a fair number were
done after the fact.
In a comprehensive review of such studies,
Klare (1976) noted that 39 of 65 studies demonstrated a positive
correlation between readability formula estimates of difficulty
and reader performance on independent criteria such
as reading speed or comprehension.
However, even this unconvincing performance is undercut by
his observation that positive results are more likely
to be reported in journals than negative ones
and by the fact that when
comprehension, rather than reading speed, is used as the independent
measure of text difficulty, only half of the studies indicated positive
correlations with readability formula estimates.
Lockman (1957) computed Flesch Reading Ease scores for nine
sets of instructions for psychological tests,
then had 171 naval cadets rate them on
The rank-order correlation between the two sets of measurements was
strong correlation but in the wrong direction.
Common sense also leads us to wonder how generalizable readability
estimates are beyond the precise situation in which they were
In 1978 Spache (1978) developed a revised version of his 1953 formula,
If a readability formula is to continue to reflect
of the difficulty of today's
books, it, too, must change.
That is, a formula validated with one group
of students and one type of texts
is found to be invalid for the same types of students and texts as
conditions change over
a 25-year period.
The effects on validity of the formula for readers
having different cultural backgrounds or dialects must be considerably
This leads us to the
third general failing of the readability formulas: Their use is
inappropriate in two of the contexts in which
they seem most valuable.
The first of these is the selection
of an appropriate text for a child in school.
if we assume the formulas have some limited validity and even if we are
working with appropriate groups of texts and
readers, we can never assume that the formula will correctly
predict how a particular reader will interact with a particular book.
For example, the book Don't Forget the Bacon
is a children's book that scores at grade level 2.7 using the Spache
It has mostly one syllable, easy words
and short, simple sentences, e.g., "a pile of chairs?".
Nevertheless, some children in fourth grade find it difficult to
understand because the higher-level structure of the story is complex
The main character is a small boy given a verbal grocery list by his
Understanding the story depends on distinguishing between times the boy
is rehearsing the list in order to remember it
and times he is repeating the same list in order to figure out what
Because of this twist, the book may be more complex than its low score
Relying on the formulas either to gauge the book's
readability or a child's reading level could be worse than useless.
A second major use for readability
formulas has been as guidelines for the simplification of existing
texts and documents.
Here, too, using these formulas is inappropriate.
Although they may, in certain cases, assign reasonable numerical values
to texts, they by no means justify modifications of an
Yet, in cases where readability formulas are used, writers naturally
tend to write to the formulas.
Such prescriptive use magnifies the inaccuracies inherent in
Several studies have investigated the effect of using readability
formulas to guide text revision.
An exercise in rewriting jury instructions
demonstrated that the score of revised instructions on a readability
measure had little to do
with how well they were understood by jurors (Charrow &
A study by Davison, Kantor, Hannah, Hermon, Lutz, and Salzillo (1980)
showed that adapting texts in the Science Research Associates
Skillbuilders series to fit the formulas was not
but, in many cases, actually increased the
difficulty of the text.
For example, in a passage about trees, the
If given a chance before another fire comes, the tree
will heal its own wounds by
growing new bark over the burned part.
was changed to
If given a chance before another fire comes, the tree
will heal its
It will grow new bark over the burned part.
The modified text contains shorter sentences, so
according to most readability formulas it
should be easier to read.
However, the reader must now make the inference that
the new bark is the mechanism by which the tree
heals its wounds without an
explicit statement of this fact.
Thus, the adapted text may actually be more
difficult than the original.
Criteria for Applicability
The preceding examples illustrate various ways in which
readability formulas yield faulty predictions, or
even lead to the writing of passages which are harder to read.
As a series of separate examples, they do not show
why readability formulas fail nor do they distinguish among different
situations in which the formulas might be more or less appropriate.
In each case, however, we can point to an assumption about the use of
which has been violated.
On the basis of these examples of readability formula failure, then, we
are led to the conclusion that the formulas are valid only if
certain conditions hold.
Interestingly, similar lists of conditions have been put forth by
of the formulas themselves.
It is becoming increasingly clear that readability formulas should be
used only where the following criteria are met:
Material may be freely read. Material like
captioning for the deaf,
which appears on the screen and then disappears after a certain
time, cannot be freely read.
The time spent on it is limited by external factors, not by choice of
Unfortunately, it appears that not only some, but nearly all, uses of
violate the basic assumptions on their applicability.
Rigorous adherence to these assumptions effectively prevents
use of readability formulas for TV captioning, adaptation,
selection of texts for readers of different cultural backgrounds,
designing special texts for children, selection of text passages,
choosing trade books, or designing remedial readers, and restricts
readability formula use to trivial cases of little import for
educational or social policy.
Text is honestly written. The formulas assume that
material is not written
to satisfy the readability formulas, but rather to satisfy some other
Higher-level text structures are irrelevant. The
formulas assume that
organizational material, information about intentions, goals, etc.
need not be specifically taken into account.
Purpose in reading is irrelevant. Skimming,
test-taking, reading for
pleasure, and so on are all taken to be equivalent in determining the
readability of a passage.
Statistical averages are meaningful in individual cases.
Use of the
formulas implies that statistical averages regarding both texts and
can provide useful information regarding the appropriateness of an
text for an individual person.
Readers of interest are the same as the readers on whom the
readability formula was validated. Any attempt to expand the
use of the formula to evaluate materials for readers whose
background, dialect, purpose in reading, etc. differs from those of
the readers used in validation is likely to lead to difficulties.
We are left with a question: Are there any areas in which
the assumptions about the readability formulas are satisfied and
the formulas improve on intuitive estimates of the readability
of a text?
We think not.
The real factors that affect readability are elements such as the
background knowledge of the reader relative to the knowledge
presumed by the writer, the purpose of the reader relative to
the purpose of the writer, and the purpose of the person who is
presenting the text to the reader.
These factors cannot be captured in a simple formula and ignoring them
may do more harm than good.
Charrow, R., & Charrow, V. Making legal
language understandable: A psycholinguistic study of jury
Columbia Law Review, 1979, 79,
Davison, A., Kantor, R., Hannah, J., Hermon, G.,Lutz, R., Salzillo, R.
Limitations of readability
formulas in guiding adaptations of texts (Tech.
Rep. No. 162). Urbana: University of Illinois Center for the Study of
Reading, March 1980.
(ERIC Document Reproduction Service No. ED 184 090)
Gilliland, J. Readability.
London: University of London Press Ltd., 1972.
Hutchins, P. Don't forget the
New York: Greenwillow Books, 1976.
Kintsch, W., & Vipond, D. Reading
comprehension and readability in educational practice and psychological
In Lars-Goran Nilsson (Ed.), Proceedings of the Conference on
Hillsdale, N.J.: Erlbaum, 1977.
Klare, G. R. A second look at the validity of readability formulas.
Journal of Reading Behavior,
1976, 8, 129-152.
Lockman, R. F. A note on measuring "understandability".
Journal of Applied Psychology,
1957, 40, 195-196.
McCall, W. A., & Crabbs, L. M. Standard
test lessons in reading.
N.Y.: Teachers College Press, 1950, 1961.
Redish, J. Readability.
In D. A. McDonald (Ed.), Drafting documents in plain language.
New York: Practicing Law Institute, 1979.
Simpkins, G., Holt, G., & Simpkins,
C. Bridge - A
Cross-Culture Reading Program.
Boston: Houghton Mifflin, 1977.
Spache, G. D. Good reading for poor readers (rev.
Champaign, IL: Garrard, 1978.
Stevens, K. C. Readability formulae and McCall-Crabbs standard test
lessons in reading.
The Reading Teacher, January 1980, 413-415.