Bruce, Bertram C.; Rubin, Ann D.; & Starr, Kathleen S. (1981). Why readability formulas fail. IEEE Transactions on Professional Communication, PC-24, 50-52. Also as Reading Education Report No. 28 (1981, August). Urbana, IL: University of Illinois, Center for the Study of Reading, and BBN Report No. 4715 (1981). Cambridge, MA: Bolt Beranek and Newman.

Why Readability Formulas Fail

Bertram Bruce
Andee Rubin
Kathleen S. Starr

This research was supported by the National Institute of Education under Contract No. MS-NIE-C-400-76-0116.

Being able to measure the readability of a text with a simple formula is an attractive prospect, and many groups have been using readability formulas in a variety of situations where estimates of text complexity are thought to be necessary. The most obvious and explicit use of readability formulas is by educational publishers designing basal and remedial reading texts; some states, in fact, will consider using a basal series only if it fits certain readability formula criteria. Increasingly, public documents such as insurance policies, tax forms, contracts, and jury instructions must meet criteria stated in terms of readability formulas.

Unfortunately, readability formulas just don't fulfill their promise. This failure can be attributed to three weaknesses in the formulas. From a theoretical point of view, they ignore or violate much of current knowledge about reading and the reading process. Second, their statistical bases are shaky, being at once poorly supported mathematically and difficult to generalize. Finally, as practical tools either for matching children and texts or for providing guidelines for writers they are totally inappropriate. Criticisms such as these have been leveled at readability formulas from many quarters (Gilliland, 1972; Redish, 1979; Kintsch & Vipond, 1977), but the formulas' uses have expanded in spite of the growing number of papers discussing their weaknesses. We attempt here to categorize and summarize some of the problems with readability formulas and their use.

Factors Not in the Formulas

The first category of problem involves the discrepancy between the characteristics of texts which readability formulas measure and those which we know to influence text comprehensibility. Because most of the formulas include only sentence length and word difficulty as factors, they can account only indirectly for other factors that make a particular text difficult, such as degree of discourse cohesion, number of inferences required, number of items to remember, complexity of ideas, rhetorical structure, dialect, and background knowledge required. Further, because the formulas are measurements based on a text isolated from the context of its use, they cannot reflect such reader-specific factors as motivation, interest, competitiveness, values, and purpose.

Readability formulas fail to account for differences in readers' dialect and cultural backgrounds. For example, a passage in Black Vernacular from the Bridge series (Simpkins, 1977), a cross-cultural reading program, starts:
Willie went and got hisself a lightweight gig. The gig wasn't saying too much. It wasn't paying nothing but chump change.
Readers familiar with this form of Black Vernacular find the passage relatively simple. Others can infer the meanings of individual words only with difficulty.

Because they view texts so narrowly, readability formulas also fail to measure the effect of the context in which a passage is read. A health information sheet describing the concept and treatment of hypertension, for example, may communicate quite effectively if a patient has enough time to read it and feels comfortable asking a physician for clarification. In a rushed, brusque encounter, however, the document would be much less comprehensible.

Lack of Statistical Basis

Despite the shortcomings of readability formulas on theoretical grounds, strong empirical evidence of their predictive value might justify their use for some tasks. Unfortunately, when such evidence is examined, the second major problem with readability formulas--their lack of solid statistical grounding--becomes apparent. Many of the hundreds of formulas in existence were validated only in terms of earlier formulas. The early formulas, in turn, were validated using the McCall-Crabbs Standard Test Lessons in Reading (McCall & Crabbs, 1950, 1961). But the McCall-Crabbs lessons were intended only as practice exercises in reading, never as measures of comprehension or text comprehensibility; nor were they intended to be general indicators of reading ability across age, class, or cultural groups. Nevertheless, the most respected formulas have all used the McCall-Crabbs lessons as the criterion of difficulty (Stevens, 1980).

Spache (1978), a readability formula designer, stated the problem succinctly:
The reading level given by the formula should mean that a child with that level of reading ability could read the book with adequate comprehension and a reasonable number of oral reading errors. This assumption has seldom if ever been tested in the development of this and other readability formulas (emphasis added).
While validation studies were generally not performed in the course of developing readability formulas, a fair number were done after the fact. In a comprehensive review of such studies, Klare (1976) noted that 39 of 65 studies demonstrated a positive correlation between readability formula estimates of difficulty and reader performance on independent criteria such as reading speed or comprehension. However, even this unconvincing performance is undercut by his observation that positive results are more likely to be reported in journals than negative ones and by the fact that when comprehension, rather than reading speed, is used as the independent measure of text difficulty, only half of the studies indicated positive correlations with readability formula estimates. Lockman (1957) computed Flesch Reading Ease scores for nine sets of instructions for psychological tests, then had 171 naval cadets rate them on "understandability." The rank-order correlation between the two sets of measurements was -0.65, a strong correlation but in the wrong direction.

Common sense also leads us to wonder how generalizable readability formula estimates are beyond the precise situation in which they were validated. In 1978 Spache (1978) developed a revised version of his 1953 formula, saying,
If a readability formula is to continue to reflect accurate estimates of the difficulty of today's books, it, too, must change.
That is, a formula validated with one group of students and one type of texts is found to be invalid for the same types of students and texts as conditions change over a 25-year period. The effects on validity of the formula for readers having different cultural backgrounds or dialects must be considerably greater.

Inappropriate Use

This leads us to the third general failing of the readability formulas: Their use is inappropriate in two of the contexts in which they seem most valuable. The first of these is the selection of an appropriate text for a child in school. Even if we assume the formulas have some limited validity and even if we are working with appropriate groups of texts and readers, we can never assume that the formula will correctly predict how a particular reader will interact with a particular book.

For example, the book Don't Forget the Bacon (Hutchins, 1976) is a children's book that scores at grade level 2.7 using the Spache (1978) formula. It has mostly one syllable, easy words and short, simple sentences, e.g., "a pile of chairs?". Nevertheless, some children in fourth grade find it difficult to understand because the higher-level structure of the story is complex and subtle. The main character is a small boy given a verbal grocery list by his mother. Understanding the story depends on distinguishing between times the boy is rehearsing the list in order to remember it and times he is repeating the same list in order to figure out what went wrong. Because of this twist, the book may be more complex than its low score implies. Relying on the formulas either to gauge the book's readability or a child's reading level could be worse than useless.

A second major use for readability formulas has been as guidelines for the simplification of existing texts and documents. Here, too, using these formulas is inappropriate. Although they may, in certain cases, assign reasonable numerical values to texts, they by no means justify modifications of an existing text. Yet, in cases where readability formulas are used, writers naturally tend to write to the formulas. Such prescriptive use magnifies the inaccuracies inherent in the formulas.

Several studies have investigated the effect of using readability formulas to guide text revision. An exercise in rewriting jury instructions demonstrated that the score of revised instructions on a readability measure had little to do with how well they were understood by jurors (Charrow & Charrow, 1979).

A study by Davison, Kantor, Hannah, Hermon, Lutz, and Salzillo (1980) showed that adapting texts in the Science Research Associates Skillbuilders series to fit the formulas was not only ineffective, but, in many cases, actually increased the difficulty of the text. For example, in a passage about trees, the sentence
If given a chance before another fire comes, the tree will heal its own wounds by growing new bark over the burned part.
was changed to
If given a chance before another fire comes, the tree will heal its own wounds. It will grow new bark over the burned part.
The modified text contains shorter sentences, so according to most readability formulas it should be easier to read. However, the reader must now make the inference that the new bark is the mechanism by which the tree heals its wounds without an explicit statement of this fact. Thus, the adapted text may actually be more difficult than the original.

Criteria for Applicability

The preceding examples illustrate various ways in which readability formulas yield faulty predictions, or even lead to the writing of passages which are harder to read. As a series of separate examples, they do not show why readability formulas fail nor do they distinguish among different situations in which the formulas might be more or less appropriate. In each case, however, we can point to an assumption about the use of the formulas which has been violated. On the basis of these examples of readability formula failure, then, we are led to the conclusion that the formulas are valid only if certain conditions hold. Interestingly, similar lists of conditions have been put forth by designers of the formulas themselves. It is becoming increasingly clear that readability formulas should be used only where the following criteria are met:

Unfortunately, it appears that not only some, but nearly all, uses of readability formulas violate the basic assumptions on their applicability. Rigorous adherence to these assumptions effectively prevents use of readability formulas for TV captioning, adaptation, selection of texts for readers of different cultural backgrounds, designing special texts for children, selection of text passages, choosing trade books, or designing remedial readers, and restricts readability formula use to trivial cases of little import for educational or social policy.

We are left with a question: Are there any areas in which the assumptions about the readability formulas are satisfied and the formulas improve on intuitive estimates of the readability of a text? We think not. The real factors that affect readability are elements such as the background knowledge of the reader relative to the knowledge presumed by the writer, the purpose of the reader relative to the purpose of the writer, and the purpose of the person who is presenting the text to the reader. These factors cannot be captured in a simple formula and ignoring them may do more harm than good.

Charrow, R., & Charrow, V. Making legal language understandable: A psycholinguistic study of jury instructions. Columbia Law Review, 1979, 79, 1306-1374.

Davison, A., Kantor, R., Hannah, J., Hermon, G.,Lutz, R., Salzillo, R.    Limitations of readability formulas in guiding adaptations of texts  (Tech. Rep. No. 162). Urbana: University of Illinois Center for the Study of Reading, March 1980. (ERIC Document Reproduction Service No. ED 184 090)

Gilliland, J.   Readability. London: University of London Press Ltd., 1972.

Hutchins, P.   Don't forget the bacon. New York: Greenwillow Books, 1976.

Kintsch, W., & Vipond, D.   Reading comprehension and readability in educational practice and psychological theory. In Lars-Goran Nilsson (Ed.), Proceedings of the Conference on Memory. Hillsdale, N.J.: Erlbaum, 1977.

Klare, G. R. A second look at the validity of readability formulas. Journal of Reading Behavior, 1976, 8, 129-152.

Lockman, R. F. A note on measuring "understandability". Journal of Applied Psychology, 1957, 40, 195-196.

McCall, W. A., & Crabbs, L. M.   Standard test lessons in reading. N.Y.: Teachers College Press, 1950, 1961.

Redish, J.   Readability. In D. A. McDonald (Ed.), Drafting documents in plain language. New York: Practicing Law Institute, 1979.

Simpkins, G., Holt, G., & Simpkins, C.   Bridge - A Cross-Culture Reading Program. Boston: Houghton Mifflin, 1977.

Spache, G. D. Good reading for poor readers (rev. 10th ed.). Champaign, IL: Garrard, 1978.

Stevens, K. C. Readability formulae and McCall-Crabbs standard test lessons in reading. The Reading Teacher, January 1980, 413-415.