Vol. XX ' OCTOBER 16, 1922 No. 7

[Entered as second-class matter December 11, 1912, at the post office at Urbana, Illinois, under the
Act of August 24, 1912. Accepted for mailing at the special rate of postage provided for in
Section 1103, Act of October 3, 1917, authorized July 31, 1918.]




W-'.LTER S. MONROE, Director




The Bureau of Educational Research was established by act
of the Board of Trustees June 1, 1918. It is the purpose of the
Bureau to conduct original investigations in the field of education,
to summarize and bring to the attention of school people the results
of research elsewhere, and to be of service to the schools of the
state in other ways.

The results of original investigations carried on by the Bureau
of Educational Research are published in the form of bulletins. A
complete list of these publications is given on the back cover of
this bulletin. At the present time five or six original investigations
are reported each year. The accounts of research conducted else-
where and other communications to the school men of the state
are published in the form of educational research circulars. From
ten to fifteen of these are issued each year.

The Bureau is a department of the College of Education. Its
immediate direction is vested in a Director, who is also an instructor
in the College of Education. Under his supervision research is
carried on by other members of the Bureau staff and also by grad-
uate students who are working on theses. From this point of view
the Bureau of Educational Research is a research laboratory for the
College of Education.


College of Education
University of Illinois, Urbana














INTRODUCTION: Basis of a rational estimate of the value of


written examinations 8



. Examinations yield inaccurate measures of achievement 9

A. Marking of examination papers subjective 9

B. Questions of examination not equal in difficulty, and

weighting by teachers subjective 11

C. Content of examinations not in agreement with educa-

tional objectives 12

D. Rate of work neglected 13

E. Adequate opportunity for a pupil to demonstrate his

ability not offered by single examination 13

F. Marks assigned to examination papers imply subjec-

tive norms 14

2. Undesirable mental processes stimulated by examinations.. 16

3. Examinations tend to become educational objectives 17

4. Examinations injurious to health of students 17

5. Time devoted to marking of examination papers might be

more profitably employed 17



1. Measurement of abilities of students necessary to high de-
gree of school efficiency 18

^Substitutes for written examinations 18

\r A. Standardized educational tests versus examinations.... 19

B. Teachers' estimates versus examination marks 19

C. Daily "grades" versus examination marks : 21

3. Inaccuracy of examination marks 22

A. Neglect of the rate of work not necessary 22

B. Unequal difficulty of questions not a serious defect 22

C. Inaccuracy of single examination 23


4. Examinations force students to review and organize con-

tent of course 24

5. Examinations furnish effective motive 25

s* 6. Proper use of examinations 26

7. Examinations as objectives 26

^^ 8. Effect of examinations upon health of students 26

9. Time devoted to examinations a profitable investment 27


J. Increasing objectivity in marking of examination papers.... 28

A. Increasing objectivity of "grades" by improving ex-

amination questions 28

(1) General methods 28

(2) New examinations: use of questions permitting
only one correct answer 29

a. True-false exercises 30

b. "Yes" and "no" exercises 31

c. Recognition exercises 3 1

d. Completion exercises 34

e. Other advantages of "new examination" 34

f. Limitations of "new examination" 35

B. Rules for marking examinations 36

2. Increasing objectivity of norms for translating examination

scores into school marks 39

3. Securing closer agreement of examinations with educational

objectives 41



1. True-false examination in physiology 44

2. True-false examination in history and civil government 48

3. True-false examination in geography 52

4. Completion examination in American government 57

5. Recognition examination in algebra 59

6. Traditional examinations.... 62


During the past twenty years there have been many controversies
concerning the value and place of written examinations. There have also
been a number of investigations of examinations and examination "grades."
Since standardized educational tests have become widely used a number
of superintendents and teachers have proposed that they replace the
written examinations set by teachers and other school officials. More
recently some attention has been given to the improvement of written ex-
aminations by the application of certain principles of test construction.

Because of the importance of the written examination and also because
a number of inquiries have been addressed to the Bureau of Educational
Research, it has seemed wise to organize and publish a summary of the
important ideas relating to both the criticism and the defense of examina-
tions. To this there have been added a number of suggestions for the im-
provement of examinations. It is hoped that this bulletin may foster
intelligent thinking relative to written examinations and their use in our

Although .this bulletin is largely the product of the labors of the
Director of the Bureau of Educational Research it is only just that the
contributions of other members of the staff should receive recognition.
Both Mrs. Charles H. Johnston and Mr. Lloyd B. Souders have made
substantial contributions.




Basis of a Rational Estimate of the Value of Written Examinations.

Until recently, examinations occupied a regular place in the work of
the school. Students expected them as a matter of course, and the ac-
curacy of the marks placed upon examination papers was not seriously
questioned. However, for a number of years written examinations set by
teachers and by other school officials have been subjected to criticism.
During this period the defects and the limitations of examinations have been
thoroughly canvassed. Many prominent educators have advised that they
be abolished entirely, and in a number of school systems this has act-
ually been done. The friends of examinations, however, have urged
their merits and have insisted that the abolition of them would cause our
educational system to deteriorate. The controversy has not been without
prejudice on both sides. The marking of examination papers involves
much drudgery for instructors. Students dislike examinations partly
because they require a type of intensive mental activity which many of
them prefer to avoid and partly because it is fashionable in many schools
to oppose them. Conservatives, naturally, have resented any proposal
to change a system of education which they credited with producing the
educated men of the present generation. Some, at least, have expressed
the belief that examinations have been largely responsible for the quality
of the output of our public schools and colleges.

In evaluating the criticisms and the defense of written examinations
it is imperative that one keep in mind the fact that they have more than
one function. Written examinations are not merely measuring instru-
ments, although this function is probably most prominent in the thinking of
many persons. The written examination is used as an instrument for
measuring the achievements of students, but it also affords a unique type
of opportunity for learning. Under rather well defined conditions, certain
tasks are set for the pupil and he is required to demonstrate within a limited
time what he is able to do. He is thrown upon his own resources and forced
to work under pressure. In the actual writing of his answers to the ques-
tions of the examination the pupil has an opportunity to learn. Ideas
tend to become more definite as a result of expression in written form.
Frequently the pupil gains new ideas as a result of the reflective thinking

he does in answering the questions. It is true that all pupils do not al-
ways learn in taking an examination, but it is also true that all pupils do
not take advantage of all other educational opportunities which are offered
them. In addition to the actual taking of the examination, the pupil
frequently, as a preparation for it, engages in review; and, because he knows
that later he must take the examination, he has a stronger motive for this

Not only is it important that we recognize the existence of functions
other than the one of measurement, but it is also imperative that we bear
in mind two distinctions. First, we must distinguish between criticisms
of examinations and criticisms of certain kinds of examinations. The
fact that some teachers set poor examinations does not furnish an adequate
basis for concluding that all examinations should be abolished. In the
second place, we should distinguish carefully between criticisms of ex-
aminations and criticisms of the ways in which they are used. Good ex-
aminations may be used for wrong purposes. For example, a good ex-
amination might be given to a pupil or a group of pupils merely as a punish-
ment for some misbehavior. If we believe that such use is not justified
it does not follow that the examination itself is subject to adverse criticism
or that all examinations should be abolished.



The arguments advanced for and against examinations have dealt
with various phases. Some of the criticisms have emphasized the effective-
ness of the examination as a measuring instrument; others have had to do
with the purposes for which examinations are used by teachers and by
other school officials. Some criticisms are based upon facts, while others
merely represent opinions. In the following pages the most significant
criticisms have been summarized and grouped under a few major heads.
In presenting these criticisms there will be no attempt to point out their
limitations or to present the arguments in favor of written examinations.
This will be reserved until the second chapter.

1. Examinations yield inaccurate measures of achievement. A
number of criticisms of written examinations set by teachers and by other
school officials have referred to their effectiveness as instruments for meas-
uring the achievements of students. These criticisms may be summarized
under six heads.

A. Marking of examination papers subjective. Scientific investi-
gation has proved that the marking of examination papers is subjective,
i.e., different teachers, when working independently, tend to assign widely
varying marks to the same paper. An investigation by Starch and Elliot 1
is typical of many that have been made. These investigators selected a
final examination paper in geometry, written by a student in one of the
largest high schools in Wisconsin. An exact reproduction of this paper
and a set of the questions were sent to one hundred and eighty high schools
in the North Central Association. It was requested that this paper be
graded according to the practise and standards of the school by the princi-
pal teacher of mathematics. One hundred and sixteen acceptable replies
were received. The papers showed evidence of having been marked with
unusual care and attention. In seventy-three schools where the passing
grade was 75 the lowest mark given was 39 and the highest 88. The mode
was 75, with twelve teachers giving this mark. Of the one hundred and

Starch, Daniel, and Elliot, E. C. "Reliability of grading high school work in math-
ematics," School Review, 21: 254-59, 1913.

sixteen marks assigned to this paper, two were above 90 and one was below
30. Twenty were 80 or above and twenty other marks were below 60.
Forty-seven teachers assigned a mark passing or above, but sixty-nine
teachers thought this paper not worthy of a passing mark.

Robert L. Morton 2 reports an investigation of the reliability of the
marking of examination papers written by teachers applying for a license
to teach. In 1904, the Ohio Legislature provided for uniform questions
for the teachers' examination. These questions were to be prepared in
the office of the State Superintendent of Public Instruction and sent to
the eighty-eight county boards of examiners. Special examiners were
appointed in each county to rate the papers. Morton selected an arith-
metic paper from the files of one board of examiners. The paper was
mimeographed, care being taken to produce exactly the language, spelling,
and punctuation of the original paper. A copy of this paper, together with
the questions, was sent to each of the eighty-eight county superintendents
in Ohio with the request that it be graded by the special examiner for arith-
metic. Replies were received from fifty-five counties. The lowest mark
given to the paper was 60 and the highest 99. In marking the answer
given to one question on this paper five examiners rated it at zero, twenty-
one at 10, and the other twenty-nine assigned marks between these
extremes. If each answer had been rated in the county assigning the low-
est mark to it, the total "grade" for the paper would have been 28. On
the other hand, if the highest marks assigned to the answers of the various
questions had been used to make up a ''grade", a mark of 100 would have
been given to the paper. Morton investigated in a similar way the mark-
ing of a paper in the theory and practise of teaching and also of one in
geography. Similar variations in the marks were found.

A striking illustration of the subjectivity of the marking of exami-
nation papers by college instructors is cited by a recent writer 3 . One of
the group of expert readers assigned to the marking of examination papers
in history, after scoring a few papers, wrote out for his own convenience
what he considered model answers to the questions. By some mischance
this "model" examination paper fell into the hands of another expert
reader who graded it as a paper written by a student. The mark he
assigned to it was below passing and, in accordance with the custom, this
"model" was rated by a number of other expert readers in order to insure
that it was properly marked. The marks assigned to it by these readers
varied from 40 to 90.

2 Morton, Robert L. "The examination method of licensing teachers," Educational
Administration and Supervision, 6: 421, November, 1920.

'Wood, Ben D. "Measurement of college work," Educational Administration and
Supervision, 7: 301-34, September, 1921.


Scientific investigation of the marking of examination papers has
been sufficiently extensive to prove that, except in a very few instances,
the process is subjective. Except for accidental errors, different teachers
should assign the same mark to an examination paper in spelling. The
marking should also be highly objective in arithmetic unless there is an
attempt to allow partial credit for examples and problems partially right
or for correct principle when the answer is not correct. The marking of
the answers to questions which call for specific facts, such as dates, names
of places, or persons, should approach objectivity. With the exception of
these cases, the marking as it is ordinarily done is highly subjective, and
hence the "grades" are inaccurate measures of achievement.

As might be expected, the degree of subjectivity varies with differ-
ent school subjects. It is, however, sometimes found to be high where the
nature of the subject matter leads one to expect that the marking will be
relatively objective. For example, Starch and Elliot found that the mark-
ing of an examination paper in geometry was just as subjective as one in
English or history. Kelly 4 found that the rating of examination papers
in algebra was considerably more objective than in physics.

B. Questions of an examination not equal in difficulty, and weighting
by teachers subjective. There is abundant evidence that the questions
of an examination are generally not equal in difficulty. Frequently, in
this respect, they vary widely. When the questions are submitted to a
large number of pupils, some will be answered correctly by a large percent
of the pupils, others by only a small percent. To give as much credit for
answering an easy question as for a difficult one would appear to introduce
serious errors into the marks assigned to the papers. Because it is recog-
nized that the questions which make up an examination are generally un-
equal in difficulty, teachers frequently attempt to assign appropriate
weights. For example, one question may be assigned a credit of 15 points
while an easy one is given a credit of only 4 points. One investigation 5
has shown that teachers' estimates of the difficulty of questions are highly
subjective. Twenty teachers were asked to arrange twenty- three prob-
lems in arithmetic in the order of their difficulty. A very wide variation in
these rankings was found. One problem was considered the easiest by
one teacher and ranked twenty-first in difficulty by another. The results
of this investigation seem representative. That being the case, any weigh t-

4 Kelly, F. J. "Teachers' marking," Teachers College, Columbia University, Con-
tributions to Education, No. 66, 1914.

6 Comin, Robert. "Teachers' estimates of the ability of pupils," School and Society,
3: 67-70, January 8, 1916.


ing of questions by teachers must be considered highly subjective, and hence
not a satisfactory corrective for the unequal difficulty of questions.

C. Content of examinations not in agreement with educational ob-
jectives. The criticism is frequently made that teachers, in formulating
examination questions, tend to ask for unimportant details and to neglect
the minimum essentials of a subject, and that, therefore, a pupil's per-
formance on an examination can not be a truthful index of the extent
to which he has achieved the educational objectives set for him. Some
questions are described as "catch questions." By this, it is usually meant
that such questions call for some unimportant detail or that they are am-
biguous in some way. There appears to have been no scientific investi-
gation of the character of the examination questions asked of pupils.
However, it is doubtless true that this criticism has justification in some
cases because frequently teachers give relatively little time to the prepar-
ation of their questions, and these often reflect any hobbies or prejudices
which the teachers may have. Experience in the construction of stand-
ardized educational tests has shown that it is difficult to eliminate all am-
biguity and indefiniteness in questions. Hence, it is likely true that many
questions are not well stated, and for this reason are not properly under-
stood by those taking the examination. When this is the case, the "grades'*
tend to be inaccurate measures of achievement.

When an examination is set by some person other than the teacher
of the class it not infrequently happens that many of the questions pertain
to topics which have received little or no attention during the instruction
periods. In many schools it seems to be the custom for the superintendent
or the principal, without consultation with the teacher in charge, to make
out the questions for the final examination on which the pupils' semester
grades are largely based. For example, in a fifth grade geography class
in an Illinois city, four of the five questions of the examination concerned
current conditions about which the children, instructed only in their texts,
knew little. A few pupils, fortunate enough to have heard these matters
discussed in their own homes, received a passing grade. The majority of
the class failed. This examination, interesting and in itself not subject to
criticism, should not have been used, however, as a means for measuring
the achievements of that particular class. It was not in agreement with
the educational objectives toward which the teacher had directed their
efforts. Such examinations are "hard" in the sense that capable students
will answer only a relatively small percent of the questions correctly, and
are rightly criticized as being unjust because the students are not given
an opportunity to demonstrate their achievements.


D. Rate of work neglected. The usual plan is to set an examination
which practically all pupils can finish in the time allowed. No record is
kept of the time which the pupil has spent in writing his answers. If two
pupils write papers which are considered equivalent in quality but one has
completed the examination in forty minutes and the other in ninety min-
utes, it is not customary to distinguish between their performances. Both
will receive the same "grade." This means that the rate of work is neg-
lected. Since the rate at which a pupil is able to answer questions is one
index of his ability, the ordinary examination fails in this respect to secure
a truthful measure of his ability.

. A single examination does not offer an adequate opportunity for
a pupil to demonstrate his ability. Some critics urge that a single exami-
nation, even when carefully prepared and graded, will not in general yield
a reliable measure of a student's ability. McAndrew 6 , in reviewing the
work of the New York City high schools, says, "New York City high
schools use them (examinations) in deciding the promotion to the training
school for teachers. We have every year some students whom their
teachers have complimented regularly but who fail of graduation because
of a three hour test which nullifies the work of four years. I cannot see
how a pupil writing for three hours can be tested for what he has done for
a year or more." Courtis 7 expresses much the same thought in the
following statement: "The best examination is not that represented by
the score of a single performance in a single day. Human effort is variable
and human skill too easily upset to make it fair to have promotion based
upon chance scores."

Thorndike 8 has summarized a number of investigations carried on at
Columbia University in order to determine the reliability of the "grades"
made on college entrance examinations as a basis for predicting the type
of work which the student will do in college. He states that we cannot
estimate the success of the student in college from his grades on entrance
examinations with "enough accuracy to make the entrance examinations
worth while and to prevent gross injustice being done to any individual.
The record of eleven or more entrance examinations gives a less accurate
prophecy of what a student will do in the latter half of his college course
than does his high school record." Similar results have been obtained by

'McAndrew, Wm. "Our old friend the examination," Proceedings of National Edu-
cational Association, 1916, pp. 527-33.

7 Courtis, S. A. "Standardizing of teachers' examinations," Proceedings of National
Educational Association, 1916, pp. 1078-86.

8 Thorndike, E. L. "The future of the college entrance examination board," Educa-
tional Review, 31: 470-83, May, 1906.


other investigators. 9 Studies of this type do not necessarily prove that
examination "grades" are inaccurate measures of achievement. Because
of other elements which enter into college life, a student having made a
satisfactory record in his secondary school may not carry on successfully
his work in college.

F. Marks assigned to examination papers imply subjective norms.
This criticism has to do with errors in interpreting measures of achievement

