Daniel Starch.

Educational psychology online

. (page 38 of 41)
Online LibraryDaniel StarchEducational psychology → online text (page 38 of 41)
Font size
QR-code for this ebook

logical measurements, in which the units depend upon the discrimi-
nation of judges, has been to regard a difference in amount of the
thing in question which can be distinguished correctly by 75% of
the judges or judgments concerning it as the smallest psychological
unit that can be used with reasonable certainty. For example, if
we take ten shades of blue and ask 100 judges to arrange them in
the order of blueness from left to right, we would regard that dif-
ference in blueness between any two successive shades which
75 of the judges agree in perceiving the one bluer than the other as
the least that can be distinguished with a fair degree of confidence.
The 75% point is chosen because it is midway between pure
chance and absolute certainty. If only one of two possible judg-
ments may be made, that is, if a given shade is either more or less


blue than another, then 50% of the judgments would be correct
by pure chance guesses. If 100% of the judgments are correct,
it means that the difference is so large that it can be recognized
correctly every time, and the amount of difference may range
any\vhere from just large enough to be always recognized up to
an infmite difference.

According to this principle, how large would the steps be on
the marking scale? For example, if we took two papers, a and h,
in a given subject which differed in quality just enough so that
three-fourths of the examiners or teachers would consider b better
than a, how large would this difference be, say on the usual 100
percentage scale? Data for answering this question with approx-
imate accuracy are found in Figures 87 to 00^ The probable
error or median deviation of the marks given by the teachers to
the papers represented in these four figures are 4.0, 4.8, 7.8 and 7.1
respectively, with an average of 6.4. By definition, two times the
probable error includes the middle half of the measures or marks.
For example, in Figure 88 the median is 80.2 and the probable
error is 4.8, that is, the middle 71 of the 142 marks lie between 75
and 85. Obviously one-fourth or 35 of the marks lie above 85.
Consequently so far as this particular paper is concerned the next
[jetter paper would have to be 4.8 points better so that three-
fourths of the examiners would consider it better.

Now the average probable error of the four sets of marks is 6.4.
Hence the difference between two pa[)ers in general must be approx-
imately 6.4 points so that three-fourths of the examiners would
consider one better than another. On this principle then the step
on the 100 percentage scale, with 70 as the usual passing grade,
turns out to be approximately 7 points. This would produce a
scale of steps as follows: 70-76, 77-84, 85-92, and 93-100. That
is, the marking scale would have five steps, failure and four passing
steps above 70. which may be designated as excellent, good, fair,
poor, and failure, or perhaps preferably by the symbols A, B, C, D,
and E. Such would be the size of the steps so that three-fourths
of the examiners of a given set of papers would agree in distinguish-
ing between the qualities of the papers.

However, any individual teacher agrees with himself more
closely in re-grading a set of papers than he agrees with other
teachers, as indicated in Table 137. This table shows that the
probable error or median deviation of a given teacher's marks in
re-grading his own papers is approximately 2 points. By the same


reasoning the amount of difference in quality between two papers
would have to be 2 points in order that an individual teacher would
consider one paper better than another in three out of four in-
dependent markings. Hence the marking scale for an individual
teacher, who grades papers from his own viewpoint and compares
them only with his own judgments, could have each step in a five-
step scale subdivided into three smaller steps of about 2 points each
by using the plus and minus. That is 70 to 76 would become
70-72 or D - , 73-74 D, and 75-76 D+, and so on.

Whether a fine marking scale such as the 100 percentage scale or
a coarser five-step scale should be used is largely a matter of con-
venience and personal habit. The advantage of a coarse scale is
perhaps that it avoids giving the pupil the impression that the
evaluation of a piece of work is more accurate than it actually is.
The advantage of a fine scale is that it probably encourages the
examiner in making as fine distinctions as possible. In practice
a fine scale can probably be used as readily and as quickly as a
coarse one if the teacher is accustomed to using a fine scale. A
person may use as fine a scale as he wishes provided he recognizes
the amount of the probable error in terms of the units of that par-
ticular scale. In terms of a 100 percentage scale the probable error
is about 6 or 7 points; in terms of the five-step scale it is about
one step which is 6 or 7 times as large as a point on the percentage
scale. The absolute amount of variation is substantially the same
on the two scales.

How Should Marks be Distributed to Groups of Pupils? If a
five-step scale is used, what percentages of pupils should in the
long run receive each of the five marks? The answer to this ques-
tion that I advocate is that the marks of large nmnbers of unse-
lected pupils should be distributed approximately in conformity
with the normal distribution or probability curve. Three lines of
evidence for this position may be presented, the last two of which
are fundamentally based upon the first:

First, mental and physical traits, when measured in large num-
bers of individuals, are distributed in a manner which )delds a
distribution surface very nearly identical with that of the proba-
bility curve. Concrete evidence for this statement has been pre-
sented in Chapter III, Figures 7 to 10, to which the reader should
turn. It seems reasonable to infer that abilities in school subjects
are very probably distributed in the same manner as other mental


In the second place, when abilities in school subjects are meas-
ured by objective methods, they are found to be, distributed in
very close conformity to the probability curve. Concrete evidence
for this is presented in Figures 16 to 27, Chapter III.

Thus, for example, the scores of 662 seventh grade pupils in the
author^s geography test shows the following distribution when the
total range of the base line is divided into five equal sections:

Scores 0-27 28-54 55-81 82-108 109-135

% of pupils 6 24 37 24 9

This is a remarkably close conformity to the theoretical dis-
tribution proposed on the following pages.

In the third place, the distribution of marks assigned by many
teachers to large numbers of students conforms fairly closely to
the normal distribution curve. When the marks of many teachers
are combined, the idiosyncrasies of individual teachers tend to be
counterbalanced. Tables 138 to 142, and Figures gi, 92, 93,
and 94, show the distributions of marks in various institutions and
the e.xtent to which they differ from the theoretical probability


Distribution of grades in the College of Letters and Science, University of
Wisconsin, for the years 1907, 1910 to 1915. From the reports of Presi-
dent E. A. Birge.


Incomplete & Failed Poor Fair Good cellent Grades
Elementary Course. . 3.6 9.3 15.3 33- 2 29.4 9.2 42,557

Advanced Courses .. . 3.2 3.5 7-9 3° -9 4i-8 12.7 39,302


Distribution of grades at Cornell University for the years 1902, 1903 and 1911.
Adapted from Finkelstein ('13, p. 22), to give the distributions for a five-
point scale, 60 being the passing grade.

Number of Grades

0-59 60-69 70-79 80-89 90-100

9.2 22.5 30.0 27.2 I I. I 20,348


Distribution of all grades for two academic years at Harvard College. After
Foster ('11, p. 262)

Totals E% D% C% B% A% of Grades

Elementary Courses 7 21 42 20 7 8969

Intermediate " 4 i3 37 28 12 2456

Advanced " 2 2 13 38 36 476



Distribution of grades at the University of Missouri. After Foster, p. 289

Delayed E D C B A of Grades

.6 8.7 41

13-7 47
13.8 48
14.4 49

Aug. 1908 3.5 15

Feb. 1909 s

June 1909 3

Feb. 1910 3

23-3 7-7

20 . 7 4.6

21.0 4.6

21-3 4-7 24,979

Averages 3.7 9.5 12.7 46.8 21.6 5.4

First year after

new system

went into efifect 9.0 14.5 50 21.7 4.9 11,342


Average percentages for Cornell, Missouri, and the elementary courses for
Harvard and Wisconsin. These percentages do not total 100 because the
incomplete grades for Wisconsin and Missouri are not included.

E D C B A Number of Grades

8.7 17.9 38.0 24.5 8.2 96,853

If we grant that marks in the long run should be assigned ac-
cording to the normal distribution curve, what percentage of
pupils should receive each of the five steps of the marking scale?
If the base line of the probability curve in Figvu"e 15, Chapter III,
is divided into five equal divisions, then the area above the various
divisions would comprise the following percentages of the total
area: ^

A, Excellent, or 93-100 = 7%

B, Good, or 85- 92 = 24%

C, Fair, or 77" 84 = 38%

D, Poor, or 70- 76 = 24%

E, Failure, or 69 = 7%

Figures 91 to 94 indicate how closely distributions of the marks
at Wisconsin, Cornell, Harvard, and Missouri run parallel to the
theoretical curve. The only difference is a slight skewing to the
right. Not quite as many D's are assigned and very slightly more
E's and A's are assigned than the theoretical distribution would
demand. Thus the marks as actually assigned by hundreds of

^ The ends of the probability curve would reach the base line only at infinity. Hence
an arbitrary point of termination must be selected. This has been placed at a point
3.6s P. E. values from the median. This point has been selected because it yields 7%
for the E and A surfaces which is appro.ximately the percentage of pupils receiving these
grades in many institutions.



teachers to thousands of students furnish impressive support for
the theory of the probability distribution of grades.

Certain objections, however, both of a theoretical and a practi-
cal kind, must be considered. In the first place, the soundness of

Con. & Failed Poor Fair Good Excellent

Fig. 91. — Distribution of 42,557 grades, broken line, in elementary courses
in the College of Letters and Science of the University of Wisconsin. The con-
tinuous Kne is the theoretical distribution. After a report by President E. A.

the theory rests on the supposition that the pupils are unselected,
chance specimens of mankind as a whole. This supposition, of
course, never obtains absolutely for any group of human beings
brought together any^vhere. The very reason that brings any
group together at the same time selects them. Pupils in school are

0-59 60-69 70-79 80-89 90-100
Fig. gi. — Distribution of 20,348 grades at Cornell University. After Fin-
kelstein ('13).

not random samplings of human beings of their respective ages —
the less so as one goes up the educational ladder. The tendency
is that every rung of the ladder selects on the whole slightly better
and better specimens. The fact, however, seems to be that the
selection which does take place is not of the sort that materially
modifies the form of the distribution ciu-ve but rather tends to
contract its base. The selection that does take place is not an



abrupt cutting off, but a gradual slicing off along a large share of
the distribution surface.


Fig. 93. — Distribution of 8,969 grades in elementary courses at Harvard Uni-
versity. After Foster ('11).

The writer undertook to ascertain the actual elimination of
university students as it really takes place on the basis of the records
of 476 freshmen tabulated by Dearborn. It was found that the


Fig. 94. — Distribution of 24,979 grades at the University of Missouri. After
Meyer ('08).

following percentages of students dropped out of the University
in the various grades of scholarship at the end of the freshmen and
sophomore years:



& Failed Poor
Percentage of students of each
grade dropped during fresh-
man year 100 52

Percentage of those remaining
in each grade, dropped during
sophomore year 45

Fair Good Excellent




This table reads that all students whose average grade was
"conditioned" or "failed" dropped out during the freshman year;
52% of those whose grade was "poor" dropped out during the
freshman year and 45% of those remaining whose grade was
"poor" dropped out during the sophomore year, etc. It is obvious,
therefore, that there is elimination from all classes of scholarship
with the exception of the highest from which there is very little or
no loss. The general effect of the actual elimination upon the
distribution curve is to shift the left end of the curve toward the
right and to change the general form of it only slightly as indicated
in Figure 95.

The outcome of this evidence is that the distribution of the
grades for the freshman year of the college as well as of the high

Fig. 95. — The continuous line shows the theoretical distribution of the marks
of students. The upper broken line represents the change in this curve due to
the dropping out of students during the freshman j'ear. The lower broken hne
represents the change in the curve due to the elimination during the sophomore
year. After Starch ('13).

school should conform quite closely to the theoretical distribution
curve and that slight shifts to the right should be made for the
successive four years. It may seem curious to recommend that
after the elimination of the successive years of the high school, the
distribution to be followed in the freshman year of the college
should be approximately normal again. The exj^lanation is that
the standards of the college are somewhat higher than those of the
high school; so that, even if the high school should eliminate all of
the poorest 7% of its pupils, the next poorest 7%, who are able to
complete the high school, are likely to be unable to meet the de-
mands of the college.

The second objection urged by teachers against the adoption of
the theoretical distribution of grades here recommended is that it
would be unfair to lay down a rule that 7% of the pupils should
be failed. How do we know; possibly by good teaching all pupils


may reach a sufficiently high attainment to be passed in the course.
The answer to this statement is that the effects even of the best
teaching will so rarely raise the attainments of pupils sufficiently
high so that none of the pupils w^ould fall below the passing grade;
and furthermore, in the interests of reasonably high standards of
scholarship the attainments of approximately 7% of large num-
bers of pupils will very probably not merit a passing grade. There
should be doubly good reasons for passing all students or for failing
considerably less than 7%. Many of the cases of "good teaching"
or "unusual classes" prove to be spurious when it is possible to
check them up by outside means.

. A third point is not so much an objection as a question of practi-
cal use of the principle of the distribution of grades; namely, in
how large classes or groups of pupils should we expect fairly close
conformity, and how close conformity should be expected? The
answer to the question which I shall give, on the basis of experience
in attempting to observe the principle in the assignment of grades,
is that for groups of students of 100 or more quite close conformity
should be expected. By quite close conformity I mean a deviation
of not more than about 25% above or below the number of grades
that theoretically should fall on a given step of the five-division
scale. For example, the theoretical distribution demands that
7% of the pupils should receive the grade of A or Excellent. For
groups of 100 or more pupils, this percentage should ordinarily
not be lower than 5 nor higher than 9; the percentage of B's should
ordinarily not run lower than iS nor higher than 30; the percentage
of C's should not run higher than 48, nor lower than 28, etc. The
larger the number of pupils concerned, the closer the conformity
should be. For groups smaller than 100 a wider latitude should be
permissible whenever there is genuine reason for wider deviation. I
advocate conformity to the theoretical distribution within the limits
of common sense with as much deviation as may seem permissible
for good cause. However, really genuine reasons for large devia-
tions, even with classes as small as 25 pupils, unless obviously
selected by special cause, is much rarer than teachers ordinarily

In support of this contention, the author ('15) reported an
experiment in which twenty-four compositions written by sixth
and seventh grade pupils were graded by 23 teachers according to
the usual percentage method with 70 as a passing grade. After
the papers had thus been graded, the teachers were requested to


grade them according to a five point scale and give the grade of E
to tv/o papers, D to from four to six papers, C to from eight to ten
papers, B to from four to six papers, and A to two papers. Even
if the teachers felt, for example, that there were no papers good
enough to receive the grade of A, they were to select the two best
ones and call them A. The outcome was that those teachers who
in their original grades differed most from the combined judgment
of all the teachers were forced to comply more closely to the actual
average marks as given in the first grading. One teacher marked
the highest paper 85 in the original grading, and objected to giving
it a grade of A in the forced distribution on the ground that no
paper in the lot was good enough to receive so high a grade, and
yet the average of the marks given by all the teachers to this
paper was 92.9, the best paper in the entire group.

The theory of the probability distribution of marks should be
observed with sense and reason and not in a purely mechanical
manner. A blind, unintelligent observance of the principle is
bound to lead to injustice, particularly with small classes. In one
such case wliich came to the author's attention it led to the giving
of a mediocre grade to a pupil of very high ability.

A fourth point frecjuently raised by teachers to justify unusually
high or low marks is that the particular class in question is an
unusually good one or poor one. Such a claim ought to be allowed
only if it can be justified by good evidence. There are, of course,
differences in classes, but these are almost never as great as we are
inclined to believe. Large differences between successive classes
in the same subject are for the most part illusory for the reason
that the judgment of an individual teacher is more likely to deviate
from a correct estimate than the average ability of a group devi-
ates from the average of other groups. The teacher who says to
each succeeding class that this is the best class he has ever had in
this subject would possess, if this judgment were correct, a magic
power for elevating the intellectual level of human beings.

The feeling on the part of teachers that a given class is an un-
usually good or poor class is quite often due to one or two unusu-
ally good or poor individuals whose impression upon the mind of
the teacher is outstanding, rather than to a higher or lower level
of the class as a whole.

^s concrete evidence of the extent to which a teacher may err
in such opinions and of the manner in which the opinion of the
teacher may be checked up, the curves in Figure 06 are Dresented.


The continuous c^u-^'e shows the distribution of the grades of a
teacher in Latin and German. When her attention was called to
the predominance of high marks, she claimed that her pupils were
exceptionally good. The broken curve shows the extent to which
her claim was unfounded since it shows that, according to their
abilities in other subjects, they were an average group.

How May Variation in the Assignment of Grades be Reduced.

I. By a common sense compliance in the distribution of marks
with the normal distribution or probability ciu-ve.

(a) To this end the administrator of a school should tabulate
at stated intervals the marks assigned by each teacher and exhibit
the tabulation to the teachers. This in itself will usually lead
without request or compulsion to a very considerable correction

Fig. 96. — The continuous line shows the distribution of the marks of a teacher
of Latin and German in a high school. The broken line shows the distribution
of the marks of the same pupils in their other subjects. After an unpublished
report of Supt. J. F. Waddell, Evansville, Wisconsin.

of aberrations on the part of those teachers who deviate most
widely. At the University of Missouri the adoption of a plan of
distribution in conformity with the normal curve reduced the
irregularity in grading in the ratio of five to two.

(b) The teacher himself will find it useful to tabulate at frequent
intervals the distribution of his grades. In making out the marks
of a set of papers and particularly in making out the final grades
for a cotu-se, the author has followed for several years the practice
of plotting a distribution of the grades as tentatively made out.
If the assignment is decidedly abnormal in having considerably
too many or too few of the different grades, a shift is made of
borderline cases, unless there is an obvious reason to the contrary,
to obtain a reasonably normal distribution. Every teacher feels
that there is a considerable number of cases concerning which he
is in doubt as to whether they should have the one or the other


grade. For example, if the tentative list of grades contains too
many or too few A's, the lowest A's may be shifted to B's or the
highest B's may be shifted to A's.

2. Variability and uncertainty in grades may be reduced by
adopting particularly in departments containing several teachers,
a plan of giving certain weights or penalties for certain types of
errors or defects. This should be done by departmental conference
so as to secure a consensus of judgments on the various types of
errors and amounts of penalties. Much could be done in this
direction toward greater uniformity in methods of grading. If
organizations of teachers would take this matter up, much of the
chaos which is now present in methods of grading could be re-
duced to order.

At the present time A's or B's obtained from different teachers
often mean quite different things. By observing the points here
suggested they would mean more nearly the same thing. Evalua-
tion of achievement in terms of judgment depends obviously upon
the judge. Marks as such will at best depend upon the examiner.
They will probably always have to be used. More impersonal and
objective methods for determining achievement in school work
are being developed at the present time. To what extent these
educational measuring devices will be able to replace the usual
examinations and grades will depend upon their future develop-



Abbott, E. E.: 1909. On the Analysis of the Memory Consciousness in

Orthography. Psychological Review Monograph, 11: No. i, 127-

Arai, T. : 191 2. Mental Fatigue. Columbia University Contributions

to Education, No. 54.
Ash, I. E.: 1914. Fatigue and Its Effects upon Control. Archives of

Psychology, No. 31.
Ayres, L. P.: 191 2. A Scale for Measuring the QuaUty of Handwriting

of School Children. Russell Sage Foundation, New York City.
Ayres, L. P.: 1913. The Spelling Vocabularies of Personal and Business

Letters. Russell Sage Foundation, New York City.
Ayres, L. P.: 191 5. A Measuring Scale for Ability in Spelling. Russell

Sage Foundation, New York City.

Bagley, W. C: 191 1. Educational Values.

Bagley, W. C: 191 5. The Determination of Minimum Essentials in
Elementary Geography and History. Fourteenth Yearbook of the
National Society for the Study of Education, 131-146.

Bagley, W. C, and Rugg, H. O.: 1916. The Content of American
History as taught in the Seventh and Eighth Grades: An Analysis of
Typical School Text-books. University of Illinois Bulletin.

Online LibraryDaniel StarchEducational psychology → online text (page 38 of 41)