
Goal Setting | Motivation |
Cognitive Strategy | Cooperative Learning | Assessment
Assessment
Excerpted from Chapter 12 of Biehler/Snowman, PSYCHOLOGY
APPLIED TO TEACHING, 8/e, Houghton Mifflin Co., 1997.
The Role of Assessment in Teaching
(pp. 439-442)
What is Assessment?
Why Should We Assess Students'
Learning?
Ways to Measure Student Learning
(pp. 442-446)
Written Tests
Performance Tests
Ways to Evaluate Student Learning
(pp. 450-453)
Norm-Referenced Grading
Criterion-Referenced Grading
Suggestions for Teaching in Your Classroom:
Effective Assessment Techniques
(pp. 464)
Resources for Further Investigation
(pp. 473)
The Role of Assessment in Teaching
Assessing student learning is something that every
teacher has to do, usually quite frequently. Written tests,
book reports, research papers, homework exercises, oral
presentations, question-and-answer sessions, science
projects, and artwork of various sorts are just some of the
ways in which teachers measure student learning, with
written tests accounting for about 45 percent of a typical
student's course grade (Green & Stager, 1986/1987). It
is no surprise, then, that the typical teacher can spend
between one-third and one-half of her class time engaged in
one or another type of measurement activity (Stiggins,
1994). Yet despite the amount of time teachers spend
assessing student learning, it is a task that most of them
dislike and that few do well. One reason is that many
teachers have little or no in-depth knowledge of assessment
principles (Crooks, 1988; Hills, 1991; Stiggins, Griswold,
& Wikelund, 1989). Another reason is that the role of
assessor is seen as being inconsistent with the role of
teacher (or helper). Since teachers with more training in
assessment use more appropriate assessment practices than do
teachers with less training (Green & Stager, 1986/1987),
a basic goal of this chapter is to help you understand how
such knowledge can be used to reinforce, rather than work
against, your role as teacher. Toward that end, we will
begin by defining what we mean by the term assessment
and by two key elements of this process, measurement and
evaluation.
Top
What is Assessment?
Broadly conceived, classroom assessment involves two
major types of activities: collecting information about how
much knowledge and skill students have learned (measurement)
and making judgments about the adequacy or acceptability of
each student's level of learning (evaluation). Both the
measurement and evaluation aspects of classroom assessment
can be accomplished in a number of ways. To determine how
much learning has occurred, teachers can, for example, have
students take exams, respond to oral questions, do homework
exercises, write papers, solve problems, and make oral
presentations. Teachers can then evaluate the scores from
those activities by comparing them either to one another or
to an absolute standard (such as an A equals 90 percent
correct). Throughout much of this chapter we will explain
and illustrate the various ways in which you can measure and
evaluate student learning.
Measurement
Measurement is the assignment of numbers to certain
attributes of objects, events, or people according to a
rule-governed system. For our purposes, we will limit the
discussion to attributes of people. For example, we can
measure someone's level of typing proficiency by counting
the number of words the person accurately types per minute
or someone's level of mathematical reasoning by counting the
number of problems correctly solved. In a classroom or other
group situation, the rules that are used to assign the
numbers will ordinarily create a ranking that reflects how
much of the attribute different people possess (Linn &
Gronlund, 1995).
Evaluation
Evaluation involves using a rule-governed system to make
judgments about the value or worth of a set of measures
(Linn & Gronlund, 1995). What does it mean, for example,
to say that a student answered eighty out of one hundred
earth science questions correctly? Depending on the rules
that are used, it could mean that the student has learned
that body of knowledge exceedingly well and is ready to
progress to the next unit of instruction or, conversely,
that the student has significant knowledge gaps and requires
additional instruction.
Top
Why Should We assess Students' Learning?
This question has several answers. We will use this
section to address four of the most common reasons for
assessment: to provide summaries of learning, to provide
information on learning progress, to diagnose specific
strengths and weaknesses in an individual's learning, and to
motivate further learning.
Summative Evaluation
The first, and probably most obvious, reason for
assessment is to provide to all interested parties a clear,
meaningful, and useful summary or accounting of how well a
student has met the teacher's objectives. When testing is
done for the purpose of assigning a letter or numerical
grade, it is often called summative evaluation since
its primary purpose is to sum up how well a student has
performed over time and at a variety of tasks.
Formative Evaluation
A second reason for assessing students is to monitor
their progress. The main things that teachers want to know
from time to time is whether students are keeping up with
the pace of instruction and are understanding all of the
material that has been covered so far. For students whose
pace of learning is either slower or faster than average or
whose understanding of certain ideas is faulty, you can
introduce supplementary instruction (a workbook or a
computer-based tutorial program), remedial instruction
(which may also be computer based), or in-class ability
grouping (recall that we discussed the benefits of this
arrangement in Chapter 6). Because the purpose of such
assessment is to facilitate or form learning and not to
assign a grade, it is usually called formative
evaluation.
Diagnosis
A third reason follows from the second. If you discover a
student who is having difficulty keeping up with the rest of
the class, you will probably want to know why in order to
determine the most appropriate course of action. This
purpose may lead you to construct an assessment (or to look
for one that has already been made up) that will provide you
with specific diagnostic information.
Effects on Learning
A fourth reason for assessment of student performance is
that it has potentially positive effects on various aspects
of learning and instruction. As Terence Crooks points out,
classroom assessment guides students' "judgment of what is
important to learn, affects their motivation and
self-perceptions of competence, structures their approaches
to and timing of personal study (e.g., spaced practice),
consolidates learning, and affects the development of
enduring learning strategies and skills. It appears to be
one of the most potent forces influencing education" (1988,
p. 467).
Top
Ways to Measure Student Learning
Just as measurement can play several roles in the
classroom, teachers have several ways to measure what
students have learned. Which type of measure you choose will
depend, of course, on the objectives you have stated. For
the purposes of this discussion, objectives can be
classified in terms of two broad categories: knowing about
something (for example, that knots are used to secure
objects, that dance is a form of social expression, that
microscopes are used to study things too small to be seen by
the naked eye) and knowing how to do something (for example,
tie a square knot, dance the waltz, operate a microscope).
Measures that attempt to assess the range and accuracy of
someone's knowledge are usually called written tests. And
measures that attempt to assess how well somebody can do
something are often referred to as performance tests. Again,
keep in mind that both types have a legitimate place in a
teacher's assessment arsenal. Which type is used, and to
what extent, will depend on the purpose or purposes you have
for assessing students. In the next two sections, we will
briefly examine the nature of both types.
Written Tests
Teachers spend a substantial part of each day assessing
student learning, and much of this assessment activity
involves giving and scoring some type of written test. Most
written tests are composed of one or more of the following
item types: selected response (multiple choice, true-false,
and matching, for example), short answer, and essay. They
are designed to measure how much people know about a
particular subject. In all likelihood, you have taken
hundreds of these types of tests in your school career thus
far. In the next couple of pages, we will briefly describe
the main features, advantages, and disadvantages of each
test.
Top
Selected-Response Tests
Characteristics
Selected-response tests are so named because the student
reads a relatively brief opening statement (called a
stem) and selects one of the provided alternatives as
the correct answer. Selected-response tests are typically
made up of multiple-choice, true-false, or matching items.
Quite often all three item types are used in a single test.
Selected-response tests are sometimes called "objective"
tests because they have a simple and set scoring system. If
alternative (b) of a multiple-choice item is keyed as the
correct response and the student chose alternative (d), the
student is marked wrong, regardless of how much the teacher
wanted the student to be right. But that doesn't mean
selected-response items are totally free of subjective
influences. After all, whoever created the test had to make
subjective judgments about which areas to emphasize, how to
word items, and which items to include in the final version.
Finally, selected-response tests are typically used when the
primary goal is to assess what might be called foundational
knowledge. This is the basic factual information and
cognitive skills that students need in order to do such
high-level tasks as solve problems and create products
(Stiggins, 1994).
Advantages
A major advantage of selected-response tests is
efficiency -- a teacher can ask many questions in a short
period of time. Another advantage is ease and reliability of
scoring. With the aid of a scoring template (such as a
multiple-choice answer sheet that has holes punched out
where the correct answer is located), many tests can be
quickly and uniformly scored.
Disadvantages
Because items that reflect the lowest level of Bloom's
Taxonomy (verbatim knowledge) are the easiest to write, most
teacher-made tests are composed almost entirely of
knowledge-level items (a point we made initially in Chapter
7). As a result, students focus on verbatim memorization
rather than on meaningful learning. Another disadvantage is
that, while we get some indication of what students know,
such tests tell us nothing about what students can do with
that knowledge.
Top
Short-Answer Tests
Characteristics
Instead of selecting from one or more alternatives, the
student is asked to supply a brief answer consisting of a
name, word, phrase, or symbol. Like selected-response tests,
short-answer tests can be scored quickly, accurately, and
consistently, thereby giving them an aura of objectivity.
They are primarily used for measuring foundational
knowledge.
Advantages
Short-answer items are relatively easy to write, so a
test, or part of one, can be constructed fairly quickly.
They allow for either broad or in-depth assessment of
foundational knowledge since students can respond to many
items within a short space of time. Since students have to
supply an answer, they have to recall, rather than
recognize, information.
Disadvantages
This item type has the same basic disadvantages as the
selected-response items. Because these items ask only for
short verbatim answers, students are likely to limit their
processing to that level, and these items provide no
information about how well students can use what they have
learned. In addition, unexpected but plausible answers may
be difficult to score.
Top
Essay Tests
Characteristics
The student is given a somewhat general directive to
discuss one or more related ideas according to certain
criteria. One example of an essay question is "Compare
operant conditioning theory and information-processing
theory in terms of basic assumptions, typical research
findings, and classroom applications."
Advantages
Essay tests reveal how well students can recall,
organize, and clearly communicate previously learned
information. When well written, essays tests call on such
higher-level abilities as analysis, synthesis, and
evaluation. Because of these demands, students are more
likely to try to meaningfully learn the material over which
they are tested.
Disadvantages
Consistency of grading is likely to be a problem. Two
students may have essentially similar responses, yet receive
different letter or numerical grades. These test items are
also very time consuming to grade. And because it takes time
for students to formulate and write responses, only a few
questions at most can be given.
Top
Performance Tests
In recent years many teachers and measurement experts
have argued that the typical written test should be used far
less often because it reveals little or nothing of the depth
of students' knowledge and how students use their knowledge
to work through questions, problems, and tasks. The solution
that these experts have proposed is to use one or more of
what are called performance tests.
Performance tests attempt to assess how well students use
foundational knowledge to perform complex tasks under more
or less realistic conditions. At the low end of the realism
spectrum, students may be asked to construct a map,
interpret a graph, or write an essay under highly
standardized conditions. That is, everyone completes the
same task in the same amount of time and under the same
conditions. At the high end of the spectrum, students may be
asked to conduct a science experiment, produce a painting,
or write an essay under conditions that are similar to those
of real life. For example, students may be told to produce a
compare-and-contrast essay on a particular topic by a
certain date, but the resources students choose to use, the
number of revisions they make, and when they work on the
essay are left unspecified. As we noted in Chapter 5, when
performance testing is conducted under such realistic
conditions, it is also called authentic assessment
(Meyer, 1992). Another term that is often used to encompass
both performance testing and authentic assessment, and to
distinguish them from traditional written tests, is
alternative assessment. In this section we will first
define the four different types of performance tests and
then look at their most important characteristics.
Top
Types of Performance Tests
Currently, there are four ways in which the performance
capabilities of students are typically assessed: direct
writing assessments, portfolios, exhibitions, and
demonstrations.
Direct Writing Assessments
These tests ask students to write about a specific topic
("Describe the person whom you admire the most, and explain
why you admire that person.") under a standard set of
conditions. Each essay is then scored by two or more people
according to a set of defined criteria.
Portfolios
A portfolio may contain one or more pieces of a student's
work, some of which demonstrate different stages of
completion. For example, a student's writing portfolio may
contain business letters; pieces of fiction; poetry; and an
outline, rough draft, and final draft of a research paper.
Through the inclusion of various stages of a research paper,
both the process and the end product can be assessed.
Portfolios can also be constructed for math and science as
well as for projects that combine two or more subject areas.
Often the student is involved in the selection of what is
included in his portfolio. The portfolio is sometimes used
as a showcase to illustrate exemplary pieces, but it also
works well as a collection of pieces that represent a
student's typical performances. In its best and truest
sense, the portfolio functions not just as a housing for
these performances but also as a means of self-expression,
self-reflection, and self-analysis for an individual student
(Templeton, 1995).
Exhibitions
Exhibitions involve just what the label suggests -- a
showing of such products as paintings, drawings,
photographs, sculptures, videotapes, and models. As with
direct writing assessments and portfolios, the products a
student chooses to exhibit are evaluated according to a
predetermined set of criteria.
Demonstrations
In this type of performance testing, students are
required to show how well they can use previously learned
knowledge or skills to solve a somewhat unique problem (such
as conducting a scientific inquiry to answer a question or
diagnosing the cause of a malfunctioning engine and
describing the best procedure for fixing it) or perform a
task (such as reciting a poem, performing a dance, or
playing a piece of music).
Top
Ways to Evaluate Student Learning
Once you have collected all the measures you intend to
collect -- for example, test scores, quiz scores, homework
assignments, special projects, and laboratory experiments --
you will have to give the numbers some sort of value (the
essence of evaluation). As you probably know, this is most
often done by using an A to F grading scale. Typically, a
grade of A indicates superior performance; a B,
above-average performance; a C, average performance; a D,
below-average performance; and an F, failure. There are two
general ways to approach this task. One approach involves
comparisons among students. Such forms of evaluation are
called norm-referenced since students are identified
as average (or normal), above average, or below average. An
alternative approach is called criterion-referenced
because performance is interpreted in terms of defined
criteria. Although both approaches can be used, we favor
criterion-referenced grading for reasons we will mention
shortly.
NORM-REFERENCED GRADING
A norm-referenced grading system assumes that classroom
achievement will naturally vary among a group of
heterogeneous students because of differences in such
characteristics as prior knowledge, learning skills,
motivation, and aptitude. Under ideal circumstances
(hundreds of scores from a diverse group of students), this
variation produces a bell-shaped, or "normal," distribution
of scores that ranges from low to high, has few tied scores,
and has only a very few low scores and only a very few high
scores. For this reason, norm-referenced grading procedures
are also referred to as "grading on the curve."
CRITERION-REFERENCED GRADING
A criterion-referenced grading system permits students to
benefit from mistakes and to improve their level of
understanding and performance. Furthermore, it establishes
an individual (and sometimes cooperative) reward structure,
which fosters motivation to learn to a greater extent than
other systems.
Under a criterion-referenced system, grades are
determined through comparison of the extent to which each
student has attained a defined standard (or criterion) of
achievement or performance. Whether the rest of the students
in the class are successful or unsuccessful in meeting that
criterion is irrelevant. Thus, any distribution of grades is
possible. Every student may get an A or an F, or no student
may receive these grades. For reasons we will discuss
shortly, very low or failing grades tend to occur less
frequently under a criterion-referenced system.
A common version of criterion-referenced grading assigns
letter grades on the basis of the percentage of test items
answered correctly. For example, you may decide to award an
A to anyone who correctly answers at least 85 percent of a
set of test questions, a B to anyone who correctly answers
75 to 84 percent, and so on down to the lowest grade. To use
this type of grading system fairly, which means specifying
realistic criterion levels, you would need to have some
prior knowledge of the levels at which students typically
perform. You would thus be using normative information to
establish absolute or fixed standards of performance.
However, although norm-referenced and criterion-referenced
grading systems both spring from a normative database (that
is, from comparisons among students), only the former system
uses those comparisons to directly determine grades.
Criterion-referenced grading systems (and
criterion-referenced tests) have become increasingly popular
in recent years primarily because of three factors. First,
educators and parents complained that norm-referenced tests
and grading systems provided too little specific information
about student strengths and weaknesses. Second, educators
have come to believe that clearly stated, specific
objectives constitute performance standards, or criteria,
that are best assessed with criterion-referenced measures.
Third, and perhaps most important, contemporary theories of
school learning claim that most, if not all, students can
master most school objectives under the right circumstances.
If this assertion is even close to being true, then
norm-referenced testing and grading procedures, which depend
on variability in performance, will lose much of their
appeal.
Top
Suggestions for Teaching in Your Classroom: Effective
Assessment Techniques
1. As early as possible in a report period,
decide when and how often to give tests and other
assignments that will count toward a grade, and announce
tests and assignments well in advance.
2. Prepare a content outline and/or a table of
specifications of the objectives to be covered on each exam,
or otherwise take care to obtain a systematic sample of the
knowledge and skill acquired by your students.
3. Consider the purpose of each test or measurement
exercise in light of the developmental characteristics of
the students in your classes and the nature of the
curriculum for your grade level.
4. Decide whether a written test or a performance test is
most appropriate.
5. Make up and use a detailed answer key.
a. Evaluate each answer by comparing it to the key.
b. Be willing and prepared to defend the evaluations you
make.
6. During and after the grading process, analyze
questions and answers in order to improve future exams.
Top
Resources for Further Investigation
Suggestions for constructing Written and Performance Tests
For specific suggestions on ways to write different types
of items for paper-and-pencil tests of knowledge and on
methods for constructing and using rating scales and
checklists to measure products, performances, and
procedures, consult one or more of the following books:
Measurement and Evaluation in Teaching (7th ed.,
1995), by Robert Linn and Norman Gronlund; How to Make
Achievement Tests and Assessments (5th ed., 1993), by
Norman Gronlund; Classroom Assessment: What Teachers Need
to Know (1995), by W. James Popham; Student-Centered
Classroom Assessment (1994), by Richard Stiggins;
Classroom Assessment (2d ed., 1994), by Peter
Airasian; and Practical Aspects of Authentic
Assessment (1994), by Bonnie Campbell Hill and Cynthia
Ruptic.
The Learning Resources Development Center (LRDC) at the
University of Pittsburgh publishes a large number of briefs,
articles, and reviews related to assessment and learning,
particularly emphasizing cognitive-based approaches. An
online resource of the LRDC can be found at
http://www.lrdc.pitt.edu/publications.html. The most
extensive on-line database of assessment information is the
ERIC/AE Test Locater, which is found at
www.cua.edu/www/eric_ae/testcol.html. It includes numerous
topics, reviews of tests, suggestions and digests relating
to alternative assessment, and broader standards and
policy-making information as it relates to evaluation and
assessment of students.
Writing Higher-Level Questions
As Benjamin Bloom and others point out, teachers have a
disappointing tendency to write test items that reflect the
lowest level of the taxonomy-knowledge. To avoid this
failing, carefully read Part 2 of Taxonomy of Educational
Objectives: The Classification of Educational Goals,
Handbook I: Cognitive Domain (1956), edited by Benjamin
Bloom, Max Englehart, Edward Furst, Walker Hill, and David
Krathwohl. Each level of the taxonomy is clearly explained
and followed by several pages of illustrative test items.
Top
Analyzing Test Items
Norman Gronlund briefly discusses item-analysis
procedures for norm-referenced and criterion-referenced
tests in Chapter 6 of How to Make Achievement Tests and
Assessments (5th ed., 1993). For norm-referenced
multiple-choice tests, these include procedures for
assessing the difficulty of each item, the discriminating
power of each item, and the effectiveness of each
alternative answer. For criterion-referenced tests, they
include a measure for assessing the effects of instruction.
More detailed discussions of item-analysis procedures can be
found in Chapter 8 of Educational Testing and
Measurement: Classroom Application and Practice (4th
ed., 1993), by Tom Kubiszyn and Gary Borich.
Also, Question Mark Software, based in Britain, produces
a software program that can help teachers generate quality
test items. Information on the software can be found at
http://www.qmark.com or by calling the U.S. distributor at
800-863-3950.
This was excerpted from Chapter 12 of Biehler/Snowman,
PSYCHOLOGY APPLIED TO TEACHING, 8/e, Houghton Mifflin Co.,
1997.
For more information on assessment --
especially on how to construct items-- see Orlich et al.,
TEACHING STRATEGIES, 5/e, Houghton Mifflin Co., 1998,
Chapter 8, "Small Group Discussions and Cooperative
Learning."
For more information on assessment in
the Grabes' INTEGRATING TECHNOLOGY FOR MEANINGFUL LEARNING,
2/e, Houghton Mifflin Co., 1998, see the "Spotlight on
Assessment" sections on pages 7, 52, 171, 316, and
357.
For more information on assessment in
Gage/Berliner, EDUCATIONAL PSYCHOLOGY, 6/e, 1998, see
Chapter 13, "Basic Concepts in Assessment and the
Interpretation of Standardized Testing," and Chapter 14,
"The Teacher's Assessment of Student Learning."