Excerpted from Snowman/Biehler, PSYCHOLOGY APPLIED TO TEACHING, © Houghton Mifflin Company. All rights reserved.
The Role of Assessment in Teaching
What is Assessment?
Why Should We Assess Students'
Ways to Measure Student Learning
Ways to Evaluate Student Learning
Suggestions for Teaching in Your Classroom:
Effective Assessment Techniques
Resources for Further Investigation
Applying Assessment to Project Based Instruction
The Role of Assessment in Teaching
Assessing student learning is something that every teacher has to do, usually quite frequently. Written tests, book reports, research papers, homework exercises, oral presentations, question-and-answer sessions, science projects, and artwork of various sorts are just some of the ways in which teachers measure student learning, with written tests accounting for about 45 percent of a typical student's course grade (Green & Stager, 1986/1987). It is no surprise, then, that the typical teacher can spend between one-third and one-half of her class time engaged in one or another type of measurement activity (Stiggins, 1994). Yet despite the amount of time teachers spend assessing student learning, it is a task that most of them dislike and that few do well. One reason is that many teachers have little or no in-depth knowledge of assessment principles (Crooks, 1988; Hills, 1991; Stiggins, Griswold, & Wikelund, 1989). Another reason is that the role of assessor is seen as being inconsistent with the role of teacher (or helper). Since teachers with more training in assessment use more appropriate assessment practices than do teachers with less training (Green & Stager, 1986/1987), a basic goal of this chapter is to help you understand how such knowledge can be used to reinforce, rather than work against, your role as teacher. Toward that end, we will begin by defining what we mean by the term assessment and by two key elements of this process, measurement and evaluation.
What is Assessment?
Broadly conceived, classroom assessment involves two major types of activities: collecting information about how much knowledge and skill students have learned (measurement) and making judgments about the adequacy or acceptability of each student's level of learning (evaluation). Both the measurement and evaluation aspects of classroom assessment can be accomplished in a number of ways. To determine how much learning has occurred, teachers can, for example, have students take exams, respond to oral questions, do homework exercises, write papers, solve problems, and make oral presentations. Teachers can then evaluate the scores from those activities by comparing them either to one another or to an absolute standard (such as an A equals 90 percent correct). Throughout much of this chapter we will explain and illustrate the various ways in which you can measure and evaluate student learning.
Measurement is the assignment of numbers to certain
attributes of objects, events, or people according to a
rule-governed system. For our purposes, we will limit the
discussion to attributes of people. For example, we can
measure someone's level of typing proficiency by counting
the number of words the person accurately types per minute
or someone's level of mathematical reasoning by counting the
number of problems correctly solved. In a classroom or other
group situation, the rules that are used to assign the
numbers will ordinarily create a ranking that reflects how
much of the attribute different people possess (Linn &
Evaluation involves using a rule-governed system to make
judgments about the value or worth of a set of measures
(Linn & Gronlund, 1995). What does it mean, for example,
to say that a student answered eighty out of one hundred
earth science questions correctly? Depending on the rules
that are used, it could mean that the student has learned
that body of knowledge exceedingly well and is ready to
progress to the next unit of instruction or, conversely,
that the student has significant knowledge gaps and requires
Why Should We assess Students' Learning?
This question has several answers. We will use this
section to address four of the most common reasons for
assessment: to provide summaries of learning, to provide
information on learning progress, to diagnose specific
strengths and weaknesses in an individual's learning, and to
motivate further learning.
The first, and probably most obvious, reason for
assessment is to provide to all interested parties a clear,
meaningful, and useful summary or accounting of how well a
student has met the teacher's objectives. When testing is
done for the purpose of assigning a letter or numerical
grade, it is often called summative evaluation since
its primary purpose is to sum up how well a student has
performed over time and at a variety of tasks.
A second reason for assessing students is to monitor
their progress. The main things that teachers want to know
from time to time is whether students are keeping up with
the pace of instruction and are understanding all of the
material that has been covered so far. For students whose
pace of learning is either slower or faster than average or
whose understanding of certain ideas is faulty, you can
introduce supplementary instruction (a workbook or a
computer-based tutorial program), remedial instruction
(which may also be computer based), or in-class ability
grouping (recall that we discussed the benefits of this
arrangement in Chapter 6). Because the purpose of such
assessment is to facilitate or form learning and not to
assign a grade, it is usually called formative
A third reason follows from the second. If you discover a
student who is having difficulty keeping up with the rest of
the class, you will probably want to know why in order to
determine the most appropriate course of action. This
purpose may lead you to construct an assessment (or to look
for one that has already been made up) that will provide you
with specific diagnostic information.
Effects on Learning
A fourth reason for assessment of student performance is
that it has potentially positive effects on various aspects
of learning and instruction. As Terence Crooks points out,
classroom assessment guides students' "judgment of what is
important to learn, affects their motivation and
self-perceptions of competence, structures their approaches
to and timing of personal study (e.g., spaced practice),
consolidates learning, and affects the development of
enduring learning strategies and skills. It appears to be
one of the most potent forces influencing education" (1988,
Ways to Measure Student Learning
Just as measurement can play several roles in the
classroom, teachers have several ways to measure what
students have learned. Which type of measure you choose will
depend, of course, on the objectives you have stated. For
the purposes of this discussion, objectives can be
classified in terms of two broad categories: knowing about
something (for example, that knots are used to secure
objects, that dance is a form of social expression, that
microscopes are used to study things too small to be seen by
the naked eye) and knowing how to do something (for example,
tie a square knot, dance the waltz, operate a microscope).
Measures that attempt to assess the range and accuracy of
someone's knowledge are usually called written tests. And
measures that attempt to assess how well somebody can do
something are often referred to as performance tests. Again,
keep in mind that both types have a legitimate place in a
teacher's assessment arsenal. Which type is used, and to
what extent, will depend on the purpose or purposes you have
for assessing students. In the next two sections, we will
briefly examine the nature of both types.
Teachers spend a substantial part of each day assessing
student learning, and much of this assessment activity
involves giving and scoring some type of written test. Most
written tests are composed of one or more of the following
item types: selected response (multiple choice, true-false,
and matching, for example), short answer, and essay. They
are designed to measure how much people know about a
particular subject. In all likelihood, you have taken
hundreds of these types of tests in your school career thus
far. In the next couple of pages, we will briefly describe
the main features, advantages, and disadvantages of each
Selected-response tests are so named because the student
reads a relatively brief opening statement (called a
stem) and selects one of the provided alternatives as
the correct answer. Selected-response tests are typically
made up of multiple-choice, true-false, or matching items.
Quite often all three item types are used in a single test.
Selected-response tests are sometimes called "objective"
tests because they have a simple and set scoring system. If
alternative (b) of a multiple-choice item is keyed as the
correct response and the student chose alternative (d), the
student is marked wrong, regardless of how much the teacher
wanted the student to be right. But that doesn't mean
selected-response items are totally free of subjective
influences. After all, whoever created the test had to make
subjective judgments about which areas to emphasize, how to
word items, and which items to include in the final version.
Finally, selected-response tests are typically used when the
primary goal is to assess what might be called foundational
knowledge. This is the basic factual information and
cognitive skills that students need in order to do such
high-level tasks as solve problems and create products
A major advantage of selected-response tests is
efficiency -- a teacher can ask many questions in a short
period of time. Another advantage is ease and reliability of
scoring. With the aid of a scoring template (such as a
multiple-choice answer sheet that has holes punched out
where the correct answer is located), many tests can be
quickly and uniformly scored.
Because items that reflect the lowest level of Bloom's Taxonomy (verbatim knowledge) are the easiest to write, most teacher-made tests are composed almost entirely of knowledge-level items. As a result, students focus on verbatim memorization rather than on meaningful learning. Another disadvantage is that, while we get some indication of what students know, such tests tell us nothing about what students can do with that knowledge.
Instead of selecting from one or more alternatives, the
student is asked to supply a brief answer consisting of a
name, word, phrase, or symbol. Like selected-response tests,
short-answer tests can be scored quickly, accurately, and
consistently, thereby giving them an aura of objectivity.
They are primarily used for measuring foundational
Short-answer items are relatively easy to write, so a
test, or part of one, can be constructed fairly quickly.
They allow for either broad or in-depth assessment of
foundational knowledge since students can respond to many
items within a short space of time. Since students have to
supply an answer, they have to recall, rather than
This item type has the same basic disadvantages as the
selected-response items. Because these items ask only for
short verbatim answers, students are likely to limit their
processing to that level, and these items provide no
information about how well students can use what they have
learned. In addition, unexpected but plausible answers may
be difficult to score.
The student is given a somewhat general directive to
discuss one or more related ideas according to certain
criteria. One example of an essay question is "Compare
operant conditioning theory and information-processing
theory in terms of basic assumptions, typical research
findings, and classroom applications."
Essay tests reveal how well students can recall,
organize, and clearly communicate previously learned
information. When well written, essays tests call on such
higher-level abilities as analysis, synthesis, and
evaluation. Because of these demands, students are more
likely to try to meaningfully learn the material over which
they are tested.
Consistency of grading is likely to be a problem. Two
students may have essentially similar responses, yet receive
different letter or numerical grades. These test items are
also very time consuming to grade. And because it takes time
for students to formulate and write responses, only a few
questions at most can be given.
In recent years many teachers and measurement experts
have argued that the typical written test should be used far
less often because it reveals little or nothing of the depth
of students' knowledge and how students use their knowledge
to work through questions, problems, and tasks. The solution
that these experts have proposed is to use one or more of
what are called performance tests.
Performance tests attempt to assess how well students use
foundational knowledge to perform complex tasks under more
or less realistic conditions. At the low end of the realism
spectrum, students may be asked to construct a map,
interpret a graph, or write an essay under highly
standardized conditions. That is, everyone completes the
same task in the same amount of time and under the same
conditions. At the high end of the spectrum, students may be
asked to conduct a science experiment, produce a painting,
or write an essay under conditions that are similar to those
of real life. For example, students may be told to produce a
compare-and-contrast essay on a particular topic by a
certain date, but the resources students choose to use, the
number of revisions they make, and when they work on the
essay are left unspecified. As we noted in Chapter 5, when
performance testing is conducted under such realistic
conditions, it is also called authentic assessment
(Meyer, 1992). Another term that is often used to encompass
both performance testing and authentic assessment, and to
distinguish them from traditional written tests, is
alternative assessment. In this section we will first
define the four different types of performance tests and
then look at their most important characteristics.
Types of Performance Tests
Currently, there are four ways in which the performance
capabilities of students are typically assessed: direct
writing assessments, portfolios, exhibitions, and
Direct Writing Assessments
These tests ask students to write about a specific topic
("Describe the person whom you admire the most, and explain
why you admire that person.") under a standard set of
conditions. Each essay is then scored by two or more people
according to a set of defined criteria.
A portfolio may contain one or more pieces of a student's
work, some of which demonstrate different stages of
completion. For example, a student's writing portfolio may
contain business letters; pieces of fiction; poetry; and an
outline, rough draft, and final draft of a research paper.
Through the inclusion of various stages of a research paper,
both the process and the end product can be assessed.
Portfolios can also be constructed for math and science as
well as for projects that combine two or more subject areas.
Often the student is involved in the selection of what is
included in his portfolio. The portfolio is sometimes used
as a showcase to illustrate exemplary pieces, but it also
works well as a collection of pieces that represent a
student's typical performances. In its best and truest
sense, the portfolio functions not just as a housing for
these performances but also as a means of self-expression,
self-reflection, and self-analysis for an individual student
Exhibitions involve just what the label suggests -- a
showing of such products as paintings, drawings,
photographs, sculptures, videotapes, and models. As with
direct writing assessments and portfolios, the products a
student chooses to exhibit are evaluated according to a
predetermined set of criteria.
In this type of performance testing, students are
required to show how well they can use previously learned
knowledge or skills to solve a somewhat unique problem (such
as conducting a scientific inquiry to answer a question or
diagnosing the cause of a malfunctioning engine and
describing the best procedure for fixing it) or perform a
task (such as reciting a poem, performing a dance, or
playing a piece of music).
Ways to Evaluate Student Learning
Once you have collected all the measures you intend to
collect -- for example, test scores, quiz scores, homework
assignments, special projects, and laboratory experiments --
you will have to give the numbers some sort of value (the
essence of evaluation). As you probably know, this is most
often done by using an A to F grading scale. Typically, a
grade of A indicates superior performance; a B,
above-average performance; a C, average performance; a D,
below-average performance; and an F, failure. There are two
general ways to approach this task. One approach involves
comparisons among students. Such forms of evaluation are
called norm-referenced since students are identified
as average (or normal), above average, or below average. An
alternative approach is called criterion-referenced
because performance is interpreted in terms of defined
criteria. Although both approaches can be used, we favor
criterion-referenced grading for reasons we will mention
A norm-referenced grading system assumes that classroom
achievement will naturally vary among a group of
heterogeneous students because of differences in such
characteristics as prior knowledge, learning skills,
motivation, and aptitude. Under ideal circumstances
(hundreds of scores from a diverse group of students), this
variation produces a bell-shaped, or "normal," distribution
of scores that ranges from low to high, has few tied scores,
and has only a very few low scores and only a very few high
scores. For this reason, norm-referenced grading procedures
are also referred to as "grading on the curve."
A criterion-referenced grading system permits students to
benefit from mistakes and to improve their level of
understanding and performance. Furthermore, it establishes
an individual (and sometimes cooperative) reward structure,
which fosters motivation to learn to a greater extent than
Under a criterion-referenced system, grades are
determined through comparison of the extent to which each
student has attained a defined standard (or criterion) of
achievement or performance. Whether the rest of the students
in the class are successful or unsuccessful in meeting that
criterion is irrelevant. Thus, any distribution of grades is
possible. Every student may get an A or an F, or no student
may receive these grades. For reasons we will discuss
shortly, very low or failing grades tend to occur less
frequently under a criterion-referenced system.
A common version of criterion-referenced grading assigns
letter grades on the basis of the percentage of test items
answered correctly. For example, you may decide to award an
A to anyone who correctly answers at least 85 percent of a
set of test questions, a B to anyone who correctly answers
75 to 84 percent, and so on down to the lowest grade. To use
this type of grading system fairly, which means specifying
realistic criterion levels, you would need to have some
prior knowledge of the levels at which students typically
perform. You would thus be using normative information to
establish absolute or fixed standards of performance.
However, although norm-referenced and criterion-referenced
grading systems both spring from a normative database (that
is, from comparisons among students), only the former system
uses those comparisons to directly determine grades.
Criterion-referenced grading systems (and
criterion-referenced tests) have become increasingly popular
in recent years primarily because of three factors. First,
educators and parents complained that norm-referenced tests
and grading systems provided too little specific information
about student strengths and weaknesses. Second, educators
have come to believe that clearly stated, specific
objectives constitute performance standards, or criteria,
that are best assessed with criterion-referenced measures.
Third, and perhaps most important, contemporary theories of
school learning claim that most, if not all, students can
master most school objectives under the right circumstances.
If this assertion is even close to being true, then
norm-referenced testing and grading procedures, which depend
on variability in performance, will lose much of their
Suggestions for Teaching in Your Classroom: Effective
1. As early as possible in a report period,
decide when and how often to give tests and other
assignments that will count toward a grade, and announce
tests and assignments well in advance.
2. Prepare a content outline and/or a table of
specifications of the objectives to be covered on each exam,
or otherwise take care to obtain a systematic sample of the
knowledge and skill acquired by your students.
3. Consider the purpose of each test or measurement
exercise in light of the developmental characteristics of
the students in your classes and the nature of the
curriculum for your grade level.
4. Decide whether a written test or a performance test is
5. Make up and use a detailed answer key.
a. Evaluate each answer by comparing it to the key.
b. Be willing and prepared to defend the evaluations you
6. During and after the grading process, analyze
questions and answers in order to improve future exams.
Resources for Further Investigation
Suggestions for constructing Written and Performance Tests
For specific suggestions on ways to write different types
of items for paper-and-pencil tests of knowledge and on
methods for constructing and using rating scales and
checklists to measure products, performances, and
procedures, consult one or more of the following books:
Measurement and Evaluation in Teaching (7th ed.,
1995), by Robert Linn and Norman Gronlund; How to Make
Achievement Tests and Assessments (5th ed., 1993), by
Norman Gronlund; Classroom Assessment: What Teachers Need
to Know (1995), by W. James Popham; Student-Centered
Classroom Assessment (1994), by Richard Stiggins;
Classroom Assessment (2d ed., 1994), by Peter
Airasian; and Practical Aspects of Authentic
Assessment (1994), by Bonnie Campbell Hill and Cynthia
The Learning Resources Development Center (LRDC) at the
University of Pittsburgh publishes a large number of briefs,
articles, and reviews related to assessment and learning,
particularly emphasizing cognitive-based approaches. An
online resource of the LRDC can be found at
http://www.lrdc.pitt.edu/publications.html. The most
extensive on-line database of assessment information is the
ERIC/AE Test Locater, which is found at
www.cua.edu/www/eric_ae/testcol.html. It includes numerous
topics, reviews of tests, suggestions and digests relating
to alternative assessment, and broader standards and
policy-making information as it relates to evaluation and
assessment of students.
Writing Higher-Level Questions
As Benjamin Bloom and others point out, teachers have a
disappointing tendency to write test items that reflect the
lowest level of the taxonomy-knowledge. To avoid this
failing, carefully read Part 2 of Taxonomy of Educational
Objectives: The Classification of Educational Goals,
Handbook I: Cognitive Domain (1956), edited by Benjamin
Bloom, Max Englehart, Edward Furst, Walker Hill, and David
Krathwohl. Each level of the taxonomy is clearly explained
and followed by several pages of illustrative test items.
Applying Assessment to Project Based Instruction
Assessment of project-based learning can be a challenge. Because learners are constructing artifacts that represent their learning, it is important to provide feedback that is constructive and authentic to the objectives of the assignment. Traditional assessments such as multiple-choice and true-false tests may be inappropriate to judge the quality of learning that has occurred. Some options include portfolios and rubrics. Portfolios offer the opportunity to employ multiple forms of assessment through different types of works and allows the learner some choice over which items will be included (Levstik & Barton, 2001). In addition, portfolios for extended periods demonstrate progress to learners, parents and teachers. However, the disadvantages to using portfolios are that they can be time consuming to grade (Zvacek, 1999) and can be somewhat subjective.
Rubrics, on the other hand, allow assessment to be more objective and reliable across learners. When created prior to the project, teachers can communicate their expectations for the project in the rubric, and the students are more aware of how their work will be evaluated. Rubrics typically include three constructs: knowledge, reasoning and communication, with levels of proficiency for each. This assessment tool is based on national history standards but could easily be adapted to fit other domains such as literature.
For more information on assessment -- especially on how to construct items-- see Orlich et al., TEACHING STRATEGIES and Cooper, CLASSROOM TEACHING SKILLS, Houghton Mifflin Co.
For more information on assessment practices involving technology - see Grabe/Grabe INTEGRATING TECHNOLOGY FOR MEANINGFUL LEARNING, Houghton Mifflin Co.