HMCo College

Goal Setting | Motivation | Cognitive Strategy | Cooperative Learning | Assessment


Excerpted from Chapter 12 of Biehler/Snowman, PSYCHOLOGY APPLIED TO TEACHING, 8/e, Houghton Mifflin Co., 1997.


The Role of Assessment in Teaching
(pp. 439-442)
What is Assessment?

Why Should We Assess Students' Learning?
Ways to Measure Student Learning
(pp. 442-446)
Written Tests Performance Tests
Ways to Evaluate Student Learning
(pp. 450-453)
Norm-Referenced Grading

Criterion-Referenced Grading
Suggestions for Teaching in Your Classroom: Effective Assessment Techniques
(pp. 464)

Resources for Further Investigation
(pp. 473)


The Role of Assessment in Teaching

Assessing student learning is something that every teacher has to do, usually quite frequently. Written tests, book reports, research papers, homework exercises, oral presentations, question-and-answer sessions, science projects, and artwork of various sorts are just some of the ways in which teachers measure student learning, with written tests accounting for about 45 percent of a typical student's course grade (Green & Stager, 1986/1987). It is no surprise, then, that the typical teacher can spend between one-third and one-half of her class time engaged in one or another type of measurement activity (Stiggins, 1994). Yet despite the amount of time teachers spend assessing student learning, it is a task that most of them dislike and that few do well. One reason is that many teachers have little or no in-depth knowledge of assessment principles (Crooks, 1988; Hills, 1991; Stiggins, Griswold, & Wikelund, 1989). Another reason is that the role of assessor is seen as being inconsistent with the role of teacher (or helper). Since teachers with more training in assessment use more appropriate assessment practices than do teachers with less training (Green & Stager, 1986/1987), a basic goal of this chapter is to help you understand how such knowledge can be used to reinforce, rather than work against, your role as teacher. Toward that end, we will begin by defining what we mean by the term assessment and by two key elements of this process, measurement and evaluation.


What is Assessment?

Broadly conceived, classroom assessment involves two major types of activities: collecting information about how much knowledge and skill students have learned (measurement) and making judgments about the adequacy or acceptability of each student's level of learning (evaluation). Both the measurement and evaluation aspects of classroom assessment can be accomplished in a number of ways. To determine how much learning has occurred, teachers can, for example, have students take exams, respond to oral questions, do homework exercises, write papers, solve problems, and make oral presentations. Teachers can then evaluate the scores from those activities by comparing them either to one another or to an absolute standard (such as an A equals 90 percent correct). Throughout much of this chapter we will explain and illustrate the various ways in which you can measure and evaluate student learning.



Measurement is the assignment of numbers to certain attributes of objects, events, or people according to a rule-governed system. For our purposes, we will limit the discussion to attributes of people. For example, we can measure someone's level of typing proficiency by counting the number of words the person accurately types per minute or someone's level of mathematical reasoning by counting the number of problems correctly solved. In a classroom or other group situation, the rules that are used to assign the numbers will ordinarily create a ranking that reflects how much of the attribute different people possess (Linn & Gronlund, 1995).



Evaluation involves using a rule-governed system to make judgments about the value or worth of a set of measures (Linn & Gronlund, 1995). What does it mean, for example, to say that a student answered eighty out of one hundred earth science questions correctly? Depending on the rules that are used, it could mean that the student has learned that body of knowledge exceedingly well and is ready to progress to the next unit of instruction or, conversely, that the student has significant knowledge gaps and requires additional instruction.


Why Should We assess Students' Learning?

This question has several answers. We will use this section to address four of the most common reasons for assessment: to provide summaries of learning, to provide information on learning progress, to diagnose specific strengths and weaknesses in an individual's learning, and to motivate further learning.

Summative Evaluation

The first, and probably most obvious, reason for assessment is to provide to all interested parties a clear, meaningful, and useful summary or accounting of how well a student has met the teacher's objectives. When testing is done for the purpose of assigning a letter or numerical grade, it is often called summative evaluation since its primary purpose is to sum up how well a student has performed over time and at a variety of tasks.

Formative Evaluation

A second reason for assessing students is to monitor their progress. The main things that teachers want to know from time to time is whether students are keeping up with the pace of instruction and are understanding all of the material that has been covered so far. For students whose pace of learning is either slower or faster than average or whose understanding of certain ideas is faulty, you can introduce supplementary instruction (a workbook or a computer-based tutorial program), remedial instruction (which may also be computer based), or in-class ability grouping (recall that we discussed the benefits of this arrangement in Chapter 6). Because the purpose of such assessment is to facilitate or form learning and not to assign a grade, it is usually called formative evaluation.


A third reason follows from the second. If you discover a student who is having difficulty keeping up with the rest of the class, you will probably want to know why in order to determine the most appropriate course of action. This purpose may lead you to construct an assessment (or to look for one that has already been made up) that will provide you with specific diagnostic information.


Effects on Learning

A fourth reason for assessment of student performance is that it has potentially positive effects on various aspects of learning and instruction. As Terence Crooks points out, classroom assessment guides students' "judgment of what is important to learn, affects their motivation and self-perceptions of competence, structures their approaches to and timing of personal study (e.g., spaced practice), consolidates learning, and affects the development of enduring learning strategies and skills. It appears to be one of the most potent forces influencing education" (1988, p. 467).


Ways to Measure Student Learning

Just as measurement can play several roles in the classroom, teachers have several ways to measure what students have learned. Which type of measure you choose will depend, of course, on the objectives you have stated. For the purposes of this discussion, objectives can be classified in terms of two broad categories: knowing about something (for example, that knots are used to secure objects, that dance is a form of social expression, that microscopes are used to study things too small to be seen by the naked eye) and knowing how to do something (for example, tie a square knot, dance the waltz, operate a microscope). Measures that attempt to assess the range and accuracy of someone's knowledge are usually called written tests. And measures that attempt to assess how well somebody can do something are often referred to as performance tests. Again, keep in mind that both types have a legitimate place in a teacher's assessment arsenal. Which type is used, and to what extent, will depend on the purpose or purposes you have for assessing students. In the next two sections, we will briefly examine the nature of both types.


Written Tests

Teachers spend a substantial part of each day assessing student learning, and much of this assessment activity involves giving and scoring some type of written test. Most written tests are composed of one or more of the following item types: selected response (multiple choice, true-false, and matching, for example), short answer, and essay. They are designed to measure how much people know about a particular subject. In all likelihood, you have taken hundreds of these types of tests in your school career thus far. In the next couple of pages, we will briefly describe the main features, advantages, and disadvantages of each test.


Selected-Response Tests


Selected-response tests are so named because the student reads a relatively brief opening statement (called a stem) and selects one of the provided alternatives as the correct answer. Selected-response tests are typically made up of multiple-choice, true-false, or matching items. Quite often all three item types are used in a single test. Selected-response tests are sometimes called "objective" tests because they have a simple and set scoring system. If alternative (b) of a multiple-choice item is keyed as the correct response and the student chose alternative (d), the student is marked wrong, regardless of how much the teacher wanted the student to be right. But that doesn't mean selected-response items are totally free of subjective influences. After all, whoever created the test had to make subjective judgments about which areas to emphasize, how to word items, and which items to include in the final version. Finally, selected-response tests are typically used when the primary goal is to assess what might be called foundational knowledge. This is the basic factual information and cognitive skills that students need in order to do such high-level tasks as solve problems and create products (Stiggins, 1994).



A major advantage of selected-response tests is efficiency -- a teacher can ask many questions in a short period of time. Another advantage is ease and reliability of scoring. With the aid of a scoring template (such as a multiple-choice answer sheet that has holes punched out where the correct answer is located), many tests can be quickly and uniformly scored.



Because items that reflect the lowest level of Bloom's Taxonomy (verbatim knowledge) are the easiest to write, most teacher-made tests are composed almost entirely of knowledge-level items (a point we made initially in Chapter 7). As a result, students focus on verbatim memorization rather than on meaningful learning. Another disadvantage is that, while we get some indication of what students know, such tests tell us nothing about what students can do with that knowledge.


Short-Answer Tests


Instead of selecting from one or more alternatives, the student is asked to supply a brief answer consisting of a name, word, phrase, or symbol. Like selected-response tests, short-answer tests can be scored quickly, accurately, and consistently, thereby giving them an aura of objectivity. They are primarily used for measuring foundational knowledge.



Short-answer items are relatively easy to write, so a test, or part of one, can be constructed fairly quickly. They allow for either broad or in-depth assessment of foundational knowledge since students can respond to many items within a short space of time. Since students have to supply an answer, they have to recall, rather than recognize, information.



This item type has the same basic disadvantages as the selected-response items. Because these items ask only for short verbatim answers, students are likely to limit their processing to that level, and these items provide no information about how well students can use what they have learned. In addition, unexpected but plausible answers may be difficult to score.


Essay Tests


The student is given a somewhat general directive to discuss one or more related ideas according to certain criteria. One example of an essay question is "Compare operant conditioning theory and information-processing theory in terms of basic assumptions, typical research findings, and classroom applications."



Essay tests reveal how well students can recall, organize, and clearly communicate previously learned information. When well written, essays tests call on such higher-level abilities as analysis, synthesis, and evaluation. Because of these demands, students are more likely to try to meaningfully learn the material over which they are tested.



Consistency of grading is likely to be a problem. Two students may have essentially similar responses, yet receive different letter or numerical grades. These test items are also very time consuming to grade. And because it takes time for students to formulate and write responses, only a few questions at most can be given.


Performance Tests

In recent years many teachers and measurement experts have argued that the typical written test should be used far less often because it reveals little or nothing of the depth of students' knowledge and how students use their knowledge to work through questions, problems, and tasks. The solution that these experts have proposed is to use one or more of what are called performance tests.

Performance tests attempt to assess how well students use foundational knowledge to perform complex tasks under more or less realistic conditions. At the low end of the realism spectrum, students may be asked to construct a map, interpret a graph, or write an essay under highly standardized conditions. That is, everyone completes the same task in the same amount of time and under the same conditions. At the high end of the spectrum, students may be asked to conduct a science experiment, produce a painting, or write an essay under conditions that are similar to those of real life. For example, students may be told to produce a compare-and-contrast essay on a particular topic by a certain date, but the resources students choose to use, the number of revisions they make, and when they work on the essay are left unspecified. As we noted in Chapter 5, when performance testing is conducted under such realistic conditions, it is also called authentic assessment (Meyer, 1992). Another term that is often used to encompass both performance testing and authentic assessment, and to distinguish them from traditional written tests, is alternative assessment. In this section we will first define the four different types of performance tests and then look at their most important characteristics.


Types of Performance Tests

Currently, there are four ways in which the performance capabilities of students are typically assessed: direct writing assessments, portfolios, exhibitions, and demonstrations.


Direct Writing Assessments

These tests ask students to write about a specific topic ("Describe the person whom you admire the most, and explain why you admire that person.") under a standard set of conditions. Each essay is then scored by two or more people according to a set of defined criteria.



A portfolio may contain one or more pieces of a student's work, some of which demonstrate different stages of completion. For example, a student's writing portfolio may contain business letters; pieces of fiction; poetry; and an outline, rough draft, and final draft of a research paper. Through the inclusion of various stages of a research paper, both the process and the end product can be assessed. Portfolios can also be constructed for math and science as well as for projects that combine two or more subject areas. Often the student is involved in the selection of what is included in his portfolio. The portfolio is sometimes used as a showcase to illustrate exemplary pieces, but it also works well as a collection of pieces that represent a student's typical performances. In its best and truest sense, the portfolio functions not just as a housing for these performances but also as a means of self-expression, self-reflection, and self-analysis for an individual student (Templeton, 1995).



Exhibitions involve just what the label suggests -- a showing of such products as paintings, drawings, photographs, sculptures, videotapes, and models. As with direct writing assessments and portfolios, the products a student chooses to exhibit are evaluated according to a predetermined set of criteria.



In this type of performance testing, students are required to show how well they can use previously learned knowledge or skills to solve a somewhat unique problem (such as conducting a scientific inquiry to answer a question or diagnosing the cause of a malfunctioning engine and describing the best procedure for fixing it) or perform a task (such as reciting a poem, performing a dance, or playing a piece of music).


Ways to Evaluate Student Learning

Once you have collected all the measures you intend to collect -- for example, test scores, quiz scores, homework assignments, special projects, and laboratory experiments -- you will have to give the numbers some sort of value (the essence of evaluation). As you probably know, this is most often done by using an A to F grading scale. Typically, a grade of A indicates superior performance; a B, above-average performance; a C, average performance; a D, below-average performance; and an F, failure. There are two general ways to approach this task. One approach involves comparisons among students. Such forms of evaluation are called norm-referenced since students are identified as average (or normal), above average, or below average. An alternative approach is called criterion-referenced because performance is interpreted in terms of defined criteria. Although both approaches can be used, we favor criterion-referenced grading for reasons we will mention shortly.



A norm-referenced grading system assumes that classroom achievement will naturally vary among a group of heterogeneous students because of differences in such characteristics as prior knowledge, learning skills, motivation, and aptitude. Under ideal circumstances (hundreds of scores from a diverse group of students), this variation produces a bell-shaped, or "normal," distribution of scores that ranges from low to high, has few tied scores, and has only a very few low scores and only a very few high scores. For this reason, norm-referenced grading procedures are also referred to as "grading on the curve."



A criterion-referenced grading system permits students to benefit from mistakes and to improve their level of understanding and performance. Furthermore, it establishes an individual (and sometimes cooperative) reward structure, which fosters motivation to learn to a greater extent than other systems.

Under a criterion-referenced system, grades are determined through comparison of the extent to which each student has attained a defined standard (or criterion) of achievement or performance. Whether the rest of the students in the class are successful or unsuccessful in meeting that criterion is irrelevant. Thus, any distribution of grades is possible. Every student may get an A or an F, or no student may receive these grades. For reasons we will discuss shortly, very low or failing grades tend to occur less frequently under a criterion-referenced system.

A common version of criterion-referenced grading assigns letter grades on the basis of the percentage of test items answered correctly. For example, you may decide to award an A to anyone who correctly answers at least 85 percent of a set of test questions, a B to anyone who correctly answers 75 to 84 percent, and so on down to the lowest grade. To use this type of grading system fairly, which means specifying realistic criterion levels, you would need to have some prior knowledge of the levels at which students typically perform. You would thus be using normative information to establish absolute or fixed standards of performance. However, although norm-referenced and criterion-referenced grading systems both spring from a normative database (that is, from comparisons among students), only the former system uses those comparisons to directly determine grades.

Criterion-referenced grading systems (and criterion-referenced tests) have become increasingly popular in recent years primarily because of three factors. First, educators and parents complained that norm-referenced tests and grading systems provided too little specific information about student strengths and weaknesses. Second, educators have come to believe that clearly stated, specific objectives constitute performance standards, or criteria, that are best assessed with criterion-referenced measures. Third, and perhaps most important, contemporary theories of school learning claim that most, if not all, students can master most school objectives under the right circumstances. If this assertion is even close to being true, then norm-referenced testing and grading procedures, which depend on variability in performance, will lose much of their appeal.


Suggestions for Teaching in Your Classroom: Effective Assessment Techniques

1. As early as possible in a report period, decide when and how often to give tests and other assignments that will count toward a grade, and announce tests and assignments well in advance.

2. Prepare a content outline and/or a table of specifications of the objectives to be covered on each exam, or otherwise take care to obtain a systematic sample of the knowledge and skill acquired by your students.

3. Consider the purpose of each test or measurement exercise in light of the developmental characteristics of the students in your classes and the nature of the curriculum for your grade level.

4. Decide whether a written test or a performance test is most appropriate.

5. Make up and use a detailed answer key.

a. Evaluate each answer by comparing it to the key.

b. Be willing and prepared to defend the evaluations you make.

6. During and after the grading process, analyze questions and answers in order to improve future exams.


Resources for Further Investigation

Suggestions for constructing Written and Performance Tests

For specific suggestions on ways to write different types of items for paper-and-pencil tests of knowledge and on methods for constructing and using rating scales and checklists to measure products, performances, and procedures, consult one or more of the following books: Measurement and Evaluation in Teaching (7th ed., 1995), by Robert Linn and Norman Gronlund; How to Make Achievement Tests and Assessments (5th ed., 1993), by Norman Gronlund; Classroom Assessment: What Teachers Need to Know (1995), by W. James Popham; Student-Centered Classroom Assessment (1994), by Richard Stiggins; Classroom Assessment (2d ed., 1994), by Peter Airasian; and Practical Aspects of Authentic Assessment (1994), by Bonnie Campbell Hill and Cynthia Ruptic.

The Learning Resources Development Center (LRDC) at the University of Pittsburgh publishes a large number of briefs, articles, and reviews related to assessment and learning, particularly emphasizing cognitive-based approaches. An online resource of the LRDC can be found at The most extensive on-line database of assessment information is the ERIC/AE Test Locater, which is found at It includes numerous topics, reviews of tests, suggestions and digests relating to alternative assessment, and broader standards and policy-making information as it relates to evaluation and assessment of students.


Writing Higher-Level Questions

As Benjamin Bloom and others point out, teachers have a disappointing tendency to write test items that reflect the lowest level of the taxonomy-knowledge. To avoid this failing, carefully read Part 2 of Taxonomy of Educational Objectives: The Classification of Educational Goals, Handbook I: Cognitive Domain (1956), edited by Benjamin Bloom, Max Englehart, Edward Furst, Walker Hill, and David Krathwohl. Each level of the taxonomy is clearly explained and followed by several pages of illustrative test items.


Analyzing Test Items

Norman Gronlund briefly discusses item-analysis procedures for norm-referenced and criterion-referenced tests in Chapter 6 of How to Make Achievement Tests and Assessments (5th ed., 1993). For norm-referenced multiple-choice tests, these include procedures for assessing the difficulty of each item, the discriminating power of each item, and the effectiveness of each alternative answer. For criterion-referenced tests, they include a measure for assessing the effects of instruction. More detailed discussions of item-analysis procedures can be found in Chapter 8 of Educational Testing and Measurement: Classroom Application and Practice (4th ed., 1993), by Tom Kubiszyn and Gary Borich.

Also, Question Mark Software, based in Britain, produces a software program that can help teachers generate quality test items. Information on the software can be found at or by calling the U.S. distributor at 800-863-3950.

This was excerpted from Chapter 12 of Biehler/Snowman, PSYCHOLOGY APPLIED TO TEACHING, 8/e, Houghton Mifflin Co., 1997.

For more information on assessment -- especially on how to construct items-- see Orlich et al., TEACHING STRATEGIES, 5/e, Houghton Mifflin Co., 1998, Chapter 8, "Small Group Discussions and Cooperative Learning."

For more information on assessment in the Grabes' INTEGRATING TECHNOLOGY FOR MEANINGFUL LEARNING, 2/e, Houghton Mifflin Co., 1998, see the "Spotlight on Assessment" sections on pages 7, 52, 171, 316, and 357.

For more information on assessment in Gage/Berliner, EDUCATIONAL PSYCHOLOGY, 6/e, 1998, see Chapter 13, "Basic Concepts in Assessment and the Interpretation of Standardized Testing," and Chapter 14, "The Teacher's Assessment of Student Learning."



Copyright Houghton Mifflin Company. All Rights Reserved.
Terms and Conditions of Use, Privacy Statement, and Trademark Information