Teacher Decision Making in Student Evaluation

Prepared for the SSTA by Patricia Jamison.
SSTA Research Centre Report #02-07: 43 pages, $11

Table of Contents


Part I:  The Nature of Teachers’ Decision Making in Evaluation: A Review of the Literature

Policy and Guiding Principles
Curriculum Goals
Assessment Literacy
Demands for Accountability
Diversity in the Classroom
Multiple Purposes of Evaluation
Professional Development and Teacher Training
Key Questions
Part II: Reconciling Goals, Responsibilities, Beliefs and Values:  Descriptions of Teachers’ Practices in Evaluation
Assessment Literacy
Purposes of Evaluation
Assessment Strategies
Interpretations of Accountability
Departmental Examinations and Teacher Accreditation
Part III: Student Evaluation, Accountability and Professionalism: Examining Paradoxes and Relationships
Examining Incongruencies
Student Evaluation and Teacher Professionalism
Implications for Professional Development and Teacher Training
Recommendations for Further Research
Concluding Remarks

This report is a summary of a Master's thesis by Patricia Jamison, University of Saskatchewan.

This study was intended to improve understanding of the various interpretations, references, and perceptions that inform teachers' decisions and practice in student evaluation. Two overarching questions were addressed in the study. How do teachers make decisions about the ways in which they assess, evaluate, and report student achievement?  What factors and considerations were given priority in these decisions and why?

The teachers in the study used references from three contexts to inform their decisions about student evaluation: a) personal values and priorities, b) professional responsibilities, and c) public expectations.  Assessment literacy and interpretations of accountability were the two predominant categories of factors that influenced the way these teachers conducted evaluation in their classrooms. 

Analysis of the data revealed several paradoxical relationships between instruction and assessment practices.  These incongruencies suggest that the current prevailing definition of accountability may be in conflict with the underlying beliefs and values reflected in Core Curriculum. 

Part I of this report provides a review of the literature from which the essential questions and framework for investigation originated.  Part II presents the observations and findings of this descriptive study.  Part III includes reflections on these findings, additional references to the literature, a discussion of teacher professionalism and accreditation, and considerations for professional development in the area of classroom assessment and student evaluation.

Back to: Evaluation and Reporting

The SSTA Research Centre grants permission to reproduce up to three copies of each report for personal use.
Each copy must acknowledge the author and the SSTA Research Centre as the source.  A complete and authorized copy of each report is  available from the SSTA Research Centre.
The opinions and recommendations expressed in this report are those of the author and may not be in agreement with SSTA officers or trustees, but are offered as being worthy of consideration by those responsible for making decisions.


Three major concurrent influences in Saskatchewan reflect current trends associated with restructuring and challenge teachers as they decide how to conduct classroom assessments and student evaluation.

Saskatchewan curricula reflect a shift from an emphasis on cognitive objectives to include affective and attitudinal aspects of learning.   Many learning objectives within the foundational objectives of Core Curriculum require subjective and high inference assessments and call for the consideration of "non-achievement" factors such as ability, effort, attitudes, and habits of mind in classroom assessments.

The recognition of a need to address equity issues, to respond to individual and cultural requirements, and to embrace diversity and inclusivity in the classroom leads us to identify questions of fairness and bias related to traditional methods of testing.

Current economic conditions and the trend to globalization have created a public desire for indicators of student achievement and competence, and of students' abilities to compete internationally.  Rising costs and shrinking resources also contribute to the call for accountability.  We tend to look to outcomes or measures of student achievement as the primary indicators of the effectiveness of the system.

Previous studies suggest a need for research that: a) describes current classroom assessment practice, b) examines the conflicting purposes of evaluation, c) investigates the role of teachers' beliefs in how they evaluate, and d) focuses on evaluation issues encountered by teachers in classrooms at the secondary level.

Stiggins, Frisbie, and Griswold (1989) advocate for research meant to help us "understand and begin to disentangle the complex array of myth, tradition, uncertainty and procedures that appear to characterize grading practices" (p. 14).  They observe that much of the literature on grading practices can be classified as statements of values, and that there is a lack of research on day-to-day practices used by teachers to grade student performance.  They suggest that an analysis of underlying assumptions and philosophies and a summary of actual practices used to generate grades would provide a framework for questioning and examining the practicality of recommended practice in the face of the constraints and realities of the classroom.

According to Natriello (1987), few studies have examined the situation in which a single student is confronted with evaluations from multiple teachers, and still fewer have been concerned with systems in which multiple teachers evaluate multiple students -- the two situations most commonly found in high schools, and of the greatest importance to students and teachers. In his criticism of previous studies on evaluation, Natriello  points out that few studies "consider the multiple purposes for evaluations in schools and classrooms," and that "additional research is needed to provide a better descriptive account of how students are currently evaluated in schools and classrooms" (p. 170).

Wilson (1996) identifies the following roles and goals of assessment:  a) feedback to students, b) diagnostic information, c) summary data for record keeping, d) evidence for reports, and e) help with curriculum revision.  According to Wilson, problems arise because teachers have many roles to perform and each one has different assessment goals attached to it.  Attempts to fulfil all these roles at once results in ambiguities involving inappropriate measures and invalid interpretations.  The author suggests that if we study each role in detail, clarifications will emerge that could inform more effective assessment practices.

Natriello (1987) advises that "researchers should be sensitive to the purposes of evaluation systems when they examine existing evaluation arrangements, which typically involve compromises among the competing demands of multiple purposes" (p.171).   Stiggins, Frisbie, and Griswold (1989) support this view:

A teacher's judgment about the grading approach to be used should be dictated by the broader educational values (particularly the theory of teaching) that he or she holds.  Until the teacher decides what meaning the grades should convey, most other decisions about grades and grading practices cannot be made.   (p. 11)
Ryan (1997) examines the conflicts that arise, particularly for secondary teachers, as they weigh the demands of their professional obligement toward students against the demands of their accountability to the world outside of school.
Subsequent research may reveal that most conscientious teachers develop a sense of professional obligement as a way of reconciling the multifarious demands occasioned by their evaluation of students with their own vision of teaching.  If so, individuals interested in the development of teachers' prowess in evaluation must be aware of the complexity and profundity of this aspect of teachers' lives and work toward understanding it more fully.  (p. 134)
Teachers at the secondary level are acutely aware of the need to balance their responsibility to students with their responsibility to others who use their evaluations in decisions about scholarships, admissions and other postsecondary training and employment opportunities.  Grade 12 is the crucible where the morality and soundness of evaluation practices are most severely tested.  The study examined the ways in which three grade 12 teachers in a rural Saskatchewan school conduct student evaluation and how they make decisions about measuring and communicating student progress and achievement.

Table of Contents

Part I: The Nature of Teachers' Decision Making in Evaluation: A Review of the Literature

Teachers make many decisions throughout all phases of the evaluation process.  First, they decide how to interpret the curriculum goals and objectives and determine what data or criteria provide evidence of the learning.  Next, they must decide what forms of assessment are most appropriate for collecting data and measuring achievement.  How will these assessments be weighted in determining a final grade?  What standards will be used to judge achievement?  How will this information be used?  How will this information be communicated to students and parents?

Throughout the phases of the evaluation process, various demands, interpretations, and beliefs influence each decision.  Wilson (2000) identifies a number of considerations likely to influence teachers' assessment practices:

For example, teachers' concerns for individuals and their growth, not only in learning but also in their social and emotional development; the subjects being taught and the ages of the students taking them;  the policies and practices of reporting that are in place;  the power of external assessments on teaching, learning and classroom assessment;  and the views of teaching and learning held by individual teachers and the practices that flow from them -- all of these and others are likely to affect teachers' practices in assessment enormously.  (p. 2)
The preliminary review of the literature and categories that emerged during the data collection phase of the study contributed to the identification of several key factors that influence teachers' decisions about how to evaluate their students' achievements: a) policy and guiding principles, b) curriculum goals, c) diversity in the classroom, d) assessment literacy, e) demands for accountability, and f) multiple purposes of evaluation.  This review of the literature concludes with references to professional development in classroom assessment and student evaluation, and identifies several essential questions emerging from the literature which provide structure for the data collection phase of the study.

Table of Contents

Policy and Guiding Principles

Student Evaluation: A teacher handbook (1991) presents a set of guiding principles which promotes continuous and varied assessments, derived from curriculum objectives and consistent with instructional and learning strategies.  The assessments are to be communicated in advance, fair and equitable, and helpful to students.  Results of evaluations are to be communicated regularly and in meaningful ways to students, parents or guardians.  Wilson (1996) identifies five key principles from Principles for Fair Student Assessment Practices for Education in Canada (1993), and characteristics of a quality assessment environment are also described by Stiggins (1995).   While these three sets of characteristics or principles are organized and worded differently, they are essentially statements of values with much in common, summarized in Student evaluation: A teacher handbook (1991):

The essence of these guiding principles is that student evaluation should be an integral part of good teaching practice. It should be treated as an ongoing and comprehensive process that is pervaded by careful planning and systematic implementation. Evaluation is considered a critical element that influences teacher decision making and guides student learning.  (p. 3)
Policies at the provincial department of education, school division and school levels shape teachers' assessment plans, grading practices, and evaluations.  In Saskatchewan, the Office of the Registrar administers policies related to: a) credit requirements for secondary level completion;  b) grade 12 departmental examinations;  c) teacher certification and accreditation, and;  d) maintenance of a central registry of students at the grade 10, 11, and 12 levels.  Directives issued in The Registrar's Handbook for School Administrators, 1999-2000  (Saskatchewan Education, 1999) include guidelines for evaluating students taking provincial examinations and for those being evaluated by accredited teachers.  The policies and regulations in the handbook address the blending of marks from teacher and provincial assessments, issues of test security, special provisions, and measures to ensure consistency and comparability.

Fagan and Spurrell (1995) found that 88% of Canadian school boards or divisions studied had a relatively comprehensive set of student evaluation policies in place covering the last three years of high school.  The areas most frequently addressed were: a) purpose of evaluation; b) grading; c) reporting; d) promotion, retention and placement; and e) methods and sources of evaluation.   The school division involved in this study has a policy in which the intentions for evaluation are framed as six goals, each with a set of objectives to guide the evaluation process.  The goals in the policy refer to the variety and quality of evaluation methods, purpose of evaluation, alignment with curriculum and instruction, communication to students and parents, and commitments to professional development.

Wilson (2000) observes that policies at the school level, particularly those related to reporting to parents, "typically result in a well-formulated set of practices determining how assessment should proceed."

Table of Contents

Curriculum Goals

Saskatchewan's Core Curriculum includes seven Required Areas of Study, the Common Essential Learnings, the Adaptive Dimension, and Locally-Determined Options.  Core Curriculum initiatives also include Gender Equity, Indian and Metis Perspectives, and Resource-based Learning.  These initiatives reflect the principles which guide the development of curriculum and classroom instruction.  Teachers are expected to incorporate the Common Essential Learnings (C.E.L.s) in an authentic manner when they plan instruction in the various areas of study with the purposes of enhancing understanding of the particular content, and preparing students for future learning in school and beyond grade 12.  Decisions about which of the C.E.L.s are to be included in any particular lesson and how they will incorporated is guided by the needs and abilities of the students and the demands of the content or skills to be learned.

The curriculum document for Mathematics C 30 discusses instructional strategies and approaches that are responsive to the initiatives of Core Curriculum, offers a number of suggestions for incorporating C.E.L.s, such as the development of personal and social values and the development of communication skills, and recognizes that addressing the C.E.L.s through mathematics instruction has implications for assessment and evaluation.

A unit which has focused on developing the C.E.L.s of Communication and Critical and Creative Thinking should also reflect this focus when assessing student learning.  Assessment should allow students to demonstrate their understanding of the important concepts in the unit and how these concepts are related to each other or to previous learning.  Questions can be structured so that evidence or reasons must accompany student explanations.  If students are encouraged to think critically and creatively throughout a unit, then the assessment for the unit should also require students to think critically and creatively.  (Saskatchewan Education, Training and Employment, 1996)
The mathematics curriculum guide identifies attributes of learning to be assessed.  These include traditional objectives related to knowledge and understanding of concepts and problem solving skills.  There are also several non-traditional objectives identified in the document.  These include: a) the ability to communicate mathematically; b) the ability to communicate through the use of computers and software; and c) the development of a positive attitude towards mathematics.  Templates are included for teachers to use in gathering assessment data indicating: a) ability to gather information and then to evaluate both the information and the sources of information; b) ability to make an informed decision; and c) ability to design and carry out an action plan.  Traditionally, these dimensions of learning have not been addressed in evaluating mathematics achievement.

Business Education: A curriculum guide for the secondary level (Saskatchewan Education, Training and Employment, 1994a) suggests that a "broader range of attributes" needs to be evaluated than in the past. Two goals of assessment and evaluation are stated in the curriculum document.   The first is to "encourage students to apply their information processing skills continually to analyze real-life problems critically and to prepare solutions efficiently."    The second goal is for students to develop self-confidence and self-esteem "by developing a greater awareness of their own expectations, attitudes and perceptions of adapting to an information-based, technological society."  The document emphasizes that "Information Processing involves more that just the final product" and states that, "How students generated and went about generating the final product is also important."  Teachers are encouraged to involve students in the evaluation process by participating in setting standards of accomplishment and by engaging in self- and peer-evaluation.

The curriculum guide for Biology 30 encourages the expansion and refinement of assessment strategies to provide a more comprehensive evaluation of student achievement based on a broader range of learning objectives.  Questioning that is open-ended and that prompts reasoning and analysis is recommended to "promote and challenge higher order thinking."  Teachers are encouraged to include "practical tasks", oral responses, and interpretations of graphs and photographs among their assessments.  The document recommends less emphasis on multiple choice, true-false, and fill-in-the-blank tests, and more emphasis on essays and alternatives such as illustrations or art projects, oral reports, concept maps, projects, and journal writing.  There are also suggestions for evaluating group projects.  The challenge of assessing values is discussed:

Assessing values is the most difficult of all the areas of assessment and evaluation.  At one time, values were not considered a part of the school's written curriculum.  Parents and society certainly required that students develop acceptable behaviours and attitudes, but these were promoted through the "hidden curriculum"…  Now, specific attitude and values are to be openly promoted in students, so the teacher's influence must be directed to these objectives.  Accordingly, they must be assessed.  (Saskatchewan Education, Training and Employment, 1992)
The foundational and learning objectives in Saskatchewan's newly implemented curriculum include a number of dimensions of learning and outcomes which have not traditionally been measured by classroom assessments.  Various sources point to the challenges of addressing attributes such as ability, effort, attitudes, values and habits of mind in classroom assessments (Ryan, 1987; Stiggins, Frisbie and Griswold, 1989;  Sizer, 1992;  Gaskell ,1995).
To focus on the academic alone would assume some agreement on the social norms that make academic knowledge important.  It would assume that students agree on the value and meaning of school assignments and that they come to the classroom happy, motivated, and fed.  For many young people, this cannot be assumed.  But even in academic classrooms with well-motivated students, teachers want more than good answers from students.  They want responsibility, effort, and a balance of independence and concern for others.  One challenge all these schools confront is how to extend course content and grading to include the wide set of concerns that confront students and , ultimately, all Canadians in a rapidly changing world.            (Gaskell, 1995, p. 85)

Table of Contents

Assessment Literacy

Teachers cite colleagues and their own experiences as students as key sources for learning about assessment strategies.  They almost never cite teacher training.  Most teacher training programs do not require a course in educational measurement, and many do not offer one.  Evaluation courses which are offered emphasize developing paper-and-pencil tests, using standardized tests, and understanding the statistical aspects of assessment.  Most teacher training courses, when offered, virtually ignore daily assignments, performance assessments, tests that accompany textbooks, and oral questions.  Inservice training in assessment is rare.  Many teachers said they neither need nor want more training in testing, but report a strong desire for practical help with specific measurement topics that relate to their students and classroom needs (Stiggins, 1988).

Bol, Stephenson, Nunnery, and O'Connell (1998) report that "if teachers felt well prepared to develop traditional assessment methods, they were also more likely to feel well prepared to develop alternative assessment methods" (p. 327) and that teachers who used alternative assessments were just as likely to use traditional methods of assessment in their classrooms.  Their research showed that elementary teachers were significantly more likely to use alternative assessments than high school teachers, and that elementary teachers had more confidence in the accuracy of alternative methods than middle years teachers.  The authors of the study suggest that since high school teachers have greater numbers of students for shorter periods of time than elementary teachers do, "they may be reluctant to use alternative assessments because of time pressures and workload" (p. 329).

Stiggins,  Griswold and Wikelund (1989) cite the findings of previous research in the measurement of thinking skills.  Teachers reported a heavy reliance on items written from knowledge, comprehension and application levels, and an increasing level of discomfort in writing test items at progressively higher levels of thinking skills. One study analyzed 9,000 test items from 6 subject areas to find that over 90% of the items measured recall of facts and information. The study focussed on mathematics, science, social studies, and language arts, and found that the largest proportion of test items tested recall.  With the exception of mathematics, little or no attention was given to analysis, comparison, and evaluation, and there was only limited use of inference.  The analysis of oral questions indicated that teachers focussed about half their questions on recall.  Inference and analysis were fairly common in oral questioning, but little or no attention was given to evaluation and comparison.  The authors of the study offer this observation about high school assessment:

In the middle school and especially in high school, however, one might expect to find greater emphasis on fostering critical thinking and problem solving.  Teachers tend to be more specialized in  their subject area,  and they tend to prefer to develop their own assessment instruments.  Under these circumstances, we might expect to see more evidence of a range of thinking skills in their assessments.   The lack of some evaluation in this dimension of assessment as one progresses through the grades is a worthy topic of further exploration and discussion.  (p. 243)
In a study designed to identify discrepancies between the practices of teachers and recommendations of measurement experts, Stiggins, Frisbie and Griswold (1989) studied the grading practices of 25 high school teachers in four subject areas.  Information was gathered on 19 dimensions of grading practices addressing student characteristics considered, measurement methods used, and the manner in which grading data were managed and reported.  Descriptions for recommendations for each of the dimensions were gleaned from measurement texts.  When these recommendations were compared with a synthesis of actual practice, discrepancies were noted in 11 of the 19 dimensions studied. Three general possible explanations for discrepancies are suggested:
  1. best practice may be a matter of opinion or philosophical position rather than established fact.
  2. recommendations of measurement experts may fail to take into account the realities and limitations of the classroom.
  3. teachers may not be aware of the recommendations or they may lack the expertise to implement them.
Stiggins (1997), Wiggins (1993a), and Wilson (1996)  question the practice of using composite scores or combined grades.  A common practice is to assign a numerical value for the level of achievement in each of a number of dimensions or criteria, then add or average the scores.  Although this approach is efficient, it does not necessarily provide an accurate or clear representation of student achievement.  If some of the criteria refer to skills and others to knowledge, then this method of scoring gives no indication of the student's relative performance in these areas.  Averages and composite scores do not differentiate between skills, attitudes, or understandings.  None of the purposes of grading (feedback, communication, or motivation) is well served when it is unclear what information is represented by a single grade or composite score.

Wilson (1996) identifies reporting to parents as a major influence on assessment practices.  Most policies and procedures emphasize simplicity, consistency, and standardization.  This means that teachers must accumulate information in ways that respond  to these criteria without taking up too much time from other activities.  Wilson recognizes the usefulness of numbers for the sake of efficiency but adds that "this capacity of numbers to be relatively free of definition in and of themselves allows them to accumulate different meaning to different people" (p. 74).    He cautions that while numbers seem to be neutral, objective, and have the appearance of fairness, the use of numbers to ensure objectivity carries certain responsibilities.  It is possible to manipulate numbers efficiently and correctly, but the validity of these operations depends upon where the numbers came from.   Wilson argues that although quantifying assessment data gives the appearance of objectivity, it is likely that the procedures involved in arriving at the numbers are not objective.

In Wilson's view, most classroom tests merely confirm what teachers expected to find because the expectations are built into the design of the test.  He adds that many teachers believe that the total number of marks awarded is a direct reflection of that component's weight in a composite score.  Consequently, sometimes teachers are actually weighting scores on a different basis than they think.

Daniel and King (1998) studied the knowledge and use of testing and measurement literacy of elementary and secondary teachers.  They report that participating teachers seemed to "lack a basic understanding of the concepts of validity and reliability and did not understand simple test statistics" (p. 340).  They point out that while these deficiencies might not deter teachers in their day-to-day assessment practices, that teachers might be led to "place their confidence in various assessment tools without adequately investigating the psychometric merits of such tools" (p. 341). Teachers participating in this study reported that "knowledge of the construction and use of teacher made tests was among the information they found most useful in their assessment practices" but at the same time they also reported that "they were not particularly familiar with more formalized test construction procedures" (p. 342) such as the use of a table of specifications.

Stiggins (1988) discusses the practice of measuring affective traits and personality characteristics such as effort and attitude.  He points out that teachers rarely have training or expertise in the assessment of these traits and that these are usually measured through subjective judgments formed during personal contact with students.  Since these assessments can affect teacher expectations and students' grades, it is important to note that while these characteristics should be taken into account, there are risks of undesirable side effects when too much of the assessment rests on this type of subjective judgment.

Ryan (1987) argues that because of the difficulty of objective evaluation in various dimensions of learning not traditionally evaluated, subjective evaluation by teachers will grow in importance.  Teachers will need to develop understanding and skill in a variety of alternative assessments in order to gain confidence in subjective evaluation and inspire trust in their assessment methods among students, parents, administrators and the public.

Ryan (1997) reports that some teachers view evaluation of process as "soft" and less objective than evaluation of achievement, while other teachers are able to perceive process dimensions as another type of product.  When teachers doubt test results or the quality or value of particular tests, they will rely on professional judgment based on their basic feelings about a student derived from observation.  Wilson (1996) concurs that "when a teacher has developed an expectation for a student and finds that a particular piece of work is not consistent with that expectation, there will be a tendency to discount the example" (p. 17).

Wilson (1996) observes that much of what teachers learn about students, they learn informally.  Often teachers trust these observations more than more formal testing, although testing also shapes teachers' views on how students are progressing.  Through looking at assignments and observing students at work, teachers pick up information that informs what they teach, how they teach, and for what length of time.

Much of the art of teaching is tied into these intricate interactions.  It is from these encounters with individuals that teachers make up their own minds about student's progress and what will be necessary for growth to occur.  So some of the most important assessments teachers do are not noted, recorded, or perhaps even recognized.  They are simply integrated into the social fabric of the class.  ( p. 5)

Wilson (1996) also points out that while "some teachers report being very confident in their ability to quickly size up a student's ability" (p. 16) acting on those perceptions may result in poor judgements.  Interpretations of events may vary greatly from one individual to another because of the expectations that each brings to the situation.  Teachers may support the "correctness of their view" (p. 16) by referring to the consistency of a student's work over time.  Wilson says, "This finding is persuasive, but only if the assessment of the quality of that work is done independent of previous judgements" which he adds is "an unlikely occurrence" (p. 16).    Wilson explains that teachers' values systems are reflected in the way they establish classroom procedures.  While teacher preferences may be compatible with some students' expectations, they may conflict with others.

Table of Contents

Demands for Accountability

Teachers are faced with many demands and pressures from within and outside their classrooms as they make choices and decide how to evaluate.  They are aware of expectations of parents, postsecondary institutions, employers, taxpayers, and the business community, and of the need to account to these various interests. They feel responsible to meet the needs of individual students and to ensure that curriculum goals are met.

Wiggins (1996a) defines accountability as being responsible for one's work and responsive to its effects.  Attending to results means that we are "properly responsible for our impact or lack of one, not merely our good-faith efforts" (p. 4).  In Wiggins's interpretation, accountability "involves the obligation of teachers to learn from assessment ... and to act on that learning in a timely and effective way" (p. 4).   This definition  represents one of two predominant views of accountability.  In this definition, accountability  is interchangeable with responsibility.  In the alternative view, accountability  means accounting to  or accounting for.    Fagan and Spurrell (1995) provide some background for the accounting  view of accountability.

Historically,education has been seen as something that is useful in improving the quality of life for the individual, not the means by which the nation could achieve greatness.  Today, however, education has become linked with national well-being as much as it is to personal well-being.  It is now one of the world's most valued commodities, with every country striving to outdo others in order to hold a privileged position in the global economy.  Although education has always been highly valued in this country, its marketability is now interconnected with the business and financial world of buying, trading and selling.  It is this linkage that has forced public accountability onto a profession that previously had been accountable primarily to the profession itself.  (p. 2)
In January, 1996, the Saskatchewan School Trustees Association sponsored Setting Standards in Education:  Saskatchewan Standards Symposium.   The purpose of the symposium was to provide a forum for examining education standards in Saskatchewan and to share perspective regarding emerging issues and future directions.  In his address at the 1996 Standards Symposium, Dr. Brian Noonan stated,"whether or not it is accurate, or fair, some elements of the public see student achievement as the measurable 'benefit' of educational costs.  As such, student achievement must be measured against explicit and public standards". (Saskatchewan School Trustees Association, 1996, p. 9).  A major theme brought forward from the 1996 Standards Symposium was that:
The concept of a "standard" differs from one person to another.  To some people, standards are numerical test score cutoffs, to others they are descriptions of desired student learnings.  To still others, they are descriptions of different levels of student performance or school programs and policies.  (Sakatchewan School Trustees Association, 1996, p.3)
Herman, Aschbacher, and Winters (1992) explain the application of standards as part of the unconscious process people use to make judgments:
Experienced teachers commonly compare their students to other groups with whom they're familiar when judging student performance.  They may have a pretty good idea of grade-level performance and typical student behaviour based on past classes, or on comparison with colleagues' classes, or even with the results of state and national assessment data.  ( p.109)
Lewis (1995), in describing the "standards movement" makes the point that there always have been standards in education.  What may be different today is that there is pressure to make standards explicit and public.   A Saskatchewan School Trustees Association poll released in November, 1995, reported that 91.1 percent of those surveyed agreed there should be clearly defined expectations of what is taught and tested in our schools(Saskatchewan School Trustees Association, 1996).  According to the same poll, 76.3  percent agreed that Saskatchewan students should be tested to determine how well they are learning in comparison to the performance of other students in Saskatchewan and Canada.

Fagan and Spurrell (1995) report that since 1977 the number of provinces and territories with provincial examinations has increased from three to nine.  They explain that this method of certification was introduced to "ensure that the principles of comparability and fairness were adhered to" (p. 5) and because it was thought that if there were no common examinations, postsecondary institutions would administer their own entrance examinations which might or might not reflect the curriculum of the schools.  Departmental examinations were also seen as a "convenient selection tool for many education and labour market purposes" (p. 5).   Saskatchewan is named as a  province which has "begun to see the results of the examinations as a major source of accountability information, regardless of the initial reasons for  introducing them" (p. 6).

Gaskell (1995) reports the strong influence of provincial and standardized testing on perceptions and priorities of administrators, teachers, students, and the public.  "What was clear to the research team is the power of exams to shape directions in, and images of, the school, even when academic achievement only partially reflects the school's mission" (p. 266).

Studies of the effects of provincial examinations on teacher behaviours suggest that the use of externally developed high-stakes testing influences instructional practices and decisions involving content, and instructional strategies (Barker, 1997;  Calder, 1997;  Runte, 1998;  Smith, 1991;  Wideen et al, 1997).  Barker's (1997) study of Grade 12 accredited Saskatchewan English teachers refers to tension and ambivalence which she attributes to multiple and conflicting demands in student evaluation.  Wideen et al (1997) report that sixteen of the eighteen grade 12 teachers interviewed in British Columbia believed that provincial examinations had narrowed the curriculum, created psychological pressures, eroded their ability to teach creatively, and directed them away for opportunities for spontaneity and depth.

The fixed mode of evaluation imposed by the high-stakes examination constrains this curriculum model by creating a type of pincer movement on curriculum autonomy.  The fixed objectives provide one part of the pincer and the final examination the other, leaving teachers with very little perceived manoeuvrability regarding instruction.  (p. 440)
An Alberta study by Runte (1998) claims that provincial exams threaten teachers' professionalism in four ways:  a) by deskilling in the area of testing;  b) by enforcing a centralized curriculum;  c) by removing the professional's right to evaluate their own work;  and d) by suggesting questionable measures of teacher productivity.

Runte (1998) discusses the deskilling effect of removing responsibility from grade 12 teachers for designing the final evaluation of their students and placing it in the hands of presumably more qualified individuals designated by the department of education.  Runte points out that this reduction in professional responsibility implies "a corresponding reduction in professional standing."   He argues that "the right to define and evaluate the product of one's own labour is a key professional privilege" and that in order to be fully professional, teachers must be "responsible for defining acceptable levels of achievement for their students".

Eisner (1977) warns that "the existence of statistically reliable achievement tests too often leads to a conception of achievement that is educationally eviscerated.  Our tools, as useful as they might be initially, often become our masters" (p. 349).  Wiggins (1996a) advocates rethinking accountability and states that "building accountability out of once-a-year and end-of-year generic tests shows that policy makers do not understand accountability" (p.8).   He hastens to add, however, that the use of performance tests and portfolios is not enough in itself to bring educational improvement and that "merely collecting more interesting student work by itself yields no further learning for anyone" (p. 9).  He offers the idea that:

Real accountability reform may be as simple as beginning to make each teacher publish their tests, assignments, and samples of student work with commentary; ask them to formally self-assess its meaning; and ask them to respond to parental, board member, and college professor comments.( p.9)
This version of accountability may be alarming to teachers since it appears to put them in the position of total responsibility for student achievement when we know that there are many conditions for learning over which teachers have little or no control.  Taken at face value, Wiggins's idea seems to put teachers in an impossible and uncomfortable position where their perceived effectiveness and worth, and consequently their job security, depends upon the performances of students.  Wiggins (1996a) argues that  "The issue is not firing or demoting people.  Accountability  means making teachers obligated  to seek feedback and to attend to it, especially when results are worse than intended or expected"  ( p.9).

What Wiggins is saying is that before we will see any significant improvements to learning, we need to shift our thinking about accountability from a focus on intentions to a focus on results. This might be the right medicine but, from the teacher's point of view, Wiggins's prescription looks like a bitter pill with possible unpleasant side effects.

Hargreaves and Fink (1998) argue that evaluation embedded in the concept of school outcomes or standards will "undermine more than enhance worthwhile improvement efforts, as long as accountability is regarded more as a process of technical auditing than an interactive conversation between school professional and educational partners outside the school"  (p. 47).  What is missing in Wiggins's proposal is the support for teachers that would come from a trusting and respectful relationship with other stakeholders in the community.  Hargreaves and Fink discuss the "emotional dimension" of improvement and change:

Creating conditions in schools that spark feelings of hope and sense of efficacy among teachers which benefit themselves and their students.  The emotional dimension of change also draws attention to the necessity of avoiding reform strategies, leadership styles, and work conditions which create conditions of hopelessness, feelings of guilt (by being overwhelmed), and of shame (by being blamed for failure or being placed at the bottom of league tables of performance).  These sorts of emotional conditions in schooling reduce teachers' sense of efficacy and their ability to provide quality education for students.  ( p. 49)
Bernauer and Cress (1997) argue that "the primary condition for effective assessment is that teachers, principals, and the community -- rather than a district, state, or the federal government -- be at  the center of the assessment enterprise" (p. 73).  They go on to say that the "true power of new assessment methods will emanate from agreements at the school level concerning goals and standards and the assessments that best measure them" (p. 73).  They maintain that as an essential condition for success Òteachers, principals, and parents must be provided regular and systematic opportunities during the school year to reflect on their goals and expectations for student learning and the relationship of these goals and expectations to emerging  national standards" (p. 74).  Further; they contend that  "external accountability assessments would be far more effective for improving education if they were built on this recognition of teacher professionalism"  (p. 74).

Eisner (1977) offers this rationale for teachers' resistance to the prevailing interpretation of accountability:

The pressures toward accountability defined in terms of specific operational objectives and precise measurement of outcomes are pressures that many teachers dislike.  Their distaste for these pressures is not due to professional laziness, recalcitrance, or stupidity, but is due to the uneasy feeling that as rational as a means-ends concept of accountability appears to be , it doesn't quite fit the educational facts with which they live and work.  ( p. 351)

Table of Contents

Diversity in the Classroom

The Exemplary Schools Project (Gaskell, 1995) was a major research study of 21 highly diverse, selected secondary schools across Canada.  Two Saskatchewan high schools, Joe Duquette and Balfour Collegiate, were included in the study. The study recognizes the dilemma for high school teachers in making decisions about their grading practices when they take into consideration the cultural diversity and exceptional needs in their student population.

There is unease in schools about grades based only on academic performance.  Students are urged to work not just for grades but for learning and enjoyment.  Deciding on grades, especially for students at risk, raises the question of what counts as success. In the larger context of the social purpose of schooling, grades can be a problem as well as a fundamental measure of success.  (Gaskell, 1995, p. 82)
Sizer (1992) discusses the conflicting goals of respecting student diversity and the use of formal tests in the application of standards.  He raises questions about the larger purpose of schooling: what should count as success, and how this might be measured.
Tests may usefully tell us what a student can display at a given moment, but can they predict for us the promise of a student's disposition to use knowledge effectively when faced with important new situations?  Are precisely similar tests administered in precisely the same manner necessarily fair?  How are the differences among students -- not their position on a single scale of presumed ability, but their richly varied ways of addressing the world -- to be gathered in, accommodated, honoured?    (p. 111)
Noddings (1997) addresses considerations of diversity and equity when she points out that "students will use their educations for very different purposes" and although there may be certain skills that  "should be universally possessed at some level of proficiency," it is not clear which ones they should be.  She asserts that "whatever is specified for all  is likely to be pathetically puny in contrast to what could be suggested if relevant differences in talents, plans, affiliations, and interests were taken into account"  (p. 186).  Noddings cautions that "any set of standards rich enough for a particular student will contain items unnecessary for many, and any set designed realistically for all will, paradoxically, be inadequate for anyone considered individually" (p. 185).

Table of Contents

Multiple Purposes of Evaluation

Understanding the purposes of evaluation helps teachers make decisions about the types of assessments and criteria they will use in evaluating student progress.  The purpose of an assessment may be clarified by asking "who is this information for and how will it be used?"

Wiggins (1993b) asserts that "assessment should improve performance, not just audit it" (p. 5).  Costa and Kallick (1997) state that the "whole point of education" is missed if students graduate dependent upon others to "tell them when they are adequate, good or excellent".  In their view the "ultimate purpose of evaluation is to have students be self-evaluative" (IV-2:1).  These writers address the purpose of evaluation from a philosophical platform.  Others examine purpose from a more technical perspective.

Wilson (1996) lists multiple and various goals for assessment including, but not limited to:  a) feedback to students;  b) diagnostic information;  c) summary data for record keeping;  d) evidence for reports;  e) curriculum revision.  Stiggins (1991) reports that teachers use assessments in their classrooms to serve at least three different categories of purposes:  a) to inform decisions related to evaluation, grading, diagnostics and referrals, sorting and grouping;  b)  to support teaching and learning through the communication of expectations, and the involvement of students in self- and peer-evaluation;  and c)  to maintain classroom management or behaviour control.  Natriello (1987) identifies four generic functions thought to be served by evaluation:  a) certification,  b) selection,  c) direction, and  d) motivation.  Wilson (1990, in Wilson 2000) found that "teachers beyond the primary level reported that they used their formal evaluation mainly to generate marks for reporting purposes" (p. 4).   Assessment was used "first and foremost to implement the school's policies on reporting" (p. 4) and other purposes, such as providing feedback to students and informing instruction, were secondary.

Kohn's (1994)  approach to examining the purpose of evaluation begins with a discussion of three levels of inquiry.  The first, or the "how to" level, assumes that what students do must receive grades and that "students ought to be avidly concerned about the ones they get" (p. 37).   Level 2 challenges traditional assumptions and looks for ways to provide a "richer deeper description of student achievement" (p. 38).  The third level challenges the whole enterprise of assessment and looks at "why" as opposed to "how" we evaluate students.

Why do we want students to improve?  This question at first seems as simple and bland as baby food;  only after a moment does it reveal a jalapeno kick:  it leads us into disconcerting question about the purpose of education itself.  (p. 39)
Kohn (1994) describes the distinction between two values systems or paradigms  -- one focusing on what students ought to be able to do, and one focusing on what we can do to support student learning and development.  He calls these "demand" and "support" models.  In the demand model, students are "workers who are obligated to do a better job" (p. 40), and teachers evaluate in order to "report whether students did what they were supposed to do" (p. 40).  The demand model "manifests itself in the view of education as an investment, a way of preparing children to become future workers" (p. 40).

The support model connects with the principles of a learner-centred curriculum, in which "the point is to help students act on their desire to make sense of the world" (p. 40).  Student evaluation, in the support model, "is, in part, a way of determining how effective we have been as educators" (p. 40) in providing the engaging tasks and supportive environment that will encourage student improvement. These models represent perspectives grounded in differing beliefs about the purpose of education.

Gaskell (1995) and Sizer (1992) raise questions about the larger purpose of schooling: what should count as success, how this might be measured, and how evaluation practices connect with beliefs about the purpose of schooling.  Eisner (1991) asserts that "our understanding of the mission of schooling must go beyond the merely measurable to a consideration of more profound purposes" (p. 12).  He suggests that what really counts in education is the promotion of wonder and imagination, multiple forms of literacy, community, diversity, and personal expression, and that "our educational aspirations should not be defined by the current limits of our testing technology (p. 14).

The process of compromising or reconciling the beliefs and values of the various stakeholders in public education in our pluralistic society results in conflicting opinions and dilemmas in the realm of student evaluation.  Stiggins, Frisbie and Griswold (1989) identify one such conflict between communication and motivation.

Grades represent the communication link between students and decision makers who will influence students' lives for years to come.  In addition, for many, they represent the heart and soul of their attempts to motivate students to learn.  Often, it appears, these purposes of motivation and communication are at odds with one another.  (p. 14)
Natriello (1987) calls for research on classroom assessment and teachers' evaluation practices that takes into consideration explicitly which of the multiple purposes of evaluation can best be served by which particular assessment and grading practices.  He explains:
The design of an evaluation system for the purpose of enhancing student motivation might involve a differentiated task structure in the classroom, a mix of more and less predictable tasks, clearly articulated criteria, challenging yet attainable, self-referenced standards, relatively frequent collection of information on students performance, appraisals that truly reflect student effort and performance, and differentiated and encouraging feedback.  An evaluation system designed for purposes of certification would look quite different.  (p. 171)
Ryan (1997) presents the concept of professional obligement  to describe teachers' "unwritten and often unarticulated beliefs about the demands that being a teacher places on their actions" (p. 121).  He suggests that over time, teachers construct a moral framework, "a prior sense of morality about what teachers ought to do," which enables them to "manage their actions to avoid dilemmas"  (p. 132).  The term, professional obligement, refers to the overarching framework of values within which teachers conduct responsibilities to students, parents, curriculum, business, postsecondary institutions, and their profession.  His study examines the conflicts that arise as teachers weigh the demands of their professional obligement toward students against the demands of their professional obligement toward the world outside of school.  Ryan reports that teachers in his study  "anticipated possible moral shoals and sailed serenely around them, buoyed by confidence in their sense of professional obligement" (p. 132).

Table of Contents

Professional Development and Teacher Training

Stiggins, Frisbie and Griswold (1989) refer to the need to address beliefs and values  as a preliminary step in professional development.  Stiggins (1988) also refers to the need for teachers to be aware of the various goals and purposes of assessment.  These two conditions for improving teachers' assessment literacy are strongly connected to the concept of professional obligement.  Based on current understanding of good practice, Stiggins, Frisbie and Griswold make four recommendations for teacher training and professional development.  The first of these is that training should help teachers explore their own philosophical positions and the implication of those positions for the meaning of grades and their grading practices.

Lock (2000) reports two case studies conducted to explore the classroom assessment practices of teachers and to understand how different perspectives and philosophies of learning can influence these practices.  The data indicated that for the two teachers in the study, “perceptions of knowledge and their theories of learning (both conscious and tacit) shaped how they interacted with students and how they structured their teaching and assessment practices” (p. 110).  This study suggests that “the internal consistency with which teachers teach, assess, and interact with students” (p. 12) may need to be taken into account in any effort to change or develop teachers' assessment and evaluation practices.  She adds that “if  teachers' perceptions of knowledge and learning form a core set of beliefs upon which their classroom practices are based, then a change in practice might require a close examination of those core beliefs” (p. 12).

Table of Contents

Key Questions

The purpose of this study was to develop an understanding of how teachers make decisions related to student evaluation.  Four essential questions that would contribute to this understanding of emerged from the review of the literature:

  1. What do these three teachers believe about the purposes of evaluation?
  2. What are the connections between curriculum goals and evaluation methods?
  3. To whom do these teachers feel accountable?
  4. How do these teachers manage conflicting responsibilities and goals within the process of student evaluation?

Table of Contents

Part II: Reconciling Goals, Responsibilities, Beliefs and Values: Descriptions of Teachers' Practices

The purpose of this study was to gain a better understanding of the ways in which teachers perceive and deal with various considerations which affect their decision making as they engage in the process of student evaluation.  A qualitative or descriptive research approach involving three case studies was chosen.  Semi-structured and open interviews were used to collect participants’ descriptions of and reflection on their experience.  The study examined the ways in which three grade 12 teachers in a rural Saskatchewan school conduct students evaluation and how they make decisions about measuring and communicating students progress and achievement.

The three teachers in the study demonstrated passion and enthusiasm for their subject areas and had no difficulty expressing what they hoped would be the most important learnings to come out of their classes.   Although they were committed to new curriculum objectives and willing to expand their repertoires of instructional strategies, they seemed less committed to incorporating new assessments for better alignment between curriculum, instruction, and evaluation.

All three teachers referred to students' lives beyond school when they talked about what they wanted students to take from their classes.  These references varied from becoming productive, employable, and responsible to other less tangible goals more closely associated with personal fulfillment and self-actualization.  Desired outcomes were broad, enduring, and difficult to measure.  They included:  1) an appreciation of the wonder of biology and sufficient understanding of scientific concepts to contribute to enlightened decisions in the future;  2)  developing logic, an appreciation for order, and a love of learning;  and 3) acquiring confidence and comfort with information processing and communications technology.

These goals represent the teachers' personal values, principles and personalities.  They are the essence of teaching and learning in their classrooms, tacitly and explicitly informing decisions and colouring interactions with students and curriculum.  These goals, decisions, and actions reflect teachers' beliefs about knowledge, learning, and purposes of evaluation.  Teachers' beliefs “provide a filter through which new knowledge passes before it can be incorporated into practices” (Wilson, 2000, p. 6).

The teachers in the study used references from three contexts to inform their decisions about student evaluation:  a) personal values and priorities, b) professional responsibilities, and c)  public expectations.  These contexts are not discrete, but connected, overlapping, and in many ways dependent.  I observed that included among these references were a number of assumptions and traditions that informed grading practices and other aspects of evaluation.   Analysis of the data suggested that assessment literacy and interpretations of accountability were the two predominant categories of factors that influenced the way these teachers conducted evaluation in their classrooms.

Table of Contents

Assessment Literacy

Assessment literacy is an obvious and important factor that influences teachers' decisions about evaluation.  Teachers' understandings of the purposes of evaluation, and their technical knowledge and skill in choosing, designing and using various assessment strategies strongly influence their choices.

Table of Contents

Purposes of Evaluation

The ways in which these teachers chose to assess and evaluate student progress and achievement was influenced by what they understood and believed the purposes of evaluation to be.  All three teachers in the study  agreed that a major purpose of evaluation is to provide a record of academic standing which acts as a credential for students applying for admittance to postsecondary educational institutions or seeking employment.  Other purposes of evaluation identified by teachers in the study were:  a) to determine and communicate how much of the curriculum students have mastered;  b) to help teachers monitor their instructional practices and find strategies to meet the needs of individual students;  and c)  to support student learning and affirm students in their learning. There was disagreement about the formative role of evaluation at the grade 12 level.  Two teachers recognized the value of assessment data in planning or improving instruction.  The third teacher, while acknowledging this important role of assessment in earlier grades, did not regard the improvement of instruction to be a primary function in grade 12.  This teacher thought that the purpose of evaluation at the grade 12 level was to help give students a clear picture of their strengths and limitations so that they could consider realistic options in their plans for the future.

Throughout the study conversations about evaluation were quickly narrowed to conversations about grading.  Although the teachers in the study mentioned that they used assessments to motivate students and monitor instruction, comparatively little attention was given to these uses of classroom assessment.  The purpose of communicating achievement strongly overshadowed any other uses of assessment data.  The teachers in the study were strongly focused on grading and on summative evaluation.  Assessment strategies were selected to serve the goal of determining a final grade.

Table of Contents

Assessment Strategies

In all cases, methods for determining students' final grades relied predominantly on quizzes, examinations, and projects or performance assessments.  One teacher reported using observation checklists and one used performance task assessment lists.  The teachers did not report any use of portfolios, rating scales, rubrics, group assessments, journals, or student self- or peer assessments.  Although group work was commonly used by all three teachers as part of instruction, none of them used group assessments to evaluate student achievement.

Two of the teachers used paper and pencil tests extensively in their evaluations of students.  Neither teacher used a table of specifications when designing examinations.  One of these teachers believed that he was including questions on his tests that prompted higher order thinking, but although these questions involved specificity, complexity, and detail, they actually only prompted thinking at the levels of recall, comprehension, some application, and occasional analysis.

Teachers chose to include scores from practices and daily assignments as part of their summative evaluations because they believed that students would be motivated to apply more consistent effort if their work was being scored and included in their grades.  Two of the teachers in the study believed that students were affirmed and motivated by the feedback from assessments.  Feedback was usually in the form of a score or grade, sometimes accompanied by brief anecdotal remarks, usually some form of encouragement or praise.

It was important to all three teachers that students' grades reflect effort and attitude in addition to achievement, and although the soundness and consistency of their methods for doing so may have been questionable, all three found ways to incorporate effort and attitude into their evaluations of students.  Each teacher was able to articulate a set of indicators to identify acceptable effort and desirable attitudes.  These indicators, reflecting teachers' personal values and perceptions, became a part of both informal and formal assessments.

Non-traditional objectives incorporated in the recent renewal of Saskatchewan curricula  posed another challenge for the teachers in this study.  As Ryan (1987) points out, the evaluation of dimensions of learning involving process, attitudes, and affective responses requires the ability to use subjective and high inference evaluations.  The teachers in this study struggled with that.  Their reflections revealed some confusion between the notions of subjectivity, reliability, and clarity of objectives and criteria.  They also expressed a lack of confidence and a sense of vulnerability if they were unable to quantify assessment data.  This reaction led them either to discount  “subjective” or qualitative data, or to use somewhat shaky methods of including it when grading students.

Teachers identified criteria for choosing among the various measurement tools and assessment strategies.   One teacher in the study referred several times to the need for a match between the focus of the assessment and the instructional objectives.  All three teachers recognized the need for varying assessments to include a range of options for students to demonstrate their learning.  Other criteria involve pragmatic considerations such as the time required for the assessment, the ease of developing and scoring the assessment, and attributes of the assessment such as reliability, validity, and objectivity.  In choosing assessments, teachers in the study said they were looking for ease of use, efficiency, fairness, and alignment between assessments and objectives.

Classroom assessments were organized into evaluation plans or grading schemes according to topics of study, rather than by domains or dimensions of learning as suggested in curriculum documents.  This choice suggests the influence of tradition or a hangover from the textbook approach.

All three teachers in the study indicated that they were comfortable with the assessment strategies they were using, and believed that students and parents were satisfied with them, as well.

Table of Contents

Interpretations of Accountability

Teachers are accountable on many levels -- to students, parents, colleagues and the community.  The priorities teachers assign to each area of accountability influences how they evaluate students.   There are two predominant views of accountability:   a) accountability  in terms of responsibility, and b) accountability  in terms of accounting to  or accounting for.  Currently, Saskatchewan teachers are challenged by three themes representing these two views of accountability:  a) alignment  of assessment strategies with the expanded goals of the renewed curriculum,  b) greater awareness of and response to the diverse needs and interests of students, and c) increased focus on performance standards.

All the teachers in the study identified accountability to students as their first and foremost priority,  but they differed in their interpretations of what it means to be accountable to students.  Each teacher's interpretation, however, included the recognition that employers, universities, and trade schools have certain assumptions about qualifications and prerequisite knowledge making teachers are accountable in their evaluations to indicate whether students can meet these expectations.   Ryan (1997) comments:

Saskatchewan teachers take this responsibility for so much of the students' grades very seriously.  In the current era of quotas and restricted access to universities and other institutions of postsecondary education, the pressure on teachers has become intense as students realize the teacher's pivotal role in the determination of students' future paths.  (p. 127)
Dilemmas related to the conflicting purposes of evaluation are closer to the surface for grade 12 teachers than they are for the rest of us.  Departmental exams, the accreditation policy, scholarships, admissions to postsecondary programs, and a shift in curriculum objectives and instructional strategies are forces which bring evaluation issues to a sharper focus at the grade 12 level. The review of the literature contains numerous references to multiple purposes and conflicting goals of evaluation (Briscoe, 1993 in Wilson, 2000;  Kohn, 1994;  Natriello, 1987;  Stiggins, Frisbie and Griswold, 1989;   Wilson, 1996).  Eisner (1977) describes these conflicts as contradictory forces arising from different paradigms:
Pressing demands for increased accountability directives are combined with pleas for more and better assessment systems that provide rich data about specific student learning.  These seemingly contradictory forces have made assessment reform a somewhat schizophrenic activity.  Can these two paradigms coexist and complement one another, or are they fundamentally different and likely to be constantly in each other's way?    ( p. 351)
One would expect the teachers in the study to report numerous pressures and dilemmas arising from the many, and often contradictory, goals and purposes of evaluation related to divergent conceptions of accountability.   Contrary to expectations, the teachers said that they did not feel pressure from parents or students about grading, and they reported very few conflicts or dilemmas.  They also seemed to be unaware of, or comfortable with, the power relationships and moral responsibilities involved in the evaluation of student achievement.   These findings are similar to those observed by Ryan (1997), who offers the following explanation:
These teachers seemed rarely caught on the horns of moral dilemmas or in knots.  They were able to develop a way of conducting formal student evaluation such that they balanced what they saw as their professional obligations with their deeply held, sometimes tacit, moral framework of the responsibility of being a teacher.  (p. 133).
Ryan (1997) concluded that the teachers in his study had “moral frameworks that were not static, waiting to come into conflict with heretofore unconsidered problems,” but that their moral frameworks were actively and purposively “guiding their practice in ways that steered them around  situations of moral conflict.” (p. 132).  There was some evidence that, in this study also, teachers were using their senses of professional obligement to avoid moral conflicts.  This was particularly apparent in discussions about: a)  evaluation of effort, attitude and values, b) conceptions of fairness, and c) accommodating students with special needs or learning disabilities.

Table of Contents

Departmental Examinations and Teacher Accreditation

Departmental examinations are seen to be a measure for ensuring accountability. Teachers in this study recognized departmental examinations as: a) model assessments, b) a means for ensuring adherence to and coverage of the curriculum, and c) a way of maintaining a consistent and acceptable standard of achievement.  Inservice training sessions for the implementation of Core Curriculum in Saskatchewan have addressed the goal of strengthening teachers' assessment literacy to assist them in supporting the goals of the renewed curriculum.  This approach may be evidence that prevailing interpretations of accountability have resulted in ideological proletarianization (Runte, 1998) as a compromise between teacher autonomy and deskilling.

Gaskell (1995) comments on the “power of exams to shape directions in, and images of the school” (p. 266) and observes that subjects for which there is no departmental exam are sometimes perceived to have a “lower status” than the “academic” subjects.  Teachers in this study tended to categorize subject areas as “academic” and “non-academic.”  They also tended to use more alternative assessments, rely less on paper-and-pencil tests, and forgo a final examination when teaching courses that were not perceived as “high stakes”,  such as accounting, information processing, and modified programs.

Accreditation may be seen as a manifestation of a particular notion of accountability.  Two of the teachers in the study had chosen to become accredited.  They believed that taking full responsibility for the evaluation of their students afforded them greater flexibility and autonomy in interpreting the curriculum.  One teacher suggested that accreditation provides a broader range of options for assessing students with learning disabilities, exceptional needs, or strong preferences or modalities.  The other teacher described accreditation as part of the process of becoming professional or in his words, “growing up as a teacher.” The third teacher in the study has elected not to be accredited and expressed concerns about “artificial inflation of marks” and inconsistencies in standards and grades that might interfere with opportunities for students.  Barker (1997) recognizes the tensions and ambivalence that exist for accredited teachers as they attempt to balance autonomy with accountability.

Table of Contents

Part III: Student Evaluation, Accountability and Professionalism: Examining Paradoxes and Relationships

Examining Incongruencies

Analysis of the data revealed several paradoxical relationships between instruction and assessment practices.  Teachers in the study who claimed the goal of independence used little or no self- and peer-evaluation strategies.  The teachers recognized the value of group work and used this strategy as part of their instructional program, but none of them used group assessments.  Teachers were aware of a broad range of alternative assessments from workshops, curriculum guides, and other professional activities, but they chose to use a fairly limited range of more traditional assessments.   Teachers expressed reservations about “subjective” evaluation strategies.  They recognized quantifiable assessments as more objective and therefore more credible, but they seemed to use grades to affirm their prior judgements of students' achievement based on informal assessments, observations, and teachers' practical knowledge.

These incongruencies between instruction and evaluation suggest that these teachers are working from two distinct values systems or paradigms.  Their instructional practices are grounded in the support model (Kohn, 1994), which is learner-centred and interested in helping students act on their desire to make sense of the world, while their evaluation practices are grounded in the demand model, in which students are “workers who are obligated to do a better job,” and teachers evaluate in order to “report whether students did what they were supposed to do.”    This incongruity or lack of alignment may have a negative effect on professionalism.

Table of Contents

Student Evaluation and Teacher Professionalism

Two forces conspire to keep teachers from wholeheartedly engaging in the evaluation of students.  They are a lack of confidence and lack of meaning. The teachers in the study were passionate about their subject areas and committed to providing quality instruction, but this same level of passion and commitment was not evident in their evaluations of students' progress and achievement.  Formal assessments were almost entirely summative in nature and directed to the purpose of determining a final grade.  The prevailing interpretation of accountability strongly influenced teachers' evaluation practices.  One important way in which this interpretation of accountability asserts itself is through the presence of departmental examinations.

Runte (1998) points out that the use of provincial examinations removes responsibility from grade 12 teachers for designing the final evaluation of their students and reasons that this reduction in professional responsibility implies “a corresponding reduction in professional standing” (p. 166).  He adds that teacher autonomy is threatened by centralized testing also because the consequent enforcement of a centralized curriculum “limits the range of skills required in making curricular decisions” (p. 167).  The shift from a student-centred curriculum, in which teachers must exercise considerable expertise and autonomy in addressing the needs of individual students, to a content-centred curriculum, in which teachers require only the technical skills needed to implement the decisions of the central bureaucracy, has the effect of further limiting the teacher's “need for, and claim to, professional autonomy” (p. 167).  The third threat to professionalism posed by departmental exams, according to Runte, is that they “remove the teacher's monopoly over standards” (p. 168).  He argues that “the right to define and evaluate the product of one's own labour is a key professional privilege” and that in order to be fully professional, teachers must be “responsible for defining acceptable levels of achievement for their students” (p. 168).  Ornstein and Cienkus (1995) concur stating that “teachers need to evaluate students, in terms of performance and progress, otherwise they are surrendering an important role in teaching” (p. 69).

Although there is agreement that evaluation is an essential part of teachers' professionalism, teachers' evaluation practices suggest that they are almost willing to defer to the central authority for the final judgement of their students' achievement.  Many teachers choose not to become accredited, and those who do frequently use the departmental examination as the model for their own teacher-developed assessments.  Ryan (1997) observed:

I anticipated that the notion of being a professional would exert a powerful influence on teachers' student evaluation practices.  After all, this is the area in which teachers' actions are held up for the greatest public scrutiny and in which the presence of a metaphorical white coat of professionalism might be expected to be highly visible.  However, these teachers were not particularly concerned about their public personas.  For them, professional obligement appeared to stem from obligations inherent in being a teacher of particular students, whether or not they regarded themselves as professionals or craftspeople. (p. 133)
Rather than the “white coat of professionalism,” teachers are wearing camouflage clothing.  In the current political climate, with demands for accountability and strong media and public criticism of public education, evaluation can be risky business.  Our system offers very few assurances to teachers that their evaluations are respected, trusted, or valued, or that their decisions will be supported by their administrations or by the department of education.   Why are teachers capable of evaluating student performance for the first eleven years, but not on the last day of June in grade 12?  What does this say to teachers and to the public about the value, purpose, or quality of all the other assessments that take place day to day throughout the years?   The language used in policy documents implies a lack of confidence in teachers' ability to fulfil the professional responsibility of evaluating students.

Accreditation means “granting” the responsibility of determining the final mark to teachers who have met “requirements” established by the department.  Hawn (1998) says that accreditation “empowers” teachers and is a “professional privilege that must be zealously guarded” (p. 50).  The policy of accreditation actually only restores to teachers a responsibility that is theirs anyway.  It may be argued that this policy which seeks to accredit certain teachers, in fact serves to discredit the rest. How did the evaluation function of teaching become a privilege rather than a right and a responsibility, and why do teachers seem prepared to relinquish their mandate?  Perhaps it is because there is some comfort for teachers in having a central authority assume the role of meeting demands for accountability, even if it means the loss of this aspect of professionalism.

Barker (1997) discusses the ambivalence experienced by accredited teachers as they deal with the tension between autonomy and accountability.  She concludes that “above all, teachers must recognize the vital link among intentions, practices and evaluation.  Through that will come meaning” (p. 118).   When curriculum goals and instructional practices come from the support paradigm and evaluation comes from the demand paradigm the resulting misalignment can make the evaluation process seem less meaningful.  A lopsided emphasis on summative evaluation for the purpose of comparing student achievement is not likely to be highly valued by teachers.

The foundational objectives of Core curriculum reflect beliefs associated with the support paradigm.  The prevailing narrow definition of accountability drives evaluation into the demand paradigm and disrupts the alignment or congruencies between curriculum, instruction, and evaluation.  This incongruency makes student evaluation less meaningful and undermines teachers' commitment to enhancing their classroom assessment practices.  As the pressures for accountability increase, teachers exercise less autonomy in their evaluation practices.  Research has shown that elementary teachers are significantly more likely to use alternative assessments than high school teachers, and that elementary teachers have more confidence in the accuracy of alternative methods than middle years teachers (Bol et al., 1998).

As long as teachers are working from one belief system when they teach and from another when they evaluate, it is unlikely that the congruencies can be found that will give their evaluations meaning.  At the same time, pressures for accountability promote beliefs that erode teachers' confidence in their own ability to evaluate.  It is unlikely that teachers will engage wholeheartedly in enhancing classroom assessment or improving student evaluation under these conditions.

Table of Contents

Implications for Professional Development and Teacher Training

Professional development and teacher training in student evaluation have traditionally involved building an awareness of a variety of assessment strategies and developing an understanding of how to design and use them.  Teachers' practice in classroom assessments has not changed significantly over the past fifteen years regardless of curriculum changes and professional development programs in student evaluation.  Experience indicates that something more than knowledge and understanding of assessment strategies is needed to enhance classroom assessment practices.  Consultation with the literature and reflections on this study suggest several implications for professional development.

  1. Professional development in student evaluation should address teachers' beliefs about learning and their understandings of the multiple purposes of evaluation.
  2. Teachers learn how to evaluate from other teachers.  Teacher training and professional development should include opportunities to work with colleagues or mentors in activities involving collaborative reflection and values clarification.
  3. Teachers in this study indicated that much of their learning took place by trial and error or through reflection on practice.  This conclusion suggests that strategies for reflection and action planning might be an important part of teacher training and professional development.
  4. Teacher training and professional development programs in student evaluation must recognize the integrated nature of curriculum, instruction, and evaluation.  It is the congruencies between these aspects of teaching and learning that will give evaluation meaning and inspire teachers' confidence to implement a wider range of alternative assessments in their classroom practice.

Table of Contents

Recommendations for Further Research

Student evaluation is a comprehensive and complex subject involving many philosophical, pedagogical, and political discussions.  This study has focussed on teacher decision making in classroom assessment practices within the context of the Saskatchewan curriculum.  Analysis of the data suggest several other topics for future research:

  1. Examine power relationships, moral responsibilities, and the influence of personal and community values in the evaluation of student progress and achievement.
  2. Identify, develop, and determine appropriate use of subjective and high inference assessments for the evaluation of dimensions of learning involving process, attitudes, and affective responses.
  3. Survey and analyze recommended and actual practice from a broad range of sources representing various curriculum orientation and beliefs associated with the purposes of evaluation.  These sources should include the recommendations of recognized organizations such as National Council on Measurement in Education, and standards developed by those organizations primarily interested in the certification of competencies.
  4. Examine the feasibility or ease of use of alternative assessments in the context of contemporary high school classrooms and current pressures and constraints.
  5. Examine the role of students in the evaluation of their progress and achievement.

Table of Contents

Concluding Remarks

Student evaluation is a complex and essential aspect of our educational experience.  There is a need for congruency between the goals and purposes of curriculum, instruction and assessment if student evaluation is to be meaningful and fair.  School boards and administrations can promote improvements in student evaluation by establishing environments of support and trust in which teachers can engage in collaborative reflection and professional dialogue that will lead them to examine their purposes, clarify values, and identify and develop the skills and strategies needed for sound assessment and evaluation practices.  School boards and administrations can further assist by developing school division and school policies of assessment and evaluation based on principles that reflect shared beliefs and values about purposes of education and evaluation, and that include recommended practices which are well aligned with the goals of Core Curriculum.

Table of Contents


Barker, W.  (1997).  Conversations with grade 12 accredited English teachers:  Towards the meaning of evaluation.  Thesis.  University of Regina.  Regina, SK.

Bernauer J. and Cress, K.  (1997).  How school communities can help redefine accountability assessment.  Phi Delta Kappan, 79 (1) ,  71 - 75.

Bol, L., Stephenson, P., Nunnery, J., & O'Connell, A.  (1998).  Influence of experience, grade level, and subject area on teachers' assessment practices.  Journal of Educational Research, 91(6), 323 - 329.

Calder, P. (1997).  Impact of diploma examinations on the teaching-learning process.  A Study Commissioned by the Alberta Teachers' Association.

Costa, A. L., & Kallick, B.  (1997).  Reassessing assessment:  Seven issues facing Renaissance schools.  In R. E. Blum and J. A. Arter.  (Eds.), A Handbook for Student Performance Assessment in an Era of Restructuring.  Alexandria, VA:  ASCD

Daniel, L. G., & King, D. A.  (1998).  Knowledge and use of testing and measurement literacy of elementary and secondary teachers.  Journal of Educational Research, 91 (9), 331 - 344.

Earl, L., & Cousins, B.  (1995).  Classroom assessment:  Changing the face;  facing the change.   Mississauga, ON:  Ontario Public School Teachers' Federation.

Eisner, E.   (1977).  On the uses of educational connoisseurship and criticism for evaluating classroom life.  Teachers College Record, 78 (3),  345 - 358.

Eisner, E.  (1991).  What really counts in schools.  Educational Leadership, 49 (5), 10 - 17.

Fagan, L., & Spurrell, D.  (1995).  Evaluating achievement of senior high school students in Canada.  Toronto:  Canadian Education Association.

Gaskell, J.  (1995).  Secondary schools in Canada:  The national report of the exemplary schools project.  Toronto:  Canadian Education Association.

Glesne, C., & Peshkin, A.  (1992).  Becoming qualitative researchers:  An introduction.   White Plains, NY:  Longman.

Haas, J.  (1995).  Standards, assessments, and students:  Encouraging  both equity and excellence. Bulletin, 79 (573),  95 - 101.

Hargreaves, A.,  & Fink, D.  (1998).  Effectiveness, improvement and educational change: A distinctively Canadian approach?.  Education Canada.  (Summer).  42 - 49.

Hawn, J.  (1998).  Assessment in senior high English language arts:  A reflection.  University of Saskatchewan.  Saskatoon, SK: Unpublished M. Ed. Project.

Herman, J., Aschbacher, P. and Winters, L. (1992).  A practical guide to alternative assessment.  Alexandria, VA:  ASCD.

High School Review Advisory Committee,  (1994).  High School Review Advisory Committee Final Report.  Saskatoon, SK:  Author.

Joint Advisory Committee.  (1993).  Principles for fair student assessment practices for education in Canada.  Edmonton, AB:  University of Alberta, Centre for Research in Applied Measurement and Evaluation.

Kohn, A.  (1994).   Grading:  The issue is not how but why.  Educational Leadership,  52 (2), p.38 - 41.

Lewis, A. C.  (1995).  An overview of the standards movement.  Phi Delta Kappan, 76 (10), 744-750.

Locke, C.  (2000).  The influence of two teachers' perceptions of knowledge and learning on their classroom practices.  Paper presented at the annual meeting of the Canadian Society for the Study of Education, Edmonton, Alberta, May, 2000.

Merriam, S. B.  (1988).  Case study research in education:  A qualitative approach.  San Francisco:  Jossey-Bass.

Natriello, G.  (1987).  The impact of evaluation processes on students.  Educational Psychologist, 22 (2), 155 - 175.

Noddings, N.  (1997).  Thinking about standards.  Phi Delta Kappan,  79 (3), 184 - 189.

Ornstein, A. C., & Cienkus, R.  (1995).  Evaluation of students:  A practitioner's perspective.  The High School Journal, 79 (1), 65 - 71.

Ryan, A.  (1987).   A position statement intended for the guidance of curriculum developers engaged in planning the evaluation of students on objectives not traditionally evaluated.  A paper written under contract as a professional service to Saskatchewan Education.

Ryan, A.  (1997).  Professional Obligement:  A dimension of how teachers evaluate their students.  Journal of Curriculum and Supervision, 12 (2), 118-134.

Saskatchewan Education.  (1989).  Evaluation in Education:  report of the Minister's advisory committee on evaluation and monitoring.  Regina:  Saskatchewan Education.

Saskatchewan Education.  (1991).  Student Evaluation:  A teacher handbook.   Regina:  Saskatchewan Education.

Saskatchewan Education.  (1992).  Science:  A curriculum guide for the secondary level.  Regina, SK:  Author.

Saskatchewan Education.  (1994a).  Business Education:  A curriculum guide for the secondary level.  Regina, SK:  Author.

Saskatchewan Education.  (1994b).  Policy Directions for Secondary Education in Saskatchewan:  Ministers' response to the High School Review Advisory Committee final report.   Regina, SK:  Author.

Saskatchewan Education.  (1996).  Mathematics A30, B30, C30:  A curriculum guide for the secondary level.  Regina, SK:  Author.

Saskatchewan Education.  (1997).  Accreditation (Initial and Renewal):  Policies and procedures.  Regina, SK:  Author.

Saskatchewan Education.  (1998).  English Language Arts:  A curriculum guide for the secondary level.  Regina, SK:  Author.

Saskatchewan Education.  (1999).  Registrar's Handbook for School Administrators 1999 - 2000.  Regina, SK:  Author.

Saskatchewan School Trustees Association.  (1996).  Setting Standards in Education:  Saskatchewan standards symposium.  SSTA Research Centre Report #96-02.  Regina:  Saskatchewan School Trustees Association.

Saskatchewan Teachers' Federation (1996).  Position Paper on Standards in Education.  Saskatoon:  Saskatchewan Teachers' Federation.

Sizer, T. R. (1992).  Horace's School:  Redesigning the American high school.  Boston, MA:  Houghton Mifflin.

Smith, M.  (1991).  Put to the test:  The effects of external testing on teachers.  Educational Researcher, June-July,  8 - 11.

Stiggins, R. J.  (1988).  Revitalizing Classroom Assessment:  The highest instructional priority.  Phi Delta Kappan, 69 (5), 363 - 368.

Stiggins, R. J.  (1991).  Facing the challenges of a new era of educational assessment.  Applied Measurement in Education, 4  (4), 263 - 273.

Stiggins, R. J.  (1997).  Student-centred classroom assessment, 2nd edition.  Toronto:  Prentice-Hall.

Stiggins, R., Frisbie, D. and Griswold, P.  (1989).  Inside high school grading practices:  Building a research agenda.  Educational Measurement:  Issues and Practice, 8 (2),  5 - 14.

Stiggins, R., Griwsold, M. and Wikelund, K.  (1989).  Measuring thinking skills through classroom assessment.  Journal of Educational Measurement, 26 (3), 233 - 245.

Supovitz, J. A., & Brennan, R. T.  (1997).  Mirror, mirror on the wall, which is the fairest test of all?  An examination of the equitability of portfolio assessment relative to standardized tests.  Harvard Educational Review, 67 (3), 472 - 502.

Tom, A.  (1984).  Teaching as a moral craft.  NY:  Longman.

Wideen, M. F., O'Shea, T., Pye, I., & Ivany, G.  (1997).  High-stakes testing and the teaching of science.  Canadian Journal of Education, 22 (4), 428 - 444.

Wiggins, G.  (1993a).  Assessing student performance:  Exploring the purpose and limits of testing.  San Francisco, CA:  Jossey-Bass.

Wiggins, G.  (1993b).  Assessment to improve  performance, not just monitor it: assessment reform in the social sciences.  Social Science Record,  (Fall), 5 - 12.

Wiggins, G.  (1994).  The immorality of test security.  Educational Policy, 8 (2), 157 - 182.

Wiggins, G.  (1996a).  Embracing accountability.  New Schools, New Communities, 12, (2), 4 - 10.

Wiggins, G.  (1996b).  Practicing what we preach in designing authentic assessments.  Educational Leadership,  54, (4),  18 - 25.

Wilson, R.  (1996).  Assessing students in classroom and schools.   Toronto, ON:  Allyn and Bacon Canada.

Wilson, R. (2000).  Toward an integrated model of assessment-in -practice.  Paper presented at the Annual Conference of the Canadian Educational Researchers' Association, Edmonton, May, 2000.

Table of Contents

Back to: Evaluation and Reporting