Defining Educational Standards and Determining Their Reasonableness
 
A summary of a master's thesis by Darryl Hunter - University of Regina
 
SSTA Research Centre Report #99-07: 34 pages, $11
Table of Contents

I. WHY ARE STANDARDS IMPORTANT? 

II. HOW ARE STANDARDS CREATED IN OTHER JURISDICTIONS?  
  Standard-setting in Canada  
  Standard-setting in Other Provinces  
  Standard-setting in Saskatchewan  
  Issues in Standard-setting  

III. HOW DO SASKATCHEWAN STAKEHOLDERS CREATE STANDARDS?  
  Who set provincial literacy standards in 1996?  
  How are standards set in Saskatchewan?  
  What is the nature of a Saskatchewan Learning Assessment Standard?  

IV. HOW DO WE DETERMINE THE REASONABLENESS OF A STANDARD AND OF A DECISION? 

V. WHAT ARE THE IMPLICATIONS FOR SCHOOL OFFICIALS?  

SELECT BIBLIOGRAPHY 

Appendicies

Overview

Trustees, school administrators and teachers are given responsibility for making fair and reliable decisions in a wide range of educational endeavour.  Standards are part of the process of making choices that affect the lives of students, teachers, support staff, parents, and community.  Because of their impact, standards and the decisions made with them must be set consistently, prudently and fairly. 
 

The purposes of this study were threefold: to explore issues in setting educational standards; to analyze the process by which stakeholders define provincial standards in literacy; to describe a legal and ethical framework for determining the reasonableness of a decision involving values. 
 

Part I of this report describes the importance of standards.  Part II surveys the ways in which standards are created in other parts of North America.  Part III outlines the method used in Saskatchewan for setting standards.  Part IV explores the notion of reasonableness in administrative decision-making.  This document concludes with advice for educators in setting reasonable standards and making trustworthy evaluative judgments.

I. WHY ARE STANDARDS IMPORTANT?

    Standards do not drop from the heavens in tablet form.  Rather they are made by human beings with their feet on earth.  This fact must be kept in mind as questions about standards or expectations for schools, students, educators and school officials become an issue in Saskatchewan, across Canada, and throughout North America.  The popular rhetoric of educational reform is increasingly coloured with the terminology of standards as public policy views shift from an exclusive focus on inputs into schools-- such as grants allocated, curricula produced, pupil-teacher ratios, and teacher qualifications -- to consideration of outcomes, such as student achievement, graduation rates, and drop-out rates.
    At one time, standard-setting – the process of defining points for educational decision-making – was primarily of interest to measurement specialists and psychometricians.  Now on both the national and provincial levels, educational administrators and policy-makers are engaging both educators and noneducators in formal exercises to define acceptable and/or desirable levels of student performance for the school system.  As such, the process of setting educational standards can be seen from a number of perspectives. Standard-setting can be viewed as a(n):

    Because it both represents and reflects diverse values about what we think important in education, a standard is often subject to controversy (Hambleton & Powell, 1983).  Virtually all scholars concur that standard setting is a judgmental exercise.  A standard therefore can be only as good as the judgments and evaluative processes used in setting it.  Popham (1978a) has argued that serious standard-setting which relies, “on decent collateral data, wide-ranging input from concerned parties, and systematic efforts to make sense out of relevant performance and judgmental data is not capriciously arbitrary.  Rather, it represents the efforts of human beings to bring their best analytic powers to bear on important decisions" (p. 169).
    Critics usually adopt one of two lines of argument when questioning the credibility of standards.  One is based on the belief that testing and standards represent a form of standardization that denies the individuality of people and undermines the unique, transactional nature of teaching.  Critics quite correctly assert that weighing a baby does not cause it to grow.  However, periodic measurement of children’s weight and height, attitudes and skills, aptitudes and knowledge, does provide feedback for professionals and parents to enable them to properly foster growth.  Large-scale assessments provide educators and trustees with beacons and external referents to guide successful professional practice, in the same way radar guides a pilot around the airport and safely onto the runway.  No one can say that radar measurements and flight standards detract from the pilot’s skill and harm the passenger, even though they certainly do shape the flight path in positive ways.
    The second criticism is that standards, because they are human creations, are arbitrary.  It is true that all methods are arbitrary in the sense that there is no scientific procedure which simply involves plugging numbers into a formula.  Different methods will produce different standards.  Yet we must not label standards as unacceptable just because they are arbitrary.  The word "arbitrary" can mean either "that which is determinable by a judge or tribunal" or whimsical, that is "selected at random and without reason".  When criticizing standards as arbitrary, critics are clearly employing the second, negative definition, whereas the first definition more accurately reflects serious standard-setting efforts.  Standard-setting exercises are judgmental, but so are most decision situations involving promotion, ranging from the acquisition of a driver’s license to the decision as to whether a newborn is ready to leave the hospital for home.  Even if standard-setting is subjective, Livingston and Zieky (1982) point out, "once a standard has been set, the decisions based on it can be made objectively.  Instead of a separate set of judgments for each test taker, you will have the same set of judgments applied to all test takers.  Standards cannot be objectively determined, but they can be objectively applied".  In short, a standard enables equality of treatment, as much as it permits judgments of educational quality.
    A more serious problem arises when people confuse standards with educational quality.  Elevating test standards, so the popular argument goes, will improve the quality of instruction and make for better schools.  However, as the Quebec Ministry of Education discovered in 1986-1987 after raising the passing scores on its provincial examinations from 50% to 60%, one of the principal effects was to increase the drop-out rate from its schools (Maheu, 1995,p.60). Raising the threshold for defining competence does not automatically result in better educational outcomes for all.
    Standards in short are crucial because of the way they are used.  Standards can be used for “high-stakes” purposes such as determining whether a student passes an exam, whether a student is awarded a scholarship, whether a student is promoted between grades, whether a student graduates, or whether a student is admitted to a post secondary education institution.  In such instances, the Charter of Rights and Freedoms may come into play for ensuring equitable treatment through the exam.  Because the Charter encompasses the principle of fundamental justice, it implies that public authorities must act according to notions similar to due process requirements in the United States.  If legal challenges are eventually mounted against an educational standard, educational tests and their technical procedures may have to meet legal tests.  The degree to which, and grounds upon which, decisions about standards are reasoned may help determine the level of confidence that students, parents, teachers and courts should have in the standard.
    Standards are also part of “low-stakes” tests such as in the many large-scale assessments now being conducted in schools.  For these assessments, results “do not count” on student report cards.  The confidentiality of participating students, schools and school divisions is preserved.  The standard is used, rather, to determine whether the entire school system is functioning well.  Although the pressure is off students, the standards are important in the realm of public accountability.  With “ low-stakes” assessments, it is not the law courts but the court of public opinion that has an interest in the standard.  Because they serve to identify overall education system strengths and weaknesses, they are central to the general public’s confidence in the school system, and public officials’ concerns with maintaining a high quality education system.
    In short, if we see educational standards as constructed through the balanced judgments, extensive expertise, and reasoned decisions of interested educators and citizens, then we need to be able scrutinize those judgments for their fairness, trustworthiness, and credibility.  The judge in a courtroom will always back her or his ruling with reasons by citing evidence, court precedents, and laws so the public and the plaintiff can subsequently review the decision for its fairness.  Likewise, educators, trustees and the general public should be able to review the procedures and judgments of those who set educational standards to ensure reasonableness in decision-making.

Back to Table of Contents 



II. HOW ARE STANDARDS CREATED IN OTHER JURISDICTIONS?

Standard-setting in the United States

    Canadian interest in the standards issue has been kindled by both the media in this country, and by an often heated public debate south of the border.  Midway through his first term, President George Bush, who had pledged to be the "education president" in response to poor American results from international assessments, summoned the governors of the fifty states to Charlottesville, Virginia with the aim of elevating education to the top of the national agenda.  There, the chief executives pledged themselves to six national goals for education, ranging from a high school graduation rate of 90% to American pre-eminence in the world in mathematics and science achievement.
    Originally dubbed America 2000 and codified into Goals 2000 by the Clinton administration, the goals have been supported on both sides of Congress.  Legislators have reauthorized the Elementary and Secondary Education Act which makes federal education funding to the states contingent on conformity with a national system of standards and assessments.  A National Goals Report was created in 1991, and updated in 1995, to show state and national progress toward the six national educational goals, including states' performance on the National Assessment of Educational Progress.  So, too, has Goals 2000 spawned a series of panels and bodies, ranging from the 1994 National Education Standards and Improvement Council, to a New Standards Project centred at the University of Pittsburgh, to the National Board for Professional Teaching Standards, to the National Council for Accreditation of Teacher Education, to the National Council of Teachers of Mathematics, in defining a variety of educational standards (Rothman, 1995).  President Clinton recently lent new vibrancy to the movement at a National Education Summit, telling American governors and business leaders that, "We can only do better with tougher standards and better assessments, and you should set the standards" (American Educator, 1996,  p. 11).  At present, forty-eight of the fifty states have developed, or are in the process of creating standards (Willis, 1997).
    In the accompanying debate, supporters claim that standards can improve student achievement by clearly defining subject matter content and specifying desired performance (Taylor, 1994).  Explicit standards lend coherence to the educational system and clarify the work of teachers, curriculum writers, educational institutions, software designers, and test experts.  Moreover, proponents argue that standards establish the principle of equality of opportunity and provide "consumer protection" by supplying accurate information to students and parents (Ravitch, 1995).  Detractors argue that standards are exclusionary and detrimental to the multicultural character of North American society (Aronowitz, 1996).  Furthermore, they undermine local control of education (Gittell, 1996) and promote further secularization of schooling (Berube, 1996).
    Three categories of standards have been identified in this debate.  Content standards describe what teachers are supposed to teach and students are expected to learn, and include an emphasis on learning subject matter through critical-thinking and problem-solving skills.  Opportunity-to-learn or delivery standards define the resources, conditions and desirable processes of learning that the education system is to provide to ensure equality of opportunity to learn (Howe, 1994; Porter, 1995).  Performance or outcome standards define degrees of student mastery or attainment considered to be satisfactory.  If content standards relate to the quality of curriculum inputs, and opportunity standards relate to the processes and conditions in school systems, performance standards describe inadequate, acceptable or outstanding accomplishment in student outcomes (Ravitch, 1995; Lewis, 1995).  While most effort has focused on content standards in Saskatchewan and the United States with the development of curricula, experts believe that performance or outcome standards will increase in importance because of the high cost of remediation (Willis, 1997).

Back to Table of Contents

Standard-setting in Canada

    Yet there are significant differences between standard-setting in Canada and the United States, where minimal competency testing became popular as a prerequisite to high school graduation in the 1970s.  In Canada, standards are increasingly required by policy-makers to define public expectations for student performance in programs or institutions.  Rather than making "high-stakes" decisions about individual students and their life chances, such as Grade 12 exit exams, provincial ministries are setting standards for making judgments about systemic rather than individual performance.  The performance standards, along with scoring rubrics, results and exemplars of student performance are subsequently held up for emulation, and not yet generally used in direct application to determine individual student marks or to make program placement decisions.
    In the United States, standards are usually considered narrowly for a specific testing situation, rather than in terms of a larger- or longer-term framework.  Education indicator systems are in vogue in Canada as provincial governments address issues of public accountability.  In Saskatchewan, the Provincial Education Indicators Program annually publishes context, process and outcomes data about the performance of the provincial education system.  In a parallel fashion, on the national level, the Council of Ministers of Education, Canada is developing a Pan-Canadian Education Indicators Program as a comprehensive monitoring system.  It will encompase achievement, student flows, satisfaction measures, citizenship behaviours and a variety of other gauges of school system effectiveness across the country.  Changing standards thus become valuable indicators.  If panelists are drawn from the same constituencies over a series of test cycles, and if one employs the same instrumentation over these cycles, then the standards may be conceived as incarnating or embodying a set of public expectations for student or school performance at given points in time.  In this sense, setting performance standards becomes a sociometric as well as a psychometric technique.
    Likewise, with large-scale assessment programs operating on recurrent cycles, and standards set with each assessment, new conceptions of a performance standard are necessary. When many people hear the word “standard”, they tend to think of something etched in granite.  Rather than remaining a fixed, static and enduring entity, a standard is now an evolving point of comparison that may or should adjust from one test cycle to another, as test circumstances, test populations, and test questions change.  Even the gold standard varies.  Moreover, panelists themselves, even when drawn from similar constituencies, may provide varying judgments as circumstances change.  The role of precedent thus becomes important, not as a way of tempering but of temporally linking judgments to establish continuity between test cycles.
    Although Canadian interest in standards mirrors that south of the border, its origins are home-grown.  In 1987, public officials were alarmed by a Southam News literacy study that pretended to show poor educational outcomes from Canadian schools.  This was a sensitive point because the Canada-US free trade agreement and globalization meant that a national economy depended on a highly skilled labour force rather than on tariff barriers.  This explained the Economic Council of Canada’s swan-song report, A Lot to Learn (1992), which called for a more coherent education system linking employers, schools and governments to boost standards and to produce graduates better equipped for a more competitive work world.
    Such was the climate when the CMEC decided in 1992 to conduct annual pan-Canadian assessments to determine 13-year old and 16-year old student competencies in basic skills, in both official languages.  The first round of the School Achievement Indicators Program was conducted in mathematics in 1993, and in reading and writing in 1994.  In 1996 the program expanded to encompass science, to parallel an interprovincial accord to develop a national science curriculum framework.  Simultaneously, the CMEC with Statistics Canada has begun to develop a Pan Canadian Education Indicators Program to collect a wider array of information about the performance of education systems across the country.  The first biennial Report on Education in Canada was released in November 1995, and a Pan Canadian Education Indicators Report was released in November 1996, as forerunners of a national reporting system.
    Since then, the CMEC has defined criterion-referenced, performance expectations for the 1996 SAIP Science, 1997 Mathematics, and 1998 Reading and Writing assessments.  And it will establish performance expectations for all future assessments (Council of Ministers of Education, Canada, 1997).  As carried out in each of the 1996, 1997 and 1998 assessments, the exercises have involved approximately 85 educators and noneducators who were empanelled in one of four regional sessions across Canada.  Those participating have answered the question: "What percentage of students should achieve each performance level and above" for those test components involved in each assessment.  The expectations of individual judges, who were selected from stakeholder groups in every province, were aggregated and equally weighted to derive a median that has become the first national performance standard in three key subject areas.  The Pan Canadian standard describes expected performance for Canadian 13- and 16-year-old students of science and has been used to clarify the work of Departments of Education across the country.

Back to Table of Contents

Standard-setting in Other Provinces

    Even though provincial standard-setting exercises have not been extensively studied in the scholarly literature, pioneering work in four provinces is described in public documents.  British Columbia's learning assessment program began in 1976, and has consistently employed "interpretation panels" of teachers to judge grade level student performance in various dimensions of mathematical, scientific and communications skill, depending on the assessment.  Although the procedure has varied from one assessment to another, constants have included the exclusive use of professionals as judges, preparatory recording of expectations as estimates for provincial performance on individual test items according to "acceptable" and "desirable" categories, and subsequent formal summary and consensual judgments of performance according to 4- to 6-point scales ranging from unsatisfactory to excellent by the empanelled judges.
    Alberta’s educational standards are, unlike most other provinces, used for “high-stakes” purposes in multiple grade levels.  Our western neighbour has aimed "to widen the process of setting assessment standards as much as possible over previous years and especially to provide for community input and feedback".  To that end, five committees have been struck, as part of the Provincial Achievement Testing Program, to define two standards in relation to the curriculum being tested.  These committees are composed of curriculum and test developers, educational administrators, teachers from across the province, psychometricians and statisticians, as well as representatives from professional, business and community organizations.  Each committee is challenged to determine what score a student must obtain, or how many questions a student must answer correctly, to be judged as having achieved an acceptable and excellent standard.  A summary standard is determined by a Final Standards Review Committee, using provisional standards, review commentary, and representatives from the original five committees.
    While the British Columbia and Alberta ministries define standards in relation to large-scale assessment results, the Toronto Board of Education's Benchmarks Program avoided evaluating its student population against external standards associated with a testing program.  Rather, more than 100 Benchmarks for language arts and mathematics have been developed as model activities for teacher emulation in the classroom setting.  Based on provincial and system objectives, developed and field-tested informally by teacher committees, and emphasizing complex but observable tasks, the Benchmarks set out performance levels and criteria, but not standards (Larter, 1991).
    Benchmarks differ from standards by the amount of authority invested in the latter.  While benchmarks describe representative performance for the general purpose of professional guidance, a performance standard has consequences attached to it as a point of educational or administrative decision making.  The benchmark score on a test may be 65%, but if test results fall below the standard of 50%, then a student fails or is assigned to a different program.  Because a decision, action or consequence flows from the user's application of information in relation to a standard, a greater duty or administrative responsibility attaches to it than to a benchmark, which serves largely as a point of professional reference.

Back to Table of Contents

Standard-setting in Saskatchewan

    The Saskatchewan Department of Education has sponsored several standard-setting sessions since 1993 as part of its large-scale Curriculum Evaluation and Provincial Learning Assessment Programs.  Standards have been set in three curriculum evaluations for the key learnings prescribed in new Core Curricula, by representative panels of teachers following student assessment.  In three-round exercises using modifications of an American method, the teachers have been asked to estimate the percentage of students who would attain each of five levels of performance, considering the number of years the curriculum has been implemented, the difficulty of test questions, and the degree of mastery sought by a curriculum.  By contrast, five Provincial Learning Assessments conducted with reading and writing in 1994 and 1996, mathematics in 1995 and 1997, and listening and speaking in 1998, have empanelled both educators and noneducators in multi-round voting exercises. Panelists have been asked what percentage of students should be expected to attain three or five performance levels.  Whereas the curriculum standards have been set in relation to learning objectives in curriculum guides, the standards associated with learning assessments have been based on broader, foundational objectives for English language arts and numeracy.
    Standard-setting in Saskatchewan stems from the work of a Minister's Advisory Committee that reviewed high school education in the province in 1994.  It identified five types of standards relating to student evaluation, and called for an equitable province-wide assessment process for Grade 12 student outcomes.  Criterion-referenced standards were suggested as the most appropriate type of standard, combined with a benchmarks system that would identify minimal, acceptable performance levels.  Yet significant dissent was expressed within the committee when it made recommendations relating to standards and testing.  A business representative called for universal testing to extend beyond the Grade 12 level, while Aboriginal committee members opposed the use of standardized, paper-and-pencil tests as incongruent with the diverse school situations in the province (High School Review Advisory Committee, 1994).
    A 1996 symposium sponsored by the Saskatchewan School Trustees Association amplified these conflicting perspectives, and showcased the kaleidoscope of opinion in the province about the standards issue (Saskatchewan School Trustees Association, 1996).  An official of the Canadian Federation of Independent Business asserted that "educational standards are important to Saskatchewan business to ensure that minimum competencies, understandings and skills are consistently assured by graduates of our school system as part of a quality labour force" (p. 5).  He was admittedly blunt in reporting that "business people do not want to do some of the 'most basic product recall work' on behalf of our educational factories" (p. 5).  Likewise, a trustee speaking on behalf of the Saskatchewan School Trustees Association advised those in attendance that "we must agree on accountability measures that will tell us how well students are meeting objectives.  If we don't develop such measures, outside pressures will force them upon us [...] Standards will help us answer the question, ‘How do we know we are doing a good job?’ "(p. 65).  Similarly, a Saskatchewan Department of Education official asserted that "it is virtually impossible to argue we shouldn't have standards" (p. 42), but emphasized that opportunity to learn and content standards are perhaps more important than focusing on outcomes.
    However, many doubts were expressed in the January forum.  Speaking on behalf of the League of Educational Administrators, Directors and Superintendents, one administrator cautioned that the province must remain "loyal to standards development that stresses the processes of learning" as opposed to only product skills (p. 63).  For a Saskatchewan Teachers' Federation representative at the forum, the call for standards was misguided and contrary to provincial approaches to education.  Likening teacher-student bonds to a farmer's attachment to the land, she asserted that, "The relationship between teachers and students, at its best, is a marvellous, even sacred thing.  It cannot be captured in lists of outcomes, in scope-and-sequence charts, in taxonomies of standards, in rubrics" (p. 60).

Back to Table of Contents

Issues in Standard-Setting

    In general, both administrators and evaluators must answer five key questions when designing standard-setting exercises.  These issues may be summarized by the journalistic device of asking who, what, how, where and when?
    The most important question is who should set the standard?  Is an educational standard a professional responsibility, a bureaucratic creation, or a social construction?  Many scholars (Shepard, 1980, Hambleton & Powell, 1983; Jaeger, 1978) suggest that standard-setters be drawn from different constituencies, so that the standard-setting process can systematically represent different value positions and areas of interest.  Yet few specific guidelines have been formulated either for selecting these panelists, or for meaningfully incorporating educational stakeholder groups into a standard-setting process.  The underlying issue revolves around the degree to which should judges have expertise in the subject matter being tested, experience in the curriculum design and instructional policies that prevail in schools, knowledge of the attributes of the population being tested, or an understanding of the maturational possibilities of youth?  Likewise, we do not know whether a panel of classroom teachers will produce more appropriate standards than a mixed panel of educators and non-educators.  Some suggest a standard produced by a blue-ribbon panel may be more credible than that produced by an anonymous jury. Others argue that the panel should consist of those who have a stake in the decisions that result from the standard that is defined, and not only those who understand student competence or potential.
    Second, what is the nature of the standard?  Should it represent a short-term target, an ideal, or a realistic estimate in terms of the current range of student skill or ability?  The nature of the standard is determined by the wording of the question that standard-setters answer.  In American judgmental processes for establishing minimal competencies,  panelists are asked to estimate the percentages of students who "would" answer a test question correctly, as in Angoff's method, or the percentages of students who "should" answer a question correctly, as in Jaeger's method.  A "would" question produces a realistic standard that defines anticipated student achievement in light of the evidence which has been assembled.  A "should" question, on the other hand, asks for a formulation of student potential in optimal circumstances, and thus produces an idealistic standard.  Rather than anticipating performance, a "should" question may ask panelists to provide aspirations rather than estimates.  Originally, the term "desire" meant "to expect from the stars."  As a target to aim for, the "should" standard may become unattainable.  Groucho Marx stated this problem succinctly when he quipped, “I have my standards, and some day I hope to live up to them”.  Of course, the standard should reflect test purposes: a "would" question may be more appropriate for public accountability purposes, whereas a "should" question may be suitable for the purposes of program improvement.  A "would" question yields a descriptive threshold of acceptability, whereas the "should" question produces a prescriptive statement to suggest needed improvements.
    A third issue revolves around the question of how we ensure that standard-setters reflect society’s and educators’ expectations?  This question of generalizability revolves around the size of the panel.  Theoretically, the audience for a public accountability report in Saskatchewan would include almost the entire adult population of over seven hundred thousand people. A statistically generalizable sample of panelists, with acceptable rates of error, would number approximately one thousand in Saskatchewan.  Yet practical considerations of cost and coordination necessitate smaller panels.  In fact, standard-setters in Canada may better be described as “jurors” rather than “judges”, a term which is used in American psychometric literature.  The label of “judge” suggests specialized expertise and advanced professional preparation in an academic discipline, whereas the public administrative standard-setting in Canada draws on the more general, lay qualities of common sense, ability to approach evidence in an unbiased manner, and good judgment sought in a typical court room juror.  A better comparison is with the jury of twelve people drawn from a variety of walks of life, and without legal training, used in the legal system.  If impartial nonexperts are deemed acceptable for making “high-stakes” rulings in criminal and civil actions, then a panel of nonexperts should analogically be sufficient to represent the informed, “low-stakes” judgments of citizens as part of a program evaluation.
    The fourth and related issue relates to the wherewithal for bringing diverse viewpoints together to yield a trustworthy standard.  A number of procedures have been developed in the United States, all of which involve groups of experts making judgments about test items individually or as groups, or about the competencies of examination candidates, to define a passing score. All procedures aim to foster deliberative reflection among panelists over several rounds of voting, and to eventually produce agreement among them.  Yet consistency is not the same thing as consensus. There may be degrees of engagement within a consensual decision, ranging from apathy to acquiescence to consent to consensus to commitment.  We have all sat on committees where peoples’ enthusiasm for a decision varies dramatically.  In other words, there is an affective element that may mean that there is a meeting of minds about a decision, but not a wedding of wills.  As such, consensus means not only dissolving contradictory views on acceptable student performance, but also extinguishing individual positions and fostering group resolve.  Extensive and careful preparatory training of judges, provision of extensive evidence about the test and the typical performances of students, statistical averaging or calculating medians and ranges of ratings, and even exclusion of the erratic panelist, have been recommended by many scholars as ways of ensuring uniform, reliable, informed judgments.
    A fifth issue is, when should the standard be set?  On first impulse, many would respond that the standard should be defined before the test.  Surely, they would say, it is a principle of fair evaluation that those being evaluated should know ahead of time what the standard is.  Similarly, people often say that panelists’ judgments should not be influenced by knowledge of test results, out of concern that their expectations will be lowered or elevated because they will know how students actually performed.
    Yet virtually all scholars recommend setting a standard after the test has been administered and the scores obtained, for three reasons.  First, we should not confuse the standard with the scale used for marking student work.  Fair evaluation means that students should know ahead of time the criteria and rules for making judgments, but decisions about the values assigned to information should wait until after all data is collected using the scale.  Second, panelists must have the full range of information about test circumstances available to make fair and fully informed decisions.  A standard-setter in Manitoba or Quebec needs to know if province-wide flooding or an ice storm may have affected the learning or test performance of students in schools.  And third, information about how students actually performed must be considered by panelists to make a fair judgment.  Figure-skating judges do not rate skaters before they’ve stepped on the ice, nor do courtroom jurors render a decision before the plaintiff has become entangled in a dispute.  Thus, totally unrealistic judgments are avoided because panelists have all the information before them.  In some instances, statisticians have had to adjust the standards afterward as a “compensatory technique” because the judgments seemed unreasonable both to educators and to those public officials who must assume responsibility for the standard.
 
Back to Table of Contents 



III. HOW DO SASKATCHEWAN STAKEHOLDERS CREATE STANDARDS?

    The research project, a quantitative and descriptive case study, addressed the problem: how reasonable are panelists' decisions when setting criterion-referenced performance standards?  The study analyzed the evidence or reasons standard-setters offered when making judgments about the quality of student outcomes.  It explored collaborative decision-making and educational standard-setting for reading and writing outcomes in Saskatchewan. Trustees’, teachers’, business people’s, curriculum writers’ and administrators’ views on the determinants of educational quality were investigated as part of the project.
    Standard setting is the second last phase, before report-writing, in the Provincial Learning Assessment Program.  The low-stakes, random sample testing program is designed to provide reliable information to the Department of Education, and to the general public about Grades 5, 8, and 11 student skills in reading and writing.  The Learning Assessment Program's purposes are: to address issues of public accountability; to provide data for program improvement; to enhance the skills of educators in student evaluation; and to determine student achievement at two year intervals so that a time series of student proficiency in "basic" and "higher order" skills can be assembled.

Back to Table of Contents

Who set provincial literacy standards in 1996?

    In fall 1996, twenty-five panelists were nominated by the SSTA, the STF, the Chamber of Commerce, LEADS and the Department of Education’s Curriculum and Instruction Branch to set standards for the 1996 Provincial Language Arts Learning Assessment.  The 13 STF representatives included 7 classroom teachers,  two principals,  two vice-principals,  and two central office program consultants.  One of the three trustees had experience as a classroom teacher.  The three Saskatchewan Education positions were filled by two language arts curriculum writers under secondment from classroom teaching duties, with one curriculum writer doing double duty on the Grade 5 and 8 panels.  LEADS appointed an assistant director, a director and a superintendent of instruction.  The Chamber of Commerce delegates were a personnel officer from a crown corporation, a retired manager from a government department who was currently managing a Chamber office, and a former engineer who was currently operating a consulting firm.  Of the 25 panelists in total, 16 were female and 9 were male.  In terms of parental status, 12 had school-age children and 11 did not; two did not indicate whether they had children or not.

Back to Table of Contents

How are standards set in Saskatchewan?

    Actual performance standards were developed in a three-stage multi-round voting process, repeated consecutively for each skill domain of writing and reading under review.  In the first stage, the Department facilitator reviewed the scoring criteria used for student performance, described actual scoring procedures, and provided examples of student work which illustrated each scale point used for categorizing student achievement.  Judges were asked both to describe in their own words the student skill under review, and to rewrite the performance descriptions found in the 5-point scoring rubric for the audiences of the Saskatchewan Education Indicators Program – the general public and public officials – in two to three sentences each.  This activity served simultaneously as a means for having judges learn about the five point scale used for scoring student work, as a way for stimulating and addressing questions about the scoring of student work, and as a source of useful terminology for describing student performance in subsequent report-writing.  The rewritten performance descriptions were discussed collectively.  Panelists were explicitly advised that student performance falling at the first and second levels was deemed "unacceptable".  Thus, level 3 performance was pre-defined as minimally acceptable performance.
    Panelists were then asked, "In this skill area, what percentage of the regular stream school population should attain each performance level?"  Without consulting others, each panel member was invited to privately write down on the ballot form his or her preliminary estimates of proportions of students who should attain each of the five levels.  Ballot forms were collected, and a mean distribution was calculated by a psychometrician using a laptop computer.  The provisional distribution was visually displayed for all panelists on a liquid crystal monitor in the form of a vertical bar graph, along with the upper and lower estimates that had been offered by panelists.  The psychometrician verbally described the panel's provisional standards, and focused the group's attention on those estimates that were most divergent for each level of performance.
    In the second stage, panelists were invited to individually and orally provide comments on the preliminary mean distribution, and to reveal their individual estimates if they wished.  Standard-setters were asked to focus on the nature or complexity of the task or test questions, the criteria used for scoring student work, the examples of student work presented, and attributes of the school population.  Standard-setters were also invited to comment on other factors which they deemed as important considerations when appraising provincial student performance.  Once every panelist had spoken, a short group discussion was conducted to allow additional viewpoints to be expressed.  Members were then given the opportunity to privately revise their preliminary estimates in light of the insights and comments generated by the panel.  The revised estimates were written down on a ballot form, collected, and averaged to produce a revised mean distribution.
    In contrast to the first two "blind" rounds, the third was an informed review: actual student results were provided.  This was accomplished by graphically displaying to panelists the revised provisional distribution of estimates, and the upper and lower estimates for each of the five performance levels, alongside actual provincial results in parallel vertical bar graphs.  The psychometrician provided a verbal description of the provisional mean standards and actual student achievement, focusing group attention on the upper and lower range estimates.  Then the Department facilitator again invited committee members in turn to comment on the panel's revised mean distribution, and to participate in a short group discussion.  Having heard everyone speak, panelists were allowed another opportunity to privately revise their estimates in light of the comments made and the actual results presented.  Ballots were collected, and a mean distribution of the panel's expectations was calculated to produce a provincial performance standard for the reading or writing skill domain under consideration.

Back to Table of Contents

What is the Nature of a Saskatchewan Learning Assessment Standard?

    Provincial literacy standards were set for four domains of writing skills and four types of reading skills for each of the three grades involved in the literacy assessment, using a variation of the Angoff (1971) method.  Three modifications were made to Angoff’s method.  First, the question was modified from  “What percentage of students would answer…” to “What percentage of students should attain…” so as to yield desired rather than simply anticipated performance.  Second, panelists were asked to identify the percentage of students who should attain each of 5 performance levels, rather than define a single level of competence or incompetence.  Developing a range of expectations provides more sophisticated information for educators than setting a minimal competence standard.  Third, panelists provided global estimates for each of the literacy domains adjudicated, rather than test item-by-item ratings to be aggrgated.  The focus was on the five performance levels of criteria, rather than on individual test questions.
    To investigate the reasoning patterns within the Standards Committee, panelists were asked to identify the types of evidence that influenced their thinking in reaching decisions.  A 22-item, Likert-type scaled questionnaire was used.  Evidence was categorized as direct, contextual or preconceptual, categories that conform roughly to the legal concepts of direct, circumstantial and hearsay evidence. Ratings were collected, tabulated and analyzed using multidimensional scaling techniques to ascertain the panel’s perception of evidential relevance, and to determine whether there were specific gender, parental, occupational or stakeholder patterns in reasoning.  The positions of panelists were mapped in two-dimensional space.  At the same time, actual voting patterns across the exercise were analyzed to ascertain whether the Standards Committee adopted a consensual and collaborative approach in its choices.
    The study found that structured practice enabled panelists to anchor their decisions more squarely in direct assessment evidence.  Equal emphasis was placed on preconceptual sources of information, but decreasing emphasis was placed on contextual evidence. Preparatory training, however, did not enable panelists to adopt a shared perspective on the pertinence of the body of evidence available for their decision-making.
    Panelists based their decisions in evidence that related to their personal and professional knowledge and experience with youth, and with literacy processes.  Decisions were also substantively grounded in actual achievement results and most types of direct assessment information.  Panelists did not base their decisions in information derived from the broad social context. Standards Committee members perceived the evidence as directly related to the large-scale assessment, as related to personal and professional experience and knowledge, and as contextual or societal in its origins (See Appendix A).  Panelists’ occupation and stakeholder affiliation were not related to evidential preferences; nor was parental status or gender (See Appendix B).
    Stakeholder judgments converged when defining standards for overall writing ability, but diverged when adjudicating reading comprehension.  However, stakeholders adopted a shared outlook on that evidence which was pertinent in decision making about provincial literacy standards.  After reviewing the voting patterns of panelists, the types of evidence considered as relevant during the exercise, and the positions of panelists when setting standards, the study concluded that the educational standards set for the 1996 Learning Assessment appeared reasonable.

Back to Table of Contents 



IV. HOW DO WE DETERMINE THE REASONABLENESS OF A STANDARD AND OF A DECISION?

    The table below illustrates Grade 8 results for the 1996 Reading and Writing Assessment. Horizontal bars show test outcomes for the various dimensions of student performance assessed, in terms of the percentage of the provincial student population who reached acceptable levels of achievement. The margin of statistical error, stemming from possible measurement and sampling error, shows the interval in which we can have confidence in the results.  The triangles illustrate the Standards Committee’s work, which defined desirable provincial student performance.
 

    The central question, then, is whether the triangles in the graph are fairly and appropriately placed to signify provincial literacy standards for student performance?  Of course, readers will have different answers to this question of reasonableness in expected performance – depending on their values, their experiences, and their views of the education system’s quality.  Because of likely divergence in views, the credibility of a standard must rest on something other than the perspectives of the bystander.  Appeal courts in the Canadian justice system do not consider the opinions of ill informed observers, but rather examine the ways which lower courts have reasoned with evidence to make their decisions.  In other words, the types of information considered by panelists, and their reasoning with that information, are key to understanding whether a standard is sound.  A reasonable standard is one that that has been carefully grounded in relevant evidence.
    Before answering the question of reasonableness in decision-making, it is useful to consider alternate notions of reasonableness as discussed in ethics – the philosophical study of concepts of the desirable.  The most formal definition is grounded in logic.  The presumption is that one can deductively draw conclusions from evidence which will be compelling or true across all circumstances.  Reasonableness is tied to the rules of logic and strict rules of evidence.  In this definition, a reasonable decision about educational standards would focus on the inferential link between direct test evidence and the articulated performance standard.  It would consider neither the contradictory value positions of panelists nor the social interactions involved in creating the standard.
    More inclusive and flexible notions of reasonableness respect diversity of opinion (Burbules, 1995).  Reasonableness is not found in the quality of the logic, but rather in such human virtues as willingness to compromise, consideration of the context, and processes of deliberation, reflection, discussion and change.  It is in how and when persons change their minds that their reasonableness manifests itself.  Reasonableness is a socially constructed notion which includes approaching problems with an open mind and sensitivity in a pluralistic society, and a willingness and a capacity to adapt to alternative positions.  In these lights, the extent to which standard-setters consider varying cultural perspectives, attend to and accommodate co-panelists’ views, and make prudent and moderate adjustments in light of a variety of discordant evidence, would be criteria for determining the reasonableness of an educational standard.  Procedural fairness ranks high in this notion of reasonableness.
    A third view suggests that reasonableness does not stem from a given social context, but rather in our ideals or beliefs about the purposes of education.  For Seigel (1988), a reasonable judgment must appeal to something other than the process through which the judgment was reached.  In his optic, a reasonable standard would be substantively grounded in the decision-maker’s vision for education, in curriculum objectives, in knowledge of literacy processes, in test purposes and consequences, and only secondarily in actual test material.  It is not the decision-maker’s disposition to adapt, nor her or his repositioning in light of changing evidence, but the participant’s predispositions which are central.
    Reviewers may also find legal notions of reasonableness useful when considering the decisions of a committee or board.  Certainly, the “low-stakes” purposes of the Provincial Learning Assessment Program and the Standards Committee’s legal status as an advisory body, not an administrative tribunal or quasi-judicial authority, make it unlikely that its decisions would be reviewed by a superior court.  The Minister can choose to endorse or disregard the advice tendered.  Yet judicial reviews of decision making by tribunals and other administrative agencies have defined a largely negative criteria of reasonableness.  The idea of a “patently unreasonable interpretation” has taken shape in a series of Supreme Court of Canada decisions, originating with Canadian Union of Public Employees Local 963 v. New Brunswick Liquor Corporation (1979).  Courts will not intervene with the decisions of administrative bodies unless they are “patently unreasonable.”  Justice Dickson illustrated what he meant by this with examples from Service Employees’ International Union, Local No.333 v. Nipawin District Staff Nurses Association, (1975):

 A pragmatic and functional analysis of the concept of “patently unreasonable” was introduced by the Supreme Court in U.E.S., Local 298 v. Bibeault (1988). Courts will recognize a range or zone of reasonableness in the interpretative decisions of administrative tribunals, rather than simple errors in, or correctness of decisions.  Irrational decisions are ascertained by looking at: the wording of the enactment which confers jurisdiction to the administrative body; the purpose of the enactment creating the tribunal; the reason for the administrative body’s existence; the area of expertise of its members; and the nature of the problem before the tribunal.  Hence, a court review of panelists’ decisions when setting standards would begin not with the question of their reasonableness, but rather with the question of whether there was evidence of patently unreasonable interpretations.  The legal and regulatory framework for the Standards Committee, the purposes of the evaluation program, the uses for the standard, the degree of committee expertise, the evidence which judges considered as relevant in their choices, and the question to which they responded in the exercise would all be considered to determine whether there was “patent unreasonableness”.
 
Back to Table of Contents 

V. WHAT ARE THE IMPLICATIONS FOR SCHOOL OFFICIALS?

    The ultimate audiences for provincially-defined performance standards from the Provincial Learning Assessment Program are principals and teachers.  Section 175 (k) of the Education Act invests the principal with the responsibility for determining school level standards; her or his duties include establishing in consultation with the teaching staff, the procedures and standards to be applied in evaluation of the progress of pupils." Thus, it is clearly the responsibility of the school administrator, in concert with her or his staff, to determine educational standards for determining whether individual students pass or fail in “high stakes” situations. Provincial standards are for educators’ professional consideration and guidance in their work.
    Nevertheless, central office administrators and school boards are given responsibility for making decisions in a wide range of educational endeavour.  Those decisions affect the lives of students, teachers, support staff, parents, and community members.  Legal and ethical notions of reasonableness are important because educational and administrative decisions involve various forms of standards.  Standards are, in essence, value-laden choices about what we deem to be desirable or undesirable.  We cannot avoid those choices, but we can ensure that they are prudently made.  Thus, educators and trustees might consider the following seven points when questions about standards and decisions arise:

  1. Standards involve evaluative judgments. Because evaluation involves the application of diverse values about what we deem important in education, it is part of a larger educational process by which we clarify expectations about schooling, and make improvements in education.
  2. Evaluative judgments, and the processes for making them, should be transparent. The Charter of Rights and Freedoms, administrative law and ethical codes all imply that decisions and evaluative procedures must be open and subject to review for fairness.
  3. Standards and the decisions which form them are often subject to public debate. Standards that are collaboratively created with the range of parties who have an interest in them, and with consensus-building, will better ensure that they are acceptable and fair in their application.
  4. Standards are important for ensuring equitable treatment and due process.  They are the decision-making points for making choices about students’ promotion and graduation, about program effectiveness, and about demonstrating public accountability.
  5. Decisions involving standards should be well reasoned, based on multiple sources of evidence, collaborative and consensual in their making, and consistent with both precedent and the legislative authority granted.
  6. Grounding decisions in contextual and preconceptual types of evidence may be reasoned, from the perspective of the decision-maker. It does not follow, however, that the decisions will appear reasonable from the perspective of a court or superior agency, which will be mindful of extraneous matters and irrelevant information.  A decision should receive substantial grounding in evidence that directly relates to the question at hand.
  7. A full and carefully documented explanation of the reasoning behind an administrative or educational decision, with corroborating evidence, may encourage adequate deliberation beforehand by the decision-maker, and inspire confidence in those affected afterward that it has been made fairly.
Back to Table of Contents 
SELECT BIBLIOGRAPHY

American Educator (1996).  Presidential address to the national education summit, 20 (1), 8-12.

Angoff, W.H. (1971). Scales, norms and equivalent scores. In R.L. Thorndike (Ed.), Educational measurement (2nd ed., pp 508-600). Washington, DC: American Council on Education.

Aronowitz, S. (1996).  National standards would not change our cultural capital.  The Clearinghouse, 69 (3), 144-147.

Berk, R.A. (1986). A consumer's guide to setting performance standards on criterion-referenced tests.  Review of Educational Research, 56 (1), 137-172.

Berube, M.( 1996).  The politics of national standards.  The Clearinghouse, 69 (3), 151-153.

Bourque, M.L. & Hambleton, R.K. (1993).  Setting performance standards on the national assessment of educational progress.  Measurement and Evaluation in Counselling and Development, 4 (26), 41-48.

Burbules, N. (1993). Rethinking rationality: On learning to be reasonable. Proceedings of the forty-ninth annual meeting of the Philosophy of Education Society. New Orleans, LA.

Burbules, N. (1995).  Reasonable doubt: Toward a postmodern defense of reason as an educational aim. In Wendy Kohli (Ed.), Critical conversations in philosophy of education (pp.82-102). New York: Routledge.

Canadian Union of Public Employees Local 963 v. New Brunswick Liquor Corp., (1979) 2 S.C.R. 227.

Cizek, G.J. (1996). Standard-setting guidelines.  Educational Measurement: Issues and Practice, 15(1), 13-21.

Council of Ministers of Education, Canada. (1997).  School achievement indicators program: 1996 Report on science assessment. Toronto: Author.

Council of Ministers of Education, Canada. (1997).  1996 SAIP science assessment: Pan-Canadian expectations-setting sessions. Toronto: Author.

Deutsch, M. (1975). Equity, equality and need: What determines which value will be used as the basis of distributive justice?  Journal of Social Issues, 31 (3), 137- 149.

Economic Council of Canada. (1992). A lot to learn: Education and training in Canada. Ottawa, ON: Supply and Services Canada.

Eisner, E. N. (1995).  Standards for American schools: Help or hindrance.  Phi Delta Kappan, 76 (10), 758-764.

Gittell, M. (1996).  National standards threaten local vitality.  The Clearinghouse, 69 (3), 148-150.

Glass, G.V. (1978). Standards and criteria.  Journal of Educational Measurement, 15, 237-261.

Gunn, L.D. (1982).  Debra P. v. Turlington: Due process enters the classroom, but how far?  Journal of Law and Education, 11 (4), 573-585.

Hambleton, R.K. & Powell, S. (1983).  A framework for viewing the process of standard-setting.  Evaluation & the Health Professions, 6 (1), 3-24.

Hambleton, R.K. & Eignor, D. (1978a).  A practioner's guide to criterion-referenced test development, validation, and test score usage.  Laboratory of Psychometric and Evaluative Research Report No, 70. Amherst, MA: University of Massachusetts.

High School Review Advisory Committee. (1994).  Final report. Regina, SK: Saskatchewan Education, Training and Employment.

Howe, K. (1994).  Standards, assessment, and equality of educational opportunity. Educational Researcher, 23 (18), 27-33.

Hunter, D. & Gambell, T. (1996).  Setting standards for a provincial literacy assessment: Premises and procedures.  McGill Journal of Education, 31(2), 195-214.

Jaeger, R.M. (1989).  Certification of student competence. In R.L. Linn (ed.) Educational Measurement (pp. 485-514). London: Collier-Macmillan.
 
Jaeger, R.M. (1991).  Selection of judges for standard-setting.  Educational Measurement: Issues and Practice, 10 (2), 3-10.

Jaeger, R.M. (1995).  Setting standards for complex performances: an iterative, judgmental policy-capturing strategy.  Educational Measurement: Issues and Practice, 14 (4), 16-20.

Jones, R. & Hunter, D. (1996).  Setting achievement standards/expectations for large-scale student assessments.  The Canadian Journal of Program Evaluation, 11(1), 35-61.

Kane, M. (1994).  Validating the performance standards associated with passing scores.  Review of Educational Research, 64 (3), 425-461.

Larter, S. (1991).  Benchmarks: The development of a new approach to student evaluation. Toronto: Toronto Board of Education.

Lewis, A.C. (1995).  Overview of the standards movement.  Phi Delta Kappan, 76 (10), 744-750.

Livingston S.A. & Zieky, M.J. (1982).  Passing scores: A manual for setting standards of performance on educational and occupational tests. Princeton, NJ: Educational Testing Service.

Logar, A. (1984).  Minimum competency testing in schools: Legislative action and judicial review.  Journal of Law and Education, 13 (1), 35-49.

Messick, S. (1989). Validity.  In R.L. Linn (Ed.), Educational measurement. (3rd ed., pp .13-103). Washington, DC: The American Council on Education and the National Council on Measurement in Education.

Messick, S. (1994).  The interplay of evidence and consequences in the validation of performance assessments.  Educational Researcher, 23 (2), 13-23.

Messick, S. (1995).  Standards of validity and the validity of standards in performance assessment.  Educational Measurement: Issues and Practice, 14 (4), 5-8.

Norcini, J.J., Shea, J. & Kanya, D.T. (1988). The effect of various factors on standard-setting. Journal of Educational Measurement, 25 (1), 57-65.

Popham, W.J. (1978a). Criterion-referenced measurement. Englewood Cliffs, NJ: Prentice-Hall.

Popham, W.J. (1978b). Setting performance standards. Los Angeles, CA: Instructional Objectives Exchange.

Porter, A. (1995). Uses and misuses of opportunity-to-learn standards. Educational Researcher, 24 (1), 21-27.
 
Principles for fair student assessment practices for education in Canada. (1993) Edmonton, Alberta: Joint Advisory Committee.

Ravitch, D. (1995). National standards in American education. Washington, DC: Brookings Institution.

Rothman, R. (1995). Measuring up: standards, assessment and school reform. San Francisco: Jossey Bass.

Saskatchewan School Trustees Association. (1996). Setting standards in education: Saskatchewan Standards Symposium. SSTA Research Centre Report # 96-02.

Service Employees International Union, Local No. 333 v. Nipawin District Staff Nurses Association, (1975) 1 S.C.R. 382.

Shepard, L. (1980). Standard-setting issues and methods. Applied Psychological Measurement, 4 (3), 447-467.

Siegel, H. ( 1988). Educating reason: Rationality, critical thinking, and education. New York: Routledge.

Siegel, H. (1992). Two perspectives on reason as an educational aim: The rationality of reasonableness. Proceedings of the forty-seventh annual meeting of the Philosophy of Education Society. Normal, IL.

U.E.S., Local 298 v. Bibeault, (1988) 2 S.C.R. 18609.

Willis, S. (1997) . National standards: Where do they stand.  Education Update: Association for Supervision and Curriculum Development, 39(2), 1-8.

Back to Table of Contents 



Appendix A:  Ratings of Evidential Relevance by Provincial Stakeholder

Panel:  Saskatchewan Reading and Writing Learning Assessment
Evidential Type
Category
PRACTICE
SESSION
Writing
Content
DOMAINS
Holistic
Writing
Reading
Comprehension
Reading
Interpretation
Personal Experience
P
4.00
4.00
3.89
3.59
3.76
Professional Experience
P
4.00
3.91
3.95
3.59
3.67
Vision for Education
P
3.67
3.45
3.74
3.86
3.57
Test Questions/Tasks
D
2.57
3.18
2.37
3.82
3.38
Assessment Procedures
D
2.57
2.73
2.37
3.18
2.81
Scoring Procedures
D
3.24
3.14
2.89
3.32
3.38
Co-Panelists' Views
C
3.52
3.77
3.37
3.77
3.76
Item Difficulty Statistics
D
1.10
0.86
0.84
2.41
2.76
Examples of Student Work
D
3.29
3.64
3.53
3.45
3.52
Test Results
D
2.95
3.77
3.47
4.23
4.10
Organizational Standards
C
3.10
2.82
2.63
2.68
2.67
Precedent
P
2.81
2.45
1.74
1.82
1.71
General Reports on System
C
2.81
2.55
1.68
2.23
1.84
Friends and Colleagues
C
2.71
2.27
2.26
2.09
1.95
Media Reports
C
1.62
1.41
1.37
1.36
1.33
Personal Sense of Performance
P
4.05
3.91
3.68
3.68
3.81
Curriculum Objectives
P
3.43
2.59
3.42
3.23
3.24
Curriculum Implementation
P
2.95
2.86
3.00
3.00
2.95
Student Population Descriptions
D
3.48
2.82
2.42
2.55
2.43
Knowledge of Literacy Processes
P
3.52
3.86
3.74
3.68
3.57
Varying Cultural Perspectives
P
3.52
3.45
3.37
2.91
2.95
Test Purposes and Consequences
D
2.71
3.14
2.84
3.36
3.43
N
 
22
21
19
22
21
Note:  Preferences were reported on a 5 point scale (1= not influential; 5=very influential)
Evidential categories are coded:  C = contextual/circumstantial; D = direct; P = preconceptual.

Back to Table of Contents



Appendix B:  Stakeholder Evidential Positions:  Overall Writing and Reading Comprehension
Back to Table of Contents