Factor 1: Individual Learning Differences
Most current assessments are not designed to accommodate individual differences. Generally, educators have interpreted "fairness" to mean that assessments are uniform in format and administered in a standardized fashion—the same test is given in exactly the same way and under the same conditions for each learner. In some situations, and for some purposes, standardized administration is indeed appropriate, particularly if the format and circumstance of the test exactly match the requirements of a future task. For example, if NASA wants to evaluate aspiring astronauts' ability to react in an emergency, each astronaut under consideration should be presented with the same simulated emergency. In this test, those who can react quickly and perform all of the necessary tasks will truly be the most qualified.
As a counter-example, imagine that you are teaching a middle school science class and are about to administer the textbook-based, multiple-choice test provided in the teacher's edition. You're hoping to find out what each of your students has learned about science over the course of the instructional unit—and by extension, how effective your teaching has been. You duplicate 25 copies of the test and pass them out, announcing to students that they will have 15 minutes to complete the test. Now imagine that among the students about to begin the test are Paula, Patrick, Kamla, Charlie, Jamal, Sophia, and Miguel.
Will the timed, multiple-choice, paper-and-pencil test yield the information you seek—a "fair" determination of what each of these students has learned in your science class? For most of the students we have been following, the likely answer is no. The method of assessment confounds science knowledge with facility with various aspects of the test itself, making it impossible to disaggregate the causes of success or failure. Individual differences interact with the test format and administration method in ways that can significantly skew the accuracy of the results. To see how, let's consider the standard, print-format science test within the framework of the three brain networks.
Individual differences in content recognition. Paula decodes words well but has difficulty comprehending connected text. This difficulty limits her ability to respond accurately to the test items, even when she understands the concepts. In fact, Paula's great interest in and facility with decoding words could actually be a disadvantage on this test, distracting her from focusing on the real task of demonstrating what she has learned about science.
Sophia would have difficulty with the science test for entirely different reasons. Her conceptual knowledge of science is likely to be strong, given her high degrees of engagement and motivation and her ability to obtain meaning from listening. But Sophia's visual deficits would impede her fluent recognition of the printed words. Her desktop magnifier could make the words more recognizable, but the time required to magnify the page could turn the 15-minute time limit into an insurmountable barrier.
Framed in this way, giving everyone the same test seems unfair. Think about how Paula and Sophia might perform if they took the test in an alternate format: Paula with help to keep her focused on the questions and the process of answering, and Sophia with a computer text-to-speech translator reading items aloud. As is, the standard science test inadvertently measures not only science knowledge but also recognition-based facility with the print medium. These extraneous factors act like the butcher's thumb on the scale.
Would it be fairer to administer the test orally to the whole class, thereby skirting the difficulty arising from Paula and Sophia's recognition weaknesses? Not really, because other students in the class may have trouble accessing speech. The simple truth is, the natural variety of recognition strengths and weaknesses within a typical classroom prevents any single presentational medium from yielding an unbiased, accurate assessment for the entire class.
Individual differences in strategic expression. Consider how students' variable abilities to plan, execute, and monitor actions and skills might affect the accuracy of this timed textbook test. Jamal, for example, has a physical disability that makes handwriting virtually impossible, and Charlie has trouble at the other end of the strategic spectrum—with planning and self-monitoring.
Jamal would probably fail this test outright, as would any test-taker who could not effectively manipulate a pencil. He would fail regardless of how well he paid attention, how well he studied, how much he really knew, and how well the new instructional approaches worked. Clearly, a physical disability that renders a student incapable of using the required medium of expression can confound assessment accuracy. And although Charlie is physically able to use pencil and paper, his planning and self-monitoring deficits could interfere with his ability to demonstrate his science knowledge on this standardized exam. The test lacks the inherent structure and support Charlie needs to systematically navigate the questions, budget his time, stay on task, and check his work.
The confounding strategic factors Jamal and Charlie present are obvious, and few teachers would seriously conclude that the boys' low scores on this kind of test indicate a lack of science knowledge. But many learners are affected by more subtle issues with modes of knowledge expression. Research is beginning to show how significantly the way students are asked to express what they know affects their performance—and these findings hold true even for students without documented learning difficulties.
Russell and Haney (1997, 2000) investigated the effects of different modes of expression (handwriting versus keyboarding) on standardized test scores of regular education students. They found that scores supposedly based on content alone were strongly influenced by the expressive medium. For example, students with experience using computers got much higher scores if they keyboarded rather than handwrote their responses. This research backs up the common-sense conclusions of our classroom examples. Because individual differences in the skills governed by strategic networks can influence performance in ways that are often unrelated to the skills and knowledge teachers are trying to assess, a single, standard mode of expression definitely is not fair to all students. Rather, it often obscures the true significance of assessment outcomes.
Individual differences in engagement. Students' differing levels of engagement can also influence assessment accuracy. For an assessment to accurately reflect what students know and can do, those students must be giving their best effort. This is partly why educators tend to link assessment to significant extrinsic motivators—rewards and punishments designed to get students to pay attention and work hard. Students often see tests as "high stakes," whether or not that formal designation applies. But making tests all-important is not necessarily the best way to motivate and engage every student. Generally speaking, both very low levels of engagement and very high levels of engagement are counterproductive. We have all felt the disabling effects of anxiety—"choking" on the playing field or during a test. Further, the same amount of external pressure, whether positive or negative, affects learners unequally. We each have our own baselines of anxiety and comfort and find different kinds of tests easy or difficult (Goleman, 1995).
Test formats (e.g., multiple choice, essay, short answer) and administration circumstances (e.g., timed/untimed, individual/group administration, in-class/take home) all impact student performance differently, depending on the individual test-taker's affective makeup. Inevitably, a single test, given in a single way, will affect some students positively and some students negatively.
Our example students present a range of affective issues that could confound results on the standardized science test. Sophia, despite her visual deficits, is supremely self-confident and readily confronts a challenge. Assuming she could use her magnifier to see the test items, her enthusiasm and determination could help her to work quickly and perform reasonably well. In Sophia's case, positive affect could promote higher performance than we might expect, given the kinds of challenges she faces. By contrast, Kamla lacks confidence about academics. Tests make her particularly anxious and increase her fears about being thought a poor student. The timed test is especially likely to intimidate Kamla, and it is easy to predict that anxiety might limit her performance.
When we consider individual differences in recognition, strategic, and affective networks, we realize that a common test format and administration method will always favor some students and hurt others, for a variety of complex reasons. Traditional assessments tend to measure things that teachers aren't trying to measure (visual acuity, decoding ability, typing or writing ability, motivation), thus confounding the results and leading us to make inaccurate inferences about students' learning. As a consequence, we risk making off-base instructional decisions—deciding, for example, to re-teach certain content rather than move on to a new challenge or to change our instructional methods when our test design, not our teaching, is contributing to poor scores.