Grading: Liberating Assessment from Grading: The Practice, Part II

This article is republished with permission from IntrepidEd News

– Jared Colley

So how do we do it? How do we liberate assessment from traditional grading practices in order to develop capable learners who are empowered to pursue whatever career or goal that they desire?

Here are six tips to ponder:

Establish clear criteria for successful academic achievement. Instead of relying on what Guskey and Feldman called normative-based grading, think carefully about how one might establish a criterion-based approach. Thomas Guskey offers three types of criterion-based approaches: Product, Process, or Progress (2015). With a product-focused approach, one is assessing competencies and learning outcomes and therefore focuses on what a student knows and can do. Process focuses less on final results and more on how one got there, which means that behaviors like timeliness, effort, or work habits could impact one’s grade as well as academic achievement. Progress looks at how much a student improved in a given amount of time. When considering Joe Feldman’s four non-negotiables for grading practices that it must be mathematically accurate, bias-resistant, motivational, and coherent, the most bias-resistant approach is to establish product-based criteria to avoid the subjective and oftentimes inaccurate judgments of student behaviors. There’s too much implicit bias at play when we try to judge human behavior and thereby give it a mathematical value.

If you’re not a competency-based school, there is still an opportunity to design clear, explicit outcomes that will be assessed by using a backward design approach like UbD. The important thing is that there is a clear portrait for success and that the assessments we design are shaped by the picture we’ve painted for our students.

Separate academic scoring from the assessment of other behaviors and dispositions. Joe Feldman uses the metaphor of trying to fit “a barrel full of information into a thimble-sized container of a single letter,” which threatens the integrity of two non-negotiables — that grades need to be coherent and bias-resistant (2019). When we mix up our criteria (in this case, Product and behavior), the omnibus alpha-numerical grade lacks coherence. It’s like looking at the dashboard in your car and having one gauge to report on the overall health of the vehicle. Such a dashboard would be confusing and largely meaningless. It also threatens one’s bias-resistant approach. I argue that we should not penalize criterion-based academic scores that are intended to measure learning when students need to improve their work habits and behaviors. There are healthier ways to provide feedback and consequences for such matters.

Reduce the number of grading categories and focus more on narrative and descriptive feedback. Thomas Guskey makes a compelling argument that a 0-100 scale only gives us the illusion of accuracy. After all, what is the difference between a 92.5 and a 94? There’s too much opportunity for error. Joe Feldman adds to this argument pointing out how the 0-100 scale is biased towards failure: Are there really 70 different ways to fail? At my school, we have four categories: Novice, Emerging, Proficient, and Advanced (1-4 numerical scale). Of course, the real value is not the proficiency labels, but the actionable, descriptive feedback that comes with it, especially when we use the proficiency scale to measure specific competencies or skills, which leads to our next suggestion.

Grade the competencies, skills, or capabilities one has targeted, not the assignment itself. Imagine a case where two students score an 85% on the same assignment in history class, but Student A did well on providing evidence to support claims but needs to improve on how he organizes his ideas. Student B, on the other hand, demonstrated proficiency in organization but did not provide enough evidence. Yet, they have the same score. I always tell teachers: “you’re not grading a paper or a test; you’re grading the competencies.” One benefit to this approach is that it guards against bias. On another occasion, I wrote about how a student turned in an assignment for me, and something about it bugged me. My gut told me this was less-than-proficient work; perhaps it had something to do with how it was formatted. However, when I looked at the standard that was targeted for the assignment in question, the student had demonstrated adequate proficiency: her capability was clearly demonstrated. Focusing on the competency prevented certain biases from getting in the way of my assessment practice. Grading the discrete standards and learning outcomes also provides an opportunity to multiply the number of gauges on the student’s metaphorical dashboard, thereby making the feedback loop more coherent and useful for the student’s development. The assessment is individualized not to sort differences but to develop what needs work for each unique student.

Grade less. Assess more. Period. When I ask teachers what’s the difference between formative and summative assessments, the focus is often the type of task, its size, and when it should happen. Formatives are things like homework, quizzes, or smaller performance tasks, and summatives are larger assessments like exams, papers, etc., that typically are administered in batches at the end of a term. This isn’t necessarily the case; instead, the difference is one of function, not form, such that formatives are assessments for learning — giving the teacher feedback of where the student is at, whether the instructional methods are working, and what needs to be adjusted. Summatives, therefore, are assessments of learning — identifying what the student learned and what they can do.

Mark D. Cannon and Amy C. Edmondson claim that organizations who successfully innovate and improve are ones that have “an intelligent process of organizational learning from failure [which] requires proactively identifying and learning from small failures” but one of the social barriers to this is the fear of consequences of failure, even when they’re small. Yet, “small failures are often the ‘early warning signs which, if detected and addressed, may be the key to avoiding catastrophic failure in the future” (“Failing to Learn and Learning to Fail (Intelligently)” Long Range Planning 38, 2005). The same is true for our students, and to reorient their attitudes towards failure, we need to stop grading formative assessments and focus on giving feedback. That way students feel freer to take risks and show what they can do. The pressure is off in terms of copying others’ work or trying to avoid doing the assignment, and students can show what they can do, leading to honest conversations that help us determine appropriate moments for assessing them summatively — those moments when practice is over and the proverbial grading scoreboard is turned on.

Consider giving more weight to the most recent demonstrations of learning. If we’re grading according to a criterion whose focus is on what they’ve learned, why let early demonstrations of learning weigh down a student’s score when they’ve developed higher capabilities later in the term? We use a decaying average at my school where the most recent summative score is worth 70% of the overall grade. Now, students have off-days, for sure, and sometimes they don’t perform at the level we know they’re capable of, meaning it’s equally important to allow for opportunities for reassessment. Someone might object that constraints such as how one’s digital grade book is set up prevent one from approaching grading and assessment in this way. I would just urge you to consider the following: The digital grade book shouldn’t tell you the student’s grade; you should tell the grade book your student’s grade, and this can be accomplished through simple “hacks” like having a single grade that one overrides at every reporting period. There are other (better) ways to make public to students and parents one’s academic development. After all, a string of numbers or letters tells us very little about a student’s story when examining a digital grade book.

Shifting our practices in this way clearly communicates to students we’re not interested in placing you on a specific track in the hierarchy of achievement. Shifting the conditions in this way emphasizes our belief in the innate ability for every learner to develop to their fullest potential. In this sense, our practice more accurately matches our purpose.

One thought I will leave you with is that this can be hard when transforming one’s grading practices on their own. I’ve been there, and it makes it harder to shift mindsets and work habits in a way that’s optimal. With this in mind, administrators and department leaders should consider the following observation: “…[Variances] in grading policies aren’t random; they are the result of each teacher’s unique, thoughtful, and strategic way to overcome the limitations of traditional, omnibus grading. Each teacher builds her own intricate system that flows from her beliefs about learning, [but] when teachers have varying and contradictory approaches to grading, they end up working against each other…” (Feldman, 2019). What Feldman identifies here John Hattie calls “Collective Teacher Efficacy” (a strategy Hattie claims to have the greatest effect size of 1.57 for improving student learning). Teachers need to be on the same page, so to speak, implementing with consistency an ideal, coherent approach to student assessment and development. After all, if we claim to be learner-centered and focused on the maximal development of every student, we must work towards this kind of transformative approach to changing how we grade and do so together.