Part 2: Creating Trustworthy Assessments

Assessment design is one of the most important responsibilities instructors take on. A well-crafted assessment gives students the chance to demonstrate genuine learning. A poorly designed one risks confusing, discouraging, or even misrepresenting what students know.

So how do we know whether an assessment is good? Three qualities stand out: validity, reliability, and alignment

Validity: Measuring What You Intend to Measure

Validity asks the fundamental question: Does this assessment measure what it claims to measure?

If your course outcome is to develop critical thinking, a multiple-choice quiz on terminology will not be valid.
If your goal is to assess collaboration, grading only individual exams undermines the purpose.

Validity forces us to match the tool to the task. An invalid assessment may produce data, but it’s data about the wrong thing.

Practical tip: Review each assessment in your course and write down the specific learning outcome it is supposed to measure. If the match feels weak, it’s time to revise.

Reliability: Consistency and Fairness

While validity ensures we are measuring the right thing, reliability ensures we are measuring it fairly.

Reliability is about consistency: two graders using the same rubric should assign similar scores; a student retaking the assessment under the same conditions should produce similar results.

Reliability matters because without it, grades lose their meaning. Students need to trust that their work is evaluated consistently.

Practical tip: Develop rubrics with clear criteria and performance levels. Pilot test your assessment if possible, or compare scores across graders to see where discrepancies arise.

Why Validity and Reliability Must Work Together

An assessment can be valid but unreliable (measuring the right skill inconsistently) or reliable but invalid (consistently measuring the wrong skill). Effective assessments require both.

For example:

A vague essay prompt might tap into higher-order thinking (valid) but yield wildly different grades depending on who evaluates it (unreliable).
A standardized multiple-choice test may produce consistent results (reliable) but fail to capture deeper analysis or creativity (invalid).

Alignment: The Glue That Holds it All Together

Alignment connects learning goals, outcomes, objectives, and assessments into a coherent whole.

Goals are broad and aspirational: “Provide students with a foundation in educational psychology.”
Outcomes translate those goals into measurable targets: “By the end of this course, students will be able to analyze psychological theories and apply them to instruction.”
Objectives break outcomes into concrete steps: “Compare and contrast behaviorist and constructivist theories.”

Assessments that align with these objectives ensure students are evaluated on what the course actually intends to teach.

Practical tip: Try the alignment test. For each learning objective, ask: What assessment task would directly demonstrate this? If none exist, you may need to add or revise assessments.

Backward Design: A Practical Framework

One of the most effective methods for ensuring validity, reliability, and alignment is backward design (Wiggins & McTighe, 2005). Instead of starting with content or activities, backward design flips the process:

Identify Desired Results. What should students know, do, or value by the end? (These become outcomes and objectives.)
Determine Acceptable Evidence. What evidence would demonstrate those results? (These become assessments.)
Plan Instruction. What learning experiences, supports, and resources will prepare students for those assessments?

Backward design prevents the all-too-common mistake of teaching first and scrambling later to “fit” assessments into the course.

Putting It Into Practice

To start applying these principles:

Audit your course assessments for validity: Are they truly measuring the stated outcomes?
Review for reliability: Do you have rubrics or guidelines that ensure consistent scoring?
Check alignment: Can you draw a clear line from each outcome to at least one assessment?
Consider backward design: Are you starting with the end in mind, or are assessments an afterthought?

Looking Ahead

Valid, reliable, and aligned assessments are the foundation of fairness and effectiveness. But they’re not the whole story. We’ll explore how to choose the right strategies for your teaching context, because what works in a large STEM lecture might fail in a small humanities seminar and vice versa.