Integrity systems in online assessments: proctoring metrics and integrity outcomes

OctoProctor team

This is our collective voice speaking on EdTech, integrity, innovation, and all things OctoProctor — from product updates to big questions in digital assessment.

About the author

PUBLISHED

Apr 23, 2026

MIN. READ

TL;DR

Blindly increasing surveillance in online assessment may lead to more false positives, more appeals, and greater student distrust.
Treating your exam design and proctoring configuration with the rigor of quality management can help you co-create integrity with your students.
Using our three-layer approach, you can create your own academic integrity system that prioritizes clarity, decision quality, and incident optimization.

Surveillance metrics ≠ integrity metrics

The traditional anti-cheating playbook is all about surveillance. To identify academic integrity breaches, proctoring platforms typically monitor noise, eye movement, mouse clicks, and more. Even sipping water can raise a flag these days!

Although it’s easy to look to surveillance-related proctoring metrics to solve today’s cheating crisis, it’s not conducive to fostering integrity in online assessments. In fact, it can have the opposite effect. By adding more hoops to jump through, test takers may become even more anxious to perform and seek out ways to game the system.

an infographic on AI usage by UK undergraduates – 92% self-reported using AI in some form, 88 used GenAI for assessment help, and the average annual growth of AI-enabled cheating reached 116.5% as reported by the Guardian.

Building an integrity feedback loop that improves testing security and experience has little to do with increasing surveillance. Instead, it’s based on listening to test takers and front liners, reviewing decisions, and fixing root causes — the way one might run a quality management program.

Stop chasing cheaters and red flags. Start nurturing mutual trust by reassessing your exam integrity process.

Why you shouldn’t be the Big Brother of surveillance KPIs

We get it: creating fair, consistent test-taking environments is hard. Nobody wants to be the naive instructor who leaves the back door open to cheating. But running a Big Brother style exam can actually compromise your underlying objectives.

When you focus on surveillance-first proctoring, it means you have to track more test taker behaviors, leading to even more flags to process. And more flags often leads to:

More false positives, putting an administrative burden on proctors who have to weed through noisy proctoring data (even with automated setups).
Increased number of appeals, as students seek out justification for their flagged or annulled exams.
Student anxiety and distrust, especially if they endure a restrictive testing environment only to have their exam annulled.

Another problem of solely going after surveillance KPIs is that it subjects them to Goodhart’s Law. Once you start using a metric to control students, they will try to manipulate the system to hit that specific target — often at the expense of integrity.

Designing integrity systems: A three-layer feedback loop

“We’re not actually facing a cheating crisis. We’re facing a design crisis.” — Marina Detinko, Board member of OctoProctor

So if adding more surveillance metrics won’t strengthen integrity in online assessment… what will?

Often educators lament the cheating crisis without considering the role of exam and proctoring design. Addressing flaws in your academic integrity framework can be harder to do, but more rewarding in the long run.

First layer: test taker trust and clarity signals

Getting lightweight, humane feedback from your test takers should be the first step in your feedback cycle. It shows that you care and can help you troubleshoot issues with your configuration. And remember: one of the strongest factors in preventing academic misconduct is having a high-quality relationship with the instructor (as cited by 89% of students in a study by University College Dublin).

Some ways to gauge clarity signals from your students include:

A micro post-exam survey: Ask students to rate their satisfaction with the exam, as well as identify any test environment factors that may have impacted their performance.
A learning self-report: Understand students’ learning journeys (e.g. feelings, progress, predictions, etc.) through brief surveys, journal entries, or interviews.
Social listening: Whether in class or on a forum, find out what’s top of mind with your test takers post-exam (unclear questions, technical glitches, concerns about privacy, etc.).

a meme depicting late David Lynch as a student, having a smoke and looking mischievous when being prompted to express himself clearly and consistently by an educator.

Second layer: process integrity and decision quality

In soccer, the best referees aren’t those who issue the most red and yellow cards — quite the opposite! A high number of infractions during a match is typically a sign that the referee isn’t respected and has completely lost control of the match.

Same goes with your testing environment. More flags during an exam may not mean you’re achieving academic integrity. Instead, you may be facing a badly configured system — one that not only leads to false flags, but also fails to inspire “fair play” from your test takers.

To safeguard your exam integrity, you’ll want to carefully design your review processes and decision quality. False flags happen for so many understandable, real-life scenarios. A student may face a technical issue, have an approved accommodation, or experience an unforeseen event.

To get to the bottom of false flags, you’ll want to track these non-surveillance metrics:

Overturn rate: How often are your decisions overturned? A high overturn rate may mean your proctoring configuration isn’t meeting the reality of test taker behaviors.
Inter-rater agreement: Are your decisions consistent? If reviewers evaluate the same cases with different results this may demonstrate that they lack judgment accuracy and/or clear criteria.
Evidence sufficiency: What happens if there’s a lack of clear evidence for cheating? Define what constitutes clear or sufficient evidence to avoid inconsistency.
Decision turnaround time: What’s the timeline to make a decision? Turnaround times should be fair to students who engage in lengthy exam preparation cycles.

Third layer: fixing incidents and optimizing configurations

The final stage of the integrity feedback loop is optimizing your proctoring and exam configuration. This is where you get into the nitty-gritty of what actually improves integrity outcomes.

Better instructions

Clarity can go a long way toward improving integrity in online assessment, as sometimes students break the rules without intending to. Revisit your exam instructions, as well as your policies (such as allowed materials) and pre-exam checklist.

UX fixes

A smooth online exam experience starts with strong design. The platform UX should account for potential user problems and questions. For example:

If a user gets disconnected, the platform shouldn’t require a full identity re-check.
If a user can’t remember which materials are allowed, a built-in lightweight chatbot can give a list when prompted.
For a smoother testing experience, the platform should avoid in-testing alerts (e.g. include a running clock on the side instead of time pop-ups).
If possible, allowing users to enter the platform in “practice mode” a few days beforehand can help reduce testing anxiety.

Accommodation tweaks

Online proctoring systems can unfairly flag neurodivergent and disabled learners. Make sure your configurations have flexible settings to allow diverse accommodation tweaks that aren’t just an afterthought, such as:

Ability to turn off or reduce camera/noise sensitivity
Add pre-approved breaks or extended time
Allow for student access to food, medicine, water, assistive devices, etc.
Set up culturally sensitive ID and camera settings (e.g. that don’t flag head coverings)

a meme depicting how often UI demos do not represent real users. The image shows a senior struggling to pay for gas. He opened his car door and exited the vehicle, yet his upper body is still paradoxically sticking through the window

Exam design strategies

Reduce cheating through assessment design. For example, you might allow for an open-book exam or one A4 note per student. Another tactic is to unblock certain websites, or even opt for continuous assessment throughout the course.

Operational fixes

Handling incidents in a fair, consistent, and timely manner is essential for creating integrity. Check your flags configuration, escalation rules, admin accessibility, and reviewer training to ensure that there’s a well-defined process for the incident and appeal process — especially for high-stakes online exams.

(From here, you can restart the feedback loop for continuous improvement in your education assessment!)

Reality check: how can you track integrity feedback?

Infographic titled “Optimizing online assessment metrics” with two columns. “Metrics to track” lists clarity score, dispute or appeal rate, overturn rate, inter-rater agreement rate, evidence sufficiency rate, decision turnaround time, accommodation-triggered incident rate, and recurring friction themes. “Metrics to contextualize” lists flag or intervention number, flag timing pattern, repeat flag rate, candidate integrity score, time watched for live proctoring, incident minutes recorded, and session pause, termination, and warning rate.

Of course, tracking metrics isn’t necessarily an evil to be avoided. It’s useful to measure non-surveillance KPIs that get to the heart of clarity, decision quality, and incident optimization. (See our recommendations above!)

Additionally, it’s worth contextualizing your surveillance-related proctoring metrics to better understand your proctoring configuration. For example:

Number of flags or interventions: Are there too many? Are they actually useful in determining a negative incident?
Flag timing pattern: Are they happening early (instructions), mid (UX), or end of exam (anxiety)?
Repeat flag rate: Are multiple students showing the same flags? Do surveys show student confusion, need of training, or repeat abuse?
Candidate integrity score: Are scores low class-wide? (This may indicate a larger configuration issue.)
Time watched (for live proctor) or incident minutes recorded (for AI proctoring): Is the burden on proctors unrealistic? Are incident minutes actually catching cheating?
Session pause/termination rate/warning rate: Are rules too strict? Are proctors inconsistent?

Don’t be the Eye of Sauron of online assessment

“If institutions continue to prioritize surveillance alone, they risk undermining the very foundations of openness, flexibility, and autonomy. Conversely, if proctoring is reimagined as part of a broader ecosystem of trust, equity, and ethical assessment design, it can contribute to both academic credibility and meaningful learning.” — Mncedisi Christian Maphalala, Ntombikayise Nkosi

Don’t be the Eye of Sauron that hyper-monitors your students. Instead of pursuing more surveillance, use our three-layer framework as a starting point to design your own integrity system that aligns with your institutional values — and drives more effective, equitable online assessment.

At OctoProctor, we can support you in creating integrity-driven proctoring systems and configurations that go beyond surveillance. Reach out to discuss your online assessment challenges with us. 🐙

Want to start tracking smart?

With some context, OctoProctor can help you point out which tracking metrics should work for your exam and tailor a demo that actually fits your circumstances.

Talk to us!

FAQ

What is integrity in online assessments?

Integrity in online assessments means designing exams, policies, review processes, and proctoring setups so that results reflect a student’s actual performance rather than confusion, loopholes, or avoidable misconduct.

What are integrity systems in education?

Integrity systems are the full set of policies, tools, review practices, and communication methods an institution uses to support fair assessment. In practice, that includes exam design, student guidance and agency, reviewer decision quality, accommodation workflows, and proctoring configuration.

How should institutions measure cheating without over-focusing on surveillance?

Measuring cheating only through surveillance signals can be misleading. A stronger approach is to combine contextualized proctoring metrics with decision-quality indicators like overturn rate, inter-rater agreement, evidence sufficiency, and appeal outcomes.

What are proctoring metrics actually useful for?

Proctoring metrics are most useful when they help institutions spot design or process problems rather than mechanically count suspicious events. On their own, they do not tell you much about integrity outcomes unless they are interpreted alongside review quality, student clarity, and incident context.

How can you reduce cheating through assessment design?

You can reduce cheating through assessment design by writing clearer instructions, using question formats that are harder to outsource, allowing appropriate open-book conditions where relevant, spacing assessment across a course, and aligning rules with what the task is actually meant to measure.

What are some exam design strategies for academic integrity?

Useful exam design strategies for academic integrity include open-book formats, continuous assessment, adaptive testing, clearer task instructions, question variation, customization and localization for your cohort, transparent and regular communication about integrity, and exam environments that reduce confusion without lowering standards.

What are good academic integrity frameworks for online assessment?

Good academic integrity frameworks do not rely solely on surveillance. They balance trust, clarity, review quality, accessibility, and operational consistency to support integrity in online assessments before and after the exam session.

What does continuous improvement in education assessment look like?

Continuous improvement in education assessment means using student feedback, incident reviews, appeal outcomes, and configuration data to keep refining the exam process instead of treating every flagged event as a student-only problem.

Integrity feedback without turning everything into surveillance KPIs