Breaking judges fairly – the case for using judge tests as a metric

doorVictor Domen

Breaking judges fairly – the case for using judge tests as a metric

By Victor Domen

Tabbing is easier than ever. Tabbycat has all sorts of build-in features that allow for efficient and fair judge allocations and breaks based on the imported data. Nevertheless, I am of the opinion we currently do not use Tabbycat to its full potential. In this short article I will make the case we should use standardised tests to assess judges and create a more fair and equal judge break.

The Problem

The decision of which teams will break is pretty objective. They get scored based on their team results and only in case of a draw does speaker score come into play. That in turn is based on a standardised scale. There are still humans involved and making decisions, but a lot of subjectivity has been removed. Nobody really complains about this. We roughly know what a 75 is, what a 70 is and what an 80 is.

The odd thing is, that we do not use similar standardised scales when it comes to judging. Judge feedback forms are increasingly used in the Netherlands, but do not always use standardised scaling to assess the qualit

y of judging. Moreover, not all CA-teams use judge feedback consistently in determining the judge break. Or, at least, that process is currently not standardised and far from transparent. Therefore, it is pretty hard to call a judge break in any way objective and therefore subject to all sorts of biases. The implication is obvious, the best judges do not always break, whereas traditionally good, or liked, judges tend to do so more easily.

Considering that breaks should be based on merit, we should find a way to make judge breaks more objective. Just like with team breaks, judge breaks should be based on numbers.

The Solution

In a nutshell: standardised testing combined with standardised feedback sc

ores. Before the tournament judges must make test. This test gives their initial score/rank in the tab. A standardised scale, much like with speake

r scores, will be the basis of this test. Judges with higher scores have greater priority to break and chair. If a judges has scored below a certain threshold do not have the ability to vote on a call. This is a process currently done ‘randomly’ be the CA team. That is to say they will give judges scores based on their previous experiences with these judges. Similar problems to those already outlined exist.

During the tournament, these initial scores can of course be altered. This happens with the standardised feedback forms, released by the Debatbond at the beginning of this academic year. Simply put, judges are ranked on a scale from 1-10 and thus receive an average judge score. At the end of the tournament the judges with the highest score break. Tabbycat has built in systems for this exact purpose. It can keep track of submitted and unsubmitted feedback and change a judge’s score based o

n the feedback received.

For this to work as much feedback needs to be submitted as possible. It will be up to the tournament staff to determine how this is encouraged/enforced. One possibility is to deny breaks to those who do not submit feedback. Another is to wait with proceeding to the next round until all feedback is submitted. Each has their own positives and negatives.

This system makes it clear which judges should chair and are more capable and takes performance during the tournament into account. An experienced judge who does well during the tournament will start as a chair and will remain as chair. Novice judges whose skills grow throughout the tournament can also be noticed and rewarded. This system quantifies judging skill in a similar vein as is done to debating skill making the entire process more objective. Of course this does not eliminate subjectivity, but does minimize it’s influence to a greater degree than the way one decides judge breaks.

It will take some time to develop the standardised scale that lies at the heart of this issue. I’ve heard that Maastricht Novice used a judge test. In my opinion the test used there should be the basis of the standardised scale, but your opinion may vary.

Conclusion

Tabbycat should be used in every tournament, because it speeds up the process, negates human error and is easy to use. With some more effort it can eliminate one of the biggest problems plaguing debating tournaments. More objectivity is always good and thus a standardised judge test should be developed and more importance should be given to judge scores when determining the break.

Victor Domen
TDV Cicero | + berichten

Victor is een debater van de Tilburgse Debatvereniging Cicero. Hij was de secretaris van de vereniging in 2017-2018.

Over de auteur

Victor Domen administrator

Victor is een debater van de Tilburgse Debatvereniging Cicero. Hij was de secretaris van de vereniging in 2017-2018.