A national text annotation and data labelling platform for the education sector


Schools, colleges and universities generate, process and store large repositories of text. This brief article explores the opportunities that arise when this large trove of text is organised, labelled and then made available to the wider sector. I will describe three use cases that emerge from a text annotation or labelling service; such as Bolton College's FirstPass platform and the affordances that these may bring to the wider education sector. And lastly, I will present a case for establishing a national data labelling and text annotation service for the education sector.

What is Data Labelling?
With regard to the FirstPass platform, data labelling is the process of annotating text about a subject topic with one or more meaningful and informative labels or tags to provide context to that text. These labels or tags enable a computer to organise or classify text according to these attributes; which is especially useful when the computer comes across previously unseen text which is not labelled. The following video provides further information about the data labelling process and how it helps the computer to assign the correct attributes to text that it has not encountered before.

Why is data labelling important?
Once text is annotated or labelled it can be used to ameliorate various problems that are encountered in our schools, colleges and universities. Let's review how labelled data can be used to address these problems through three use cases. Firstly, with regard to formative assessment, teachers have a high volume of work that is associated with giving effective feedback; especially when their students respond to open-ended questions using free-form text. This invariably means that students have to wait for a few days or more before they receive feedback from their teachers. Bolton College's FirstPass platform has demonstrated that a text annotation platform can aid and support the formative assessment process; specifically with regards to providing real-time feedback to students as they compose free-form text responses to open-ended questions that have been posed by their teachers. One of the key properties of the FirstPass platform is that its subject topic classifiers become more reliable and accurate as they are exposed to additional annotated text that is sourced from teachers and students as they interact with the platform.

The second use case explores how text annotation can be used to garner feedback from students about their courses or campus services. Educational institutions value the voice of their students and they regularly seek opinion from them so as to inform the development of courses and campus services. The current state of digital data collection tools allows for the rapid and large scale collection of responses to closed questions. However, the processing of responses to open-ended questions remains slow and protracted. The FirstPass platform is well placed to support this use case by taking advantage of text classification and sentiment analysis. Colleagues and I at Bolton College envisage that text classifiers can be set up and trained and then be used to support the delivery of open-ended survey questions. The administrators of the survey would then be able to access a textual and graphical summary of each question which updates itself as more students respond to each open-ended question.

And the third use case explores how text annotation can be used to support students and teachers with end-point assessment preparations. When end-point exams or tests are approaching teachers face a high volume of work as they mark mock exam papers; especially if they include the use of open-ended questions. And with regards to students, they are unable to garner real-time feedback when they respond to open-ended questions as they prepare for these tests and exams. The advent of FirstPass will enable teachers and awarding bodies to set up a large bank of subject topic classifiers which can then be used to support the delivery of mock exam questions to students. Students will also have the opportunity to login to FirstPass and access a bank of open-ended questions for specific subjects and levels. The platform's ability to deliver real-time feedback to each student will mean that students and teachers can garner valuable insights about knowledge gaps. Once again, the subject topic classifiers within the FirstPass platform begin to perform with increasing reliability as more and more students engage with each open-ended mock exam question; especially if students from across the UK and further afield engage with FirstPass.

The case for establishing a data labelling and text annotation service for the education sector
New and emerging AIED services such as FirstPass are dependent on leveraging the strengths that are inherent in a large participatory user base who are motivated by shared goals. The recognition of these strengths will dampen the need for individual education institutions to develop their own ad-hoc data labelling services that are isolated and divorced from the wider AIED eco-system. Here are some questions that need to be addressed when seeking to propose a national data labelling and text annotation service. The first question centres on whether schools, colleges, universities and other organisations within the education sector will consent to supporting a participatory model were they offer anonymised text so that it can be labelled and annotated on a common and open platform. If the leaders in these institutions are provided with a clear rationale for labelling and annotating text, and if the products and services that will emerge from this annotated data are clearly articulated then a colleagiate environment could be fostered to support the endeavour. This can be reinforced further if the products and services that emerge from this labelled and annotated dataset are used to support students and teachers across the UK.

The second question that needs to be addressed is which trusted organisation will host and manage the data labelling and text annotation platform for participating schools, colleges and universities. The choice of institution is an important one because it will be the guardian of a very large and valuable store of annotated text. At the present moment in time, Jisc is well placed to take on this role. It already acts as a national adovcacy body for education technology for the further and higher education sectors; and it has a rich history of providing services to colleges, universities and other organisations across the UK. One of the roles that this institution will be required to undertake is to foster an environment were new and emergment AIED services can take shape; and more importantly, to nature an environment that enables these services to come to fruition so that students, teachers and others can make use of them to support their studies, teaching and broader work.

The third question will need to seek out the business model that will underpin an open education resource such as this so that it is sustainable and affordable to operate. A case could be made that states that part of the revenue that is derived from the sale of products and services that emerge from the text annotation platform is used to support and develop its ongoing use. The fourth question focuses its attention on what types of services can be developed to support the education sector if there is a large and growing library of labelled and annotated text. The services that I have listed thus far represent a small fraction of what could be developed fo the education sector.