5 reasons for establishing a national text labelling and annotation platform for the education sector
In this short article I present five reasons for establishing a national text labelling and annotation service. The list is not meant to be definitive and the rationale for each reason requires further work.
Reason 1: Schools, colleges and universities generate and hold a very large volume of textual data that can be pooled and labelled.
Schools, colleges and universities generate and hold a large volume of data in the form of text. This could be online course materials on a learning management system, student work, a teacher's feedback to his or her students, comments made by teachers and campus support teams as they support students as they journey through their studies, the documents on an intranet, complaints made by students, the responses from student surveys which record their opinions and views about the courses that they have completed or about campus services; and so much more. At the present moment in time the vast majority of this textual data is not labelled or annotated and it remains dormant on the platforms that act as their stores. If this large body of text is labelled and annotated, value can be derived through the creation of new AIED services that support students along all points on the student-life-cycle.
Reason 2: Opportunity to establish good ethical practices that can be applied to the collection, annotation and use of training data.
At first sight, the activities that support data labelling and text annotation may appear to be simple but there are many issues that need to be addressed if the work is to be conducted ethically. For example, we need to guard against unethical employment practices regarding the hiring of data labellers to support text annotation projects. Data labellers will be subject specialist teachers or domain experts so the monetary reward for carrying out data labelling needs to be commensurate to the work that they are undertaking. Some data labelling and text annotation projects may involve the handling of sensitive material; such as text about self-harming, mental health or suicide. In these instances, support will need to be provided to individuals who are involved with such projects. Guidance and practice will need to be in place to ensure against bias which may lead to discriminatory practices against individuals or to specific groups of students and teachers. For example, the labelling of postal codes, gender, race, age, learning support needs and more will all need to be addressed. In many cases, red lines will need to be drawn about what is not acceptable practice.
Reason 3: Opportunity to establish good governance arrangements for managing training data across the education sector.
By establishing a national text labelling and annotation service, good governance arrangements could standardise the labelling and annotation processes; especially for small scale AIED projects or for national AIED projects that are designed to support hundreds of thousands of students across the UK. If textual data is supplied by thousands of education institutions across the UK appropriate governance arrangements will need to be put in place to enable the delivery of AIED services that are trusted by everyone who comes to use them; and by the institutions that supply text to the text labelling and annotation service. The lack of a national governance framework for text labelling and annotation services could lead to a fragmented landscape with individual organisations and companies designing and implementing their data labelling projects in isolation from their counterparts and with little or no interoperability between one service and another. A national governance framework will also provide advice and guidance regarding controversial annotation projects that involve facial recognition, the use of sensitive datasets or projects that do not align with the needs of the education sector. Recent reports from UNESCO and Jisc begin to address the ethical concerns of designing, using and managing AIED services but more guidance is still needed regarding data labelling and text annotation.
Reason 4: A large and growing library of annotated text could support the creation of multiple AIED services that address common problems faced by educational institutions.
One of the factors that contributes to complexity is the growing volume of data that permeates through our education institutions. This is especially true if we see every facet of a campus as entities that either produce, manipulate or consume data. The traditional tools that are used to manage data such as management information systems, CRM systems, business intelligence systems, learning management systems and data dashboards all too often fall short. In many cases, they exacerbate the problems that are associated with complexity and urgency. This problem was noted during the 1950s by Norbert Wiener who stated that whilst communication mechanisms do become more efficient, they are subject to increasing levels of entropy. Wiener also stated that external agents could be introduced to control entropy. The agents that support the day-to-day operations of an AIED service are designed to carry out narrow and well defined cognitive tasks, and in doing so, they reduce the level of complexity on our campuses. Mark Weiser and John Seely Brown later coined the term calm technology to describe the nature of these services (Hussain, 2019). One can easily envisage how a large pool of labelled and annotated textual data can be used as a basis for creating new AIED services. A pool of annotated data will also reduce the time and costs that are associated with developing these services.
Reason 5: The establishment of a national text labelling and annotation service supports the creation of new AIED services through a shared, participative and collaborative model.
If cognitive computing is present in every digital service that students or teachers touch; and if we see technology as a human activity, the digital service will itself be shaped and altered by these interactions. This is one of the defining traits of AIED services; and one that sets them apart from legacy EdTech services. The means of production for AIED services are dependent on leveraging the strengths that are inherent in large numbers of people who fulfil tasks that were previously carried out by the few. The participatory model lends itself particularly well to the education sector were the larger group are motivated towards shared goals. If this is the case, AIED services can indeed be developed with a communal spirit. I personally welcome this spirit of cooperation, participation and collaboration (Hussain, 2021).