The issues with data
1 2023-01-04T08:02:48+00:00 Jotsna Iyer 4f2bfb514a09301de0e5275ee45bf5db41479839 7 1 plain 2023-01-04T08:02:48+00:00 Jotsna Iyer 4f2bfb514a09301de0e5275ee45bf5db41479839This page is referenced by:
-
1
2023-01-04T08:02:48+00:00
Open or closed?
2
A section for chapter 8
plain
2023-02-09T17:18:47+00:00
Open Educational Resources (OER) and their history
Educational resources refer to any material, nowadays mostly digital, which will play a part in education: textbooks, slides, curricula, exams,... They will be open when they can be freely shared with others (but a more exact definition will be given in a moment).
Even if education has been open in many aspects in several moments of history, the actual terms were better understoodT. he following definitions of OER and Open License were revised in connection with the Recommendation on November 25, 20193:
1. Open educational resources (OER) are learning, teaching, and research materials in any format and medium that reside in the public domain or are under copyright that has been released under an open license and that permit no-cost access, reuse, repurpose, adaptation, and redistribution by others.
2. An open license is a license that respects the intellectual property rights of the copyright owner and provides permissions granting the public the right to access, reuse, repurpose, adapt, and redistribute educational materials.
The terms open content and OER refer to any copyrightable work (traditionally excluding software, which is described by other terms such as open source) that is licensed to grant the following rights (also known as the 5 Rs)1,2:
• to Retain - the right to make, possess, and control copies of the content (e.g., download, reproduce, store, and manage).
• to Reuse - the right to use the content in a variety of ways (e.g., in class, in a study group, on a website, in a video).
• to Revise - the right to adapt, adjust, modify, or alter the content itself (e.g., translate the content into another language).
• to Remix - the right to combine the original or revised content with other material to create something new (e.g., to embed the content).
• to Redistribute - the right to distribute copies of the original content, revisions, or their combination to others.
It should be noted that these rights are non trivial: for example, the third right is essential for teachers: to be allowed to take someone's learning material and adapt to one's own purpose, to the duration and level of one's classroom, perhaps to geographic and cultural specificities.Why AI wants open data
On the other hand, as demonstrated in different parts of this book, and also by the financial investments of the industry, Education can be seen as a market. And as machine learning if the principal force driving Artificial Intelligence, it is fair to deduce that for AI to thrive, AI for Education will need data.The difference between user data and knowledge data
The sort of data AI for education will be needing is of two types.
Data about the users. How do they learn? What triggers good learning? What allows to better learn? As Daphne Koller once put it: ‘Let's make education science into a data science!’
This data can only be produced by the users themselves. It is therefore essential for companies to own platforms with which Users will be asked to interact. This has been the key to success of many AI companies and will be the key for success in education.
The second type of data concerns knowledge. In education, courseware represents a large chunk of this knowledge. This data is or isn't shared: in most cases knowledge creators or collectors may know little about licenses and the material they have produced will be hidden in University repositories, on strange blogs, or shared inside specific groups on social networks. Some of this knowledge is of course behind paywalls and some is on sites whose business model involves offering the knowledge for free, but in a setting in which one has to view adverts and unwanted publicity to get or maintain access.User data has to be protected
In the first case the data -the user data- has to be protected. More so if this data belongs to under age pupils. Which means that the school or the teacher should not share this data with platforms unless they are explicitly allowed to do so. Even when the platform does offer some interesting service. In a similar way, it is never a good idea to register the names and addresses of one's pupils in order to participate in some activity. Please refer to the video about data and reidentification for details.
The European Union has provided a robust framework to protect its citizens, their privacy, their digital rights. This is called the GDPR. The GDPR protects by giving the citizens rights that have to be granted by the platforms, whether they are for education or not.Knowledge data should be shared
On the other hand, knowledge can be shared. And should be shared. Obviously this is only possible when one has the right to do this, which means understanding how licensing works. Creative commons licenses are usually those that work best for OER.
Once OER are shared, artificial intelligence can be used by many things, such as those present in project X5-GON.
------------------------------------------------------------------------------------------------------
1Wiley, D., & Hilton, J. (2018). Defining OER-enabled pedagogy. International Review of Research in Open and Distance Learning, 19(4).
2Wiley, D (2014).The Access Compromise and the 5th R.
3UNESCO. (2019). Recommendation on open educational resources (OER). -
1
2023-01-04T08:02:47+00:00
AI Speak : Data based systems Part 1
1
plain
2023-01-04T08:02:47+00:00
Decisions in the classroom
As a teacher, you have access to many kinds of data. Either tangible data like attendance and performance records or intangible ones like student body language. Consider some of the decisions you take in your professional life: What are the data that help you make these decisions?
There are technological applications that can help you visualize or process data. Artificial intelligence systems use data to personalise learning, make predictions and decisions that might help you teach and manage the classroom : Do you have needs that technology can answer? If yes, what will be the data such a system might require to carry out the task?
Educational systems have always generated data - student personal data, academic records, attendance data and more. With digitalisation and AIED applications, more data is recorded and stored : mouse clicks, opened pages, timestamps and keyboard strokes.1 With data-centric thinking becoming the norm in the society, it is natural to ask how to crunch all this data to do something pertinent : Could we give more personalised feedback for the learner? Could we design better visualisation and notification tools for the teacher?2
Whatever be the technology used, it has to meet a real requirement in the classroom. After the need is identified, we can look at the data available and ask what is relevant to a desired outcome. This involves uncovering factors that let educators make nuanced decisions. Can these factors be captured using available data? Is data and data-based systems the best way of addressing the need? What could be the unintended consequences of using data this way? 3
Machine learning lets us defer many of these questions to the data itself.4 ML applications are trained on data. They work by operating on data. They find patterns and make generalisations and store these as models - data that can be used to answer future questions.4 Their decisions and predictions, and how these affect student learning, are all data too. Thus, knowing how programmers, the machine and the user handle data is an important part of understanding how artificial intelligence works.About Data
Data is generally about a real world entity - a person,an object, or an event. Each entity can be described by a number of attributes (features or variables).5 For example, name, age and class are some attributes of a student. The set of these attributes is the data we have on the student, which, while not in any way close to the real entity, does tell us something about them. Data collected, used and processed in the educational system is called educational data.1
A dataset is the data on a collection of entities arranged in rows and columns . The attendance record of a class is a dataset. Each row is the record of one student. The columns could be their presence or absence during a particular day or session. Thus each column is an attribute.
Data is created by choosing attributes and measuring them : every piece of data is the result of human decisions and choices. Thus, data creation is a subjective, partial and messy process prone to technical difficulties.4,5. Further, what we choose to measure, and what we don't can have a big influence on expected outcomes.
Data traces are records of student activity such as mouse clicks, data on opened pages, the timing of interactions or key presses in a digital system.1 Metadata—that is, data that describe other data.5 Derived data is data calculated or inferred from other data : Individual scores of each student is data. The class average is derived data. Often, derived data is more useful in getting useful insights, finding patterns and making predictions. Machine Learning applications can create derived data and link it with metadata data traces to create detailed learner models, which help in personalising learning.1
For any data based application to be successful, attributes should be carefully chosen and correctly measured. The patterns discovered in them should be checked to see if they make sense in the educational context. When designed and maintained correctly, data driven systems can be very valuable.
This chapter aims to introduce a few basics of data and data based technology but data literacy is a very important skill to possess and merits dedicated training and continuing support and update.1
Legislation you should know about
Because of the drastic drop in costs of data storage, more data and metadata are saved and retained for a longer time.6 This can lead to privacy breaches and rights violations. Laws like the General Data Protection Regulation (GDPR) discourages such practices and gives EU citizens more control over their personal data. They give legally enforceable data protection regulations across all EU member states.
According to GDPR, personal data is any information relating to an identified or identifiable person (data subject). Schools, in addition to engaging with companies that handle their data, store huge amounts of personal information about students, parents, staff, management, and suppliers. As data controllers, they are required to store data which they process confidentially and securely and have procedures in place for the protection and proper use of all personal data.1
Rights established by the GDPR include :- The Right to Access that makes it mandatory for them to know(easily) what data is being collected about them
- The citizen’s Right to Be Informed of the usage made of their data
- The Right to Erasure that allows a citizen whose data has been collected by a platform to ask for that data to be removed from the dataset built by the platform (and which may be sold to others)
- The Right to explanation where explanation should be provided whenever they need clarification on automated decision processes that affect them
Please refer to GDPR for dummies for the analysis done by independent experts from the Civil Liberties Union for Europe (Liberties), which is a watchdog that safeguards the human rights of everyone in the European Union.
------------------------------------------------------------------------------------------------------
1 Ethical guidelines on the use of artificial intelligence and data in teaching and learning for educators, European Commission, October 2022
2 du Boulay, B., Poulovasillis, A., Holmes, W., Mavrikis, M., Artificial Intelligence And Big Data Technologies To Close The Achievement Gap,in Luckin, R., ed. Enhancing Learning and Teaching with Technology, London: UCL Institute of Education Press, pp. 256–285, 2018
3 Hutchinson, B., Smart, A., Hanna, A., Denton, E., Greer, C., Kjartansson, O., Barnes, P., Mitchell, M., Towards Accountability for Machine Learning Datasets: Practices from Software Engineering and Infrastructure, Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, Association for Computing Machinery, New York, 2021
4 Barocas, S., Hardt, M., Narayanan, A., Fairness and machine learning Limitations and Opportunities, yet to be published
5 Kelleher, J.D, Tierney, B, Data Science, London, 2018
6 Schneier, B., Data and Goliath: The Hidden Battles to Capture Your Data and Control Your World, W. W. Norton & Company, 2015
7 Kant, T., Identity, Advertising, and Algorithmic Targeting: Or How (Not) to Target Your “Ideal User.”, MIT Case Studies in Social and Ethical Responsibilities of Computing, 2021