L'intelligenza artificiale per gli insegnanti: Un libro aperto

All that data : Personal identity, Bias and Fairness

This page is still being processed. Please come back later!

 

Personal identity

Learning to use technology to better certain outcomes in education also means compensating for the aspects of education and well being that technology could potentially undermine. The problem with data centric systems is that numbers and performance metrics can supersede other considerations. We could get so carried away by the benefits of personalisation that the process of learning itself takes a back seat.

Ethical guidelines for educators reminds us to consider the people, their identity, integrity, and dignity : “approach people with respect of their intrinsic value and not as a data object or a means-to-an-end”. The human-centric approach to AI is to keep in mind that that people are not just the data; that the label that a software might give students to personalise learning pathways or to split them into groups is not their real identity.

https://mit-serc.pubpub.org/pub/identity-advertising-and-algorithmic-targeting#n0mak79jtsj https://mit-serc.pubpub.org/pub/identity-advertising-and-algorithmic-targeting#n0mak79jtsj https://mit-serc.pubpub.org/pub/identity-advertising-and-algorithmic-targeting#n0mak79jts
ACTIVITY

Adapted from Identity, Advertising, and Algorithmic Targeting: Or How (Not) to Target Your “Ideal User.”,  licensed under CC BY NC 4.0.(Kant, T., Identity, Advertising, and Algorithmic Targeting: Or How (Not) to Target Your “Ideal User. MIT Case Studies in Social and Ethical Responsibilities of Computing, 2021)
 

Feminist, Nerd, Alcoholic? Meeting Your “Algorithmic Self”

Google and Facebookprovide “Ad Preferences” pages that allow users to see their explicitly and implicitly inferred demographics and interests. These are glimpses of what Kyle Jarrett and John Cheney-Lippold call our “algorithmic identities.”
Figure 1. A selection of the author’s “Google Ad Preferences.”

Access your “Ad Settings” Profile on Google, Facebook or Instagram. Or if you regularly use another platform, try finding out if they have ad settings and if you can access them.
Questions for discussion:

Frequently, the categories created by machine learning models might not even correspond to recognisable attributes or social categories that humans would understand, let alone identify with. For example, the category “likes math” might be recognisable to the user. The category “Clicked a resource shown in the screen” would be less significant for a human but important for a adaptive learning system  when making inferences about student preferences. (Milano, S., Taddeo, M., Floridi, L. Recommender systems and their ethical challenges, AI & Soc 35, 957–967, 2020).

While using a data based system could be beneficial for a number of things, educators will need to check and reorient ourselves  and the students regularly towards the human values that sustain society, ensure wellbeing and foster true learning.
 

Bias and Fairness


MIT Datascience : "Bias in Data Science
The goal of ML is to create models that encode appropriate
generalizations from data sets. Two major factors contribute
to the generalization (or model) that an ML algorithm
will generate from a data set. The first is the data set the
algorithm is run on. If the data set is not representative of
the population, then the model the algorithm generates
won’t be accurate. For example, earlier we developed a regression
model that predicted the likelihood that an individual
will develop Type 2 diabetes based on his BMI. This
model was generated from a data set of American white
males. As a consequence, this model is unlikely to be accurate
if used to predict the likelihood of diabetes for females
or for males of different race or ethnic backgrounds. The
term sample bias describes how the process used to select
a data set can introduce biases into later analysis, be it a
statistical analysis or the generation of predictive models
using ML.
144 Chapter 4
The second factor that affects the model generated
from a data set is the choice of ML algorithm. There are
many different ML algorithms, and each one encodes a
different way to generalize from a data set. The type of
generalization an algorithm encodes is known as the learning
bias (or sometimes the modeling or selection bias) of the
algorithm. For example, a linear-regression algorithm encodes
a linear generalization from the data and as a result
ignores nonlinear relationships that may fit the data more
closely. Bias is normally understood as a bad thing. For
example, the sampling bias is a bias that a data scientist
will try to avoid. However, without a learning bias there
can be no learning, and the algorithm will only be able to
memorize the data.
However, because ML algorithms are biased to look
for different types of patterns, and because there is no one
learning bias across all situations, there is no one best ML
algorithm. In fact, a theorem known as the “no free lunch
theorem” (Wolpert and Macready 1997) states that there
is no one best ML algorithm that on average outperforms
all other algorithms across all possible data sets. So the
modeling phase of the CRISP-DM process normally involves
building multiple models using different algorithms
and comparing the models to identify which algorithm
generates the best model. In effect, these experiments are
testing which learning bias on average produces the best
models for the given data set and task."


"Following Yao and Huang (2017), Farnadi
et al. (2018) also identify the two primary sources of
bias in recommender systems with two problematic patterns
of data collection, namely observation bias, which results
from feedback loops generated by the system’s recommendations
to specific groups of users, and population imbalance,
where the data available to the system reflect existing social
patterns expressing bias towards some groups. They propose
a probabilistic programming approach to mitigate the system’s
bias against protected social groups." Milano, S., Taddeo, M., Floridi, L. Recommender systems and their ethical challenges, AI & Soc 35, 957–967, 2020

"Our historical examples of the relevant outcomes will almost
always reflect historical prejudices against certain social groups, prevailing cultural
stereotypes, and existing demographic inequalities. And finding patterns in these
data will often mean replicating these very same dynamics."FML

"There’s a more traditional use of the term bias in statistics and machine learning.
Suppose that Amazon’s estimates of delivery dates/times were consistently too
early by a few hours. This would be a case of statistical bias. A statistical estimator
is said to be biased if its expected or average value differs from the true value that
it aims to estimate. Statistical bias is a fundamental concept in statistics, and there
is a rich set of established techniques for analyzing and avoiding it.......Is our goal to faithfully reflect the data? Or do we have
an obligation to question the data, and to design our systems to conform to some
notion of equitable behavior, regardless of whether or not that’s supported by the
data currently available to us? These perspectives are often in tension, and the
difference between them will become clearer when we delve into stages of machine
learning" FML

"One area where machine learning practitioners often have to define new categories is in defining the target variable.8 This is the outcome that we’re trying to
predict – will the defendant recidivate if released on bail? Will the candidate be a
good employee if hired? And so on.
Biases in the training set’s target variable are especially critical, because they
are guaranteed to bias the predictions (not necessarily so with other attributes).
But the target variable is arguably the hardest from a measurement standpoint,
because it is often a construct that is made up for the purposes of the problem
at hand rather than one that is widely understood and measured. For example,
“creditworthiness” is a construct that was created in the context of the problem of
how to successfully extend credit to consumers;8
it is not an intrinsic property that
people either possess or lack.If our target variable is the idea of a “good employee”, we might use performance review scores to quantify it. This means that our data inherits any biases
present in managers’ evaluations of their reports. Another example: the use of
computer vision to automatically rank people’s physical attractiveness.9, 10 The
training data consists of human evaluation of attractiveness, and, unsurprisingly,
all these classifiers showed a preference for lighter skin." FML

"When we go from individual images to datasets of images, we introduce another
layer of potential biases. Consider the image datasets that are used to train today’s
computer vision systems for tasks such as object recognition. If these datasets
were representative samples of an underlying visual world, we might expect that
a computer vision system trained on one such dataset would do well on another
dataset. But in reality, we observe a big drop in accuracy when we train and test
on different datasets.15 This shows that these datasets are biased relative to each
other in a statistical sense, and is a good starting point for investigating whether
these biases include cultural stereotypes." FML

"Predictive models trained with supervised learning methods are often good
at calibration: ensuring that the model’s prediction subsumes all features in the
data for the purpose of predicting the outcome. By contrast, human intuition is
notoriously poor at accounting for priors, and this is a major reason that statistical
predictions perform better in a wide variety of settings. But calibration also means
that by default, we should expect our models to faithfully reflect disparities found
in the input data.
Here’s another way to think about it. Some patterns in the training data (smoking is associated with cancer) represent knowledge that we wish to mine using
machine learning, while other patterns (girls like pink and boys like blue) represent
stereotypes that we might wish to avoid learning. But learning algorithms have
no general way to distinguish between these two types of patterns, because they
are the result of social norms and moral judgments. Absent specific intervention,
machine learning will extract stereotypes, including incorrect and harmful ones, in
the same way that it extracts knowledge."

"Finally, it’s also possible for the learning step to introduce demographic disparities that aren’t in the training data. The most common reason for this is the
sample size disparity. If we construct our training set by sampling uniformly from
the training data, then by definition we’ll have fewer data points about minorities.
Of course, machine learning works better when there’s more data, so it will work
less well for members of minority groups, assuming that members of the majority
and minority groups are systematically different in terms of the prediction task.23
Worse, in many settings minority groups are underrepresented relative to
population statistics. For example, minority groups are underrepresented in the
tech industry. Different groups might also adopt technology at different rates,
which might skew datasets assembled form social media. If training sets are drawn
from these unrepresentative contexts, there will be even fewer training points from
minority individuals. For example, many products that incorporate face-detection
technology have been reported to have trouble with non-Caucasian faces, and it’s
easy to guess why.23
When we develop machine-learning models, we typically only test their overall
accuracy; so a “5% error” statistic might hide the fact that a model performs
terribly for a minority group. Reporting accuracy rates by group will help alert us
to problems like the above example. In the next chapter, we’ll look at metrics that
quantify the error-rate disparity between groups." FML

"There’s one application of machine learning where we find especially high
error rates for minority groups: anomaly detection. This is the idea of detecting
behavior that deviates from the norm as evidence of abuse against a system. A
good example is the Nymwars controversy, where Google, Facebook, and other tech
companies aimed to block users who used uncommon (hence, presumably fake)
names.
Further, suppose that in some cultures, most people receive names from a small
set of names, whereas in other cultures, names might be more diverse, and it might
be common for names to be unique. For users in the latter culture, a popular name
12
would be more likely to be fake. In other words, the same feature that constitutes
evidence towards a prediction in one group might constitute evidence against the
prediction for another group.23
If we’re not careful, learning algorithms will generalize based on the majority
culture, leading to a high error rate for minority groups. This is because of the
desire to avoid overfitting, that is, picking up patterns that arise due to random
noise rather than true differences. One way to avoid this is to explicitly model
the differences between groups, although there are both technical and ethical
challenges associated with this, as we’ll show in later chapters"

"Most attempts to “debias” machine learning in the current research
literature assume simplistic mathematical systems, often ignoring the effect of
algorithmic interventions on individuals and on the long-term state of society"FML

"Despite these important limitations, there are reasons to be cautiously optimistic
about fairness and machine learning. First, data-driven decision-making has
the potential to be more transparent compared to human decision-making. It
forces us to articulate our decision-making objectives and enables us to clearly
understand the tradeoffs between desiderata. However, there are challenges to
overcome to achieve this potential for transparency. One challenge is improving the
interpretability and explainability of modern machine learning methods, which is
a topic of vigorous ongoing research. Another challenge is the proprietary nature
of datasets and systems that are crucial to an informed public debate on this topic.
Many commentators have called for a change in the status quo.42
Second, effective interventions do exist in many machine learning applications,
especially in natural-language processing and computer vision. Tasks in these
domains (say, transcribing speech) are subject to less inherent uncertainty than
traditional decision-making (say, predicting if a loan applicant will repay), removing
some of the statistical constraints that we’ll study in Chapter 3.
Our final and most important reason for optimism is that the turn to automated
decision-making and machine learning offers an opportunity to reconnect with the
moral foundations of fairness. Algorithms force us to be explicit about what we
want to achieve with decision-making. And it’s far more difficult to paper over our
poorly specified or true intentions when we have to state these objectives formally.
In this way, machine learning has the potential to help us debate the fairness of
different policies and decision-making procedures more effectively."

Legitimacy

legitimacy — whether it
is fair to deploy such a system at all. whether it is morally justifiable to use machine learning or automated methods at all . FML

"Machine learning is not a replacement for _human_ decision making : While it’s true that machine learning models might be difficult for people
to understand, humans are black boxes, too. And while there can be systematic
bias in machine learning models, they are often demonstrably less biased than
humans.
We reject this analogy of machine learning to human decision making. By
understanding why it fails and which analogies are more appropriate, we’ll develop
a better appreciation for what makes machine learning uniquely dangerous as a
way of making high-stakes decisions.....As Katherine Strandburg has argued, “[r]eason giving is a core
requirement in conventional decision systems precisely because human decision
makers are inscrutable and prone to bias and error, not because of any expectation
that they will, or even can, provide accurate and detailed descriptions of their
thought processes”.2
In analogizing machine learning to bureaucratic — rather than individual —
decision making, we can better appreciate the source of some of the concerns about
machine learning. When it is used in high-stakes domains, it undermines the kinds
of protections that we often put in place to ensure that bureaucracies are engaged
in well-executed and well-justified decision making" FML

"This principle is premised on the belief that people are entitled to similar
decisions unless there are reasons to treat them differently (we’ll soon address
what determines if these are good reasons). Consistency is especially important for
consequential decisions. Inconsistency is also problematic when it prevents people
from developing effective life plans based on expectations about the decisionmaking systems they must navigate in order to obtain desirable resources and
opportunities.57 Thus, inconsistent decision making is unfair both because it might
result in unjustified differential treatment of similar individuals and also because
it is a threat to individual autonomy by preventing people from making effective
decisions about how best to pursue their life goals.
The second view of arbitrariness is getting at a deeper concern: are there good
reasons — or any reasons — why the decision-making scheme looks the way that
it does? For example, if a coach picks a track team based on the color of runners’
sneakers, but does so consistently, it is still arbitrary because the criterion lacks a
valid basis. It does not help advance the decision maker’s goals (e.g., assembling a
team of runners that will win the upcoming meet).
Arbitrariness, from this perspective, is problematic because it undermines a
bedrock justification for the chosen decision-making scheme: that it actually helps
to advance the goals of the decision maker. If the decision-making scheme does
nothing to serve these goals, then there is no justified reason to have settled on
that decision-making scheme — and to treat people accordingly. When desirable
resources and opportunities are allocated arbitrarily, it needlessly subjects individuals to different decisions, despite the fact that all individuals may have equal
interest in these resources and opportunities.....Arbitrary decision making fails to respect the gravity of these decisions and
shows a lack of respect for the people subject to them. Even if we accept that we
cannot dictate the goals of institutions, we still object to giving them complete
freedom to treat people however they like. When the stakes are sufficiently high,
decision makers bear some burden for justifying their decision-making schemes
out of respect for the interests of people affected by these decisions. The fact that
people might try their luck with other decision makers in the same domain (e.g.,
another employer, lender, or admission officer) may do little to modulate these
expectations."FML

Teacher in loop
"Automation requires that an institution determine in advance all of
the criteria that a decision-making scheme will take into account; there is no room
to consider the relevance of additional details that might not have been considered
or anticipated at the time that the software was developed.
Automated decision-making is thus likely to be much more brittle than decisionmaking that involves manual review because it limits the opportunity for decision
subjects to introduce information into the decision-making process. People are
confined to providing evidence that corresponds to a pre-established field in the
software. Such constraints can result in absurd situations in which the strict
application of decision-making rules leads to outcomes that are directly counter to
the goals behind these rules. New evidence that would immediately reverse the
assessment of a human decision maker may have no place in automated decision
making. Discretion is valuable in these cases because humans are often able to
reflect on the relevance of additional information to the decision at hand and the
underlying goal that such decisions are meant to serve. In effect, human review
leaves room to expand the criteria under consideration and to reflect on when the
mechanical application of the rules fails to serve their intended purpose" FML

"Decision makers might have a pre-existing but informal process for making decisions, and they might like to automate it. In this case, machine learning (or other
statistical techniques) might be employed to “predict” how a human would make
a decision, given certain attributes or criteria. The goal isn’t necessarily to perfectly
recover the specific weight that past decision makers had implicitly assigned to
different criteria, but rather to ensure that the model produces a similar set of
decisions to the humans. An educational institution might want to automate the
process of grading essays, and it might attempt to do that by relying on machine
learning to learn to mimic teachers’ grades in the past.
This form of automation might help to address concerns with arbitrariness in
human decision making by formalizing and fixing a decision-making scheme similar to what humans might have been employing in the past. In this respect, machine
learning might be desirable because it can help to smooth out any inconsistencies
in the human decisions from which it has induced some decision-making rule.5
Automation is a natural way to address concerns with arbitrariness understood as
inconsistency.6
A few decades ago, there was a popular approach to automation that relied
on explicitly encoding the reasoning that humans relied on to make decisions.68
This approach, called expert systems, failed for many reasons, including the fact
that people aren’t always able to explain their own reasoning.69 Expert systems
were eventually abandoned in favor of simply asking people to label examples
and having learning algorithms discover how to best predict the label that humans
would assign. While this approach has proved powerful, it has its dangers.
First, it may give the veneer of objective assessment to decision-making schchemes
that simply automate the subjective judgment of humans......Specifically, humans will need to exercise discretion
in specifying and identifying what counts as an example of the target.70 This
approach runs the obvious risk of replicating and exaggerating any objectionable
qualities of human decision making by learning from the bad examples set by
humans.................;Even when the model is able to reliably predict the decisions that
humans would make given any particular input, there is no guarantee that the
model will have inherited all of the nuance and considerations that go into human
decision-making. Worse, models might also learn to rely on criteria in ways that
humans would find worrisome or objectionable, even if doing so still produces a
similar set of decisions as humans would make.72
In short, the use of machine learning to automate decisions previously performed by humans can be problematic both because it can end up being both too
much like human decision makers and too different from them."FML

"The final form of automation is one in which decision makers rely on machine
learning to learn a decision-making rule or policy from data. This form of automation, which we’ll call predictive optimization, speaks directly to concerns with
reasoned decision making. Note that neither of the first two forms of automation
does so. Consistently executing a pre-existing policy via automation does not
ensure that the policy itself is a reasoned one. Nor does relying on past human
decisions to induce a decision-making rule guarantee that the basis for automated
decision making will reflect reasoned judgments. In both cases, the decision making scheme will only be as reasoned as the formal policy or informal judgments
whose execution is being automated.
In contrast, predictive optimization tries to provide a more rigorous foundation
for decision making by only relying on criteria to the extent that they demonstrably
predict the outcome or quality of interest. When employed in this manner, machine
learning seems to ensure reasoned decisions because the criteria that have been
incorporated into the decision making scheme — and their particular weighing —
are dictated by how well they predict the target. And so long as the chosen target
is a good proxy for decision makers’ goals, relying on criteria that predict this
target to make decisions would seem well reasoned because doing so will help to
achieve decision makers’ goals." FML

"First, the
decision makers try to identify an explicit target for prediction which they view
32
as synonymous with their goal — or a reasonable proxy for it.7
In a college
admissions scenario, one goal might be scholastic achievement in college, and GPA
might be a proxy for it. Once this is settled, the decision makers use data to discover
which criteria to use and how to weight them in order to best predict the target.
While they might exercise discretion in choosing the criteria to use, the weighting of
these criteria would be dictated entirely by the goal of maximizing the accuracy of
the resulting prediction of the chosen target. In other words, the decision-making
rule would be learned from data, rather than set down according to decision
makers’ subjective intuitions, expectations, and normative commitments....... The
traditional approach makes it possible to express multiple goals and normative
values through the choice of criteria and the weight assigned to them.
In the machine learning approach, multiple goals and normative considerations
need to be packed into the choice of target. ............One danger of the machine learning approach is that it leads to a narrow focus
on the accuracy of the prediction. In other words, “good” decisions are those
that accurately predict the target. But decision making might be “good” for other
reasons: focusing on the right qualities or outcomes (in other words, the target
is a good proxy for the goal), considering only relevant factors (in a sense we’ll
discuss below), considering the full set of relevant factors, incorporating other
normative principles (e.g., need, desert, etc.), or allowing people to understand and
potentially contest the policy. Even a decision making process that is not terribly
accurate might be seen as good if it has some of these other properties.75
Does the fact that certain criteria reasonably accurately predict a target of
interest mean that making decisions on the basis of these criteria is justified? While
this might be perceived as preferable to decisions that were made on some basis
that had been chosen for no particular reason at all, this treats the predictive value
of any decision making criterion as the only quality that is of concern when asking
whether a decision is justified. In the next few sections, we’ll explore situations
where this answer might not suffice." FML

Goliath: "Those who have this power
have enormous power indeed. It’s the power to use
discriminatory criteria to dole out different opportunities, access,
eligibility, prices (mostly in terms of special offers and discounts),
attention (both positive and negative), and exposure.
This practice can get very intrusive. High-end rest"...."OVERSIGHT AND ACCOUNTABILITY
In order for most societies to function, people must give others
power over themselves.
Ceding power is an inherently risky thing to do, and over the
millennia we have developed a framework for protecting
ourselves even as we do this: transparency, oversight, and
accountability. If we know how people are using the power we
give them, if we can assure ourselves that they’re not abusing it,
and if we can punish them if they do, then we can more safely
entrust them with power. This is the fundamental social contract
of a democracy."

Goliath : "If systemic imperfections are inevitable, we have to accept
them—in laws, in government institutions, in corporations, in
individuals, in society. We have to design systems that expect
them and can work despite them. If something is going to fail or
break, we need it to fail in a predictable way. That’s resilience.
In systems design, resilience comes from a combination of
elements: fault-tolerance, mitigation, redundancy, adaptability,
recoverability, and survivability. It’"... "The problem is that all three branches of government have
abrogated their responsibilities for oversight. The normal
democratic process of taking a law, turning it into rules, and then
turning those rules into procedures is open to interpretation
every step of the way, and therefore requires oversight every step
of the way. "

MIT Datacience "This personalization
can result in preferential treatment for some and marginalization of others. A clear example of this discrimination
is differential pricing on websites, wherein some
customers are charged more than other customers for the
same product based on their customer profiles (Clifford
2012).
These profiles are constructed by integrating data
from a number of different noisy and partial data sources,
so the profiles can often be misleading about an individual.
What is worse is that these marketing profiles are treated
as products and are often sold to other companies, with
the result that a negative marketing assessment of an individual
can follow that individual across many domains. "

MIT Datascience : "Two aspects of these marketing profiles
make them particularly problematic. First, they are a
black box, and, second, they are persistent. The black-box
nature of these profiles is apparent when one considers
that it is difficult for an individual to know what data are
recorded about them, where and when the data were recorded,
and how the decision processes that use these data
work. "...."What is more, in the modern world data are often
stored for a long time. So data recorded about an event in an individual’s life persists long after an event. "

MIT "Furthermore, unless used very carefully, data science
can actually perpetuate and reinforce prejudice. An argument
is sometimes made that data science is objective:
it is based on numbers, so it doesn’t encode or have the
prejudicial views that affect human decisions. The truth
is that data science algorithms perform in an amoral
manner more than in an objective manner. Data science
extracts patterns in data; however, if the data encode a
prejudicial relationship in society, then the algorithm is
likely to identify this pattern and base its outputs on the
pattern. Indeed, the more consistent a prejudice is in a
society, the stronger that prejudicial pattern will appear
in the data about that society, and the more likely a data
science algorithm will extract and replicate that pattern of
prejudice. For example, a study carried out by academic researchers
on the Google Online Advertising system found
that the system showed an ad relating to a high-paying
job more frequently to participants whose Google profile
identified them as male compared to participants whose
profile identified them as female (Datta, Tschantz, and
Datta 2015)."

MIT: "The anticipatory nature of predictive policing means
that individuals may be treated differently not because of
what they have done but because of data-driven inferences
about what they might do. As a result, these types of systems
may reinforce discriminatory practices by replicating
the patterns in historic data and may create self-fulfilling
prophecies."

" the Charter of Fundamental
Rights of the European Union prohibits discrimination
based on any grounds, including race, color, ethnic
or social origin, genetic features, sex, age, birth, disability,
sexual orientation, religion or belief, property, membership
in a national minority, and political or any other opinion
(Charter 2000)."

"Another downside to a field oriented around one-dimensional, competitive
pursuit is that it becomes structurally difficult to address biases in models and
classifiers. If a contestant takes steps to prevent dataset bias from propagating
to their models, there will be an accuracy drop (because accuracy is judged on a
biased dataset) and fewer people will pay attention to the work.
As fairness issues in machine learning have gained prominence, fairnessfocused benchmarks datasets have proliferated, such as the Pilot Parliamentarians
Benchmark for facial analysis319 and the Equity Evaluation Corpus for sentiment
analysis.525 An advantage of this approach is that the scientific and cultural machinery of benchmark-oriented innovation can be repurposed for fairness research.
A potential danger is Goodhart’s law, which states, in its broad form, “When a
measure becomes a target, it ceases to be a good measure.” As we’ve emphasized
in this book, fairness is multifaceted, and benchmarks can capture only narrow
notions of fairness. While these can be useful diagnostics, if they are misconstrued as targets in their own right, then research that is focused on optimizing for these
benchmarks may not result in fairness in a more substantive sense." FML


"Properties of datasets that sometimes (but not always, and not in easily predictable ways) propagate downstream include imbalance, biases, stereotypes, and
categorization. By imbalance we mean unequal representation of different groups.
For example, Buolamwini and Gebru pointed out that two facial analysis benchmarks, IJB-A and Adience, overwhelmingly featured lighter-skinned subjects.319
By dataset biases we mean incorrect associations, especially those corresponding
to social and historical prejudices. For example, a dataset that measures arrests as
a proxy for crime may reflect the biases of policing and discriminatory laws. By
stereotypes we mean associations that accurately reflect a property of the world
(or a specific culture at a specific point in time) that is thought to be the result of
social and historical prejudice. For example, gender-occupation associations can
be called stereotypes. By categorization we mean assigning discrete (often binary)
labels to complex aspects of identity such as gender and race.10
Representational harms occur when systems reinforce the subordination of
some groups along the lines of identity. Representational harms could be downstream harms — such as when models apply offensive labels to people from
some groups — but they could be inherent in the dataset. For example, ImageNet contains numerous slurs and offensive labels inherited from WordNet and
pornographic images of people who did not consent to their inclusion in the
dataset.506, 507
While downstream and representational harms are two categories that have
drawn a lot of attention and criticism, there are many other harms that often
arise including the environmental cost of training models on unnecessarily large
datasets508 and the erasure of the labor of subjects who contributed the data496 or
the annotators who labeled it.466 For an overview of ethical concerns associated
with datasets, see the survey by Paullada et al.509" FML

"Conversely, one reason why there are fairness interventions applicable to production datasets but not scientific benchmarks is that interventions for production
datasets can be strongly guided by an understanding of their downstream impacts
in specific applications. Language and images, in particular, capture such a variety
of cultural stereotypes that sanitizing all of them has proved infeasible.513 It is
much easier to design interventions once we fix an application and the cultural
context(s) in which it will be deployed. Different interventions may be applicable
to the same dataset used in different applications. Unlike scientific benchmarks,
dataset standardization is not necessary in engineering settings.
In fact, the best locus of intervention even for dataset biases may be downstream
of the data. For example, it has been observed for many years that online translation
systems perpetuate gender stereotypes when translating gender-neutral pronouns.
The text “O bir doctor. O bir hem¸sire.” may be translated from Turkish to English
as “He is a doctor. She is a nurse.” Google Translate mitigated this by showing
multiple translations in such cases.514, 515 Compared to data interventions, this has
the benefit of making the potential bias (or, in some cases, erroneous translation)
more visible to the user"FML

ML fairness practitioners have identified collection and curation
of representative data as the most common way to mitigate model
biases [71],Kenneth Holstein, Jennifer Wortman Vaughan, Hal Daumé III, Miro Dudik, and
Hanna Wallach. 2019. Improving fairness in machine learning systems: What
do industry practitioners need?. In Proceedings of the 2019 CHI Conference on
Human Factors in Computing Systems. 1–16.

Ethical guidelines on the use of artificial intelligence and data in teaching and learning for educators, European Commission, October 2022: Fairness relates to everyone being treated fairly in the social
organisation. Clear processes are required so that all users have
equal access to opportunity. These include equity, inclusion, nondiscrimination,
and fair distribution of rights and responsibilities

+

3 Diversity, non-Discrimination and Fairness
• Is the system accessible by everyone in the same way without any barriers?
• Does the system provide appropriate interaction modes for learners with disabilities or special education needs? Is the AI
system designed to treat learners respectfully adapting to their individual needs?
• Is the user interface appropriate and accessible for the age level of the learners? Has the usability and user-experience
been tested for the target age group?
• Are there procedures in place to ensure that AI use will not lead to discrimination or unfair behaviour for all users?
• Does the AI system documentation or its training process provide insight into potential bias in the data?
• Are procedures in place to detect and deal with bias or perceived inequalities that may arise?
4 Societal and Environmental Wellbeing
• How does the AI system affect the social and emotional wellbeing of learners and teachers?
• Does the AI system clearly signal that its social interaction is simulated and that it has no capacities of feeling or empathy?
• Are students or their parents involved in the decision to use the AI system and support it?
• Is data used to support teachers and school leaders to evaluate student wellbeing and if so, how is this being monitored?
• Does use of the system create any harm or fear for individuals or for society?

+

Bias is an inclination of prejudice towards or
against a person, object, or position. Bias can
arise in many ways in AI systems. For example,
in data-drive AI systems, such as those
produced through machine learning, bias in data
collection and training can result in an AI system
demonstrating bias. In logic-based AI, such as
rule-based systems, bias can arise due to how
a knowledge engineer might view the rules that
apply in a particular setting.
It does not necessarily relate to human bias or
human-driven data collection. It can arise, for
example, through the limited contexts in which
a system is used, in which case there is no
opportunity to generalise it to other contexts.
Bias can be good or bad, intentional or
unintentional. In certain cases, bias can result in
discriminatory and/or unfair outcomes (
i.e. unfair bias).

+

Assumptions made by AI algorithms, could amplify
existing biases embedded in current education
practices i.e., bias pertaining to gender, race, culture,
opportunity, or disability status.
Bias can also arise due to online learning and
adaptation through interaction. It can also arise
through personalisation whereby users are presented
with recommendations or information feeds that are
tailored to the user’s tastes.







 

This page has paths:

This page references: