Machine Learning Ethics
Computer software runs on “algorithms”, which are instructions that tell the computer what to do. People use algorithms to make decisions in many important settings such as health care, law enforcement, and education. For example, doctors and nurses use algorithms to estimate a patient’s risk of developing diseases, and judges use algorithms to predict whether someone will commit a crime in the future. One type of algorithm, called Machine Learning, analyzes large amounts of data to make predictions about the future. Machine learning algorithms have recently become very popular, and they are very effective at certain tasks such as recognizing objects in images and translating text into different languages.
As technology has advanced in recent years, it has become easier and less expensive to obtain computers and data needed to use machine learning. This means that more people are able to use these tools to make decisions. For example, imagine that a store owner is deciding which candy to buy for their store: chocolate bars or lollipops. To help make this decision, the store owner wants to predict whether people will buy more chocolate bars or more lollipops in the next week. To make this prediction, they build a machine learning algorithm that analyzes three pieces of information (data): the amount of chocolate bars sold last week, the number of lollipops sold last week, and the weather forecast for next week. The store owner then uses this algorithm (which uses the data) to decide which treat to buy.
Machine learning algorithms like this are also used to make very important predictions in the real world---such as whether a patient will develop a deadly disease, or whether a person will commit a crime in the future. These predictions are then used to make important decisions, such as a doctor deciding whether to prescribe drugs, or a judge deciding whether or not to put someone in jail.
How Can We Make Sure the Predictions Are Fair?
Many scientists and engineers who build machine learning algorithms want to make sure that these algorithms and their predictions are fair. Most people agree that fairness is important. For instance, we want doctors to treat patients fairly, and we want judges to treat defendants fairly. And scientists and engineers have designed many different ways to make machine learning algorithms more fair. These methods of fairness are largely based in mathematics. However, there are many different ways to define what “fair” means, and scientists and engineers don’t agree on a single definition.
To make matters worse, we don’t know whether those who use machine learning to make predictions (such as doctors or judges) understand or agree with scientists and engineers about what is fair.
How Do People Understand Fairness Definitions?
The goal of our study was to learn whether people understand the definitions of fairness used by scientists and engineers to make machine learning more fair. Since these definitions are mathematical, we developed an online survey to measure how well people understand these definitions. We used simple examples and pictures to do this. In this survey we call each fairness definition a “fairness rule” to make the wording simpler.
All of our examples were based on the following scenario: a company is deciding which job applicants to hire, and they need to be fair to applicants based on their gender. First we show all job applicants, including two pieces of information about each job applicant. This includes 1) whether or not they are qualified (yes or no) and 2) their gender (male or female). We only used two genders in this study to keep things simple. At the end of each example, we show the decision made by the company. We specifically look at which applicants were selected, and which were rejected. We created surveys to test participants’ understanding of one of four different fairness rules, described below:
The fraction of male candidates who receive job offers should equal the fraction of female candidates who receive job offers.
The fraction of qualified male candidates who do not receive job offers should equal the fraction of qualified female candidates who do not receive job offers.
The fraction of unqualified male candidates who receive job offers should equal the fraction of unqualified female candidates who receive job offers.
The fraction of qualified male candidates who do not receive job offers should equal the fraction of qualified female candidates who do not receive job offers. Similarly, the fraction of unqualified male candidates who receive job offers should equal the fraction of unqualified female candidates who receive job offers.
Each survey began with a description of the hiring scenario, a written definition of the fairness rule, and two examples of this rule using pictures. To measure peoples’ understanding in several different ways we developed a short “quiz” that used picture-based examples, multiple-choice questions, true/false questions, and yes/no questions about the fairness rule.
For example, one question from this quiz is: “Is the following statement TRUE OR FALSE: This fairness rule always allows the company to send offers only to the most qualified applicants.” Overall, this quiz included 9 questions, so the maximum possible score is 9 and the minimum score is 0.
The survey also asked whether participants agreed with and whether they liked the fairness rule. Additionally, we asked participants to write this rule in their own words. This gave us another way to measure whether participants understood the rule, and how they viewed it. We also asked participants to provide basic information about themselves, including their age, gender, ethnicity, and education level.
In total, 147 participants took our survey. The mix of age, gender, ethnicity, and education level of participants roughly matched the US population according to the 2017 US Census. Below are some of our main findings:
Our quiz is a consistent and reliable measurement of whether participants understood a fairness rule. We compared results from our quiz with the descriptions that participants wrote of each fairness rule. We found that participants who wrote a correct description of the fairness rule usually scored higher on the quiz, and participants who wrote an incorrect fairness definition usually scored lower on the quiz.
Overall, survey participants answered about half of our quiz questions correctly. Of course, some participants answered all questions correctly, and some answered all questions incorrectly.
Overall, participants scored much worse with fairness rule (D) than with the other three. This is not surprising, since definition (D) is a combination of two different rules (B and C). Rule (D) (and also rule B) may also be confusing since it counts qualified applicants who are not hired or offered jobs. These rules might be confusing in the context of hiring, since they count people who are not hired. Overall, participants who say they don’t like or don’t agree with a fairness rule usually get higher scores on our quiz---their average score was about 7 out of 9. Participants who liked or agreed with the rule had an average score of about 5 out of 9. We are not sure why this happened. One possible reason is that participants who understand a fairness rule very well can also understand the downsides of using a fairness rule to make decisions---which might cause them to dislike these rules.
Participants with more education (those who have a bachelor’s degree) usually get higher scores on our quiz than those with less education.
Starting a Wider Conversation About Fair Predictions
Most people agree that machine learning used to make predictions about people should treat people fairly. There are many mathematical methods to make sure that predictions are fair, but experts disagree about which methods should be used. There has been little discussion between those who build machine learning systems and those affected by these systems (such as patients or criminal defendants). We think that this discussion is important. In order to make sure that machine learning systems are fair and do not cause harm, everyone who builds and uses machine learning systems needs to agree on what “fair” looks like. This discussion needs to include the engineers who build the machine learning systems, the users of these systems, and the people affected by the users’ decisions.
Our study takes a small step toward starting this conversation, by trying to understand how well people understand the mathematical definitions of fairness used by scientists and engineers. We developed a short quiz to measure understanding, and we found that people answer about half of these questions correctly. This means that there is still a big gap in understanding between those who build machine learning systems and the general population.
This study also has limitations that are necessary to consider. Most importantly, we had a small number of participants (147), all within the United States. These participants are a very small fraction of the overall population of the United States (and the world overall). Therefore, our conclusions may not accurately represent the greater population in the United States or worldwide. The only way we will find out is with more studies like this one. Overall, we encourage engineers and non-engineers to continue the conversation about how to build fair machine learning systems.
Written By: Dr. Duncan McElfresh
Academic Editor: Biologist
Non-Academic Editor: Retiree
Original Paper
• Title: Measuring non-expert comprehension of machine learning fairness metrics
• Authors: Debjani Saha, Candice Schumann, Duncan C. McElfresh, John P. Dickerson, Michelle L. Mazurek, and Michael Carl Tschantz
• Journal: ICML'20: Proceedings of the 37th International Conference on Machine Learning
• Date Published: 13 July 2020
Please remember that research is done by humans and is always changing. A discovery one day could be proven incorrect the next day. It is important to continue to stay informed and keep up with the latest research. We do our best to present current work in an objective and accurate way, but we know that we might make mistakes. If you feel something has been presented incorrectly or inappropriately, please contact us through our website.