A complete guide to Bias & Ethics in Data Collection
Bias at the collection stage, in statistical terms, means that the data you have gathered is not representative of the group or activity about which you want to make a claim. Shortcuts and various types of errors are part of what makes us human. According to the author and psychologist Daniel Levitin (2016):
Remember, people, gather statistics. People choose what to count, and how to go about counting. There are a host of errors and biases that can enter into the collection process and these can lead millions of people to draw the wrong conclusions.
Before we jump in, let us quickly understand what Bias and ethics are all about.
What is Ethics?
Ethics seeks to answer the following
- What is right or wrong?
- What is good or bad?
- What is justice?
- What is well-being or equality?
What is Bias?
A bias is a prejudice in favor of or against one thing, person, group compared with another.
Data Collection is the most critical phase and foundation for data-driven technology. Due to lack of time and people bias occurs during the collection process. The following are some of the biases created during the data collection process and which potentially impact/ harm the ethics
A kind of systematic error occurs when the data collector decides who and what is going to be studied/researched. In this approach, the selection of participants is not randomized. Let us say we want to assess the program for improving the health of working from home employees. However, those how have signed up may be different from those who don’t signup. Maybe people who have signed up are more health-conscious and hence they signed up.
If this was the case, it would not be fair to conclude that the program was effective. Also, only this self-selection would have impacted the health of the study participants more than the program
How to manage these ethical harms in Selection Bias?
The selection bias can be minimized using the following ways
- Include as many samples as possible in the study
- Conducting an experimental study
- Draw from the sample that is not self-selecting
Sampling bias is a result of failure to ensure the proper randomization of the population sample and many times it happens unintentionally. For example, imagine there are 30 people in a classroom and you ask if they prefer Maths or Physics. If you only surveyed the boys and concluded that the majority of students like Maths, you would have demonstrated sampling bias.
There are many types of Sampling Bias like the following
- Under coverage: This type happens when some of the variables are not represented / poorly represented and it is a common type of Sampling bias
- Non-Response: Also referred to as participation bias, the inability of the participant to take part in the survey
- Pre-Screening: This happens when the selection process deployed in a study results in a sample that is a poor representation of the population
Using the following measures of sampling bias can be managed to avoid ethical harms
- Avoid Convenience Sampling
- Follow up on non-responder
- Clearly define the Target audience
- Make the Survey accessible and simple
Bias occurs due to data points assigned to incorrect categories. For example, categorizing a smoker as a non-smoker.
There are two types of misclassification bias:
Non-differential: It occurs when the degree of misclassification of exposure status is equal across all groups
Differential: It occurs when the degree of misclassification of exposure status among those with and those without are different.
People who are involved in data collection should have a proper mitigation plan to handle bias and also make it part of the process so that it will reduce the ethical harms if any. If you would like to know more about how sampling bias might affect your business, visit Payoda.
Authored by: Suri Parathasarathy