1. Surveying the basics of bias and fairness in machine learning. The students will learn the basics from the two review articles “A Survey on Bias and Fairness in Machine Learning” by NINAREH MEHRABI, FRED MORSTATTER, NRIPSUTA SAXENA, KRISTINA LERMAN, and ARAM GALSTYAN, and “An Introduction to Algorithmic Fairness” arXiv:2105.05595v1 [cs.CY] by Hilde J.P. Weerts.
2. Searching for possible fairness libraries that can be used in the industry. We will use three libraries created by big technology companies, so that they are trustable to be used in industry.
• Fairlearn (By Microsoft)
• AIF360 (By IBM)
• What-if-tool (By Google)
3. Selecting a published structured and unstructured dataset. The main goal of the project is to mitigate bias in the structured (tabular) dataset. If possible, we will extend our bias analysis to the unstructured data such as text and image.
• Tabular Dataset: TitanicSexism (fairness in ML), https://www.kaggle.com/code/garethjns/titanicsexism-fairness-in-ml/input
• Text Dataset: Fake and real news dataset, https://www.kaggle.com/datasets/clmentbisaillon/fake-and-real-news-dataset
• Imaged Dataset: UTKFace, https://www.kaggle.com/datasets/jangedoo/utkface-new
4. Discussing the possible mitigation algorithms that can be used. Mitigation algorithms should be implemented in pre-processing, in-processing, and post-processing. Below is an example of the mitigation algorithms that will be used.
• Fairlearn: ExponentiatedGradient, GridSearch, ThresholdOptimizer, CorrelationRemover, AdversarialFairnessClassifier, AdversarialFairnessRegressor.
• AIF360: preprocessing (Disparate Impact Remover, LFR, Optim Preproc, Reweighing), inprocessing (Adversarial Debiasing, ART Classifier, Gerry Fair Classifier, Meta Fair Classifier, Prejudice Remover, Exponentiated Gradient Reduction, GridSearch Reduction), postprocessing (Calibrated EqOdds Postprocessing, EqOdds Postprocessing, Reject Option Classification).
• What-If-Tool: It is still under study
5. Discussing the results and summarizing the comparison among the libraries. In the discussion, we will compare the performance of the mitigation algorithms in different stage of the ML life cycle such as preprocessing, inprocessing, and postprocessing.
{Empty}
PI has an undergraduate student they would like to work with them on this project.
{Empty}
{Empty}
{Empty}
{Empty}
6127 Galleon Dr
Mechanicsburg, Pennsylvania. 17050
CR-Penn State
{Empty}
No
Already behind3Start date is flexible
{Empty}
{Empty}
{Empty}
03/22/2024
{Empty}
{Empty}
{Empty}
{Empty}
{Empty}
{Empty}
{Empty}
Mentor is needed - skills in machine learning and bias identification needed.