Award Date
May 2023
Degree Type
Dissertation
Degree Name
Doctor of Philosophy (PhD)
Department
Computer Science
First Committee Member
Kazem Taghva
Second Committee Member
Laxmi Gewali
Third Committee Member
Wolfgang Bein
Fourth Committee Member
Mingon Kang
Fifth Committee Member
Emma Regentova
Number of Pages
112
Abstract
Model validation is a critical step in the development, deployment, and governance of machine learning models. During the validation process, the predictive power of a model is measured on unseen datasets with a variety of metrics such as Accuracy and F1-Scores for classification tasks. Although the most used metrics are easy to implement and understand, they are aggregate measures over all the segments of heterogeneous datasets, and therefore, they do not identify the performance variation of a model among different data segments. The lack of insight into how the model performs over segments of unseen datasets has raised significant challenges in deploying machine learning models into production environments. The unstable performance is especially concerning for critical applications such as credit risk models, cancer detection, and self-driving cars, which have significant impacts on users.In this dissertation, we leverage the notion of information-theoretic explanations to measure the performance of binary classifiers over various segments of data. We provide the following contributions: 1) A distributed implementation of the explanation framework that outperforms a single-node baseline. 2) An application of the framework to summarize the model’s performance over various segments of training and testing data in terms of overall accuracy as well as providing false-positive and false-negative patterns. 3) A further application of the framework to annotate test instances with expected performance indicators at the inference time. In addition to assisting machine learning engineers with model tuning and data augmentation decisions, the proposed tools can also identify potential model bias and unfairness with respect to protected attributes and data segments.
Keywords
artificial intelligence; database; information theory; machine learning
Disciplines
Artificial Intelligence and Robotics | Computer Engineering | Computer Sciences
File Format
Degree Grantor
University of Nevada, Las Vegas
Language
English
Repository Citation
Esmaeilzadeh, Armin, "Information-Theoretic Model Diagnostics (InfoMoD)" (2023). UNLV Theses, Dissertations, Professional Papers, and Capstones. 4676.
http://dx.doi.org/10.34917/36114701
Rights
IN COPYRIGHT. For more information about this rights statement, please visit http://rightsstatements.org/vocab/InC/1.0/