Doctor of Philosophy (PhD)
First Committee Member
Second Committee Member
Third Committee Member
Fourth Committee Member
Fifth Committee Member
Number of Pages
Model validation is a critical step in the development, deployment, and governance of machine learning models. During the validation process, the predictive power of a model is measured on unseen datasets with a variety of metrics such as Accuracy and F1-Scores for classification tasks. Although the most used metrics are easy to implement and understand, they are aggregate measures over all the segments of heterogeneous datasets, and therefore, they do not identify the performance variation of a model among different data segments. The lack of insight into how the model performs over segments of unseen datasets has raised significant challenges in deploying machine learning models into production environments. The unstable performance is especially concerning for critical applications such as credit risk models, cancer detection, and self-driving cars, which have significant impacts on users.In this dissertation, we leverage the notion of information-theoretic explanations to measure the performance of binary classifiers over various segments of data. We provide the following contributions: 1) A distributed implementation of the explanation framework that outperforms a single-node baseline. 2) An application of the framework to summarize the model’s performance over various segments of training and testing data in terms of overall accuracy as well as providing false-positive and false-negative patterns. 3) A further application of the framework to annotate test instances with expected performance indicators at the inference time. In addition to assisting machine learning engineers with model tuning and data augmentation decisions, the proposed tools can also identify potential model bias and unfairness with respect to protected attributes and data segments.
artificial intelligence; database; information theory; machine learning
Artificial Intelligence and Robotics | Computer Engineering | Computer Sciences
University of Nevada, Las Vegas
Esmaeilzadeh, Armin, "Information-Theoretic Model Diagnostics (InfoMoD)" (2023). UNLV Theses, Dissertations, Professional Papers, and Capstones. 4676.
IN COPYRIGHT. For more information about this rights statement, please visit http://rightsstatements.org/vocab/InC/1.0/