Master of Science (MS)
First Committee Member
Second Committee Member
Third Committee Member
Fourth Committee Member
Fifth Committee Member
Number of Pages
State-of-the-art image captioning models can successfully produce a diverse set of accurate captions. Previous research has focused on improving caption diversity while maintaining a high level of fidelity. We shift the focus from accuracy and diversity to controllability. We use a modified version of the traditional encoder-decoder network that allows the model to produce a meaningful and structured latent space. We then explore the latent space using several latent cartographic methods: lerp, slerp, analogy completion, attribute vector rotation, and interpolation graphs. Additionally, we discuss different categories of latent space and provide modifications for each of the cartographic methods. Finally, we show that it is possible to generate a set of diverse and accurate captions with desired real space semantics by sampling from different areas of the latent space.
Computer Vision; Generative Networks; Image Captioning; Latent Space; Machine Learning; Natural Language Processing
University of Nevada, Las Vegas
Musser, Mikian J., "Exploring the Latent Space of Image Captioning Networks" (2021). UNLV Theses, Dissertations, Professional Papers, and Capstones. 4306.
IN COPYRIGHT. For more information about this rights statement, please visit http://rightsstatements.org/vocab/InC/1.0/
Available for download on Thursday, December 15, 2022