13. Computer Vision¶
Whether it is medical diagnosis, self-driving vehicles, camera monitoring, or smart filters, many applications in the field of computer vision are closely related to our current and future lives. In recent years, deep learning has been the transformative power for advancing the performance of computer vision systems. It can be said that the most advanced computer vision applications are almost inseparable from deep learning. In view of this, this chapter will focus on the field of computer vision, and investigate methods and applications that have recently been influential in academia and industry.
In Section 6 and Section 7, we studied various convolutional neural networks that are commonly used in computer vision, and applied them to simple image classification tasks. At the beginning of this chapter, we will describe two methods that may improve model generalization, namely image augmentation and fine-tuning, and apply them to image classification. Since deep neural networks can effectively represent images in multiple levels, such layerwise representations have been successfully used in various computer vision tasks such as object detection, semantic segmentation, and style transfer. Following the key idea of leveraging layerwise representations in computer vision, we will begin with major components and techniques for object detection. Next, we will show how to use fully convolutional networks for semantic segmentation of images. Then we will explain how to use style transfer techniques to generate images like the cover of this book. In the end, we conclude this chapter by applying the materials of this chapter and several previous chapters on two popular computer vision benchmark datasets.
- 13.1. Image Augmentation
- 13.2. Fine-Tuning
- 13.3. Object Detection and Bounding Boxes
- 13.4. Anchor Boxes
- 13.5. Multiscale Object Detection
- 13.6. The Object Detection Dataset
- 13.7. Single Shot Multibox Detection
- 13.8. Region-based CNNs (R-CNNs)
- 13.9. Semantic Segmentation and the Dataset
- 13.10. Transposed Convolution
- 13.11. Fully Convolutional Networks
- 13.12. Neural Style Transfer
- 13.13. Image Classification (CIFAR-10) on Kaggle
- 13.13.1. Obtaining and Organizing the Dataset
- 13.13.2. Image Augmentation
- 13.13.3. Reading the Dataset
- 13.13.4. Defining the Model
- 13.13.5. Defining the Training Function
- 13.13.6. Training and Validating the Model
- 13.13.7. Classifying the Testing Set and Submitting Results on Kaggle
- 13.13.8. Summary
- 13.13.9. Exercises
- 13.14. Dog Breed Identification (ImageNet Dogs) on Kaggle
- 13.14.1. Obtaining and Organizing the Dataset
- 13.14.2. Image Augmentation
- 13.14.3. Reading the Dataset
- 13.14.4. Fine-Tuning a Pretrained Model
- 13.14.5. Defining the Training Function
- 13.14.6. Training and Validating the Model
- 13.14.7. Classifying the Testing Set and Submitting Results on Kaggle
- 13.14.8. Summary
- 13.14.9. Exercises