Summary of "Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks"

arerevealed summary of the paper: Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks﻿
AbstractUsing convolutional neural networks (CNNs) for supervised learning has been popular in the field of computer vision; however, unsupervised learning with CNNs has not. 
This paper explores the gap between the success of unsupervised and supervised learning with CNNs, by introducing a concept known as deep convolutional generative adversarial networks (DCGANs). By training this on various image datasets, the model can learn a hierarchy of representations from both the generator and discriminator, demonstrating applicability for general image representation.
IntroductionBeing able to learn reusable feature representations from unlabelled datasets is an area of active research. If we leverage this in computer vision, it could prove a high performance on various supervised learning tasks. We can use generative adversarial networks (GANs) to do this. However, the learning process and the lack of heuristic cost function can make it more attractive to representation learning, which has been unstable to train.
The paper proposes/evaluated a set of deep convolutional GANs (DCGANs) 
We can use the trained discriminators for image classification, as it shows a competitive performance with other unsupervised algorithms. It also talks more about visualizing the filters learnt by the GANs and show specific filters to draw particular objects. We will also show generations to have interesting arithmetic properties, which allows for easy manipulation for many semantic qualities.
Related WorkRepresentation learning from unlabelled data
We can perform clustering on our data and leverage it to improve classification scores. This is done by also training auto-encoders to separate the what and where components of the code. This is a ladder structure that can encode images into compact code, and decode the code to reconstruct the image as accurately.
These methods have been shown to be suitable for feature representation from image pixels.
Generating Natural Images
When generating images, these models can fall into two categories, parametric and non-parametric.
The parametric models have been explored a ton as they are used for generative images. However, generating natural images isn't successful as the samples are blurry and contain a lot of noise.
The non-parametric models are often used to do matching from a database of existing images. It is used in texture synthesis, super-resolution, and in-painting.
Visualizing the Internals of CNNs
There is a lot of criticism with CNNs because it uses black-box methods.  This means that we barely know what the network is doing, even for simple human-consumable algorithms. Using deconvolution and filtering the maximal activations using can approximate the purpose of each convolutional filter in the network.
Approach and Model ArchitectureThere have been difficulties scaling GANs using a CNN architecture, as it is generally used in supervised literature. 
In order to make sure the DCGANs are stable during training, we must have the following architecture:
Replace the pooling layers with strided-convolutions (discriminator) and fractional-strided-convolutions (generator)
Use batch normalization in both the generator and discriminator
Remove the fully connected hidden layers for a more profound architecture.
Use the ReLU activation in the generator for all the layers except for the output layer, which uses a tanh activation function.
Use the LeakyReLU activation in the discriminator for all of the layers.
﻿
Details of Adversarial TrainingThe DCGANs were trained on three different datasets:
Large-scale Scene understanding (LSUN) - used bedrooms dataset, no data augmentation
Imagenet-1k - the source of natural images for unsupervised training, no data augmentation.
Faces dataset - scrapped images of faces from the internet, no data augmentation.
There was no pre-processing applied on the training images except for scaling them to the tanh activation. These models were also trained with mini-batch stochastic gradient descent with a standard deviation of 0.02. 
﻿
Experiments/ResultsClassfiyng CIFAR-10 datset 
The standard way to test unsupervised learning is by applying it to the supervised dataset, so we used the CIFAR-10. After testing and comparing it to other models, here are the results:
﻿
Classifying SVHN Digital Dataset
Another dataset used was the Streetview house dataset (SVHN), where we used the DCGAN as a feature extractor. We specifically used the discriminators from this model for the supervised purpose. After testing and it to other models, here are the results:
﻿
﻿
Conclusion and Future WorkThe proposed model (DCGAN) is a much more stable architecture of training GANs. The adversarial networks are a good use for the representation of images in supervised learning as well as generating models. 
There has been a trend of more extended training models sometimes to collapse parts of their filters, so more future work is needed to tackle this problem. This model can also be extended into other domains like videos and audio as well.
﻿
These were the main things that were covered in the paper!
Summary of "Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks"

Abstract

Introduction

Related Work

Approach and Model Architecture

Details of Adversarial Training

Experiments/Results

Conclusion and Future Work

Comments (loading...)

AI Paper Summaries