Loading...

Postulate is the best way to take and share notes for classes, research, and other learning.

More info

Paper Summary of: "Conditional Image Generation with PixelCNN Decoders"

Profile picture of Dickson WuDickson Wu
May 23, 2021Last updated May 27, 20214 min read

Paper Summary of: "Conditional Image Generation with PixelCNN Decoders"

Abstract

  • Generates new images from an image density model. We create the image density model through vectors, tags, labels, or even embeddings of other networks -> It can do a ton with this:

  • We can feed it labels: It could generate new scenes of that label (ex: Tiger)

  • We can feed it the embedding of an image of a face -> Creates new portraits.

  • And it can be a decoder for image autoencoders

  • And it can match the state of the art for image classification in ImageNet

Introduction:

  • Right now the generative networks don't take any inputs or constrains on them. They just generate stuff!

  • There are lots of applications of generation that require conditions to be set (aka inputs).

  • PixelCNN builds off of Pixel RNN and seeks to improve it. Both are conditional and generative

  • Unique value = returns probability densities so it's easy to apply to compression + probabilistic planning

  • There are two ways of doing this: 2D LSTMs or CNN's. LSTMs are more accurate but CNN's are faster -> We'll combine them both to have the accuracy of the 2D LSTM and the training speed of the CNN

  • We can literally feed it a one hot encoding of the class, and it can spit out generated images. Or generate new images given the embeddings

Gated PixelCNN

  • This is the combination of PixelCNN and PixelRNN

  • This is what it looks like:

    • 

    • Think of a CNN. It has a kernel which looks at some block right?

    • The middle image shows where the CNN is actually allowed to look. 1 = open, you can see it. 0 = closed you can't see it

    • Now the CNN will take all the pixels that it does see and produce a joint distribution (think of it like each of the previous variables have some value to it, and then you just multiply them all together to get a concatenation of the variables to then predict the next pixel

    • 

    • The right part is just showing that the kernel can only see what's above it, and what's left to it

    • Since the CNN can see the previous pixels, it can generate high quality images in a non-linear fashion

  • How we make this to the level of PixelRNN

    • Pixel RNNs can access the whole field while CNN's can't. It just grows linearly. We can surpass that by using more layers

    • Also RNNs have multiplicative units that increase complexity. We can do the same by replacing the RELU activation with a gated activation unit:

    • 

  • Blind spots

    • There are also apparently blind spots in this model

    • They just had 2 CNNs that were ultimately combined together to cover the blind spot

  • Conditional PixelCNN

    • Remember that we're generating specific images, so we can't just let the CNN go off on it's own

    • We just input the latent vector h into everything - both the CNN model distribution, and the activation function

    • 

    • 

  • Also we can just replace the decoding in autoencoders with the PixelCNN and it works! PixelCNN does the heavy lifting in one area, so the Ender just focuses on high-level abstract information

Experimentation

  • For CIFAR-10, it (Gated PixelCNN) achieved state of the art

  • ImageNet same thing. (for unconditional modeling)

  • Then they tested out the conditional one (feed in a one-hot encoding)

  • 

  • Then they tested out the portrait embeddings

  • 

  • They can also paly with linear interpolations:

  • 

  • Also trained autoencoders (m = number of dimensions of the bottleneck)

  • 

Conclusion

  • They just summarize the paper

  • In the future they want generate new ones from only 1 example image

  • Also they want to do variational auto-encoders

  • They could also try image caption instead of labels.


Comments (loading...)

Sign in to comment

ML Paper Collection

A Collection of Summaries of my favourite ML papers!