Loading...

Postulate is the best way to take and share notes for classes, research, and other learning.

More info

Paper Summary: "Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks"

Profile picture of Dickson WuDickson Wu
May 24, 2021Last updated May 27, 20215 min read

Paper Summary: "Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks" Abstract

  • CycleGan = a method of turning an image from one group into an image from another group (zebra to horse)

  • For a ton of applications, there isn't a paired training set! Meaning we can't do it using traditional approaches

  • CycleGan plans solve that problem.

  • Let X and Y be 2 groups. G is a function (model) and F is another Model

  • Our goal is to get G(X) to be indistinguishable from another random Y.

  • F(Y) turns it into X. So we cycle it up!

  • We try and get F(G(X)) to be equal to X! (and he other way around to)

  • This is super effective for data that doesn't have paired training data

Introduction:

  • Imagine Monet (some painter) painting a scene. He paints in a certain style so we know the features he has. If we were to take a photo today and think about how Monet could do it, we could somewhat picture what it would look like.

  • If you break it down, we're essentially taking a photo and transforming it into a Monet Style

  • That's what Cycle GANs seek to do

  • Existing architectures can do this (ex: segmentation model) but they depend on labeled data that's hard to get and expensive. Sometimes you can't even translate it in the real world (like turning a horse into a zebra)

  • Instead we need have unpaired training sets:

    • All we give it is 2 sets: X and Y

    • Function G = X --> Y

    • Function F = Y --> X

    • y^ = G(x)

    • The goal is to get y^ to be in set Y

    • We can train that adversarial train it (like a GAN)

    • But that often fails because it just collapses. G can spit out BS, or always spit out a signal image and never optimize

    • We need to add another loss: So we borrow something from machine translation: Translate a sentence from English to French. Then translate that sentence back to English. They should be matching

    • We do the same thing. We compare x and F(G(x)) - our second loss

    • We combine them together to get our full loss. Train on that and we're good!

Related Work

  • GANs: Main thing we're stealing is the adversarial loss (aka 2 models duke it out, the worse the other guy does the lower your loss, the better they do the higher your loss

  • Image to Image translation. They do okay, but they require labeled sets - we're using unpaired sets

  • Unpaired Image to Image translation: They exist but they're all specific or have a similarity function. We don't need any of that thus very robus and applicable to many different domains

  • Cycle Consistency: AKA optimizing F(G(x)) = x. Been used for a long time in machine translation and even Mark Twain

  • Neural Style Transfer: Although results are somewhat similar, but style Gans are still more generalizable + they have different ways of achieve it

Formulation

  • Some notation:

    • 

    • That's the images (replace x with y and that's the other images)

    • 

    • That's the data distribution of x (remember that our aim is to map x to the data distribution of Y. Opposite too)

    • There are two D's which are the discriminators between the real and fake X's and Y's

    • Adversarial losses = the generators and discriminators

    • Cycle Consistency losses = they're the F(G(x)) things

  • Adversarial Loss:

    • Generator's task: Try and screw up the Discriminator and get our images to as real as possible

    • Discriminator's task: Try and distinguish between real images and fake ones

    • 

    • We plug both real and fake ones into the loss

    • Same with both sides

  • Cycle Consistency Loss:

    • Now adversarial losses could theoretically get everything, but it doesn't. So we add in this guy to make sure it is

    • It's what we said before about it maintaining that closed circle

    • 

  • Putting everything together

    • We just add up our losses:

    • 

    • Lambda's there to scale up the cycle loss

    • Actually we can think of this like an auto encoder:

      • Autoencoder = image -> compress -> same image

      • Cyclegan = image -> other image -> Same image

      • We're just a special subsection where the compression is actually an image in another domain and we happen to use adversarial loss to train that part

    • As it turns out we also can't use just one of the losses - combining them is actually better

Implementation:

  • They go into detail about the model - replicated from another paper

  • They also go into detail about how they trained it - also replicated from other papers

  • And they talk about the hyperparameters they set



Results

  • They start doing tests regarding other models, different losses and variants

  • Applications:

    • Fun Style transfer

    • You can do the object transformation but for the same class too

    • Season transfer

    • Paintings -> Photos (add an identity loss too, aka compare the output with the original input)

    • Photo enhancement

Limitations and Discussion

  • Most results are actually really good

  • Geometric changes are a problem. Also it can't generalize to stuff it hasn't seen before (since the training data didn't have any humans riding on horses, it ends up striping Putin while riding a horse)

  • There's also still a gap between paired data and Cyclegan. They still perform better - but we can implement a tiny bit of supervision to make it less expensive and boost results.

Cool Images:






Comments (loading...)

Sign in to comment

ML Paper Collection

A Collection of Summaries of my favourite ML papers!