Paper Summary: "Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks"

Paper Summary: "Unpaired Image-to-Image Translation
 using Cycle-Consistent Adversarial Networks"

Abstract
CycleGan = a method of turning an image from one group into an image from another group (zebra to horse)
For a ton of applications, there isn't a paired training set! Meaning we can't do it using traditional approaches
CycleGan plans solve that problem. 
Let X and Y be 2 groups. G is a function (model) and F is another Model
Our goal is to get G(X) to be indistinguishable from another random Y. 
F(Y) turns it into X. So we cycle it up!
We try and get F(G(X)) to be equal to X! (and he other way around to)
This is super effective for data that doesn't have paired training data
Introduction:
Imagine Monet (some painter) painting a scene. He paints in a certain style so we know the features he has. If we were to take a photo today and think about how Monet could do it, we could somewhat picture what it would look like.
If you break it down, we're essentially taking a photo and transforming it into a Monet Style
That's what Cycle GANs seek to do
Existing architectures can do this (ex: segmentation model) but they depend on labeled data that's hard to get and expensive. Sometimes you can't even translate it in the real world (like turning a horse into a zebra)
Instead we need have unpaired training sets:
All we give it is 2 sets: X and Y
Function G = X --> Y
Function F = Y --> X
y^ = G(x)
The goal is to get y^ to be in set Y
We can train that adversarial train it (like a GAN)
But that often fails because it just collapses. G can spit out BS, or always spit out a signal image and never optimize
We need to add another loss: So we borrow something from machine translation: Translate a sentence from English to French. Then translate that sentence back to English. They should be matching
We do the same thing. We compare x and F(G(x)) - our second loss
We combine them together to get our full loss. Train on that and we're good!
Related Work
GANs: Main thing we're stealing is the adversarial loss (aka 2 models duke it out, the worse the other guy does the lower your loss, the better they do the higher your loss
Image to Image translation. They do okay, but they require labeled sets - we're using unpaired sets
Unpaired Image to Image translation: They exist but they're all specific or have a similarity function. We don't need any of that thus very robus and applicable to many different domains
Cycle Consistency: AKA optimizing F(G(x)) = x. Been used for a long time in machine translation and even Mark Twain
Neural Style Transfer: Although results are somewhat similar, but style Gans are still more generalizable + they have different ways of achieve it
Formulation
Some notation:
﻿
That's the images (replace x with y and that's the other images)
﻿
That's the data distribution of x (remember that our aim is to map x to the data distribution of Y. Opposite too)
There are two D's which are the discriminators between the real and fake X's and Y's
Adversarial losses = the generators and discriminators
Cycle Consistency losses = they're the F(G(x)) things
Adversarial Loss:
Generator's task: Try and screw up the Discriminator and get our images to as real as possible
Discriminator's task: Try and distinguish between real images and fake ones
﻿
We plug both real and fake ones into the loss
Same with both sides
Cycle Consistency Loss:
Now adversarial losses could theoretically get everything, but it doesn't. So we add in this guy to make sure it is
It's what we said before about it maintaining that closed circle
﻿
Putting everything together
We just add up our losses:
﻿
Lambda's there to scale up the cycle loss
Actually we can think of this like an auto encoder:
Autoencoder = image -> compress -> same image
Cyclegan = image -> other image -> Same image
We're just a special subsection where the compression is actually an image in another domain and we happen to use adversarial loss to train that part
As it turns out we also can't use just one of the losses - combining them is actually better
Implementation:
They go into detail about the model - replicated from another paper
They also go into detail about how they trained it - also replicated from other papers
And they talk about the hyperparameters they set
﻿
Results
They start doing tests regarding other models, different losses and variants
Applications:
Fun Style transfer
You can do the object transformation but for the same class too
Season transfer
Paintings -> Photos (add an identity loss too, aka compare the output with the original input)
Photo enhancement
Limitations and Discussion
 Most results are actually really good
Geometric changes are a problem. Also it can't generalize to stuff it hasn't seen before (since the training data didn't have any humans riding on horses, it ends up striping Putin while riding a horse)
There's also still a gap between paired data and Cyclegan. They still perform better - but we can implement a tiny bit of supervision to make it less expensive and boost results. 
Cool Images:
﻿
﻿
﻿
﻿
Paper Summary: "Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks"

Comments (loading...)

ML Paper Collection