Loading...

Postulate is the best way to take and share notes for classes, research, and other learning.

More info

Paper Summary: "Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks"

Profile picture of Dickson WuDickson Wu
Jun 9, 20214 min read

Paper Summary: "Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks"

Abstract:

  • State of the art object detection = Region Proposal Algorithms. Advances = speeding it up, but region proposal computation = bottleneck

  • This paper = uses Convolutional features to overcome that. They use Region Proposal Network to share the Conv features with the detection network.

  • They even upgrade this further and merge RPN with Fast R-CNN + Attention mechanisms



Introduction:

  • Right now the state of the art = R-CNNs. They were computationally expensive, but ew could speed it up by sharing convolutions across proposals (proposed in Fast R-CNNs)

  • But the bottleneck we have right now are the proposals (aka the thing that finds the regions to scan in the first place)

  • There are several algorithms out there to do it. But they're all slow.

  • We're going to throw that out and just find the proposals using CNNs - more specifically we share the convolutions of the proposal network with the object detection network

  • When they "share the convolutions" I think they're saying: Using the same convolutional layers. But What they do is that they take the outputs and then slap on a few extra layers to determine the regions bounds

  • This method helps with the generalizability (scaled images for example)



Related Work:

  • Object Proposals:

    • Lots of literature on it already.

    • People group super-pixels, or use sliding windows

    • This is treated like an external module of the detector network part

  • Deep Networks for Object Detection:

    • It's just a classifier

    • We just use CNN's and Linear layers to help us do it

    • Sharing computation of convolutions = has been picking up steam in the past little bit



Faster R-CNN:

  • 2 modules:

    • CNN to spit out proposal regions

    • Fast R-CNN detector

  • The attention part comes from the RPN "Telling" the detector where to look

  • 

  • RPN (region Proposal Network):

    • We give it an image, it spits out some rectangular boxes with some "objectness" score

    • We're sharing layers so we're going to have some shared layers + some RPN specific layers

    • The specifc RPN Layers = there's going to be an additional conv layer that scans the image. This outputs to an intermediate layer that goes to 2 models. Box regression + Box classification

    • 

    • So each sliding window (how the conv layer goes over the image) can hit multiple bounding boxes at once. It's parameterized by the scale + the aspect ratio.

    • Now each box will have 4 coordinate outputs (for the box) + 2 score outputs (probability that it is or is not an object)

    • Cool feature of this model = it doesn't care where the objects are. It will still find it (other methods don't do this). This is called Translation invariant property

    • To detect bounding boxes, people usually use different size images (slow), have different size filters (also slow). But these guys just uses anchores of multiple sizes. This way, not extra computaiton since our image + our convolution = all the same size

    • 

    • Loss = how close is our box to the ground truth. Close as in the Intersection over the union of the boxes [plus a ton of math going over it]

    • When training we just take a sub-sample of the anchors to compute the loss (so we don't overwhelm it)

  • Training them together:

    • We want them to share the same layers, so we have several ways of doing this

    • Alternating training:

      • Train RPN First

      • RPN Proposals --> Fast R-CNN

      • This trains the shared convolutions and we continue from there

    • Approximate joint training:

      • We merge them into one network and just train the whole thing like it was one

      • It's easy to implement, but a flaw is that it ignores the coordinates (as a loss)

      • Produces pretty close results, and reduces training time by 25-50%

    • Non-approximate Join training:

      • We fix the flaw in the approximate joint training

      • It's complex and "beyond the scope of this paper"

    • 4 step alternating training:

      • Train the RPN (transferred learned)

      • Use RPN to train the Fast R-CNN (transferred learned)

      • This point the Conv layers aren't trained. So we use the detector to train the RPN. Except we freeze the shared part and only train the RPN head

      • Now we switch and train the detector network

  • [They go over some implementation details and image processing stuff]



Experimentation:

  • [Also just a bunch of tests and stuff like that. Basically it works really well!]



Conclusion:

  • [Just does a TL;DR of the paper]


Comments (loading...)

Sign in to comment

ML Paper Collection

A Collection of Summaries of my favourite ML papers!