Paper Summary of "PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation"

Paper Summary of "PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation"

Note: This was one of the harder papers to read.
Abstract
They can take in the points in point clouds
Efficient + effective
Used for Object classification, part segmentation, scene semantic parsing
Introduction:
Point data = super important data format. But existing NN's can't take them in, it needs to be regular. So we transform the dots into another form to pass it in
The new data becomes voluminous + there are artifacts
Instead we just deal with point clouds. These guys are simple and unified structures - less complex than meshes. Thus it's easier to learn from
Of course the input data is going to change (not always the same size) thus we gotta symmetrize it
The architechture of the PointNet can do lots of different tasks. Input is the same, output is variable
Overall structure:
Input data is just processed one by one, we just take in the coordinates of the point
The key of PointNet is the use of Max Pooling. The network is forced to find relavent information + encode the reason why it selected them
Linear layers take the learnt optimal values and then output the desired output
Since we apply the transformations the points independently of each other, we can add a "data-dependent spatial transformer network" to extract more useful information from the point
PointNet can approximate any set function that is continuous (aka we can transform the data anyway we want)
It can also summarize the network to it's essential points
It's also robust to small perturbations in the input data + good at dealing with outliers and missing data
For benchmarks, it's faster and on-par if not better than state of the art techniques
Related Work:
Point Cloud Features: Basically some random features that people made up to help extract features from points. Classified as intrinsic or extrinsic. Or local or global
Deep Learning on 3D Data: Volumetric CNNs = pioneers, but it's constrained to resolution -> Data too sparse + cost too much. FPNN + Vote3D -> Still sparse. Multiview CNNs are good at shape classification + retrieval tasks, but not point classification + shape completion.  Spetral CNNs = on meshes, but constrained by being very geometric, can't be shapes like a couch or a cow. Feature based DNNs = take 3D data into vectors - constrained by representation power
Deep learning on unordered sets: The point sets are without order (no order of placing them down) so they're unordered sets. Not much ML is in this space. Some work on NLP, but not geometry
Problem Statement:
So we need our algorithm to take in unordered set as inputs
The set = {P_i | 1, ..., n} P_i = a vector with the coordinate
Image segementation: We take in the input point cloud (a segemented part) and then we output k scores (for k classes)
Semantic Segmentation: Input = a region of the whole scene. Model will output n x m scores. n = points, m = semantic sub-categories
Deep Learning on Point Sets:
The architecture = inspired by the properties of R^n space
Properties of R^n space:
Unordered: Like a dictionary in python. No specific order
Interaction among points. Points close to each other can be combined together to form a coherent structure - we must capture this structure
Invariance under transformation: Basically if we transform the data, they'll still be relative to each other
The PointNet Architecture:
﻿
n inputs come in. We transform feature transform them. Agregate them together through max pooling. 
Output = classification of k classes
Segmentation = an extension to the classification net. It just concatentes the global and local features.
The MLP = just a neural network
There are 3 main modules;
Max pooling layer - symmetric function to combine information
Local + Global information concatination
The thing that extracts features from input space
Symmetric function = Same outputs no matter what order the inputs come into. This is achieved through:
Sorting it (which is hard for higher dimensions). We use some strats to get there, but it's not stable. Meaning we need some extra stuff
Use an RNN. Sometimes the order will be jumbled up, but the hope is it to become tolerant since they're robust. 
Use a symmetric function to aggregate the information. Looks like this:
﻿
g(x) = the max pooling. h(x) = the neural network after it
Local and Global Information Aggregation:
The output of our symmetric function = big vector
We need to turn this into local and global knowledge
Global = we just use a SVM o a multi-layer perceptron to do it
Concatenate it with the input features, then we can extract the local features from there
After that we can use both local and global geometries to predict the image. 
Joint Alignment Network
We just anchor all our points to another point. 
It's just a mini-network that does this
We also add in another network that aligns it, but it's hard to optimize so we add a regularization term
They go ahead proving how it's doing the universal approximation stuff [skimmed some boring math]
Also prove that their model is very robust [skimmed some boring math] -> Does this by summarizing the shape down to key points. So outliers or missing data aint' going to affect it
Experiment
3D Object classifcation. On CAD models they were at the state of the art
3D Object part segmentation. State of the art for almost every category
Semantic Segmentation in Scenes: State of the art for everything except the sofa
[skimmed some stuff] Testing their thing out on some metrics: Effectiveness of certain stuff, robustness
[skimmed some stuff] Talked about how they can visualize the PointNet outputs (and critical points)
[skimmed some stuff] Talked about Time and Space Complexity
Conclusion
Overall this work is transformative in the field of 3D objects. Instead of transforming point clouds to other, inferior, data forms, these guys can just gobble them up directly and produce at state of the art, and sometimes past state of the art results in 3 domains. 
Paper Summary of "PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation"

Comments (loading...)

ML Paper Collection