Loading...

Postulate is the best way to take and share notes for classes, research, and other learning.

More info

Paper Summary of "PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation"

Profile picture of Dickson WuDickson Wu
May 23, 2021Last updated May 27, 20216 min read

Paper Summary of "PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation" Note: This was one of the harder papers to read.

Abstract

  • They can take in the points in point clouds

  • Efficient + effective

  • Used for Object classification, part segmentation, scene semantic parsing

Introduction:

  • Point data = super important data format. But existing NN's can't take them in, it needs to be regular. So we transform the dots into another form to pass it in

  • The new data becomes voluminous + there are artifacts

  • Instead we just deal with point clouds. These guys are simple and unified structures - less complex than meshes. Thus it's easier to learn from

  • Of course the input data is going to change (not always the same size) thus we gotta symmetrize it

  • The architechture of the PointNet can do lots of different tasks. Input is the same, output is variable

  • Overall structure:

    • Input data is just processed one by one, we just take in the coordinates of the point

    • The key of PointNet is the use of Max Pooling. The network is forced to find relavent information + encode the reason why it selected them

    • Linear layers take the learnt optimal values and then output the desired output

  • Since we apply the transformations the points independently of each other, we can add a "data-dependent spatial transformer network" to extract more useful information from the point

  • PointNet can approximate any set function that is continuous (aka we can transform the data anyway we want)

  • It can also summarize the network to it's essential points

  • It's also robust to small perturbations in the input data + good at dealing with outliers and missing data

  • For benchmarks, it's faster and on-par if not better than state of the art techniques

Related Work:

  • Point Cloud Features: Basically some random features that people made up to help extract features from points. Classified as intrinsic or extrinsic. Or local or global

  • Deep Learning on 3D Data: Volumetric CNNs = pioneers, but it's constrained to resolution -> Data too sparse + cost too much. FPNN + Vote3D -> Still sparse. Multiview CNNs are good at shape classification + retrieval tasks, but not point classification + shape completion. Spetral CNNs = on meshes, but constrained by being very geometric, can't be shapes like a couch or a cow. Feature based DNNs = take 3D data into vectors - constrained by representation power

  • Deep learning on unordered sets: The point sets are without order (no order of placing them down) so they're unordered sets. Not much ML is in this space. Some work on NLP, but not geometry

Problem Statement:

  • So we need our algorithm to take in unordered set as inputs

  • The set = {P_i | 1, ..., n} P_i = a vector with the coordinate

  • Image segementation: We take in the input point cloud (a segemented part) and then we output k scores (for k classes)

  • Semantic Segmentation: Input = a region of the whole scene. Model will output n x m scores. n = points, m = semantic sub-categories

Deep Learning on Point Sets:

  • The architecture = inspired by the properties of R^n space

  • Properties of R^n space:

    • Unordered: Like a dictionary in python. No specific order

    • Interaction among points. Points close to each other can be combined together to form a coherent structure - we must capture this structure

    • Invariance under transformation: Basically if we transform the data, they'll still be relative to each other

  • The PointNet Architecture:

    • 

    • n inputs come in. We transform feature transform them. Agregate them together through max pooling.

    • Output = classification of k classes

    • Segmentation = an extension to the classification net. It just concatentes the global and local features.

    • The MLP = just a neural network

  • There are 3 main modules;

    • Max pooling layer - symmetric function to combine information

    • Local + Global information concatination

    • The thing that extracts features from input space

  • Symmetric function = Same outputs no matter what order the inputs come into. This is achieved through:

    • Sorting it (which is hard for higher dimensions). We use some strats to get there, but it's not stable. Meaning we need some extra stuff

    • Use an RNN. Sometimes the order will be jumbled up, but the hope is it to become tolerant since they're robust.

    • Use a symmetric function to aggregate the information. Looks like this:

    • 

    • g(x) = the max pooling. h(x) = the neural network after it

  • Local and Global Information Aggregation:

    • The output of our symmetric function = big vector

    • We need to turn this into local and global knowledge

    • Global = we just use a SVM o a multi-layer perceptron to do it

    • Concatenate it with the input features, then we can extract the local features from there

    • After that we can use both local and global geometries to predict the image.

  • Joint Alignment Network

    • We just anchor all our points to another point.

    • It's just a mini-network that does this

    • We also add in another network that aligns it, but it's hard to optimize so we add a regularization term

  • They go ahead proving how it's doing the universal approximation stuff [skimmed some boring math]

  • Also prove that their model is very robust [skimmed some boring math] -> Does this by summarizing the shape down to key points. So outliers or missing data aint' going to affect it

Experiment

  • 3D Object classifcation. On CAD models they were at the state of the art

  • 3D Object part segmentation. State of the art for almost every category

  • Semantic Segmentation in Scenes: State of the art for everything except the sofa

  • [skimmed some stuff] Testing their thing out on some metrics: Effectiveness of certain stuff, robustness

  • [skimmed some stuff] Talked about how they can visualize the PointNet outputs (and critical points)

  • [skimmed some stuff] Talked about Time and Space Complexity

Conclusion

  • Overall this work is transformative in the field of 3D objects. Instead of transforming point clouds to other, inferior, data forms, these guys can just gobble them up directly and produce at state of the art, and sometimes past state of the art results in 3 domains.


Comments (loading...)

Sign in to comment

ML Paper Collection

A Collection of Summaries of my favourite ML papers!