Postulate is the best way to take and share notes for classes, research, and other learning.
Paper Summary: "Pyramid methods in image processing"
TL;DR - Take an image, bandpass it + downsample it. Do it again and again with the images. Depending on how you bandpass it, it will be either a Gaussian or a Laplacian pyramid. This pyramid is only 33% bigger than the original image, yet is extremely powerful in a multitude of situations
Importance - Many concepts in this paper are later seen in CNN's and future work. Plus it's interesting to see how if you change up the representation of the data, you're able to achieve much better results
How we represent image information = important to the solution. One structure which we represent them = pyramid. Pyramid because each level = a different scale of bandpassing.
This paper = talking about those pyramid types + showing the cool things that it can do.
(Note this paper from back in 1984, 36 years ago)
Image processing = in image enhancement, digital images, data compression, and machine vision. These areas have massive improvements to make.
How we represent the data = critical to the algorithm performance. We can represent it like a matrix of pixel intensities, Fourier Transform, or transform representations. Each is good at their own task and bad at others.
One cool way of representing the data is to be able to retain the spatial information of an image. We can do this by breaking it down into specific bandpass filters to create several images. Sections on those images = localized data, while whole images = levels of detail in the images.
We also have to observe images at different scales. Objects of different sizes and distances. So we should also be able to recognize many different scales at the same time - a lot of work in the area has produced a pyramid! Pyramids a great because they're versatile, conviction and efficient - so this paper goes about to apply pyramids to applications.
In order to get features from an image, and when we're using convolution to find that feature, we can do it in two ways:
For some context: If we were looking for that car, if we convolve it (aka just apply the convolution) over the image, the region in which it lights up the most will be where the target matches exactly with the part on the image (thus we've found the car in the image)
We can either take the target image and expand it, then take all the expanded targets and convolves it with the main one. Or we can take the image and shrink that guy down and convolve it with the target.
Turns out the second way of doing is way faster (by the 4th power faster. And the image pyramid does exactly that. We just take the image and shrink it down to different resolutions so it's more efficient.
Bottom of the image, G_0, is equal to the size of the original image. Then we do a low pass filter + subsample to get the smaller image (lowpass filter to make it better equality, so it's not super pixelated). And then we do that over and over again until we have a range of images of decreasing size - the pyramid.
Some variables for the formula below:
G = the layer
i, j = the rows and columns
m,n = part of the window filter part
w = the lowpass filter part
Layer (given the row and column) = We iterate through m and n --> which goes into w(m,n) a window function with Gaussian blur. This blurs the segment of the image (2i+m, 2j+n)
Or we can just called that equation the REDUCE operation: G_l = REDUCE[G_{l-1}].
We can also view the pyramid like convolving it with some Gaussian lowpass weight functions (image the image like a wave, and then apply these weighing functions over it). But the waves double in size at each increase of the layers.
Hence we call it the Gaussian pyramid
Instead of lowpass filters, we can also do some bandpass filters. We can obtain them by subtracting layers from each other, but we still need to interpolate (aka expand it a bit, which is the opposite of reduce)... There's a formula for that, where k is the number of expansions we need to make.
And we can define the whole bandpass pyramid in terms of lowpass pyramids. L = the bandpass pyramid's layers:
This is like when we subtract 2 gaussian's with each other, resembling a Laplacian operations - thus it's called a Laplacian Pyramid.
The cool property of Laplacian pyramids is that we can recover the original image.
Pyramids are great at scaling images for analysis, but we can use it for other domains like data compression, graphics etc. It's actually better than other techniques because it's less computationally expensive + much simpler.
The pyramid = a code or transformation. We would want to transform our original image into this pyramid because:
We can transform our image into components that would be better analyzed {connections with CNN! That's what the kernels are doing!} (they do this by being able to highlight edges, but they're doing it on a local area {similar to CNN's}).
Also it compacts it such that we can store it better (we can take an image and squeeze it, but the undo the compacting it to retrieve the original image (with some degradation though if we're doing quantization, but if we do it right then quantization will have 0 effects)).
We can do pattern matching really easily. We've seen that up there! But the cool part is that the final pyramid's size is only 33% larger than the original image. Thus searching for the pattern = 33% more expensive than only searching the original image.
But the complexity of the image = not that good. If we have something that's simple, it can be picked up at higher (smaller) levels. But if it's complex we need higher matching, thus more in between layers are required (thus more computation power needed)
Another thing we can do is detect properties that are integrated within the image itself, like texture. We convolve a pattern with an image, then a non-linear function comes in. Finally, we use windows to extract properties of the area. Using pyramids, we can do this efficiently, since we can just do it at all levels.
The final application is doing a course find search. So if you have a big and complex pattern in the image, instead of convolving it with high-resolution images, you can just: take the image and make it a low resolution, take the kernel and also make it a low resolution - they should be approximately the same, thus we can find the position quickly.
If we want to be more detailed then we make both the kernel and the image higher resolution to find it. This saves a ton of computation, especially when the size and orientation of the target aren't known.
One technique to remove noise + sharpen = you take the image, break it up into the Laplacian pyramid. Then you do a cornering function. low values = set to 0, and larger values are kept. Then we just sum up all the levels and we end up with a nice image!
We can also extend the field of depth! Take 2 images at 2 depths. The low-frequency parts of the image should be the same since we're looking at the same object. But the higher frequency details will be different since we're at 2 different focuses.
This means that images near the base of the pyramid = similar, but near the top, where the encoding of the image is, it's really different. Different parts of the image while having those higher frequencies + amplitudes, thus we can just select the parts that have those higher frequencies, thus we can select the in-focus parts of the image.
We can also merge 2 images together without it being too jarring. So it's smooth merging! We used to do this by doing averaging the border over a certain area, but we have to tune the area correctly. Too narrow and it jarring. Too wide and it gets some weird side effects.
Instead, we can just take the image and break it up by the Laplacian pyramid. Then we do the averaging the border, but this time it'll actually be nice since the wavelengths are also broken up. We just sum up the image to get the mosaic.
Pyramids = a super cool way to represent images. It's efficient to compute, faster than other techniques. There are a ton of things which we can do it with!
Special thank you to Saurabh Kumar for recommending the paper to me!
A Collection of Summaries of my favourite ML papers!