Math of Comp Vision

by Abhiram Kidambi

Project Motivation

This project has taken me the longest to complete and is the most in-depth and thorough project I probably will ever complete. It is a compilation of various Computer Vision notes from different angles of learning the subject (EE/CE DIP approach to the CS Deep-Learning approach) to better understand the field. The notes are based on some of the computer vision presentations I made while I was in FIRST Robotics and the lectures I gave at conferences like the Battle O’ Baltimore Conference, but they also include various information/content from other avenues. For example, a section of the notes is centered on Fourier Analysis (which extends more broadly than Computer Vision); that section is based on the content of MATH416, a DSP course offered at UMD. To access the repository, go to this link, and to view a PDF of the notes, go here.

Purpose and Content

This project contains 60 pages of notes with around 10 various demos showing various concepts in Computer Vision. Note that all the notes in the note-file are mine and should only be used with my permission and the MATH416 Notes belong to Dr. Wojcich Czaja at UMD.

Technical Content

Here is a brief description of some of the topics covered in the notes:

  • Basic Overview - this consists of important mathematical and photography tools/concepts you should know. Note that although PCA and SVD aren't that important in the notes, understanding convolution and the pinhole projection is of tantamount importance. The pinhole projection section in particular is one of the longest sections in the notes and took me a long time to make (since it's the only section with proper "photos") -- it becomes very important later down the line.
  • DSP/Fourier-Analysis Topics - this section is pretty much useless for those solely interested in computer vision. That being said, this includes a bunch of topics about signals processing and transforms -- it will walk through a lot of the information pertaining the Discrete Fourier Transform, the Gabor Transform, the (Discrete) Wavelet Transform, the Discrete Haar Transform, the Laplace and Z Transforms, along with the ideas of Multiresolutional Analysis. This is a condensation of a lot of the material from MATH416, so do note that much of the notes will be in line with the 416 Notes. Note that there are a few demos from this section in the notes.
  • Image Processing AND Edge Detection - this section includes a lot of old-school computer vision topics that aren't really in use today simply because Deep-Learning has taken over. It includes concepts such as Kernels, Gradient Kernels, Laplacian Kernels, Template Matching/Correlation, Canny Edge Detector, the Hough Transform, and SIFT detectors. These are topics that involve certain mathematical ideals that are omitted for sake of understanding -- more information regarding the math can be found online on certain resources. That being said, you can also find a lot of demos which include information in the DIP/Edge-Detection folder. These are two separate chapters.
  • Image Stitching - this section is why understanding the pinhole projection is important and details the process of taking two images, and performing the homography matrix linear operation in order to "match" the two images to create a single one. It's particularly useful in technologies like Google Street View where we want to stitch photos/videos from different angles to make it appear like a 3d-world. The concept of RANSAC (Random Sample Consensus) is introduced along with the ideas of different linear transformations (affine, projection) and homogeneous coordinates which are used significantly in many computer vision research topics.
  • Facial Detection and Recognition - this is a very broad topic that focuses on the idea of detecting faces, and then finding ways to recognize them (the latter part is very briefly covered because nowadays this is mostly done through Deep-Learning). Facial Detection is particularly useful in cameras for auto-focusing and auto-adjustment so that the photo can be adjusted to make the faces "look best" and that the focus-plane is matching where the most amount of "faces" are located (concepts that extend from the pinhole projection model taught in section 1). The Viola-Jones algorithm is mentioned along with some other facial recognition algorithms, and the broad focus is the idea of Haar-Cascading and the use of Haar-Features in developing a classifier. This can be loosely summarized as taking some of the content earlier from Haar Wavelets/Transforms and finding ways to create classifiers to detect faces and pinpoint them in an image.
  • Deep Learning - this is by far the least dense and explanatory section of the textbook simply because it occurred to me about midway through that most of the content here isn't really something you could "teach" in a typical sense since it all revolves around taking Neural Networks and fine-tuning them. Some of the content derives from a textbook which I read, but a lot of the content comes from my own personal research. That being said, for this particular topic, the best way to get into this field is simply by working with models like YOLO. If you would like to see some of my work with YOLO in the past, you can check out these two projects right here:

Further Notes

To learn more about the notes and what specifically is going on in this set of notes, please visit the Github README and the notes themself along with the various demos. If you have any questions about any of the content or things of the sort, please let me know by contacting me here.


Author: Abhiram Kidambi
Written: 08-07-2024
Tags: PROJECT

Copyright ©2024, All rights reserved | For more information or permissions, please contact Abhiram Kidambi.