当前位置:   article > 正文

Computer Vision and Pattern Recognision Review

Computer Vision and Pattern Recognision Review

CVPR Review

Image Processing

在这里插入图片描述
Find 3D edges.
find 3D edges and planes

在这里插入图片描述
在这里插入图片描述
convolution 将 kernel 中心对称, inverted left-right and up-down
cross-correlation 不用
在这里插入图片描述
convolution can be changed to a matrix multiplication
在这里插入图片描述
在这里插入图片描述

在这里插入图片描述
在这里插入图片描述
IDFT - 2D
在这里插入图片描述
Box filter blur
在这里插入图片描述

在这里插入图片描述

在这里插入图片描述
在这里插入图片描述
近看highpass, 远看lowpass
在这里插入图片描述
在这里插入图片描述

在这里插入图片描述
在这里插入图片描述
Box filters are simple and fast but may result in blocky effects.
Mean filters preserve edges better but can cause blurring.
Gaussian filters are commonly used for smoothing and noise reduction, offering a more natural blur with preserved image details。

  1. Different scales / size of filter? Padding if necessary. Why different scales and how content affect the result?
    Ans: extraction of features at different levels of detail. local features, fine-grained details, such as edges and textures or larger, more global features like shapes and objects.
    padding: the output feature maps have the same spatial dimensions. Without padding, a loss of important information at the borders of the image.

  2. Separability property of a filter / convolution? 2d conv->2*1d conv
    how can be separated? Step by step
    Ans:
    Input image: W*H
    Kernel: K*K
    stride: S*S
    Outputimage: [(W - K) / S + 1 ] * [(H - K) / S + 1]
    Same as steps:
    1. kernel (1,K) and stride (1,S), get (W, [(H - K) / S + 1])
    2. kernel (K,1) and stride (S,1), get same result

  3. What is Fourier Transform? What is the usage? How to calculate in 1D? 2D?
    Why it is important? 1d equation and 2d equation, no calculation
    Ans: decompose a complex signal into its constituent frequencies. frequency domain rather than the time domain.
    The Fourier Transform is important because it allows us to analyze complex signals and understand their frequency content. It helps in filtering out noise, extracting meaningful information, compressing data, and understanding the behavior of signals in different domains. In image processing, the Fourier Transform is used for tasks such as image enhancement, denoising, compression, and pattern recognition.

  4. How to work on a kernel approximating a 1st, 2nd derivative?
    Gradient operators. estimate the local gradient. rate of change of the function at each point. Steps: 1. Choose a suitable kernel: Sobel operator and Prewitt operator. 2. conv the kernal to image or signal. 3. The result of the convolution operation is an approximation of the local gradient. For the first derivative, the result will be a vector showing the gradient in both the x and y directions. For the second derivative, the result will be a scalar representing the magnitude of the second derivative. 4. kernel size and high sampling rate increase accuracy.

  5. Convolution in image domain is equivalent to multiplication in frequency domain. Why? Verify?
    Ans: convolution in the spatial domain corresponds to multiplication in the frequency domain due to the properties of the Fourier transform.

Steerable Pyramid: It is an extension of the Laplacian pyramid that allows for multi-directional decomposition. It uses a set of steerable filters to compute the image representation at each level.

  1. What is histogram match? How? And applications?
    matches a specified reference histogram. redistributing the pixel values of an image to achieve a desired histogram shape. cumulative distribution function. Normalize the CDF. find the closest CDF value in the reference histogram and replace the pixel value with the corresponding intensity value. Image enhancement; Image registration; Color transfer; Image recognition;
  2. Non-maximum suppression? How? Applications?
    object detection algorithms to eliminate multiple overlapping bounding boxes and only keep the one with the highest confidence score, which represents the most probable location and size of the object. 1. Sorting the detections 2. Selecting the highest-scoring detection 3. Calculating overlap (LoU) 4. Removing overlapping detections 5. Iterating through the remaining detections. Apps: Object, Text, Face, Edge detection.

Neural Network

  1. Cross-entropy and its usage?
    Loss functions. Corss entropy is the difference of two probability distribution.
    -\sum(p_i*log(q_i))Use the cross entropy loss function can help the predicted distribution near the real distribution, continuous differentiable and convet.
    在这里插入图片描述

The transformer model utilizes self-attention mechanisms to capture the relationships between different words (or tokens) in a sentence. This attention mechanism allows the model to focus on relevant words while processing the input, enabling it to capture long-range dependencies effectively. Unlike traditional recurrent or convolutional neural networks, transformers do not have sequential or local dependencies, making them highly parallelizable and efficient.

One of the main advantages of transformers in computer vision is their ability to capture global contextual information effectively. They can model interactions between all image regions simultaneously, enabling the model to understand the relationships between objects and their context in a scene.

Transformers at capturing long-range dependencies and modeling complex relationships in visual data. This makes them well-suited for tasks that require understanding the context and semantic relationships between objects in images. Additionally, transformers have shown great potential in tasks such as image captioning, visual question answering, and image generation, where understanding and generating coherent and contextually relevant output is crucial.

Geometry

在这里插入图片描述
camera to image coordinates, use left matrix
在这里插入图片描述
Focal Length as Function of FOV
在这里插入图片描述
在这里插入图片描述
world coordinate to camera coordinates
在这里插入图片描述
在这里插入图片描述
在这里插入图片描述
在这里插入图片描述
在这里插入图片描述
Disparity is used to estimate the depth information of the scene.
Depth map is obtained by using the disparity information. It represents the distance of objects in the scene from the camera. Higher disparity values mean objects are closer to the camera, while lower values indicate objects are farther away.
Auto driving collision detection can utilize the stereo camera setup and depth information. By continuously analyzing the depth map, the system can determine the distance of objects in the scene and detect potential collision risks. Depth-based algorithms and techniques are used to identify obstacles and calculate their proximity to the vehicle, enabling the collision detection system to react and take necessary measures to avoid accidents.
在这里插入图片描述

在这里插入图片描述
在这里插入图片描述
在这里插入图片描述
The Fundamental Matrix is a 3*3 matrix that encodes epipolar geometry. Given a point in one image, multiplying the fundamental matrix will tell us the epipolar line in the second image. Eight-point algorithm.在这里插入图片描述
*[ F_ij] =0
在这里插入图片描述
在这里插入图片描述

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/知新_RL/article/detail/140942
推荐阅读
相关标签
  

闽ICP备14008679号