赞
踩
Find 3D edges.
convolution 将 kernel 中心对称, inverted left-right and up-down
cross-correlation 不用
convolution can be changed to a matrix multiplication
IDFT - 2D
Box filter blur
近看highpass, 远看lowpass
Box filters are simple and fast but may result in blocky effects.
Mean filters preserve edges better but can cause blurring.
Gaussian filters are commonly used for smoothing and noise reduction, offering a more natural blur with preserved image details。
Different scales / size of filter? Padding if necessary. Why different scales and how content affect the result?
Ans: extraction of features at different levels of detail. local features, fine-grained details, such as edges and textures or larger, more global features like shapes and objects.
padding: the output feature maps have the same spatial dimensions. Without padding, a loss of important information at the borders of the image.
Separability property of a filter / convolution? 2d conv->2*1d conv
how can be separated? Step by step
Ans:
Input image: W*H
Kernel: K*K
stride: S*S
Outputimage: [(W - K) / S + 1 ] * [(H - K) / S + 1]
Same as steps:
1. kernel (1,K) and stride (1,S), get (W, [(H - K) / S + 1])
2. kernel (K,1) and stride (S,1), get same result
What is Fourier Transform? What is the usage? How to calculate in 1D? 2D?
Why it is important? 1d equation and 2d equation, no calculation
Ans: decompose a complex signal into its constituent frequencies. frequency domain rather than the time domain.
The Fourier Transform is important because it allows us to analyze complex signals and understand their frequency content. It helps in filtering out noise, extracting meaningful information, compressing data, and understanding the behavior of signals in different domains. In image processing, the Fourier Transform is used for tasks such as image enhancement, denoising, compression, and pattern recognition.
How to work on a kernel approximating a 1st, 2nd derivative?
Gradient operators. estimate the local gradient. rate of change of the function at each point. Steps: 1. Choose a suitable kernel: Sobel operator and Prewitt operator. 2. conv the kernal to image or signal. 3. The result of the convolution operation is an approximation of the local gradient. For the first derivative, the result will be a vector showing the gradient in both the x and y directions. For the second derivative, the result will be a scalar representing the magnitude of the second derivative. 4. kernel size and high sampling rate increase accuracy.
Convolution in image domain is equivalent to multiplication in frequency domain. Why? Verify?
Ans: convolution in the spatial domain corresponds to multiplication in the frequency domain due to the properties of the Fourier transform.
Steerable Pyramid: It is an extension of the Laplacian pyramid that allows for multi-directional decomposition. It uses a set of steerable filters to compute the image representation at each level.
The transformer model utilizes self-attention mechanisms to capture the relationships between different words (or tokens) in a sentence. This attention mechanism allows the model to focus on relevant words while processing the input, enabling it to capture long-range dependencies effectively. Unlike traditional recurrent or convolutional neural networks, transformers do not have sequential or local dependencies, making them highly parallelizable and efficient.
One of the main advantages of transformers in computer vision is their ability to capture global contextual information effectively. They can model interactions between all image regions simultaneously, enabling the model to understand the relationships between objects and their context in a scene.
Transformers at capturing long-range dependencies and modeling complex relationships in visual data. This makes them well-suited for tasks that require understanding the context and semantic relationships between objects in images. Additionally, transformers have shown great potential in tasks such as image captioning, visual question answering, and image generation, where understanding and generating coherent and contextually relevant output is crucial.
camera to image coordinates, use left matrix
Focal Length as Function of FOV
world coordinate to camera coordinates
Disparity is used to estimate the depth information of the scene.
Depth map is obtained by using the disparity information. It represents the distance of objects in the scene from the camera. Higher disparity values mean objects are closer to the camera, while lower values indicate objects are farther away.
Auto driving collision detection can utilize the stereo camera setup and depth information. By continuously analyzing the depth map, the system can determine the distance of objects in the scene and detect potential collision risks. Depth-based algorithms and techniques are used to identify obstacles and calculate their proximity to the vehicle, enabling the collision detection system to react and take necessary measures to avoid accidents.
The Fundamental Matrix is a 3*3 matrix that encodes epipolar geometry. Given a point in one image, multiplying the fundamental matrix will tell us the epipolar line in the second image. Eight-point algorithm.
*[ F_ij] =0
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。