赞
踩
Digital image and video compression is now essential. Internet teleconferencing, High Definition Television (HDTV), satellite communications and digital storage of movies would not be feasible without a high degree of compression. As it is, such applications are far from realizing their full potential largely due to the limitations of common image compression techniques. The limitations are inherent in the information theory on which they are based, that published by Claude Shannon in 1948. The Shannon theory has lead modern communications into a theoretical trap from which it is difficult to escape.
The Shannon theory defines “information” merely as binary digits (bits). Data content is irrelevant. The bit rate in television is therefore determined entirely by the system's hardware parameters, such as image size, resolution and scanning rates. The images shown on the screen are irrelevant such that a random noise image requires the same bit rate as a blank image.
Improving television with larger screens and better resolution requires a huge increase in transmission bit rates. The bit rates are ,however, limited by the available broadcast spectrum or network connection. The only recourse is lossy image compression, most commonly JPEG, MPEG-2, Wavelets or Fractals. “Lossy” by name and lossy by nature. The more the image is compressed, using lossy methods, the worse the image quality.
Autosophy information theory may offer an escape from the trap. Autosophy re-defines “information” as depending only on image content and motion. Hardware parameters such as image size, resolution and scanning rates become virtually irrelevant. Static images produce very low bit rates, while fast action sequences require higher bit rates.
Autosophy transmission is ideally suited to the new packet switching networks, such as Internet TCP/IP, ATM and the future Information Superhighway. Autosophy’s built-in encryption capabilities can also ensure the security of communications even via public networks. Self-learning multimedia databases and robot vision systems are areas of further potential.
Bit rates in digital television: Conventional vs. Autosophy
Bit rates and communication protocols in conventional digital television are determined entirely by system hardware, such as image size, resolution and scanning rates. Images are formed by “pixels” in ordered rows and columns where each pixel must be constantly re-scanned and re-transmitted. According to the CCIR-601 industry standard digital television comparable to analog NTSC television would contain 720 columns by 486 lines. Each pixel is represented by 2 bytes (5 bits per color = 32 brightness shades) which are scanned at 29.97 frames per second. That would require a bit rate of about 168 Mb/s or about 21 Mega bytes per second. A normal CD-ROM can store only about 30 seconds of such television. The bit rate will not be affected no matter what images are shown on the screen. Because the bit rate is constant transmission is best on fixed bandwidth channels, such as the 6.75 MHz analog channel in commercial NTSC television. Transmitting the images via packet switching networks faces severe difficulties including a need for huge compression ratios. But, the more the images are compressed the worse the image quality.
Every increase in screen size, resolution or frame rates makes the problem worse and requires ever-greater compression ratios. Hence the rather poor image quality of so-called High Definition Television (HDTV) and especially of Internet teleconferencing and streaming video.
Required compression ratios for package television via commercial channels
NTSC TV | HDTV | Film quality | ||
Channel | Bit rate | 168 Mb/s | 933 Mb/s | 2300 Mb/s |
PC local LAN | 30 kb/s | 5,600:1 | 31,000:1 | 76,000:1 |
Modems | 56 kb/s | 3,000:1 | 17,000:1 | 41,000:1 |
ISDN | 64 - 144 kb/s | 1,166:1 | 6,400:1 | 16,000:1 |
T-1, DSL | 1.5 Mb/s | 112:1 | 622:1 | 1,500:1 |
Ethernet | 10 Mb/s | 17:1 | 93:1 | 230:1 |
T-3 | 42 Mb/s | 4:1 | 22:1 | 54:1 |
Fiber optic | 200 Mb/s | 1:1 | 5:1 | 11:1 |
The table above shows some discouraging facts about the transmission of digital images via commercial networks. With the exception of costly fiber optic cables, each channel requires enormous compression ratios which cannot be achieved with acceptable image quality using conventional compression methods. Internet video is currently of very poor quality with extremely jerky motion. Even the HDTV standard approved by the Federal Communications Commission (FCC) produces rather blurred images with a jumpy flickering motion that is almost dizzying.
In Autosophy television, in contrast, the bit rate is dependent only on motion within the images. Screen size, resolution and scanning rates are virtually irrelevant. Motion is defined in increments of 1024 pixels/sec (kp/s) which, in normal television, is approximately one square inch of changed screen per second. Change may be distributed throughout any part or portion of the screen image. Motion is usually generated by large moving objects which generate change in both their leading and trailing edges. High motion values are also generated by severe panning of the camera. The human eye can perceive very fine color resolution but only within static images. Rapid motion reduces fine detail perception. The human eye can either perceive fine color resolution or rapid movement but not both at the same time. The “true information bandwidth” of the human eye can be defined by the Autosophy information theory.
Autosophy television channels for various averaged motions within the images
Average motion | very slow | slow | normal | fast | |
2 kp/s | 4 kp/s | 8 kp/s | 16 kp/s | ||
Channel | Bit rate | 12 kb/s | 24 kb/s | 48 kb/s | 96 kb/s |
PC local LAN | 30 kb/s | 2.5 | 1 | ||
Modems | 56 kb/s | 4 | 2 | 1 | |
ISDN | 64 -144 kb/s | 12 | 6 | 3 | 1 |
T-1, DSL | 1.5 Mb/s | 125 | 62 | 31 | 15 |
Ethernet | 10 Mb/s | 833 | 416 | 208 | 104 |
T-3 | 42 Mb/s | 3500 | 1750 | 875 | 437 |
Fiber optic | 200 Mb/s | 16,666 | 8,333 | 4,166 | 2,083 |
Assuming a very large television screen with 2048 by 2048 (2k by 2k) pixels and 7 bit resolution per color, each kp/s (1024 pixels change per second) would generate a bit rate of about 6 kb/s (6000 bits per second). Motion within the images is usually not continuous. Periods of slow motion are interspersed with periods of rapid motion. The figures above are for an average motion integrated over time. They would allow a PC to PC teleconferencing session via a normal PC local LAN, but only with slow motion within the images. T-1, Ethernet and fiber optic connections could carry hundreds or thousands of simultaneous teleconferencing sessions. The bit rate for each television transmission may be expressly limited by motion feed-back, as explained later. The same methods apply to improvements in memory storage capacities, allowing the storage of a full length motion picture on a credit card sized CAROM module.
In addition to orders of magnitude image compression Autosophy methods have other important advantages.
The transmission of conventional analog television is best accomplished using fixed bandwidth channels such as the 6.75 MHz NTSC channels. Transmitting such television via the new packet switching networks (such as ATM or Internet TCP/IP) is very difficult and requires a rigidly defined Quality of Service (QoS). Autosophy television in contrast is ideal for the new packet switching networks because slow moving images will produce a slow packet rate, while rapidly moving images increase the packet rate. The network can then be shared by many users each producing packet bursts only when motion occurs in their images. Autosophy television is also much less sensitive to transmission errors or packets being dropped in a congested network.
Because the bit rates in conventional television are determined by the hardware, each advancement in technology towards larger screens and better cameras requires a new transmission standard which may not be compatible with previous standards. For Autosophy television, in contrast, a hardware independent communication protocol could be developed. This would allow evolution towards larger and better screens without any change to communication protocols. Television cameras and monitors could have different screen sizes, resolutions and scanning rates; yet communicate in a universal protocol which would always remain backwards compatible.
Autosophy television's built-in encryption option allows secure teleconferencing via the Internet or satellite without any possibility of unauthorized interception. It would largely solve the security problems associated with the Internet today.
Cosine transform compression in the JPEG, MPEG-2 standards
Cosine transforms are used in JPEG compression for still images, MPEG-2 compression for moving video, and the FCC-standard for HDTV. All use variations on the basic methods, explained below.
The basic idea was conceived by a French mathematician, for whom it is named the Fourier Transforms. Fourier discovered that any repeating signal, such as vibrations or sound waves, can be converted from samples into a set of frequency values. Each higher frequency is a whole multiple of the base frequency. The Fourier transform function is implemented, for example, in test instruments and oscilloscopes to analyze vibrations and noisy transmission signals.
The cosine transform uses a similar method to convert an image pixel pattern into a set of spatial frequency values. Instead of changing brightness in time, spatial transforms change brightness within an image area. The frequency values can be imagined like image brightness waves that change from light to dark in sine wave fashion. Low frequency values change brightness slowly, while high frequency values change more rapidly. Low frequency values are found in flat, slowly-changing image backgrounds. Higher frequency values add sharp edges and crispness to the images. In short, the theory predicts that any pattern of pixel brightness samples in a television image can be converted into a pattern of spatial frequency values. The frequency values can later be used in a “reverse transform” to reproduce the original pixel brightness samples.
The input image is first cut into 8 by 8 pixel tiles where each color (red-green-blue) is represented by a separate tile. Each tile is then sequentially processed by a computer using the cosine transform algorithm.
Using a very complex algorithm the 64 pixel tile is then converted into 64 frequency values. This transformation requires enormous computing power. Converting HDTV images in real time is very difficult with today's hardware. A DC value represents the overall background brightness of the tile. Because each tile is processed separately from neighboring tiles, checkerboard-pattern image distortion may appear in compressed images if the DC value is computed even with slight errors. The transform is theoretically lossless and as such should not distort the images. However, in real electronic systems, computation is flawed and will produce only approximate values. Starting from the DC value the other 63 frequency values are scanned out in a zigzag pattern, starting from the lowest frequency values and proceeding to higher and higher frequency values.
Up to this point there is no image compression. The 64 frequency values, in fact, require many more bits than the original 64 pixel samples. Likewise, distortions arise only from flaws in the computation process. But the image compression process will now selectively remove information that is deemed to be least important to the human eye. A quantization threshold value is applied depending on the desired compression ratios. All samples smaller than the threshold are cleared out to zero. This means that the higher the threshold the higher the compression, but also the lower the image quality. Erasing smaller frequency values removes detail resolution in the images and also introduces image artifacts. The result is an image tile that is only an approximation of the image tile originally seen by the camera. Introduced image artifacts include light or dark lines like cracked paint in old paintings. Such distortions and artifacts are obviously not acceptable for scientific or medical imaging.
The 64 processed frequency values are encoded for transmission. A Run-Length encoding scheme simply counts the number of zero values in a string of zeroes and represents it as a single number. More information on the Huffman coding scheme can be found in the data compression tutorial. Basically, it compresses the data by assigning codes with fewer bits to the most often encountered output patterns. The final output is a code for each tile containing a variable numbers of bits.
Because of run-length and Huffman coding such transmissions are highly sensitive to error propagation. Even a single bit error in the transmission can cause the image to break up into random noise until an error recovery code is detected. This produces very disturbing visual effects in noisy transmissions.
The receiver reconstructs the images in reverse of the above process.
Wavelet image compression
Wavelet compression uses bandpass filters to separate an image into images with low or high spatial frequencies. Low frequency images are those in which brightness change is gradual, for example, flat or rounded background areas. Such images appear soft and blurry. Higher frequency band images are crisp and sharp edged. Adding the frequency band images back together should reconstruct the original input image; perfectly if the processing is perfect.
A pixel data stream from an input image is divided into several sub-bands by a tree of bandpass filters. Each filter allows only a specific band of frequencies to pass. The filters may be analog or digital, but since neither kind is perfect some image distortion can be expected even at this stage.
The process takes several steps backward before taking a step forward. We began with a single input image and now have several images, each of which requires a full measure of bits. However, since low frequency images change brightness more slowly they can be sampled at a slower rate. The sampling rate is so adjusted that the highest frequency image takes half of all samples, while each lower frequency band is sampled at a progressively halved speed. The lowest frequency image is sampled at the lowest rate. In the end, the sum of the samplings from all the frequency bands is exactly the same as the single sampling of the original input image. No image compression has yet been realized. Even so, some distortion will have crept in due to imperfections in the sampling process.
Lossy image compression is applied using a quantization threshold. Samples below the threshold are cleared to zero. The higher the threshold, the more samples cleared and the higher the compression ratio. Equally, though, the more samples cleared, the greater the image distortion and the lower the image quality. Output images are therefore only approximations of the images seen by the camera.
Output samples are further processed using Run Length coding (replacing a string of zeroes with a single number) and Huffman coding (assigning shorter bit codes to more frequent patterns). The output codes are then combined in the output data stream. Such transmissions are subject to error propagation. Even a single bit error can cause the image to break up into random noise.
Wavelet compression is lossy. It will always compromise image quality to some extent. The more images are compressed, the worse the image quality. Commercially useful compression ratios can only be achieved with significant distortion. The pattern of distortion will, of course, differ from the “checkerboard” pattern arising from the cosine transforms. But whether the overall image quality is better or worse depends on the application and individual judgment. Certainly both methods require enormous computing resources and can generally only achieve low levels of compression with acceptable image quality. Higher levels of compression come with progressively greater distortion.
Fractal image compression and forging techniques
Forging techniques, such as fractal compression, generate images that look approximately like the originals. The human eye can be fooled to disregard the differences in some cases. Typical demonstrations are of flat images or else chaotic ones such as bird plumage.
An example can be found in graphics software packages, such as MS PowerPoint. These generate large geometric shapes from simple vector equations. The geometric objects are then filled with a pattern or color. Images generated in this way can be reduced or enlarged without changing shapes, filling patterns, or color densities. Such images are said to be resolution and size invariant in that a large image contains the same information as a small image.
A closer simulation of reality is attempted with Mandelbrot fractal equations. Simple shapes are combined to form larger and larger shapes. The larger shapes are identical to the smaller ones. Higher magnification will reveal only smaller and smaller shapes that are identical to the original shape. A good example is a mountain landscape of peaks and valleys. Higher magnification reveals smaller and smaller peaks and valleys which look like the original landscape.
For fractal image compression a “reverse Mandelbrot” procedure is used. It matches an image tile to a Mandelbrot equation that approximately simulates its pattern. The equation can then be transmitted and will produce an output tile that looks approximately like the input tile. Higher and higher magnification would reveal smaller and smaller shapes identical to the larger ones.
Such pattern substitution requires enormous computation and is very difficult to achieve in real time. Anyway, the bottom line is that the output images are mere approximations of the input images. Such forgeries can fool some of the people some of the time and may be suitable for games and other entertainment purposes. But they can have no place in scientific or medical imaging.
Conclusion
Hardware-defined television systems require excessive bit rates which can not be accommodated by packet switching networks. Mathematical procedures, no matter how complex, can not circumvent the basic truth that any attempt at image compression must be paid for with deterioration of image quality. The Federal Communications Commission (FCC) tried to repeal that basic law when it imposed its High Definition Television standard. It remains to be seen whether the resulting quality and price will be acceptable to consumers, whose standards are rather higher than those of bureaucrats.
Lossless Autosophy still image compression
Autosophy image compression is different. It uses an approach based on Autosophy information theory in which bit rate is determined not by hardware factors but by image content. Essentially, simple images are highly compressible, complex images less compressible and random noise images not compressible at all. Being radically based on image content, Autosophy compression is entirely lossless. Images are not distorted.
The degree of compression is also influenced by the quality of the camera. Cheap cameras and noisy images are not as suitable as higher quality cameras with low-noise output. Noise level may be reduced by filters as long as care is taken not to remove any essential information.
The system above is suitable for transmitting high resolution still images of any size or format via the Internet. The resolution is 7 bits per color, 128 shades for each color or better than 1 % accuracy in reproduction. That is the limit of human perception and the maximum resolution of commercial color monitors and printers. The output is in common 8 or 16 bit codes, easy to implement in storage and transmission. Compression ratios depend on complexity and noise in the images. Even random images would not produce any data expansion, while average lossless compression ratios would be about 5:1. Errors in the transmission may produce error propagation, but that should not be a problem in modern Internet communications because data packages contain error checking codes. Packages with errors are automatically re-transmitted until valid data is received.
First the image is divided into 5 by 5 pixel tiles. Each tile has a different center pixel address computed for each image format. Each 5 by 5 pixel tile is converted into a 25 pixel string by spiral scanning from each tile's center pixel address. A hyperspace library contains up to 30k by 22 bit nodes, each consisting of a 7 bit pixel brightness value (GATE) and a 15 bit POINTER. The library can contain many thousands of the most common image patterns stored in a saturating hyperspace mode. (The serial tree library is further explained in the data compression tutorial.) Each of the first 128 library locations contains a GATE equal to the 7 least significant address bits and a POINTER of all zero bits. The last 2k addresses are reserved for special communications codes (such as error checking or image format codes) embedded in the output data stream.
STILL IMAGE ENCODING ALGORITHM
Autosophy live video compression
Bit rates are dramatically reduced in Autosophy television because they come to depend not on system hardware, but on the motion and complexity of images shown on the screen.. Autosophy video compression is especially suited to packet switching networks such as Internet TCP/IP or ATM.
According to Autosophy information theory, a communication need only transmit that which is not already known to the receiver. Everything already known is redundant and need not be constantly re-transmitted. Autosophy television therefore requires an Image Buffer in both the transmitter and receiver, which contains the entire current image. Images scanned from the television camera are compared with the current image in the Image Buffer to locate pixels that have changed brightness. The screen addresses of the changed pixels are accumulated in a Change Buffer. The new pixel brightness from the camera replaces the previous pixel brightness in the Image Buffer. Every pixel that has not changed is ignored. The changed pixel addresses in the Change Buffer are then combined, using a hyperspace library, into “superpixel” or cluster codes for transmission. The superpixel codes are used by the receiver to selectively update small clusters of pixels in its own Image Buffer. The image in the Image Buffer is periodically scanned to the output monitor.
Superpixel or cluster codes are transmitted only when change or motion occurs in the input images. If the images change slowly then only a few superpixel codes are transmitted. Fast-moving action sequences generate many more superpixel transmissions. Random noise images generate excessive transmissions unless motion feedback (as explained later) is used.
Assuming an HDTV-like image with up to 2k by 2k pixels and 7 bits per color resolution, then each superpixel or cluster code would contain 70 bits. Each superpixel code may describe change of between 2 and 25 pixels in each of the three colors (red-green-blue). Image scanning rates are irrelevant.
Autosophy information theory shows that the human eye has a limited “true information bandwidth.” It can perceive very fine color resolution only in static images; rapid movement reduces color sensitivity. In other words, it can perceive fine color resolution or rapid motion but not both at the same time. This can be exploited in a motion feedback circuit. The Change Buffer contains the number of pixels that have changed brightness in previous frames and is therefore a measure of motion. The number of changed pixel addresses in the buffer is used as feedback to the pixel brightness comparator. The brightness comparator applies a discrimination threshold. Any pixel brightness change below the threshold is ignored.
The more motion in the images the higher the discrimination threshold. For slow moving images the threshold is very low, filtering out only random noise from the camera. More rapid motion dynamically increases the threshold. Totally random images are also cut to an acceptable bit rate. The package switching rate is thereby limited even for very high motion or random noise images, without causing any distortions visible to the human eye. Only the most rapidly moving objects in the images will have temporarily reduced color motion resolution. Static portions of the images will not be affected and once the extremely rapid motion subsides, normal color motion is fully restored.
BIT RATE ESTIMATES
An Autosophy television system can be built with ordinary hardware. Any large memory chips will do for the Image Buffers and Change Buffers. For real time conversion, however, the transmitter requires a CAM (Content Addressable Memory). Commercially available CAM are acceptable; even better would be the Autosophy-native CAROM. For real-time playback the receiver requires only a normal Read Only Memory, which can be mass produced as a chip containing the hyperspace library. The rest of the hardware consists of run-of-the-mill integrated logic circuits. Note that there is no need for a microprocessor or program storage. The output packet codes can be sent directly to the receiver or stored on CD-ROM or DVD for later playback. A complete television encoder/receiver may eventually be contained in integrated chipsets.
Comparing the features of conventional and Autosophy television.
Image standards and compatibility problems
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。