Popular Searches

download information for Search Optimization  Search Optimization
download information for Search Engine  Search Engine
download information for Adsense  Adsense
download information for RSS  RSS
download information for Blog  Blog
download information for Compression  Compression
download information for Audio  Audio
download information for Video  Video
download information for XML  XML
download information for Screensaver  Screensaver
download information for CSS  CSS
download information for Backup  Backup
download information for Software  Software
download information for Spyware  Spyware



Tags

source coding redirects computer science theory process encoding bearing unencoded through specific schemes example article encoded accept convention



Web Matches



A  B  C  D  E  F  G  H  I  J  K  L  M  N  O  P  Q  R  S  T  U  V  W  X  Y  Z

Search Articles e.g. mp4
 

Data compression
"Source coding" redirects here. For the term in computer programming, see Source code.

In computer science and information theory, data compression or source coding is the process of encoding information using fewer bits (or other information-resembleing units) than an unencoded representation would use through use of specific encoding schemes. For example, this article could be encoded with fewer bits if one were to accept the convention that the word "compression" be encoded as "comp." One popular instance of compression with which many computer users are familiar is the ZIP file format, which, as well as providing compression, acts as an archiver, storing many files in a single output file.

As with any Not communication, compressed data communication only works when both the sender and receiver of the information understand the encoding scheme. For example, this text makes sense only if the receiver understands that it is intended to be interpreted as characters representing the English language. Similarly, compressed data can only be understood if the decoding method is known by the receiver.

Compression is useful because it helps reduce the consumption of expensive resources, such as hard disk space or transmission bandwidth. On the downside, compressed data must be decompressed to be used, and this extra processing may be detrimental to some applications. For instance, a compression scheme for video may require expensive hardware for the video to be decompressed fast enough to be viewed as it's being decompressed (the option of decompressing the video in full before watching it may be inconvenient, and requires storage space for the decompressed video). The design of data compression schemes therefore involves trade-offs among various factors, including the degree of compression, the amount of distortion introduced (if using a lossy compression scheme), and the computational resources required to compress and uncompress the data.

Lossless vs. Lossy Compression

Lossless compression algorithms typically exploit statistical redundancy in such a way as to represent the sender's data more concisely with fewer errors. Lossless compression is possible because most real-world data has statistical redundancy. For example, in English text, the letter 'e' is much more common than the letter 'z', and the probability that the letter 'q' will be proceeded by the letter 'z' is very small.

A further kind of compression, called lossy data compression or perceptual coding, is possible if some loss of fidelity is acceptable. Generally, a lossy data compression will be guided by research on how people perceive the data in question. For example, the human eye is more sensitive to subtle variations in luminance than it is to variations in color. JPEG image compression works in part by "rounding off" some of this less-important information. Lossy data compression provides a way to obtain the best fidelity for a given amount of compression. In some cases, transparent (unnoticeable) compression is desired; in other cases, fidelity is sacrificed to reduce the amount of data as much as possible.

Lossless compression schemes are reversible so that the original data can be reconstructed, while lossy schemes accept some loss of data in order to achieve higher compression.

However, lossless data compression algorithms will always fail to compress some files; indeed, any compression algorithm will necessarily fail to compress any data containing no discernible patterns. Attempts to compress data that has been compressed already will therefore typically result in an expansion, as will attempts to compress encrypted data.

In practice, lossy data compression will also come to a point where compressing again does not work, although an extremely lossy algorithm, which for example always removes the last byte of a file, will always compress a file up to the point where it is empty.

An example of lossless vs. lossy compression is the following string:25.888888888

This string can be compressed as:25.8

Interpreted as, "twenty five point 9 eights", the original string is perfectly recreated, just written in a smaller form. In a lossy system, using26

instead, the original data is lost, at the benefit of a smaller file size.

Applications

The above is a very simple example of run-length encoding, wherein large runs of consecutive indistinguishable data values are replaced by a simple code with the data value and length of the run. This is an example of lossless data compression. It is often used to optimize disk space on office computers, or better use the connection bandwidth in a computer network. For symbolic data such as spreadsheets, text, executable programs, etc., losslessness is essential because changing even a single bit can't be tolerated (except in some limited cases).

For visual and audio data, some loss of quality can be tolerated without losing the essential nature of the data. By taking advantage of the boundaries of the human sensory system, a great deal of space can be saved while producing an output which is nearly indistinguishable from the original. These lossy data compression methods typically offer a three-way tradeoff between compression speed, compressed data size and quality loss.

Lossy image compression is used in digital cameras, greatly increasing their storage capacities while hardly degrading picture quality at all. Similarly, DVDs use the lossy MPEG-2 codec for video compression.

In lossy audio compression, methods of psychoacoustics are used to remove non-audible (or less audible) components of the signal. Compression of human speech is often performed with even more specialized techniques, so that "speech compression" or "voice coding" is sometimes distinguished as a separate discipline than "audio compression". Different audio and speech compression standards are listed under audio codecs. Voice compression is used in Internet telephony for example, while audio compression is used for CD ripping and is decoded by audio players.

Theory

The theoretical background of compression is provided by information theory (which is closely related to algorithmic information theory) and by rate-distortion theory. These fields of study were essentially created by Claude Shannon, who published fundamental papers on the topic in the late 1940s and early 1950s. Cryptography and coding theory are also closely related. The idea of data compression is deeply connected with statistical inference.

Many lossless data compression systems can be viewed in terms of a four-stage model. Lossy data compression systems typically include even more stages, including, for example, prediction, frequency transformation, and quantization.

The Lempel-Ziv (LZ) compression methods are among the most popular algorithms for lossless storage. DEFLATE is a variation on LZ which is optimized for decompression speed and compression ratio, although compression can be slow. DEFLATE is used in PKZIP, gzip and PNG. LZW (Lempel-Ziv-Welch) is used in GIF images. Also noteworthy are the LZR (LZ-Renau) methods, which serve as the basis of the Zip method. LZ methods utilize a table-based compression model where table entries are substituted for repeated strings of data. For most LZ methods, this table is generated dynamically from earlier data in the input. The table itself is often Huffman encoded (e.g. SHRI, LZX). A current LZ-based coding scheme that performs well is LZX, used in Microsoft's CAB format.

The very best compressors use probabilistic models whose predictions are coupled to an algorithm called arithmetic coding. Arithmetic coding, invented by Jorma Rissanen, and turned into a practical method by Witten, Neal, and Cleary, achieves superior compression to the better-known Huffman algorithm, and lends itself particularly well to adaptive data compression tasks where the predictions are strongly context-dependent. Arithmetic coding is used in the bilevel image-compression standard JBIG, and the document-compression standard DjVu. The text entry system, Dasher, is an inverse-arithmetic-coder.

Matt Mahoney, one of the 3 founders of the Hutter Prize, claims that "Compression is Equivalent to General Intelligence" .

Comparative

Independent comparison of different methods of data compression (Results & Softwares, in French. Airelle, 2007). Numbers in parenthesis are the rank of the method of compression for the category of file specified above.

Text files such as HTML (*.htm) and plain text (.txt) files can be hard compressed.

Some files are already compressed (e.g.mp3 or .zip), so the compression rate of such files is poor. Due to the addition of header data, often there are diminishing returns in such compression, causing the file to actually be slightly larger upon storage.

To be more representative of the performance, the global score (/20) is calculated with a non-parametric formula after the sum of the ranks (1 to 20) for each of the 20 tested methods.

Comparison of different methods of data compression
Files

*.avi

*.dll

*.doc

*.exe

*.gif

*.htm

*.jpg

*.mp3

*.mpg

*.pdf

*.txt

*.wav

*.zip

Notation

TOTAL

Numberoffiles

16

26

138

24

246

79

44

29

8

36

8

1

19

674

Initialsize

5,261,152

5,254,220

5,254,656

5,254,056

5,246,209

5,261,187

5,246,116

5,250,432

5,257,720

5,257,876

5,253,436

5,256,024

5,262,680

68,315,764

7z

4,524,067 (2)

1,543,179 (3)

147,690 (3)

3,910,541 (3)

4 620 354 (1)

341,996 (4)

4,770,061 (4)

5,053,813 (2)

4,879,067 (5)

4,258,863 (3)

1,270,884 (3)

3,670,225 (5)

5,226,742 (14)

16/20

44,217,482

arj

4,696,659 (9)

2,160,530 (15)

1,018,050 (17)

4,130,505 (11)

4,702,449 (12)

898,370 (17)

4,803,740 (11)

5,108,093 (17)

4,910,699 (16)

4,606,736 (15)

1,875,329 (16)

4,450,535 (12)

5,223,905 (13)

6.1/20

48,585,600

bh

4,703,291 (12)

2,156,986 (12)

1,010,284 (15)

4,128,594 (9)

4,693,021 (9)

889,650 (15)

4,806,914 (13)

5,105,811 (13)

4,904,209 (11)

4,601,545 (13)

1,848,972 (13)

4,451,648 (15)

5,201,639 (4)

7.5/20

48,502,564

bz2

4,720,926 (18)

2,095,832 (7)

573,721 (5)

4,273,885 (18)

4,896,084 (18)

645,243 (5)

4,743,918 (2)

5,069,593 (4)

4,888,293 (7)

4,444,829 (5)

1,531,448 (6)

3,771,508 (7)

5,238,677 (16)

11.7/20

46,893,957

bza

4,639,340 (6)

2,166,940 (17)

987,806 (11)

4,231,254 (17)

4,878,327 (17)

783,188 (8)

4,787,973 (7)

5,076,189 (5)

4,873,810 (2)

4,618,970 (17)

1,516,326 (5)

3,770,938 (6)

5,227,572 (15)

9.8/20

47,558,633

cab

4,701,113 (11)

2,148,386 (10)

893,796 (7)

4,127,044 (8)

4,678,810 (5)

842,129 (10)

4,798,500 (8)

5,099,787 (8)

4,900,314 (10)

4,584,969 (8)

1,846,233 (12)

4,451,857 (18)

5,201,717 (5)

10.8/20

48,274,655

gza

4,703,371 (13)

2,157,116 (13)

1,001,990 (13)

4,126,436 (7)

4,693,136 (10)

874,444 (12)

4,803,739 (10)

5,105,765 (12)

4,904,249 (12)

4,597,720 (11)

1,840,188 (11)

4,451,638 (14)

5,201,436 (3)

9.2/20

48,461,228

j

4,678,506 (8)

1,914,777 (5)

703,722 (6)

4,057,445 (5)

4,681,437 (6)

691,916 (6)

4,805,059 (12)

5,092,070 (7)

4,898,847 (8)

4,326,394 (4)

1,629,228 (8)

3,594,954 (4)

5,215,150 (12)

13/20

46,289,505

jar

4,704,088 (14)

2,158,273 (14)

1,017,205 (16)

4,129,816 (10)

4,705,456 (13)

893,622 (16)

4,809,136 (16)

5,107,254 (15)

4,904,615 (13)

4,603,367 (14)

1,849,394 (14)

4,451,718 (16)

5,202,611 (8)

6.2/20

48,536,555

lha

4,711,090 (16)

2,215,476 (18)

1,020,194 (18)

4,204,071 (15)

4,830,501 (15)

913,845 (18)

4,918,792 (19)

5,206,933 (19)

5,066,716 (19)

4,802,049 (19)

1,895,771 (17)

4,447,253 (10)

5,263,136 (18)

6.7/20

49,495,827

lzh

4,711,090 (16)

2,215,476 (18)

1,066,340 (19)

4,143,461 (14)

4,819,157 (14)

971,166 (19)

4,816,349 (18)

5,107,584 (16)

4,924,974 (18)

4,635,416 (18)

1,945,961 (19)

4,449,756 (11)

5,212,837 (11)

5.3/20

49,019,567

pkz

4,899,083 (20)

2,354,373 (20)

1,173,097 (20)

4,401,289 (20)

5,120,590 (19)

1,018,250 (20)

5,162,114 (20)

5,253,006 (20)

5,203,747 (20)

5,076,577 (20)

2,084,290 (20)

5,027,854 (20)

5,264,213 (19)

0.2/20

52,038,483

rar

4,634,009 (5)

1,693,150 (4)

173,313 (4)

3,948,241 (4)

4,639,881 (4)

318,269 (3)

4,780,095 (6)

5,081 085 (6)

4,887,973 (6)

4,258,775 (2)

1,318,381 (4)

2,657,731 (3)

5,202,579 (7)

15.5/20

43,593,482

rk

4,589,894 (3)

1,474,339 (2)

132,629 (1)

3,866,814 (1)

4,628,017 (3)

257,588 (1)

4,434,701 (1)

5,017,545 (1)

4,787,286 (1)

4,498,992 (6)

1,168,720 (1)

1,659,771 (1)

5,183,337 (1)

18.2/20

41,699,633

rs

4,625,725 (4)

2,137,145 (9)

937,954 (10)

4,221,864 (16)

4,850,493 (16)

768,711 (7)

4,776,635 (5)

5,066,886 (3)

4,878,852 (3)

4,612,537 (16)

1,560,879 (7)

3,804,335 (8)

5,240,116 (17)

10.7/20

47,482,132

sqx

4,662,560 (7)

2,078,866 (6)

991,992 (12)

4,105,933 (6)

4,699,518 (11)

878,469 (14)

4,808,697 (15)

5,102,452 (10)

4,908,341 (14)

4,590,245 (10)

1,836,245 (9)

4,415,575 (9)

5,208,275 (10)

9.8/20

48,287,168

gz

4,707,481 (15)

2,165,409 (16)

907,006 (8)

4,133,949 (12)

4,684,949 (7)

861,638 (11)

4,807,701 (14)

5,105,913 (14)

4,909,789 (15)

4,588,822 (9)

1,853,650 (15)

4,451,792 (17)

5,202,392 (6)

7.8/20

48,380,491

uha

4,498,275 (1)

1,474,005 (1)

136,880 (2)

3,879,360 (2)

4,625,014 (2)

284,363 (2)

4,760,572 (3)

5,104,837 (11)

4,879,047 (4)

4,237,400 (1)

1,233,812 (2)

2,435,124 (2)

5,187,408 (2)

17.3/20

44,736,097

yz1

4,814,935 (19)

2,128,899 (8)

924,706 (9)

4,279,162 (19)

4,686,669 (8)

804,198 (9)

4,810,966 (17)

5,124,596 (18)

4,922,886 (17)

4,568,274 (7)

1,901,300 (18)

4,561,179 (19)

5,207,874 (9)

6.4/20

48,735,644

zip

4,701,064 (10)

2,155,923 (11)

1,009,814 (14)

4,135,619 (13)

5,270,565 (20)

877,679 (13)

4,799,508 (9)

5,101,205 (9)

4,898,961 (9)

4,599,883 (12)

1,839,080 (10)

4,450,719 (13)

5,264,564 (20)

7.5/20

49,104,584

Intermediatecompressedsize

4,701,089

2,152,155

962,880

4,130,160

4,696,327

851,884

4,803,740

5,103,645

4,902,262

4,593,983

1,839,634

4,448,505

5,210,556

48,519,559

Intermediatecompressionrate

10.6 %

59.0 %

81.7 %

21.4 %

10.5 %

83.8 %

8.4 %

2.8 %

6.8 %

12.6 %

65.0 %

15.4 %

1.0 %

29.0 %




Related Ads



Add to Google Add to Yahoo Add to Furl  Add to del.icio.us  Add to DIGG

Resource: Part or all of the information provided in this section is brought to you via wikipedia and other similar sites. Please repsect their licenses and for more information visit the homepages of these sites.

Links
Freeware Downloads Download Information RGB Hex Converter Web Colors
Home  Link to Us
Copyright © iFreeware Downloads 2005-2012
All rights reserved