User:Ryan Cooley/MPEG1: Difference between revisions

From Citizendium
Jump to navigation Jump to search
imported>Rcooley
(10)
imported>Rcooley
(11)
Line 1: Line 1:
MPEG-1 articles (MPEG-1, MP1, MP2, MP3) on wikipedia are complete crap.  Disorganized, slanted, incomplete, misconstrued, etc.
It's far easier to start from scratch than try to fix all the individual existing ones, and will give far better end results; I will use ''some'' small bits of content from the existing articles.
'''Do not make any changes to this page''' for now.  
'''Do not make any changes to this page''' for now.  
This is my mind-dump and accommodating others before I'm done will just make much, much more work for me. Put any suggestions on the Talk page, and I will eventually address them. -RC
This is my mind-dump and accommodating others before I'm done will just make much, much more work for me. Put any suggestions on the Talk page, and I will eventually address them. -RC


'''MPEG-1''' was an early [[standard]] for [[lossy]] compression of [[video]] and [[audio]].  It was designed to compress raw video and CD audio to 1.5Mb/s without discernible quality loss, making [[Video CD]]s and [[Digital Video Broadcasting]] possible.
'''MPEG-1''' was an early [[standard]] for [[lossy]] compression of [[video]] and [[audio]].  It was designed to compress raw video and CD audio to 1.5Mb/s without discernible quality loss, making [[Video CD]]s and [[Digital Video Broadcasting]] possible.
Line 13: Line 9:


== History ==
== History ==
Modeled on the successful collaborative approach and the compression technologies developed by the [[Joint Photographics Expert Group]] and [[CCITT]]'s [[Experts Group on Telephony]] (creators of the [[JPEG]] image compression standard and the [[H.261]] standard for [[video conferencing]] over [[ISDN]] lines respectively) the [[MPEG]] working group was established in January 1988.  MPEG was formed to address the need for [[standard]] video and audio encoding formats, and build on H.261 to get better quality through the use of more complex (non-[[realtime]]) encoding methods. <ref>http://www.cis.temple.edu/~vasilis/Courses/CIS750/Papers/mpeg_6.pdf pp.2</ref>
Modeled on the successful collaborative approach and the compression technologies developed by the [[Joint Photographics Experts Group]] and [[CCITT]]'s [[Experts Group on Telephony]] (creators of the [[JPEG]] image compression standard and the [[H.261]] standard for [[video conferencing]] over [[ISDN]] lines respectively) the [[MPEG]] working group was established in January 1988.  MPEG was formed to address the need for [[standard]] video and audio encoding formats, and build on H.261 to get better quality through the use of more complex (non-[[realtime]]) encoding methods. <ref>http://www.cis.temple.edu/~vasilis/Courses/CIS750/Papers/mpeg_6.pdf pp.2</ref>


Development of the MPEG-1 standard began in [[May 1988]].  14 video and 14 audio codec proposals were submitted by individual companies and institutions for evaluation.  The codecs were extensively tested for computational complexity and subjective (human perceived) quality, at (combined video+audio) data rates of 1.5Mbps.  The codecs that excelled in this testing were utilized as the basis for the standard and refined further, with additional features and other improvements being incorporated. <ref>http://www.chiariglione.org/mpeg/meetings/santa_clara90/santa_clara_press.htm</ref>  
Development of the MPEG-1 standard began in [[May 1988]].  14 video and 14 audio codec proposals were submitted by individual companies and institutions for evaluation.  The codecs were extensively tested for computational complexity and subjective (human perceived) quality, at (combined video+audio) data rates of 1.5Mbps.  The codecs that excelled in this testing were utilized as the basis for the standard and refined further, with additional features and other improvements being incorporated. <ref>http://www.chiariglione.org/mpeg/meetings/santa_clara90/santa_clara_press.htm</ref>  


After 20 meetings of the full group in various cities around the world, and 4 <sup>1</sup>/<sub>2</sub> years of development and testing, (a draft standard was produced September 1990, and only minor changes were introduced) the final standard was approved in early [[November 1992]]. <ref>http://www.chiariglione.org/mpeg/meetings.htm</ref>  Before the MPEG-1 standard had even been finalized, work began on a second standard, [[MPEG-2]], intended to extend MPEG-1 technology to provide full broadcast-quality at high bitrates (3 - 15 [[Mbps]]), and support for [[interlaced]] video. <ref>http://www.chiariglione.org/mpeg/meetings/london/london_press.htm</ref>  Due in part to the similarity between the two codecs, the MPEG-2 standard included full backwards compatibility with MPEG-1 video.
After 20 meetings of the full group in various cities around the world, and 4 <sup>1</sup>/<sub>2</sub> years of development and testing, the final standard was approved in early [[November 1992]]. <ref>http://www.chiariglione.org/mpeg/meetings.htm</ref> The completion date, as commonly reported, for the MPEG-1 standard varies greatly, because a largely complete draft standard was produced in September 1990, and from that point on, only minor changes were introduced. Before the MPEG-1 standard had even been finalized, work began on a second standard, [[MPEG-2]], intended to extend MPEG-1 technology to provide full broadcast-quality video at high bitrates (3 - 15 [[Mbps]]), and support for [[interlaced]] video. <ref>http://www.chiariglione.org/mpeg/meetings/london/london_press.htm</ref>  Due in part to the similarity between the two codecs, the MPEG-2 standard included full backwards compatibility with MPEG-1 video.


Notably, the MPEG-1 standard very strictly defines the [[bitstream]], and decoder function, but does not define how MPEG-1 encoding is to be performed (although they did provide a reference implementation).  This means that MPEG-1 coding efficiency can drastically vary depending on the encoder used, and generally means that newer encoders perform significantly better than their predecessors.
Notably, the MPEG-1 standard very strictly defines the [[bitstream]], and decoder function, but does not define how MPEG-1 encoding is to be performed (although they did provide a reference implementation).  This means that MPEG-1 coding efficiency can drastically vary depending on the encoder used, and generally means that newer encoders perform significantly better than their predecessors.


== Applications ==
== Applications ==
*Today, MPEG-1 is by far the most widely compatible lossy audio/video format in the world.   
*Today, MPEG-1 has become by far the most widely compatible lossy audio/video format in the world.   
*MPEG-1 Video and Layer I/II audio can be implemented without payment of license fees. <ref>http://www.emedialive.com/Articles/ReadArticle.aspx?ArticleID=12165</ref> <ref>http://www.extremetech.com/article2/0,1697,1153916,00.asp</ref> <ref>http://www.snazzizone.com/TP09.html</ref> <ref>http://213.130.34.82/resources/technical/mpegcompared/index.htm</ref> (Due to its age, patents on the technology have expired in most countries.???)  
*MPEG-1 Video and Layer I/II audio can be implemented without payment of license fees. <ref>http://www.emedialive.com/Articles/ReadArticle.aspx?ArticleID=12165</ref> <ref>http://www.extremetech.com/article2/0,1697,1153916,00.asp</ref> <ref>http://www.snazzizone.com/TP09.html</ref> <ref>http://213.130.34.82/resources/technical/mpegcompared/index.htm</ref> (Due to its age, (most?) patents on the technology have expired in most countries.???)  
*Most computer software for video playback includes MPEG-1 decoding, in addition to any other supported formats.
*Most computer software for video playback includes MPEG-1 decoding, in addition to any other supported formats.
*The immense popularity of [[MP3]] audio has established a massive [[installed base]] of hardware that can playback all 3 layers of MPEG-1 audio.   
*The immense popularity of [[MP3]] audio has established a massive [[installed base]] of hardware that can playback (all 3 layers of) MPEG-1 audio.   
*Millions of portable digital audio players (such as [[iPod]]s) can playback MPEG-1 audio.
*Millions of portable [[digital audio]] [[digital audio players|players]] (such as [[iPod]]s) can playback MPEG-1 audio.
*The widespread popularity of MPEG-2 (mostly with broadcasters) means MPEG-1 is playable by most digital cable/satellite set-top-boxes, and digital disc and tape players.
*The widespread popularity of MPEG-2 (mostly with broadcasters) means MPEG-1 is playable by most digital cable/satellite set-top boxes, and digital disc and tape players, due to backwards compatibility.
*MPEG-1 video and audio is the exclusive format used on [[Video CD]] (VCD), the first [[consumer]] digital video format, and still very popular around the world.
*MPEG-1 video and audio is the exclusive format used on [[Video CD]] (VCD), the first [[consumer]] digital video format, also the first disc based digital format, and still a very popular option around the world.
*The [[Super Video CD]] standard, based on VCD uses MPEG-1 audio exclusively, as well as MPEG-2 video.
*The [[Super Video CD]] standard, based on VCD, uses MPEG-1 audio exclusively, as well as MPEG-2 video.
*[[DVD video]] use primarily MPEG-2 video, but MPEG-1 support is explicitly specified in the standard.   
*[[DVD video]] uses MPEG-2 video primarily, but MPEG-1 support is explicitly defined/specified in the standard.   
*[[DVD video]] originally required MPEG-1 Layer II audio for PAL countries, but was changed to allow [[Dolby Digital]] AC-3.  MPEG-1 Layer II audio is still allowed on DVDs, although the MPEG-2 additions, like [[MPEG multichannel]] and [[VBR]], are rarely supported.  Most DVD players also support [[Video CD]] and MP3 CD playback.
*The [[DVD video]] standard originally required MPEG-1 Layer II audio for PAL countries, but was changed to allow AC-3/[[Dolby Digital]]-only discs.  MPEG-1 Layer II audio is still allowed on DVDs, although the MPEG-2 additions, like [[MPEG multichannel]] and [[VBR]], are rarely supported.  Most DVD players also support [[Video CD]] and MP3 CD playback, which use MPEG-1.
*The international [[Digital Video Broadcasting]] (DVB) standard primarily uses MPEG-1 Layer II audio, as well as MPEG-2 video.
*The international [[Digital Video Broadcasting]] (DVB) standard primarily uses MPEG-1 Layer II audio, as well as MPEG-2 video.
*The international [[Digital Audio Broadcasting]] (DAB) standard uses MPEG-1 Layer II audio exclusively due to error resilience and low complexity of decoding.
*The international [[Digital Audio Broadcasting]] (DAB) standard uses MPEG-1 Layer II audio exclusively, due to error resilience and low complexity of decoding.
*MPEG-1 Layer II audio, with [[MPEG multichannel]] extensions, was proposed for use in the North American [[ATSC]] standard but [[Dolby Digital]] (aka. AC-3, A/52) was chosen instead.  This is a matter of significant controversy, as it has been revealed that at least 2 (The [[Massachusetts Institute of Technology]] and [[Zenith]]) of the 4 voting board members received millions of dollars of compensation from [[Dolby Laboratories]] in exchange for their votes. <ref>http://www-tech.mit.edu/V122/N54/54hdtv.54n.html</ref>
*MPEG-1 Layer II audio, with [[MPEG multichannel]] extensions, was proposed for use in the North American [[ATSC]] standard but [[Dolby Digital]] (aka. AC-3, A/52) was chosen instead.  This is a matter of significant controversy, as it has been revealed that the organizations (The [[Massachusetts Institute of Technology]] and [[Zenith]]) behind at least 2 of the 4 voting board members received tens of millions of dollars of compensation from secret deals with [[Dolby Laboratories]] in exchange for their votes, with one of the board members directly receiving several million. <ref>http://www-tech.mit.edu/V122/N54/54hdtv.54n.html</ref>




== Video ==
== Video ==
Part 2 of the MPEG-1 standard covers video and is defined in [[ISO/IEC 11172-2]]   
Part 2 of the MPEG-1 standard covers video and is defined in ISO/[[IEC 11172-2]]   


=== Color Space ===
=== Color Space ===
Before encoding to MPEG-1 the color-space is transformed to Y'CbCr (Y'=Luma, Cb=Chroma Blue, Cr=Chroma Red).  Luma (brightness/resolution) is stored separately from chroma (color) and even further separated into red and blue components.  The chroma is also subsampled to 4:2:0, meaning it is decimated by half vertically and half horizontally, or just one quarter the resolution of the video.  Because the human eye is much less sensitive to small changes in color than in brightness, [[chroma subsampling]] is a very effective way to reduce the amount of video data that needs to be compressed.  On videos with fine/complex details this can manifest as chroma aliasing artifacts.  Compared to other digital [[compression artifact]]s, this issue seems to be very minor/rare source of annoyance.
Before encoding video to MPEG-1 the color-space is transformed to Y'CbCr (Y'=Luma, Cb=Chroma Blue, Cr=Chroma Red).  Luma (brightness/resolution) is stored separately from chroma (color, hue, phase) and even further separated into red and blue components.  The chroma is also subsampled to 4:2:0, meaning it is [[decimated]] by half vertically and half horizontally, to just one quarter the resolution of the video.   


Because of subsampling, Y'CbCr video must always be encoded using even dimensions ([[divisible]] by 2), otherwise chroma "ghosts" will appear in the encoded video, as it will appear the color is ahead of, or behind the rest of the video, much like a shadow.  
Because the human eye is much less sensitive to small changes in color than in brightness, [[chroma subsampling]] is a very effective way to reduce the amount of video data that needs to be compressed.  On videos with fine (complex?) details this can manifest as chroma aliasing artifacts.  Compared to other digital [[compression artifact]]s, this issue seems to be very (minor?) rarely a source of annoyance.
 
Because of subsampling, Y'CbCr video must always be stored (encoded?) using even dimensions ([[divisible]] by 2), otherwise chroma mismatch ("ghosts") will occur, and it will appear the color is ahead of, or behind the rest of the video, much like a shadow. (in the encoded video, ?)


=== Resolution ===
=== Resolution ===
MPEG-1 supports resolutions up to 4095 &times; 4095.   
MPEG-1 supports resolutions up to 4095&times;4095.   


MPEG-1 videos are most commonly found using ([[SIF]]) resolutions: 352x240, 352x288, or 320x240. These low resolutions, combined with a bitrate less than 1.5Mb/s, makes up what is known as a [[constrained parameters bitstream]].  This is the commonly accepted minimum video specifications any decoder should be able to play, to be considered MPEG-1 compliant.  This was selected to provide a good balance between quality and performance, allowing the use of reasonably inexpensive hardware of the time.
MPEG-1 videos are most commonly found using ([[SIF]]) resolutions: 352x240, 352x288, or 320x240. These low resolutions, combined with a bitrate less than 1.5Mb/s, makes up what is known as a [[constrained parameters bitstream]].  This is the commonly accepted minimum video specifications any decoder should be able to play, to be considered MPEG-1 compliant.  This was selected to provide a good balance between quality and performance, allowing the use of reasonably inexpensive hardware of the time.


Bitrate ?
  Max Bitrate ?
 
 
=== I-Frames / Pictures ===
MPEG-1 has several frame and picture types.  The first, most important, yet simplest are '''I-frames'''. 
 
I-frame is a common abbreviation for '''Intra-frame'''.  They may also be known as key-frames due to their somewhat similar function to the [[keyframe]]s used in animation.
 
I-frames can be considered effectively identical to [[JPEG]] images.  I-frames are the only frame type that can be decoded independently of any other frames.  This is important.
 
High-speed seeking through an MPEG-1 video is only possible to the nearest I-frame.  When cutting a video, without computationally intensive re-encoding, it is only possible to start a (segment of?) video from (an?) the first I-frame in the segment.  For this reason, I-frame only MPEG videos are used in editing applications.
 
I-frame only compression is very, very fast, but produces very large file sizes, on the order of 2 - 14 &times; larger than normally encoded MPEG-1 video. <!--mostly mathematical fact; partly original research, verifiable using libavcodec: compare MPEG1 encoding using vqscale=X:keyint=1 to vqscale=X:keyint=15)-->  I-frame only MPEG-1 video is very similar to [[MJPEG]] video, so much so that very high-speed lossless conversion can be made from one format to the other. <ref>http://citeseer.ist.psu.edu/acharya98compressed.html Compressed Domain Transcoding of MPEG ''infers that quantization tables differ, but those are user selectable''</ref>
 
The length between I-frames is known as the [[Group of Pictures]] (GOP) size.  MPEG-1 most commonly uses a GOP size of 15.  ie. 1 I-frame for every 14 non-I-frame (some combination of P-frames and B-frames).  With more intelligent encoders, GOP size is dynamically chosen, up to some pre-selected maximum limit.
 
Limits are placed on the maximum number of frames between I-frames due to encoding complexing, decoder buffer size, seeking ability, and and accumulation of IDCT errors in low-precision implementations common in hardware decoders (chips?).
 
An I-frame can be defined as a frame of video that contains only I-pictures, that is, all the blocks (see ''macro-block'' below) in the frame are independently encoded.
 
=== P-frames ===
 
P-frames are '''predicted''', (or ''forward''-predicted) encoding only the difference in image from frame (I- or P-) immediately preceding it. 
 
The difference between frames is calculated using ''motion vectors'' (see below).  Motion vector data will be embedded in the P-frame for use by the decoder.
 
If a reasonable match is found, the block from the previous frame is used, and the error (or difference between the block and the current frame) is encoded and stored in the P-frame. 
 
If a match from the previous frame for a block cannot be found, the block will be encoded as an I-picture. ie. intra-coded, storing the entire block as an image, in full.
 
If a video drastically changes from one frame to the next (such as a [[scene change]]), it can be more efficient (performance, buffer, seek-ability) to encode it as an I-frame. 
 
A P-frame can contain any number of I-pictures, in addition to P-pictures.


=== B-frames ===
A B-frame is similar to a P-frame, except it can predict from the previous and/or future frame.  This makes B-frames very computationally complex, cause (1/FPS * #b-frames) delay in both encoding and decoding.  As such are subject of much controversy, and often omitted.
B-frames can be highly beneficial in scenes where the background is being revealed over several frames, or fading transitions (from one scene to the next). 
No other frames are predicted from a B-frame.  This is good and bad. 
Because they are not referenced, a very low bitrate B-frame can be inserted where needed to help control the bitrate, but not dragging down future P-frames, so without causing as much quality loss as a P-frame might. 
Because B-frames are not referenced, the following P-frame must still encode the changes between it and the previous I or P frame a second time, in addition to encoding much of it in the B-frame.
A B-frame can contain any number of I-pictures and B-pictures, in addition to B-pictures.


=== D-frames ===
=== D-frames ===
Line 62: Line 105:
MPEG-1 operate on video in a series of 8x8 blocks for quantization, motion estimation, etc.  Because chroma is subsampled by 4, however. you need 4 luma blocks to correspond to 1 chroma block.  This gives us the 16x16 macroblock as the smallest independent unit in video.
MPEG-1 operate on video in a series of 8x8 blocks for quantization, motion estimation, etc.  Because chroma is subsampled by 4, however. you need 4 luma blocks to correspond to 1 chroma block.  This gives us the 16x16 macroblock as the smallest independent unit in video.


It is very important to maintain video resolutions that are [[multiple]]s of 16.   
It is very important to maintain video resolutions that are [[multiple]]s of 16.  See Motion Vectors for more reasons.


The same problem can be seen where black bars do not fall on a macroblock boundary.
  Black Bars
  Cropped macroblocks
  Noise around edges




Line 95: Line 140:


RLE is particularly effective after quantization, as a significant number of the AC coefficients are now zero, and can be represented with just a couple bytes (in a special 2-dimensional Huffman table that codes the run-length and the ending character).
RLE is particularly effective after quantization, as a significant number of the AC coefficients are now zero, and can be represented with just a couple bytes (in a special 2-dimensional Huffman table that codes the run-length and the ending character).
=== Motion Vectors ===
P and B frames
Error encoded
Macroblock multiples of 16
Cropped macroblocks
The same problem can be seen where black bars do not fall on a macroblock boundary.




   Datarate
   Datarate


   I-frames (Intraframe)  
   I-frames (Intraframe)*
     Seeking
     Seeking*
   P-frames (Predicted)
   P-frames (Predicted)*
   B-frames (Bidirectional)
   B-frames (Bidirectional)*
     Complexity (memory)
     Complexity (memory)*
     Delay
     Delay*
   GOP
   GOP*
     Keyframe placement
     Keyframe placement*


   Quantization*
   Quantization*
Line 195: Line 253:
   Hybrid filtering*
   Hybrid filtering*
     aliasing issues*
     aliasing issues*
     "aliasing compensation"?
     "aliasing compensation"? need more details
   mid/side (or impulse) joint stereo  
   mid/side (or impulse) joint stereo  
   "If there is a transient, 192 samples are taken instead of 576 to limit the temporal spread of quantization noise"
   "If there is a transient, 192 samples are taken instead of 576 to limit the temporal spread of quantization noise"
Line 211: Line 269:
   Interleaving
   Interleaving
   PES
   PES
  SCR
  PTS
     Wrap-around
     Wrap-around
   DTS
   DTS
Line 219: Line 279:
== See Also ==
== See Also ==


*[[MPEG]] The Moving Picture Experts Group
*[[MPEG]] The Moving Picture Experts Group, developers of the MPEG-1 format
*[[MP3]] The Cultural Phenomenon in Music
*[[MP3]] More details on MPEG-1 Layer III audio
*[[MPEG multichannel]] Backwards compatible 5.1 channel [[surround sound]] extension to Layer II
*[[MPEG-2]] The direct successor to the MPEG-1 standard.
 
;Implementations
*[[Libavcodec]] includes MPEG-1 video/audio encoders and decoders
*[[MJPEGtools]] MPEG-1/2 video/audio encoders
*[[Twolame]] high quality MPEG-1 Layer II audio encoder based on [[Lame]] psychoacoustic models
 
*[[Musepack]] high quality audio format originally based on MPEG-1 Layer II, with significant incompatible changes and improvements


== References ==
== References ==

Revision as of 21:57, 23 March 2008

Do not make any changes to this page for now. This is my mind-dump and accommodating others before I'm done will just make much, much more work for me. Put any suggestions on the Talk page, and I will eventually address them. -RC

MPEG-1 was an early standard for lossy compression of video and audio. It was designed to compress raw video and CD audio to 1.5Mb/s without discernible quality loss, making Video CDs and Digital Video Broadcasting possible.

Perhaps the most well-known part of the MPEG-1 standard today is the MP3 audio format it introduced.

The MPEG-1 standard is published as ISO/IEC 11172.

History

Modeled on the successful collaborative approach and the compression technologies developed by the Joint Photographics Experts Group and CCITT's Experts Group on Telephony (creators of the JPEG image compression standard and the H.261 standard for video conferencing over ISDN lines respectively) the MPEG working group was established in January 1988. MPEG was formed to address the need for standard video and audio encoding formats, and build on H.261 to get better quality through the use of more complex (non-realtime) encoding methods. [1]

Development of the MPEG-1 standard began in May 1988. 14 video and 14 audio codec proposals were submitted by individual companies and institutions for evaluation. The codecs were extensively tested for computational complexity and subjective (human perceived) quality, at (combined video+audio) data rates of 1.5Mbps. The codecs that excelled in this testing were utilized as the basis for the standard and refined further, with additional features and other improvements being incorporated. [2]

After 20 meetings of the full group in various cities around the world, and 4 1/2 years of development and testing, the final standard was approved in early November 1992. [3] The completion date, as commonly reported, for the MPEG-1 standard varies greatly, because a largely complete draft standard was produced in September 1990, and from that point on, only minor changes were introduced. Before the MPEG-1 standard had even been finalized, work began on a second standard, MPEG-2, intended to extend MPEG-1 technology to provide full broadcast-quality video at high bitrates (3 - 15 Mbps), and support for interlaced video. [4] Due in part to the similarity between the two codecs, the MPEG-2 standard included full backwards compatibility with MPEG-1 video.

Notably, the MPEG-1 standard very strictly defines the bitstream, and decoder function, but does not define how MPEG-1 encoding is to be performed (although they did provide a reference implementation). This means that MPEG-1 coding efficiency can drastically vary depending on the encoder used, and generally means that newer encoders perform significantly better than their predecessors.

Applications

  • Today, MPEG-1 has become by far the most widely compatible lossy audio/video format in the world.
  • MPEG-1 Video and Layer I/II audio can be implemented without payment of license fees. [5] [6] [7] [8] (Due to its age, (most?) patents on the technology have expired in most countries.???)
  • Most computer software for video playback includes MPEG-1 decoding, in addition to any other supported formats.
  • The immense popularity of MP3 audio has established a massive installed base of hardware that can playback (all 3 layers of) MPEG-1 audio.
  • Millions of portable digital audio players (such as iPods) can playback MPEG-1 audio.
  • The widespread popularity of MPEG-2 (mostly with broadcasters) means MPEG-1 is playable by most digital cable/satellite set-top boxes, and digital disc and tape players, due to backwards compatibility.
  • MPEG-1 video and audio is the exclusive format used on Video CD (VCD), the first consumer digital video format, also the first disc based digital format, and still a very popular option around the world.
  • The Super Video CD standard, based on VCD, uses MPEG-1 audio exclusively, as well as MPEG-2 video.
  • DVD video uses MPEG-2 video primarily, but MPEG-1 support is explicitly defined/specified in the standard.
  • The DVD video standard originally required MPEG-1 Layer II audio for PAL countries, but was changed to allow AC-3/Dolby Digital-only discs. MPEG-1 Layer II audio is still allowed on DVDs, although the MPEG-2 additions, like MPEG multichannel and VBR, are rarely supported. Most DVD players also support Video CD and MP3 CD playback, which use MPEG-1.
  • The international Digital Video Broadcasting (DVB) standard primarily uses MPEG-1 Layer II audio, as well as MPEG-2 video.
  • The international Digital Audio Broadcasting (DAB) standard uses MPEG-1 Layer II audio exclusively, due to error resilience and low complexity of decoding.
  • MPEG-1 Layer II audio, with MPEG multichannel extensions, was proposed for use in the North American ATSC standard but Dolby Digital (aka. AC-3, A/52) was chosen instead. This is a matter of significant controversy, as it has been revealed that the organizations (The Massachusetts Institute of Technology and Zenith) behind at least 2 of the 4 voting board members received tens of millions of dollars of compensation from secret deals with Dolby Laboratories in exchange for their votes, with one of the board members directly receiving several million. [9]


Video

Part 2 of the MPEG-1 standard covers video and is defined in ISO/IEC 11172-2

Color Space

Before encoding video to MPEG-1 the color-space is transformed to Y'CbCr (Y'=Luma, Cb=Chroma Blue, Cr=Chroma Red). Luma (brightness/resolution) is stored separately from chroma (color, hue, phase) and even further separated into red and blue components. The chroma is also subsampled to 4:2:0, meaning it is decimated by half vertically and half horizontally, to just one quarter the resolution of the video.

Because the human eye is much less sensitive to small changes in color than in brightness, chroma subsampling is a very effective way to reduce the amount of video data that needs to be compressed. On videos with fine (complex?) details this can manifest as chroma aliasing artifacts. Compared to other digital compression artifacts, this issue seems to be very (minor?) rarely a source of annoyance.

Because of subsampling, Y'CbCr video must always be stored (encoded?) using even dimensions (divisible by 2), otherwise chroma mismatch ("ghosts") will occur, and it will appear the color is ahead of, or behind the rest of the video, much like a shadow. (in the encoded video, ?)

Resolution

MPEG-1 supports resolutions up to 4095×4095.

MPEG-1 videos are most commonly found using (SIF) resolutions: 352x240, 352x288, or 320x240. These low resolutions, combined with a bitrate less than 1.5Mb/s, makes up what is known as a constrained parameters bitstream. This is the commonly accepted minimum video specifications any decoder should be able to play, to be considered MPEG-1 compliant. This was selected to provide a good balance between quality and performance, allowing the use of reasonably inexpensive hardware of the time.

 Max Bitrate ?


I-Frames / Pictures

MPEG-1 has several frame and picture types. The first, most important, yet simplest are I-frames.

I-frame is a common abbreviation for Intra-frame. They may also be known as key-frames due to their somewhat similar function to the keyframes used in animation.

I-frames can be considered effectively identical to JPEG images. I-frames are the only frame type that can be decoded independently of any other frames. This is important.

High-speed seeking through an MPEG-1 video is only possible to the nearest I-frame. When cutting a video, without computationally intensive re-encoding, it is only possible to start a (segment of?) video from (an?) the first I-frame in the segment. For this reason, I-frame only MPEG videos are used in editing applications.

I-frame only compression is very, very fast, but produces very large file sizes, on the order of 2 - 14 × larger than normally encoded MPEG-1 video. I-frame only MPEG-1 video is very similar to MJPEG video, so much so that very high-speed lossless conversion can be made from one format to the other. [10]

The length between I-frames is known as the Group of Pictures (GOP) size. MPEG-1 most commonly uses a GOP size of 15. ie. 1 I-frame for every 14 non-I-frame (some combination of P-frames and B-frames). With more intelligent encoders, GOP size is dynamically chosen, up to some pre-selected maximum limit.

Limits are placed on the maximum number of frames between I-frames due to encoding complexing, decoder buffer size, seeking ability, and and accumulation of IDCT errors in low-precision implementations common in hardware decoders (chips?).

An I-frame can be defined as a frame of video that contains only I-pictures, that is, all the blocks (see macro-block below) in the frame are independently encoded.

P-frames

P-frames are predicted, (or forward-predicted) encoding only the difference in image from frame (I- or P-) immediately preceding it.

The difference between frames is calculated using motion vectors (see below). Motion vector data will be embedded in the P-frame for use by the decoder.

If a reasonable match is found, the block from the previous frame is used, and the error (or difference between the block and the current frame) is encoded and stored in the P-frame.

If a match from the previous frame for a block cannot be found, the block will be encoded as an I-picture. ie. intra-coded, storing the entire block as an image, in full.

If a video drastically changes from one frame to the next (such as a scene change), it can be more efficient (performance, buffer, seek-ability) to encode it as an I-frame.

A P-frame can contain any number of I-pictures, in addition to P-pictures.

B-frames

A B-frame is similar to a P-frame, except it can predict from the previous and/or future frame. This makes B-frames very computationally complex, cause (1/FPS * #b-frames) delay in both encoding and decoding. As such are subject of much controversy, and often omitted.

B-frames can be highly beneficial in scenes where the background is being revealed over several frames, or fading transitions (from one scene to the next).

No other frames are predicted from a B-frame. This is good and bad.

Because they are not referenced, a very low bitrate B-frame can be inserted where needed to help control the bitrate, but not dragging down future P-frames, so without causing as much quality loss as a P-frame might.

Because B-frames are not referenced, the following P-frame must still encode the changes between it and the previous I or P frame a second time, in addition to encoding much of it in the B-frame.

A B-frame can contain any number of I-pictures and B-pictures, in addition to B-pictures.

D-frames

MPEG-1 has a unique frame type not found in later video standards. D-frames or DC-pictures are independent images (intra-frames) that have been encoded DC-only (AC coefficients are removed—see DCT below) and hence are very low quality. D-frames are never used/referenced by I, P or B frames. D-frames are only useful for fast previews of video, for instance when seeking through a video at high speed.

Given moderately higher performance decoding equipment, this feature can be approximated by processing I-frames, and discarding the AC coefficients before display.

Macroblocks

MPEG-1 operate on video in a series of 8x8 blocks for quantization, motion estimation, etc. Because chroma is subsampled by 4, however. you need 4 luma blocks to correspond to 1 chroma block. This gives us the 16x16 macroblock as the smallest independent unit in video.

It is very important to maintain video resolutions that are multiples of 16. See Motion Vectors for more reasons.

 Black Bars
 Cropped macroblocks 
 Noise around edges


DCT

Each 8x8 block is encoded using the Forward Discrete Cosign Transform (FDCT). This process by itself is lossless (practically: there are some rounding errors), and is reversed by the Inverse DCT (IDCT) upon playback to produce the original values.

The FDCT process converts the 64 uncompressed pixel values (brightness) into 64 different frequency values. One (large) value that is the average of the entire 8x8 block (the DC coefficient) and 63 smaller, positive or negative values (the AC coefficients), that are relative to the value of the DC coefficient.

The (large) DC coefficient remains mostly consistent from one block to the next, and can be compressed quite effectively with DPCM, so only the amount of difference between each DC value needs to be stored. A significant number of the AC coefficients will be near 0, which can then be more efficiently compressed in a later step. Additionally, the frequency conversion is necessary for quantization.

Quantization

Quantization (of digital data) is, essentially, the process of reducing the accuracy of a signal.

A quantization table is a string of 64-numbers (0-255) that tells the encoder how relatively important or unimportant each piece of visual information is. Each number in the table corresponds to a certain frequency component of the video image.

Each of the 64 frequency values of the DCT block are divided by their corresponding values in the quantization table. This reduces or completely eliminates the information in some frequency components of the video, deemed less visually important. This quantization process usually reduces a significant number of the AC coefficients to zero.

Quantization eliminates a large amount of data, and is the main lossy processing step in MPEG-1 video encoding. This is also the source of most MPEG-1 video compression artifacts, like blockiness, color banding, noise, ringing, discoloration, et al. when video is encoded with an insufficient bitrate.

Lossless Data Compression

Several steps in the encoding of MPEG-1 video are lossless, meaning they will be reversed on decoding to produce exactly the same values. Since these lossless data compression steps don't add noise into or otherwise change the video (unlike quantization), it is often referred to as noiseless coding in the context of lossy codecs. Since lossless compression aims to remove as much redundancy as possible, it is also known as entropy coding in information theory.

Huffman Coding

After perceptual coding, the data is analyzed to find strings that repeat often. Those strings are then put into a special table, with the most frequently repeating data assigned the shortest code to keep the data as small as possible.

Once the table is constructed, those strings are in the data are replaced with their (much smaller) codes, which references the appropriate entry in the table.

RLE

Run-length encoding (RLE) is a very simple method of compressing repetition. A sequential string of characters, no matter how long, can be replaced with a few bytes, noting the value that repeats, and how many times.

RLE is particularly effective after quantization, as a significant number of the AC coefficients are now zero, and can be represented with just a couple bytes (in a special 2-dimensional Huffman table that codes the run-length and the ending character).

Motion Vectors

P and B frames

Error encoded

Macroblock multiples of 16

Cropped macroblocks

The same problem can be seen where black bars do not fall on a macroblock boundary.


 Datarate
 I-frames (Intraframe)*
   Seeking*
 P-frames (Predicted)*
 B-frames (Bidirectional)*
   Complexity (memory)*
   Delay*
 GOP*
   Keyframe placement*
 Quantization*
   Ringing (large coefficients in high frequency sub-bands)
   zigzag
 Motion Vectors/Estimation
   Black borders/Noise
   pel precision (half pixel IIRC)
   Two MV per macroblock (forward/backward pred)
   Prediction error
   DPCM encoded, just like DC coeffs
   Blockiness
 CBR/VBR
 Spacial Complexity
 Temporal Complexity


Audio

Part 3 of the MPEG-1 standard covers audio and is defined in ISO/IEC 11172-3

MPEG-1 audio utilizes psychoacoustics to significantly reduce the data rate required by an audio stream. It reduces or completely discards certain parts of the audio that the human ear can't hear, either because they are in frequencies where the ear has limited sensitivity, or are masked by other, typically louder, sounds.


Channel Encoding:

  • Mono
  • Joint Stereo (impulse encoded)
  • Stereo
  • Dual (two uncorrelated mono channels)
  • Sampling rates: 32, 44.1 and 48 kHz
  • Bitrates: 32, 48, 56, 64, 80, 96, 112, 128, 160, 192, 224, 256, 320 and 384 kbit/s
 mono, stereo, joint stereo (impulse, m/s), dual.*
 efficient time-domain concealment characteristics 

Layer I

MPEG-1 Layer I is nothing more than a simplified version of Layer II, designed for very low delay and low complexity to facilitate real-time encoding on the hardware available in 1990, for applications like teleconferencing and studio editing. With the substantial performance improvements in digital processing since, it has now been long obsolete.

It saw limited adoption in it's time, and most notably was used on the defunct Digital Compact Cassette at 384 kbps. Layer I audio files use the extension .mp1

Layer II

MPEG-1 Layer I/II are time-domain encoders that utilize an empirically determined/defined psychoacoustic model based on the absolute threshold of hearing with the global masking threshold determined using a 1024 point FFT; as well as perceptual auditory masking. A low-delay 32 sub-band polyphased filter bank is used for time-frequency mapping, with overlapping ranges to prevent aliasing.

Time domain refers to how the psychoacoustic model is applied: to short, discrete samples/chunks of the audio waveform. This implies low-delay as a small number of samples are analyzed before encoding, as opposed to frequency domain encoding which must analyze a large number of samples before it can decide how to transform and output encoded audio.

Despite some 20 years of progress in the field of digital audio coding, MP2 remains the preeminent lossy audio coding standard due to its especially high audio coding performances on highly critical audio material such as castanet, symphonic orchestra, male and female voices and particularly high quality percussive sounds (impulses) like triangle, glockenspiel and audience applause. Testing has shown MP2 to be equivalent or superior to than much more recent audio codecs, such as Dolby Digital AC-3. [11]

Subjective audio testing by experts, in the most critical conditions ever implemented, has shown MP2 to offer transparent audio compression at 256kbps for 16-bit 44.1khz CD audio. [12] That (approximately) 1:6 compression ratio for CD audio is particularly impressive since it's quite close to upper theoretical limit of perceptual entropy, at just over 1:8. [13] [14] Achieving much higher compression is simply not possible without discarding some perceptible information.

 audio broadcasting
 error resilient
 Musicam


Layer III/MP3

MP3 is a frequency domain transform encoder that utilizes a dynamic psychoacoustic model. Based on ASPEC and OCF algorithms.

Even though it utilizes some of the lower layer functions, MP3 is quite different from MP2.

MP3 does not benefit from the 32 sub-band filter bank, instead just MDCT tranforming the data again, and processing it in the frequency domain in much smaller pieces. In fact, being forced to use the filter bank (to fit in the MPEG-1 audio standard) wastes processing time and compromises MP3 quality.

The Layer I/II 1024 point (FFT) window for spectral estimation is too small for MP3, so it has to utilize two passes to cover the full 1152 samples, potentially selecting a less appropriate global masking threshold because of it.

MP3 outputs 1152 samples, but spreads the larger MP3 frames over a varying number of several Layer I/II-sized frames, making editing much more difficult, and proving more vulnerable to errors.

Unlike Layers I/II, MP3 uses Huffman coding (after perceptual) to (losslessly) further reduce the bitrate, without any further quality loss, making MP3 further affected by small transmission errors.

MP3 benefits greatly from being able to divide the audio into 576 frequency components using the (overlapping) MDCT transform. This allows MP3 to more accurately apply psychoacoustic rules (than can Layer II with just 32 sub-bands), particularly in the critical bands and providing much better low-bitrate performance.

The frequency domain (MDCT) design of MP3 imposes some limitations as well. It causes a factor of 12 - 36 times worse temporal resolution than MP2, which can cause artifacts due to (unexpected) transients sounds like percussive events with artifacts spread over a larger window. This results in audible smearing and pre-echo. [15]

This hybrid design also introduces aliasing artifacts, which are compensated for, but that produces (artifacts?) energy encoded in the frequency domain.???

Because of these issues, MP2 sound quality is actually superior to MP3 at high bitrates (at the VERY LEAST, above 112 kbps/channel)

 "Frequency resolution is limited by the small long block window size, decreasing coding efficiency

No scale factor band for frequencies above 15.5/15.8 kHz"

 9 months?
 ASPEC (Fraunhoffer) 
 entropy coding (Huffman)*
 Hybrid filtering*
   aliasing issues*
   "aliasing compensation"? need more details
 mid/side (or impulse) joint stereo 
 "If there is a transient, 192 samples are taken instead of 576 to limit the temporal spread of quantization noise"
 psychoacoustic model and frame format from MP1/2*
 ringing
 CBR/VBR

Systems

Part 1 of the MPEG-1 standard covers systems which is the logical layout of the encoded audio, video, and other bitstream data.

"The MPEG-1 Systems design is essentially identical to the MPEG-2 Program Stream structure." [16]

 Program Stream
 Interleaving
 PES
 SCR
 PTS
   Wrap-around
 DTS
 Timebase correction
 Pixel/Display Aspect Ratio


See Also

  • MPEG The Moving Picture Experts Group, developers of the MPEG-1 format
  • MP3 More details on MPEG-1 Layer III audio
  • MPEG multichannel Backwards compatible 5.1 channel surround sound extension to Layer II
  • MPEG-2 The direct successor to the MPEG-1 standard.
Implementations
  • Libavcodec includes MPEG-1 video/audio encoders and decoders
  • MJPEGtools MPEG-1/2 video/audio encoders
  • Twolame high quality MPEG-1 Layer II audio encoder based on Lame psychoacoustic models
  • Musepack high quality audio format originally based on MPEG-1 Layer II, with significant incompatible changes and improvements

References

  1. http://www.cis.temple.edu/~vasilis/Courses/CIS750/Papers/mpeg_6.pdf pp.2
  2. http://www.chiariglione.org/mpeg/meetings/santa_clara90/santa_clara_press.htm
  3. http://www.chiariglione.org/mpeg/meetings.htm
  4. http://www.chiariglione.org/mpeg/meetings/london/london_press.htm
  5. http://www.emedialive.com/Articles/ReadArticle.aspx?ArticleID=12165
  6. http://www.extremetech.com/article2/0,1697,1153916,00.asp
  7. http://www.snazzizone.com/TP09.html
  8. http://213.130.34.82/resources/technical/mpegcompared/index.htm
  9. http://www-tech.mit.edu/V122/N54/54hdtv.54n.html
  10. http://citeseer.ist.psu.edu/acharya98compressed.html Compressed Domain Transcoding of MPEG infers that quantization tables differ, but those are user selectable
  11. Wustenhagen et al, Subjective Listening Test of Multi-channel Audio Codecs, AES 105th Convention Paper 4813, San Francisco 1998
  12. http://www.faqs.org/faqs/mpeg-faq/part1/ "You can compress the same stereo program down to 256 Kbits/s with no loss in discernable quality." (the original papers would be much, much better refs, but I can't seem to find them! This just proves they exist!)
  13. J. Johnston, Estimation of Perceptual Entropy Using Noise Masking Criteria, in Proc. ICASSP-88, pp. 2524-2527, May 1988.
  14. 6. J. Johnston, Transform Coding of Audio Signals Using Perceptual Noise Criteria, IEEE J. Sel. Areas in Comm., pp. 314-323, Feb. 1988.
  15. http://www.cs.columbia.edu/~coms6181/slides/6R/mpegaud.pdf pp.8
  16. http://www.chiariglione.org/mpeg/faq/mp1-sys/mp1-sys.htm

External Links