Chủ Nhật, 10 tháng 11, 2013

[Encoding] Understanding a Quantization Matrix

Ok Folks, here is an explanation of what a matrix really does.

With Xvid you can not only use different built-in Quantization Matrices, like H.263 and Mpeg, but you can also use custom matrices either made by yourself or someone else.

Since few people actually understand what a matrix does, I will try to give an explanation here that is possible to understand even if you're not a math expert.

You will all have heard about macroblocks by now. These are the 16x16 or 32x32 blocks of which a single mpeg4-frame is composed of.
These macroblocks are comprised of 4 8x8 blocks that are grouped together. These 8x8 blocks form the basis of MPEG-4 compression.

Instead of being composed of proper pixels, like a normal bitmap or a film still, a block is more like a representation of a complex formula that tries to mimic the content of the original picture as best as it can.

The Human eye is much more sensitive to changes in the brightness, or Luminance than it is in color changes.
So mpeg4 uses a type of color space in it's file structure that assigns less bits to color changes than it does to brightness changes.

A 8x8 block is not made up of pixels but it is made up of a single value which represents the average brightness (or color) value, and all the remaining values are mathematical representations of the amount of variation from this average value for the whole block.
To put it in other words, you have one basic mean, or average value, and all the variation, or detail of the picture , is represented by the end result of a certain complex formula.
The other places in the 8x8 block, which we invariably but inaccurately call pixels, represent different types of detail, and especially the variation of this detail.

Let's take a look at one block:

X X X X X X X X
X X X X X X X X
X X X X X X X X
X X X X X X X X
X X X X X X X X
X X X X X X X X
X X X X X X X X
X X X X X X X X

The first place in the upper left corner represents the average, or mean value. So if the whole block is dark brown or light red on average, this place says so.
Going from this place to the right or down, we get representations of the amount of variation from this value.
Now this is hard to grasp, so pay attention.
When going from left to right, or from top to bottom, the higher the amount of detail gets.
If you take the original picture of 8x8 bits, the amount of detail is transformed into values depending on a certain frequency at which the detail is present.
Finer detail is represented by higher frequencies.
So, the further you go to the right in the block, the higher the frequency (or the finer the detail).
Let's take three examples:
An 8x8 picture with one big iron bar in it has little detail, so it has a very low frequency.
An 8x8 picture with four broomsticks in it has some detail, so it has a medium frequency.
An 8x8 picture with rain, a cornfield or hair in it, has lots of detail in it and so it has a very high frequency.
As you can guess by now, the values in a block represent both the horizontal and vertical frequency in the original picture.
The formula that does the translation or transformation from detail to frequency is called Discrete Cosinus Transformation or DCT.

So when looking at a block in the way we have just gotten to understand, we can see the next things:


Code:
Average brightness and color of the whole block
I
I Low frequency (bigger)detail
I I
I I Medium frequency (normal)detail
I I I
I I I Higher Frequency (fine)
I I I detail
I I I I
X--X--X--X--X--X--X--X
X--\--X--X--X--X--X--X
Low Frequency detail--- X--X--\--X--X--X--X--X
X--X--X--\--X--X--X--X
Medium Frequency detail X--X--X--X--\--X--X--X
X--X--X--X--X--\--X--X
X--X--X--X--X--X--\--X
High Frequency detail---X--X--X--X--X--X--X--A
I
I
Mathematical representation of the finest
detail, both horizontally and vertically
EDIT: Finally used the code tag...

Ok, now we understand how a block is built.
So how come Quantization matrices into play?

A Quantization matrix (QM from now on)looks something like this:

08 16 19 22 26 27 29 34
16 16 22 24 27 29 34 37
19 22 26 27 29 34 34 38
22 22 26 27 29 34 37 40
22 26 27 29 32 35 40 48
26 27 29 32 35 40 48 58
26 27 29 34 38 46 56 69
27 29 35 38 46 56 69 83

Now there is actually a rather complex proces behind this, but I'll try to describe it in a simple manner:
Every value in the QM is the threshold for the DCT detail-to-frequency translation.
All detail below the threshold will not be regarded as detail and will NOT be encoded. It will simply be discarded.
Now you can understand why Xvid is a so called lossy codec.
It throws detail away. The amount of detail thrown away is determined by the QM.
As you can see, the farther to the right and the farther to the bottom the higher the threshold gets.
The finer the detail, the higher the detail has to stand out of the rest of the picture to be encoded and not thrown away.

So say you have a picture of a girl with long blond hair standing in front of a very light gray wall and behind two prison bars. (don't we all love to see that now)
Off course it's hard to portray that in an 8x8 picture but bear with me for a minute.

The blond girl and the wall will have little contrast between them, so the difference between the average values for brightness and contrast and the maximum values will not be that much.
The values will be lower and will not go that often over the threshold. This would mean less difference that has to be encoded and the picture will have high compressability.
If the wall had been black, the contrast would be much higher and the difference between the average value (sort-of-grey) and the extremities (blond and black) would be much higher. So much more values would have gone over the threshold and would have to be encoded. So higher contrasting scenes result in lower compressability, which we already know offcourse.
Now the two black prison bars in front of the girl are detail, all be it not very fine. So they get a low frequency and they will be encoded if their difference from the average goes over the threshold.
So assuming they're not blond, they will be encoded.
Now the girl has hair, and the texture of hair is off course very fine. So the hair has a lot of detail and gets a very high frequency.
As you can see in our QM, the threshold for high frequencies is much higher than for low frequencies. So the fine detail (the hair) has to differ much more from the average values to be encoded.
So unless the difference is very high, which in this case we assume isn't, the details of the hair won't be encoded.
The matter would be different if she has something like coloured streaks in her hair which would increase the contrast.

So the end result is that the finer the detail, the bigger the contrast of this detail needs to be from the average values of the picture, to be encoded.
This is offcourse on a per-block basis and not on a picture as a whole, which generally consists of more than one 8x8 block. Let's hope you can see through the simplification.

Now you can understand why some matrices soften the picture, like H.263, while others like Mpeg produce a sharper picture.
The values in one QM simply give finer detail a lower threshold and are therefore more likely to encode finer detail, at the price of compressability.
You can also see that the QM that I took as an example isn't a very high compressability matrix; the values are rather low in general.

Some other points:
-End credits usually have very little detail, so you could design a QM especially for this, with VERY high compressability.

-A heavy compression matrix simply ups all the finer detail values so less of that is compressed.

-You could design specific matrices for specific type of content.
You could design matrices especially for sci-fi space adventures, Anime and animation movies, and jungle scenes.

-If you know the exact frequency of interlacing artifacts, you could up their threshold to filter them out!

-Same might work for other types of noise and artifacts.

-I don't know if the one QM is meant for both luminance and color information, I assume it does, but separate QM's for luminance and color would produce higher tweakability. (Don't know if it would be Mpeg4-compliant though, or if it's possible at all).

--------------------------------------------------------------------
Well that's it folks!
Hope I didn't make too many errors, and please correct me if and where I'm wrong.

http://forum.doom9.org/showthread.php?t=54147

Socializer Widget By Blogger Yard
SOCIALIZE IT →
FOLLOW US →
SHARE IT →

0 nhận xét:

Đăng nhận xét

Labels

[Android] [Anti Virus] [Audio] [Design] [Encoding] [Excel] [Guitar Pro] [HD phim] [Hooking] [iOS] [mySQL] [PHP] [Pic] [Programming] [Repair Disk] [Security] [Subtitle] [Sync Sub] [System Info] [Using soft] [Using Tools] [Video] [Word] ẢNH ẢNH BẠN BÈ ẢNH CHỤP Ảnh Của Tôi ảnh đẹp Ảnh động Ảnh động đẹp ẢNH GH.ĐỘNG SONCA ẢNH GH.NG ẢNH GH.NG VÀ CẢNH ẢNH GIA ĐÌNH Ảnh hoa đẹp Ảnh trang trí 3 Ảnh trang trí 4 Bài hát yêu thích BẠN ĐỒNG MÔN Cách cắm hoa Chăm sóc hoa CODE ĐÀN NGỰA CHẠY DƯỚI CHÂN TRANG-CHUỘT CHẠY BLOGSPOT ĐỊA CHỈ TRANG GHÉP ẢNH GH.NG GHÉP ẢNH SC -BẠN BÈ-GIA ĐÌNH 13 (TRANG http://www.loonapix.com) Happy New Year Hạt giống hoa Hoa anh đào Hoa anh thảo Hoa Baby Hoa Bằng lăng Hoa bất tử Hoa Bleeding heart Hoa Bồ Công Anh Hoa cải Hoa Cát Tường Hoa Cẩm Chướng Hoa cẩm tú Hoa cây roi Hoa cúc Hoa cưới Hoa dạ yến thảo Hoa dâm bụt Hoa đào Hoa đỗ quyên Hoa đồng tiền Hoa gạo Hoa giấy Hoa hải đường Hoa hồng Hoa hướng dương Hoa kim châm Hoa kim cương Hoa Lan Hoa lê Hoa lily hoa loa kèn Hoa lưu ly Hoa mai Hoa mẫu đơn Hoa mười giờ Hoa ngọc lan Hoa ngũ sắc Hoa oải hương Hoa phượng Hoa Sen Hoa sen trắng Hoa sim Hoa sứ Hoa sử dương tử Hoa sữa Hoa Thạch Thảo Hoa thiết mộc lan Hoa thủy tiên Hoa thược dược Hoa tulip Hoa Violet Hoa xương rồng LINK TRANG GHÉP ẢNH TRỰC TUYẾN VÀ CODE CHỈNH HÌNH MÙA HẠ Nhạc Không Lời nhật kí Quốc hoa Thông tin riêng blog Thơ của tôi. Thơ hay thơ hay 2 THƠ LỤC BÁT THƠ THẤT NGÔN BÁT CÚ THỦ THUÂT BLOG THỦ THUẬT BLOG Tổng Hợp TRÁNG TẠO VIDEO TRỰC TUYẾN MIỄN PHÍ XE ĐẠP ĐIỆN Ý nghĩa các loài hoa