|
|
virtualdub.org <http://www.virtualdub.org/>
|
|
|
Proof that I had too much free time in college [AD-SIZE]
|
|
|
<http://www.robofish.com/cgi-bin/banner.cgi?virtualdub>
|
|
|
|
|
|
|
|
|
Current version
|
|
|
|
|
|
v1.6.15 (stable)
|
|
|
|
|
|
|
|
|
Navigation
|
|
|
|
|
|
Home </index>
|
|
|
Archived news </oldnews>
|
|
|
Downloads </download>
|
|
|
Documentation </virtualdub_docs>
|
|
|
Capture </docs_capture>
|
|
|
Compiling </docs_compiling>
|
|
|
Processing </docs_processing>
|
|
|
Crashes </docs_crashes>
|
|
|
Features </features>
|
|
|
Filters </virtualdub_filters>
|
|
|
Filter SDK </filtersdk>
|
|
|
Knowledge base </virtualdub_kb>
|
|
|
Donate </donation>
|
|
|
Contact info </contact>
|
|
|
Forum <http://forums.virtualdub.org/>
|
|
|
|
|
|
|
|
|
Search
|
|
|
|
|
|
|
|
|
Archives
|
|
|
|
|
|
01 Jul - 31 Jul 2006 </blog/archives/archive_2006-m07.php>
|
|
|
01 June - 30 June 2006 </blog/archives/archive_2006-m06.php>
|
|
|
01 May - 31 May 2006 </blog/archives/archive_2006-m05.php>
|
|
|
01 Apr - 30 Apr 2006 </blog/archives/archive_2006-m04.php>
|
|
|
01 Mar - 31 Mar 2006 </blog/archives/archive_2006-m03.php>
|
|
|
01 Feb - 28 Feb 2006 </blog/archives/archive_2006-m02.php>
|
|
|
01 Jan - 31 Jan 2006 </blog/archives/archive_2006-m01.php>
|
|
|
01 Dec - 31 Dec 2005 </blog/archives/archive_2005-m12.php>
|
|
|
01 Nov - 30 Nov 2005 </blog/archives/archive_2005-m11.php>
|
|
|
01 Oct - 31 Oct 2005 </blog/archives/archive_2005-m10.php>
|
|
|
01 Sep - 30 Sep 2005 </blog/archives/archive_2005-m09.php>
|
|
|
01 Aug - 31 Aug 2005 </blog/archives/archive_2005-m08.php>
|
|
|
01 Jul - 31 Jul 2005 </blog/archives/archive_2005-m07.php>
|
|
|
01 June - 30 June 2005 </blog/archives/archive_2005-m06.php>
|
|
|
01 May - 31 May 2005 </blog/archives/archive_2005-m05.php>
|
|
|
01 Apr - 30 Apr 2005 </blog/archives/archive_2005-m04.php>
|
|
|
01 Mar - 31 Mar 2005 </blog/archives/archive_2005-m03.php>
|
|
|
01 Feb - 28 Feb 2005 </blog/archives/archive_2005-m02.php>
|
|
|
01 Jan - 31 Jan 2005 </blog/archives/archive_2005-m01.php>
|
|
|
01 Dec - 31 Dec 2004 </blog/archives/archive_2004-m12.php>
|
|
|
01 Nov - 30 Nov 2004 </blog/archives/archive_2004-m11.php>
|
|
|
01 Oct - 31 Oct 2004 </blog/archives/archive_2004-m10.php>
|
|
|
01 Sep - 30 Sep 2004 </blog/archives/archive_2004-m09.php>
|
|
|
01 Aug - 31 Aug 2004 </blog/archives/archive_2004-m08.php>
|
|
|
|
|
|
|
|
|
Stuff
|
|
|
|
|
|
Powered by Pivot
|
|
|
<http://www.pivotlog.net/?ver=Pivot+-+1.15%3A+%27Soundwave%27>
|
|
|
XML: RSS feed </blog/rss.xml>
|
|
|
XML: Atom feed </blog/atom.xml>
|
|
|
|
|
|
|
|
|
<20> </blog/archives/archive_2006-m07.php#e117> <20>
|
|
|
</blog/pivot/entry.php?id=117>Alpha blending without SIMD support
|
|
|
|
|
|
Now that we've covered averaging bitfields <http://6>, how to
|
|
|
efficiently alpha blend with a factor other than one-half?
|
|
|
|
|
|
Alpha blending is normally done using an operation known as /linear
|
|
|
interpolation/, or /lerp/:
|
|
|
|
|
|
lerp(a, b, f) = a*(1-f) + b*f = a + (b-a)*f
|
|
|
|
|
|
...where a and b are the values to be blended, and f is a blend factor
|
|
|
from 0-1 where 0 gives a and 1 gives b. To blend a packed pixel, you
|
|
|
could just expand all channels of the source and destination pixels and
|
|
|
do the blend in floating point, but it really hurts to see this, since
|
|
|
the code turns out nasty on practically any platform. Unless you've got
|
|
|
a platform that gets pixels in and out of a floating-point vector really
|
|
|
easily, you should use integer math for fast alpha blending.
|
|
|
|
|
|
So how to blend quickly without resorting to per-channel?
|
|
|
|
|
|
First, if you are dealing with an alpha channel instead of a constant
|
|
|
alpha value, chances are that the alpha value ranges from 0 to 2^N-1,
|
|
|
which is not a convenient factor for division. You could cheat and just
|
|
|
divide by 2^N, but that leads to the unpleasant result of
|
|
|
either the fully transparent or opaque case not working correctly
|
|
|
(sloppy). Conditionally adding one to the alpha value fixes this at the
|
|
|
cost of introducing a tiny amount of error. I used to add the high bit,
|
|
|
thus mapping [128,255] to [129,256] for 8-bit values; I'm told that
|
|
|
shifting [1,255] to [2,256] leads to better accuracy. Either way will
|
|
|
prevent the glaring error cases, though.
|
|
|
|
|
|
The next step is to reformulate the blend equation in terms of integer math:
|
|
|
|
|
|
lerp(a, b, f) = a + (((b-a) * f + round) >> shift)
|
|
|
|
|
|
where round = 1 << (shift - 1).
|
|
|
|
|
|
To eliminate some of the pack/unpack work, realize that you can alpha
|
|
|
blend a channel in place as long as you isolate it and have enough
|
|
|
headroom above it in the machine word to accommodate the intermediate
|
|
|
result of the multiply. In other words, instead of extracting red =
|
|
|
(pixel >> 16)&0xff, blending that, and then shifting it back up, simply
|
|
|
blend (pixel & 0x00ff0000).
|
|
|
|
|
|
Now, the magic: you can actually do more than one bitfield this way as
|
|
|
long as you have enough space between them. If you have two
|
|
|
non-overlapping bitfields combined as (a << shift1) + (b << shift2),
|
|
|
multiplying their combined form by an integer gives the same result as
|
|
|
splitting them apart, multiplying each, and then recombining. For a 565
|
|
|
pixel, you could thus blend red and blue in the following manner
|
|
|
(remember that the red/green/blue masks for 565 are 0xf800, 0x07e0, and
|
|
|
0x001f, respectively):
|
|
|
|
|
|
rbsrc = src & 0xf81f
|
|
|
rbdst = dst & 0xf81f
|
|
|
rbout = ((rbsrc * f + rbdst * (32-f) + 0x8010) >> 5) & 0xf81f
|
|
|
|
|
|
Which leads to the surprising result that you can safely subtract the
|
|
|
two bitfields together and scale the difference without any fancy SIMD
|
|
|
bitfield support.
|
|
|
|
|
|
rbout = (rbdst + (((rbsrc - rbdst) * f + 0x8010) >> 5)) & 0xf81f
|
|
|
|
|
|
The remaining green channel is easy. Doing it this way does limit
|
|
|
precision in the blend factor, since you're limited to the number of
|
|
|
bits of headroom you have, but five bits for 565 is decent. If you also
|
|
|
have an alpha channel to blend, you can do so, although you might need
|
|
|
to temporarily shift down green and alpha together to make headroom at
|
|
|
the top of the machine word if you're dealing with a big pixel.
|
|
|
|
|
|
What if you didn't have a hardware multiply, or the one you have is very
|
|
|
slow? Well, you might use lookup tables, then. Ideally, though, you'd
|
|
|
like to avoid inserting and extracting the channels again. One dirty
|
|
|
trick you can use revolves around the fact that you can distribute the
|
|
|
multiplication over the additive nature of bits, thus allowing the
|
|
|
lookup tables to be indexed off the raw bytes instead of the channels:
|
|
|
|
|
|
unsigned blend565[33][2][256];
|
|
|
|
|
|
void init() {
|
|
|
for(unsigned alpha=0; alpha<=32; ++alpha) {
|
|
|
unsigned f = alpha;
|
|
|
|
|
|
for(unsigned i=0; i<256; ++i) {
|
|
|
blend565[alpha][1][i] = (((i & 0xf8)*f) << 19) + (((i & 0x07)*f) << 3) + (0x04008010 >> 1);
|
|
|
blend565[alpha][0][i] = (((i & 0xe0)*f) >> 5) + (((i & 0x1f)*f) << 11);
|
|
|
}
|
|
|
}
|
|
|
}
|
|
|
|
|
|
void blend565(unsigned dst, unsigned src, unsigned alpha) {
|
|
|
unsigned ialpha = 32-alpha;
|
|
|
unsigned sum = blend565tab[alpha][0][src & 0xff] + blend565tab[ialpha][0][dst & 0xff] + blend565tab[alpha][1][src >> 8] + blend565tab[ialpha][1][dst >> 8];
|
|
|
|
|
|
sum &= 0xf81f07e0;
|
|
|
|
|
|
return (sum & 0xffff) + (sum >> 16);
|
|
|
}
|
|
|
|
|
|
It may look odd because we're actually splitting the green bitfield
|
|
|
between the two lookup tables, but it works -- essentially, it's
|
|
|
combining partial products from the lower and upper halves of the green
|
|
|
bitfield. I've also thrown the rounding constant into the tables to save
|
|
|
an addition. The table's rather big at 67K, but if you are doing alpha
|
|
|
blending off of a constant, you can cache pointers to the two pertinent
|
|
|
rows and then only 4K of tables are used, which is much nicer on the
|
|
|
cache. The shifting/masking in the table lookups are also unnecessary if
|
|
|
you load the source pixels as pairs of bytes instead of as words.
|
|
|
|
|
|
Incidentally, if you think about it, this trick can also be used to
|
|
|
convert /any/ bitfield-based 16-bit packed pixel format to /any/ other
|
|
|
bitfield-based pixel format up to 32 bits with a single routine, just by
|
|
|
changing 2K of tables. This generally isn't worthwhile if you have a
|
|
|
SIMD multiplier -- Intel's MMX application notes describe how you can
|
|
|
abuse MMX's pmaddwd instruction to convert 8888 to 565 at about 2.1
|
|
|
clocks/pixel -- but it can be handy if you find yourself without a
|
|
|
hardware multiplier or even a barrel shifter.
|
|
|
|
|
|
%num% comments | Jul 08, 2006 at 00:10 | default
|
|
|
|
|
|
|
|
|
Comments
|
|
|
|
|
|
*Comments posted:*
|
|
|
|
|
|
I?m a bit confused, can you show how blending is done between an ARGB
|
|
|
(foreground) and RGB32 (background, no alpha data, same bit positions
|
|
|
for RGB values)?
|
|
|
|
|
|
*Blight* - 08 07 06 - 16:44
|
|
|
|
|
|
|
|
|
|
|
|
unsigned blend2(unsigned src, unsigned dst) {
|
|
|
unsigned alpha = src >> 24;
|
|
|
alpha += (alpha > 0);
|
|
|
|
|
|
unsigned srb = src & 0xff00ff;
|
|
|
unsigned sg = src & 0<>00ff00;
|
|
|
unsigned drb = dst & 0xff00ff;
|
|
|
unsigned dg = dst & 0<>00ff00;
|
|
|
|
|
|
unsigned orb = (drb + (((srb ? drb) * alpha + 0<>800080) >> 8)) & 0xff00ff;
|
|
|
unsigned og = (dg + (((sg ? dg ) * alpha + 0<>008000) >> 8)) & 0<>00ff00;
|
|
|
|
|
|
return orb+og;
|
|
|
}
|
|
|
|
|
|
*Phaeron* - 08 07 06 - 17:01
|
|
|
|
|
|
|
|
|
|
|
|
Another nice trick in this area is using premultiplied inverse alpha
|
|
|
when you need to stack together a lot of images with alpha channel
|
|
|
before blending them on top of the destination image/video.
|
|
|
|
|
|
*Haali* - 08 07 06 - 18:22
|
|
|
|
|
|
|
|
|
|
|
|
Thanks Phaeron, quite informative.
|
|
|
|
|
|
Also, you can skip alpha if alpha = 0 or alpha = 255 (just copy the
|
|
|
source/destination at 100%, that actually speeds things considerably).
|
|
|
|
|
|
*Blight* - 09 07 06 - 07:08
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Comment form
|
|
|
|
|
|
|
|
|
*Please keep comments on-topic for this entry.* If you have unrelated
|
|
|
comments about VirtualDub, the forum is a better place to post them.
|
|
|
Name:
|
|
|
Remember personal info?
|
|
|
Yes
|
|
|
No
|
|
|
Email:
|
|
|
Your email address is only revealed to the blog owner and is not shown
|
|
|
to the public.
|
|
|
URL:
|
|
|
Comment: / Textile <#>
|
|
|
|
|
|
*An authentication dialog may appear when you click Post Comment.*
|
|
|
Simply type in "user" as the user. I have had to do this to stop
|
|
|
automated comment spam.
|
|
|
|
|
|
|
|
|
|
|
|
*Small print:* All html tags except <b> and <i> will be removed from
|
|
|
your comment. You can make links by just typing the url or mail-address.
|
|
|
|