l3lib/devel/trash/alphablend.txt

virtualdub.org <http://www.virtualdub.org/>
Proof that I had too much free time in college	[AD-SIZE]
<http://www.robofish.com/cgi-bin/banner.cgi?virtualdub>


      Current version

v1.6.15 (stable)


      Navigation

Home </index>
Archived news </oldnews>
Downloads </download>
Documentation </virtualdub_docs>
   Capture </docs_capture>
   Compiling </docs_compiling>
   Processing </docs_processing>
   Crashes </docs_crashes>
Features </features>
Filters </virtualdub_filters>
Filter SDK </filtersdk>
Knowledge base </virtualdub_kb>
Donate </donation>
Contact info </contact>
Forum <http://forums.virtualdub.org/>


      Search


      Archives

01 Jul - 31 Jul 2006 </blog/archives/archive_2006-m07.php>
01 June - 30 June 2006 </blog/archives/archive_2006-m06.php>
01 May - 31 May 2006 </blog/archives/archive_2006-m05.php>
01 Apr - 30 Apr 2006 </blog/archives/archive_2006-m04.php>
01 Mar - 31 Mar 2006 </blog/archives/archive_2006-m03.php>
01 Feb - 28 Feb 2006 </blog/archives/archive_2006-m02.php>
01 Jan - 31 Jan 2006 </blog/archives/archive_2006-m01.php>
01 Dec - 31 Dec 2005 </blog/archives/archive_2005-m12.php>
01 Nov - 30 Nov 2005 </blog/archives/archive_2005-m11.php>
01 Oct - 31 Oct 2005 </blog/archives/archive_2005-m10.php>
01 Sep - 30 Sep 2005 </blog/archives/archive_2005-m09.php>
01 Aug - 31 Aug 2005 </blog/archives/archive_2005-m08.php>
01 Jul - 31 Jul 2005 </blog/archives/archive_2005-m07.php>
01 June - 30 June 2005 </blog/archives/archive_2005-m06.php>
01 May - 31 May 2005 </blog/archives/archive_2005-m05.php>
01 Apr - 30 Apr 2005 </blog/archives/archive_2005-m04.php>
01 Mar - 31 Mar 2005 </blog/archives/archive_2005-m03.php>
01 Feb - 28 Feb 2005 </blog/archives/archive_2005-m02.php>
01 Jan - 31 Jan 2005 </blog/archives/archive_2005-m01.php>
01 Dec - 31 Dec 2004 </blog/archives/archive_2004-m12.php>
01 Nov - 30 Nov 2004 </blog/archives/archive_2004-m11.php>
01 Oct - 31 Oct 2004 </blog/archives/archive_2004-m10.php>
01 Sep - 30 Sep 2004 </blog/archives/archive_2004-m09.php>
01 Aug - 31 Aug 2004 </blog/archives/archive_2004-m08.php>


      Stuff

Powered by Pivot
<http://www.pivotlog.net/?ver=Pivot+-+1.15%3A+%27Soundwave%27>
XML: RSS feed </blog/rss.xml>
XML: Atom feed </blog/atom.xml>


    <20> </blog/archives/archive_2006-m07.php#e117> <20>
    </blog/pivot/entry.php?id=117>Alpha blending without SIMD support

Now that we've covered averaging bitfields <http://6>, how to
efficiently alpha blend with a factor other than one-half?

Alpha blending is normally done using an operation known as /linear
interpolation/, or /lerp/:

lerp(a, b, f) = a*(1-f) + b*f = a + (b-a)*f

...where a and b are the values to be blended, and f is a blend factor
from 0-1 where 0 gives a and 1 gives b. To blend a packed pixel, you
could just expand all channels of the source and destination pixels and
do the blend in floating point, but it really hurts to see this, since
the code turns out nasty on practically any platform. Unless you've got
a platform that gets pixels in and out of a floating-point vector really
easily, you should use integer math for fast alpha blending.

So how to blend quickly without resorting to per-channel?

First, if you are dealing with an alpha channel instead of a constant
alpha value, chances are that the alpha value ranges from 0 to 2^N-1,
which is not a convenient factor for division. You could cheat and just
divide by 2^N, but that leads to the unpleasant result of
either the fully transparent or opaque case not working correctly
(sloppy). Conditionally adding one to the alpha value fixes this at the
cost of introducing a tiny amount of error. I used to add the high bit,
thus mapping [128,255] to [129,256] for 8-bit values; I'm told that
shifting [1,255] to [2,256] leads to better accuracy. Either way will
prevent the glaring error cases, though.

The next step is to reformulate the blend equation in terms of integer math:

lerp(a, b, f) = a + (((b-a) * f + round) >> shift)

where round = 1 << (shift - 1).

To eliminate some of the pack/unpack work, realize that you can alpha
blend a channel in place as long as you isolate it and have enough
headroom above it in the machine word to accommodate the intermediate
result of the multiply. In other words, instead of extracting red =
(pixel >> 16)&0xff, blending that, and then shifting it back up, simply
blend (pixel & 0x00ff0000).

Now, the magic: you can actually do more than one bitfield this way as
long as you have enough space between them. If you have two
non-overlapping bitfields combined as (a << shift1) + (b << shift2),
multiplying their combined form by an integer gives the same result as
splitting them apart, multiplying each, and then recombining. For a 565
pixel, you could thus blend red and blue in the following manner
(remember that the red/green/blue masks for 565 are 0xf800, 0x07e0, and
0x001f, respectively):

rbsrc = src & 0xf81f
rbdst = dst & 0xf81f
rbout = ((rbsrc * f + rbdst * (32-f) + 0x8010) >> 5) & 0xf81f

Which leads to the surprising result that you can safely subtract the
two bitfields together and scale the difference without any fancy SIMD
bitfield support.

rbout = (rbdst + (((rbsrc - rbdst) * f + 0x8010) >> 5)) & 0xf81f

The remaining green channel is easy. Doing it this way does limit
precision in the blend factor, since you're limited to the number of
bits of headroom you have, but five bits for 565 is decent. If you also
have an alpha channel to blend, you can do so, although you might need
to temporarily shift down green and alpha together to make headroom at
the top of the machine word if you're dealing with a big pixel.

What if you didn't have a hardware multiply, or the one you have is very
slow? Well, you might use lookup tables, then. Ideally, though, you'd
like to avoid inserting and extracting the channels again. One dirty
trick you can use revolves around the fact that you can distribute the
multiplication over the additive nature of bits, thus allowing the
lookup tables to be indexed off the raw bytes instead of the channels:

unsigned blend565[33][2][256];

void init() {
    for(unsigned alpha=0; alpha<=32; ++alpha) {
        unsigned f = alpha;

        for(unsigned i=0; i<256; ++i) {
            blend565[alpha][1][i] = (((i & 0xf8)*f) << 19) + (((i & 0x07)*f) <<  3) + (0x04008010 >> 1);
            blend565[alpha][0][i] = (((i & 0xe0)*f) >>  5) + (((i & 0x1f)*f) << 11);
        }
    }
}

void blend565(unsigned dst, unsigned src, unsigned alpha) {
    unsigned ialpha = 32-alpha;
    unsigned sum = blend565tab[alpha][0][src & 0xff] + blend565tab[ialpha][0][dst & 0xff] + blend565tab[alpha][1][src >> 8] + blend565tab[ialpha][1][dst >> 8];

    sum &= 0xf81f07e0;

    return (sum & 0xffff) + (sum >> 16);
}

It may look odd because we're actually splitting the green bitfield
between the two lookup tables, but it works -- essentially, it's
combining partial products from the lower and upper halves of the green
bitfield. I've also thrown the rounding constant into the tables to save
an addition. The table's rather big at 67K, but if you are doing alpha
blending off of a constant, you can cache pointers to the two pertinent
rows and then only 4K of tables are used, which is much nicer on the
cache. The shifting/masking in the table lookups are also unnecessary if
you load the source pixels as pairs of bytes instead of as words.

Incidentally, if you think about it, this trick can also be used to
convert /any/ bitfield-based 16-bit packed pixel format to /any/ other
bitfield-based pixel format up to 32 bits with a single routine, just by
changing 2K of tables. This generally isn't worthwhile if you have a
SIMD multiplier -- Intel's MMX application notes describe how you can
abuse MMX's pmaddwd instruction to convert 8888 to 565 at about 2.1
clocks/pixel -- but it can be handy if you find yourself without a
hardware multiplier or even a barrel shifter.

%num% comments | Jul 08, 2006 at 00:10 | default


      Comments

*Comments posted:*

I?m a bit confused, can you show how blending is done between an ARGB
(foreground) and RGB32 (background, no alpha data, same bit positions
for RGB values)?

*Blight* - 08 07 06 - 16:44


unsigned blend2(unsigned src, unsigned dst) {
unsigned alpha = src >> 24;
alpha += (alpha > 0);

unsigned srb = src & 0xff00ff;
unsigned sg = src & 0<>00ff00;
unsigned drb = dst & 0xff00ff;
unsigned dg = dst & 0<>00ff00;

unsigned orb = (drb + (((srb ? drb) * alpha + 0<>800080) >> 8)) & 0xff00ff;
unsigned og = (dg + (((sg ? dg ) * alpha + 0<>008000) >> 8)) & 0<>00ff00;

return orb+og;
}

*Phaeron* - 08 07 06 - 17:01


Another nice trick in this area is using premultiplied inverse alpha
when you need to stack together a lot of images with alpha channel
before blending them on top of the destination image/video.

*Haali* - 08 07 06 - 18:22


Thanks Phaeron, quite informative.

Also, you can skip alpha if alpha = 0 or alpha = 255 (just copy the
source/destination at 100%, that actually speeds things considerably).

*Blight* - 09 07 06 - 07:08


      Comment form


*Please keep comments on-topic for this entry.* If you have unrelated
comments about VirtualDub, the forum is a better place to post them.
Name:
Remember personal info?
Yes
No
Email:
Your email address is only revealed to the blog owner and is not shown
to the public.
URL:
Comment:	/ Textile <#>

*An authentication dialog may appear when you click Post Comment.*
Simply type in "user" as the user. I have had to do this to stop
automated comment spam.


*Small print:* All html tags except <b> and <i> will be removed from
your comment. You can make links by just typing the url or mail-address.