SEP
5
2008

Better algorithm for QPainter::fillRect() with raster based painting

In my last blog I found out that Qt is being evil when using QPainter::eraseRect() with a QImage based textured brush. How evil? Well, calling QPainter::fillRect() with the same brush results in something like a 30-50% speedup while achieving the exact same results. Not only that, but the QPainter::eraseRect() codepath makes QImage not thread safe for painting outside the main thread because it is silently using QPixmap behind the scenes. However, this isn't the whole story. I was surprised that even with all this fixed the algorithm is still not optimal.

In fact, you can see a very substantial increase in performance using a fractal based filling and tiling algorithm. Here are the results for a 10000x10000 QImage for tiling operations:


Algorithm Brush msecs
----------------------------------------
naive checkered brush 1186
eraseRect checkered brush 4445
fillRect checkered brush 704
fractalFill checkered brush 284



As you can see, the QPainter::eraseRect() is evil. Just changing to QPainter::fillRect() provides a very big decrease in the amount of time to paint a pattern. However, the fractalFill algorithm is still much faster. It cuts that time by more than half. So what is this algorithm?


void ...
{
QImage image(WIDTH, HEIGHT, FORMAT);

/* create our pattern */
QImage base(20, 20, FORMAT);
QPainter p1(&base);
p1.setCompositionMode(QPainter::CompositionMode_Source);
p1.fillRect(0, 0, 10, 10, Qt::gray);
p1.fillRect(10, 0, 10, 10, Qt::white);
p1.fillRect(0, 10, 10, 10, Qt::white);
p1.fillRect(10, 10, 10, 10, Qt::gray);

QPainter p(&image);
p.setCompositionMode(QPainter::CompositionMode_Source);

/* draw the pattern once at 0,0 */
p.drawImage(0, 0, base);

const int imageW = image.width();
const int imageH = image.height();
int w = base.width();
int h = base.height();
while (w < imageW || h < imageH) {
if (w < imageW) {
/* Copy and draw the existing pattern to the right */
p.drawImage(QRect(w, 0, w, h), image, QRect(0, 0, w, h));
/* Update width of our pattern */
w *= 2;
}
if (h < imageH) {
/* Copy and draw the existing pattern to the bottom */
p.drawImage(QRect(0, h, w, h), image, QRect(0, 0, w, h));
/* Update height of our pattern */
h *= 2;
}
}
}


This would be even faster if Qt were smarter about the overloads of QPainter::drawImage(). The real inline drawImage overload that does the heavy lifting takes a QRect. This forces the other overloads to create and destroy a QRect which is minimal overhead, but in a tight loop this cost starts starts to grow.

Implementing this algorithm and seeing the improvement led icefox and me to investigate whether we could use the same idea to speedup the general case of QPainter::fillRect() with a solid brush. We didn't have much hope it could be sped up as this is a fundamental painting routine in Qt, but surprise this algorithm does give a nice speedup :)


Algorithm Brush msecs
----------------------------------------
fillRect solid brush took 508
fractalFill solid brush took 293


This is a large performance increase in a fundamental painting routine. This same idea could be used inside of QPainter::eraseRect(), QPainter::fillRect(), and hopefully a future QPainter::drawTiledImage(). Hopefully the Trolls can investigate and see if something like this should go into Qt 4.5.

Here are the two methods side by side:

QImage fillRectSolid()
{
QImage image(WIDTH, HEIGHT, FORMAT);
QPainter p(&image);
p.setCompositionMode(QPainter::CompositionMode_Source);
p.fillRect(QRect(0, 0, WIDTH, HEIGHT), Qt::red);
return image;
}

QImage fractalFillSolid()
{
QImage image(WIDTH, HEIGHT, FORMAT);

QPainter p(&image);
p.setCompositionMode(QPainter::CompositionMode_Source);
p.fillRect(0, 0, 1, 1, Qt::red);

const int imageW = image.width();
const int imageH = image.height();
int w = 1;
int h = 1;
while (w < imageW || h < imageH) {
if (w < imageW) {
p.drawImage(QRect(w, 0, w, h), image, QRect(0, 0, w, h));
w *= 2;
}
if (h < imageH) {
p.drawImage(QRect(0, h, w, h), image, QRect(0, 0, w, h));
h *= 2;
}
}
return image;
}



A few notes: this speedup only occurs with non-alpha channel QImage::Format's or at least pre-multiplied. With the inline overloads rearranged again it'd be even faster. I hope this is useful. Read on for the source to all of these tests...

Here you go. Happy playing!


#include

#define WIDTH 10000
#define HEIGHT 10000
#define FORMAT QImage::Format_RGB16

QImage naiveCheckered()
{
QImage image(WIDTH, HEIGHT, FORMAT);
QPainter p(&image);
p.setCompositionMode(QPainter::CompositionMode_Source);
int w = image.width();
int h = image.height();
QBrush brush(Qt::gray);
for (int j = 0; j < h; j+= 10)
for (int i = (j % 20 == 0) ? 0 : 10; i < w;i += 20)
p.fillRect(i, j, 10, 10, brush);

return image;
}

QImage eraseRectCheckered()
{
QImage image(WIDTH, HEIGHT, FORMAT);

QImage base(20, 20, QImage::Format_RGB16);
QPainter p1(&base);
p1.fillRect(0, 0, 10, 10, Qt::gray);
p1.fillRect(10, 0, 10, 10, Qt::white);
p1.fillRect(0, 10, 10, 10, Qt::white);
p1.fillRect(10, 10, 10, 10, Qt::gray);

QPainter p(&image);
p.setCompositionMode(QPainter::CompositionMode_Source);
p.setBackground(QBrush(base));
p.eraseRect(QRect(0, 0, image.width(), image.height()));

return image;
}

QImage fillRectCheckered()
{
QImage image(WIDTH, HEIGHT, FORMAT);

QImage base(20, 20, QImage::Format_RGB16);
QPainter p1(&base);
p1.fillRect(0, 0, 10, 10, Qt::gray);
p1.fillRect(10, 0, 10, 10, Qt::white);
p1.fillRect(0, 10, 10, 10, Qt::white);
p1.fillRect(10, 10, 10, 10, Qt::gray);

QPainter p(&image);
p.setCompositionMode(QPainter::CompositionMode_Source);
p.fillRect(QRect(0, 0, image.width(), image.height()), QBrush(base));

return image;
}

QImage fractalFillCheckered()
{
QImage image(WIDTH, HEIGHT, FORMAT);

QImage base(20, 20, FORMAT);
QPainter p1(&base);
p1.setCompositionMode(QPainter::CompositionMode_Source);
p1.fillRect(0, 0, 10, 10, Qt::gray);
p1.fillRect(10, 0, 10, 10, Qt::white);
p1.fillRect(0, 10, 10, 10, Qt::white);
p1.fillRect(10, 10, 10, 10, Qt::gray);

QPainter p(&image);
p.setCompositionMode(QPainter::CompositionMode_Source);
p.drawImage(0, 0, base);

const int imageW = image.width();
const int imageH = image.height();
int w = base.width();
int h = base.height();
while (w < imageW || h < imageH) {
if (w < imageW) {
p.drawImage(QRect(w, 0, w, h), image, QRect(0, 0, w, h));
w *= 2;
}
if (h < imageH) {
p.drawImage(QRect(0, h, w, h), image, QRect(0, 0, w, h));
h *= 2;
}
}
return image;
}

QImage fillRectSolid()
{
QImage image(WIDTH, HEIGHT, FORMAT);
QPainter p(&image);
p.setCompositionMode(QPainter::CompositionMode_Source);
p.fillRect(QRect(0, 0, WIDTH, HEIGHT), Qt::red);
return image;
}

QImage fractalFillSolid()
{
QImage image(WIDTH, HEIGHT, FORMAT);

QPainter p(&image);
p.setCompositionMode(QPainter::CompositionMode_Source);
p.fillRect(0, 0, 1, 1, Qt::red);

const int imageW = image.width();
const int imageH = image.height();
int w = 1;
int h = 1;
while (w < imageW || h < imageH) {
if (w < imageW) {
p.drawImage(QRect(w, 0, w, h), image, QRect(0, 0, w, h));
w *= 2;
}
if (h < imageH) {
p.drawImage(QRect(0, h, w, h), image, QRect(0, 0, w, h));
h *= 2;
}
}
return image;
}

void makeTab(QTabWidget *w, const QImage &image, const QString &text)
{
QLabel *label = new QLabel;
label->setPixmap(QPixmap::fromImage(image));
w->addTab(label, text);
}

int main (int argc, char **argv)
{
QApplication application (argc, argv);

qDebug() << "starting tests...";

QTime time;
time.start();

QImage naiveCheckImage = naiveCheckered();
qDebug() << "naive \t\tcheckered brush took\t" << time.restart();

QImage eraseRectCheckImage = eraseRectCheckered();
qDebug() << "eraseRect \tcheckered brush took\t" << time.restart();

QImage fillRectCheckImage = fillRectCheckered();
qDebug() << "fillRect \tcheckered brush took\t" << time.restart();

QImage fractalFillCheckImage = fractalFillCheckered();
qDebug() << "fractalFill \tcheckered brush took\t" << time.restart();

QImage fillRectSolidImage = fillRectSolid();
qDebug() << "fillRect \tsolid brush took\t" << time.restart();

QImage fractalFillSolidImage = fractalFillSolid();
qDebug() << "fractalFill \tsolid brush took\t" << time.restart();

#if 0
QTabWidget w;

makeTab(&w, naiveCheckImage, QLatin1String("naive checkered"));
makeTab(&w, eraseRectCheckImage, QLatin1String("eraseRect checkered"));
makeTab(&w, fillRectCheckImage, QLatin1String("fillRect checkered"));
makeTab(&w, fractalFillCheckImage, QLatin1String("fractalFill checkered"));
makeTab(&w, fillRectSolidImage, QLatin1String("fillRect solid"));
makeTab(&w, fractalFillSolidImage, QLatin1String("fractalFill solid"));

w.show();
return application.exec();
#else
return 0;
#endif
}

Comments

Maybe you should file a bug about this in Qt's bugtracker. I guess they don't accept patches - maybe something Nokia should start to think about.


By superstoned at Fri, 09/05/2008 - 18:03

I don't understand.

eraseRect should translate into an XRender solid draw of rop type clear. This should be accelerated on almost any card.

Most hardware acceleration drivers won't accelerate if the pixmap is larger than 4096x4096 (or possibly 2048x2048, depending on driver), which would make the 10000x10000 QImage fail acceleration in ways that no real pixmap would.


By John Tapsell at Fri, 09/05/2008 - 18:05

This is for raster based painting as I made clear in the title. The eraseRect should not convert a QImage based texture to a QImage as this is *wrong*. The X server does not even enter into this discussion.


By Adam Treat at Fri, 09/05/2008 - 18:39

Did you file a bug at Qt's bugtracker? I suppose it isn't possible to add a patch over there. Maybe something Nokia should think about adding, after all, opening up is good, right?


By superstoned at Fri, 09/05/2008 - 18:22

When playing with KDE4 I noticed a massive CPU usage when simply hovering over listviews (not sure what exact class that is), for instance the plugin lists in KRunner config or KWin effects config, also the process list in the process manager (accessible via KRunner). Simply hovering over the lists elements makes my CPU (Pentium M 1,4GHz) go up to 80-90% (half Xorg and half application). Could this be the same cause?


By psychotron at Fri, 09/05/2008 - 19:44

What???


By Adam Treat at Fri, 09/05/2008 - 19:57

:) You talked about inefficient painting in Qt, if I understood that right? And I experienced heavy CPU usage in the cases described in my comment, which seem to be related to the painting of the highlighed elements. (If you hover over the list, one item after the other is painted highlighted) So I wondered, whether your findings are related to that... but maybe I'm on the wrong track here :)


By psychotron at Fri, 09/05/2008 - 21:33

Cool that you are providing these benchmarks, we're always looking for points of improvement in the Qt graphics system.

For 4.5 we are investigating several performance areas and this case is already covered. At least on my local build of Qt 4.5 optimization branch, the fractal filling runs slower than a plain fillRect() or drawTiledPixmap when done on a QImage.

Also, I noticed that you are using RGB16 as the format. Which format you use usually have a lot to say. For instance the ARGB32_Premultiplied format is highly optimized and runs fillRect() roughly twice as fast, because of MMX and SSE, at least on x86.


By gunnar.sletta at Mon, 09/08/2008 - 13:42

Yes, I ran it on ARGB32_Premultiplied and the fractal fill is still faster. I'm glad to hear that your local branch of Qt 4.5 optimization is even faster still. That's awesome! Do you have any idea when this will hit the snapshots?

Also, are you planning on introducing drawTiledImage? As no matter if it is faster, it is just plain wrong to convert a QImage to a QPixmap for tiling purposes especially when the canvas device you are drawing to is QImage.


By Adam Treat at Mon, 09/08/2008 - 14:27

Maybe of little interest now, since it turns out that Qt 4.5 will have a faster implementation yet, but I ran the benchmark and thought you might be interested in seeing the numbers. Of course then I had to wait for my account to clear before I could post a comment :p

The numbers are at:
http://pastebin.com/m40bd6a5a (MacOS X, Macbook Pro Core 2 Duo)


By giddie at Mon, 09/08/2008 - 15:45

Pages