Friday, March 6, 2009

float cast underflow slow

Today I noticed that some of code that just blended some images was being really slow dependent on some of the coefficients used in the blending. After checking the coefficients for strange values (nans/infs), for which there were none, it was still slow. The problem was only appearing in one function used for blending (some integer code that used floating coefficients that were casted from doubles). The problem didn't appear when strictly using floating point arithmetic. Turns out it is due to slow casting from double to float when there is underflow.

Some timing results for casting from double to double (d), double to float (f), double to integer (i), for test program, included below:

Without optimization
time for d (init=1e-30) 0.013476
time for f (init=1e-30) 0.012164
time for i (init=1e-30) 0.016217
time for d (init=1e-50) 0.019088
time for f (init=1e-50) 0.229608
time for i (init=1e-50) 0.019010

Without optimization (-O3)
time for d (init=1e-30) 0.005358
time for f (init=1e-30) 0.004038
time for i (init=1e-30) 0.003713
time for d (init=1e-50) 0.004566
time for f (init=1e-50) 0.135538
time for i (init=1e-50) 0.003527

You can see the floating point cast is something like 30x slower for the value of 1e-50 (where there is underflow) as opposed to the case of 1e-30 where there is no underflow.

Try the code for yourself.


#!/bin/sh

cat <<EOF > _f_.cc
#include
#include
#include
#include
#include

template
void test(int m, double init){
double * v = new double[m];
T * f = new T[m];
struct timeval tv;
gettimeofday(&tv, 0);

for(int its=0; its< 20; its++){
v[0] = init;
double * vptr = v;
for(int i=0; i< m; i++){
*(vptr++) = v[0];
f[i] = ((T)v[i]);
}
}
struct timeval tva, el;
gettimeofday(&tva, 0);
timersub(&tva, &tv, &el);
printf("time for %s (init=%g) %lf\n", typeid(T).name(), init, double(el.tv_sec) + double(el.tv_usec)/1e6);
delete [] f;
delete [] v;
}

int main(int ac, char * av[]){
int m = 100000;

test(m, 1e-30);
test(m, 1e-30);
test(m, 1e-30);

test(m, 1e-50);
test(m, 1e-50);
test(m, 1e-50);
return 0;
}
EOF

echo "Without optimization"
g++ _f_.cc -lm -o _f_
./_f_

echo "Without optimization (-O)"
g++ _f_.cc -lm -o _f_ -O
./_f_

echo "Without optimization (-O3)"
g++ _f_.cc -lm -o _f_ -O3
./_f_

echo "Without optimization (-O3)"
g++ _f_.cc -lm -o _f_ -O3 -mmmx -msse
./_f_

rm _f_.cc _f_

No comments: