Earlier I talked about the performance benefits of moving code from ruby into small sections of C code. The good news is that not only is this fairly effective, but the interfacing is fairly painless.

The test I did were centered around floating point calculations (this is what I do in my research work, so that is what is important to me.) Integer only calcualtions are a totally different animal. My tests centered on the following totally arbitrary calculation

```
#pure_ruby.rb
total = 0.0
1.0.step(2000.0,0.0001) do |x|
result = (5.4*x**5 - 3.211*x**4 + 100.3*x**2 - 100 +
20*Math.sin(x) - Math.log(x)) * 20*Math.exp(-x/100.3)
total += result / 0.0001
end
puts total
```

Which is the type of calculation you would see in a numerical integration type of calculation. Atomic potentials typically have these type of terms; I picked random coefficients.

This particular calculation on my iMac runs in just under 100 seconds (1.5 min), which is slow. The slowest part of the program is the one hefty calculation, the only other major contributor is the looping itself. The first test was to simply move the formula to C, which gives

```
#c_calc_call.rb
require "mathc"
mc = MathC.new
total = 0.0
1.0.step(2000.0,0.0001) do |x|
result = mc.calc(x)
total += result / 0.0001
end
puts total
```

and

```
/* math.c */
#include "ruby.h"
#include "stdio.h"
#include "math.h"static VALUE t_init(VALUE self)
{
return self;
}
static VALUE t_calc(VALUE self, VALUE x) {
double newx;
double result;
newx = NUM2DBL(x);
result = (5.4 * pow(newx,5) - 3.211 * pow(newx, 4) +
100.3 * pow(newx,2) - 100 + 20 * sin(newx) -
log(newx)) * 20 * exp(-newx/100.3);
return rb_float_new(result);
}
VALUE cTest;
void Init_mathc() {
cTest = rb_define_class("MathC", rb_cObject);
rb_define_method(cTest, "initialize", t_init, 0);
rb_define_method(cTest, "calc", t_calc, 1);
}
```

the extconf.rb file is the same as in my earlier example, with some of the names changed.

This version runs in 25 seconds, **almost 4x faster**. (By the way, I do “times faster” as slow time / fast time, or how many times would the fast one run before the slow one finished. I don’t know if there is a more accepted way to do this.)

The final test I did was also to move the looping code into C. This makes the ruby part essentially a shell.

```
#c_loop_call.rb
require "mathc"
mc = MathC.new
total = 0.0
total = mc.bigcalc()
puts total
```

now all of the work is in the C code

```
/* mathc.c */
#include "ruby.h"
#include "stdio.h"
#include "math.h"static VALUE t_init(VALUE self)
{
return self;
}
static VALUE t_calc(VALUE self, VALUE x) {
double newx;
double result;
newx = NUM2DBL(x);
result = (5.4 * pow(newx,5) - 3.211 * pow(newx, 4) +
100.3 * pow(newx,2) - 100 + 20 * sin(newx) -
log(newx)) * 20 * exp(-newx/100.3);
return rb_float_new(result);
}
static VALUE t_bigcalc(VALUE self) {
double total = 0.0;
double x;
double result;
for (x=1.0; x <= 2000.0; x += 0.0001) {
result = (5.4 * pow(x,5) - 3.211 * pow(x, 4) +
100.3 * pow(x,2) - 100 + 20 * sin(x) -
log(x)) * 20 * exp(-x/100.3);
total += result/0.0001;
}
return rb_float_new(total);
}
VALUE cTest;
void Init_mathc() {
cTest = rb_define_class("MathC", rb_cObject);
rb_define_method(cTest, "initialize", t_init, 0);
rb_define_method(cTest, "calc", t_calc, 1);
rb_define_method(cTest, "bigcalc", t_bigcalc, 0);
}
```

This bit runs in **8 seconds**, 12x faster than the original code, and another 3x faster than just using the calculation as a function call.

Just for fun, I wrote the entire program in C, just to see how much overhead was inherent in the ruby setup. I’m including this just for completeness.

```
/* purec.c */
#include "math.h"
#include "stdio.h"int main() {
double total = 0.0;
double x;
double result;
for (x=1.0; x <= 2000.0; x += 0.0001) {
result = (5.4 * pow(x,5) - 3.211 * pow(x, 4) +
100.3 * pow(x,2) - 100 + 20 * sin(x) -
log(x)) * 20 * exp(-x/100.3);
total += result/0.0001;
}
printf("%en", total);
}
```

This bit ran in just over 7 seconds. Given that the initialization cost is something you pay once, there aren’t many situations where this degree of optimization would be necessary.

The final results

```
Pure Ruby 98.483 s 1.0 x
C call only 25.853 s 3.8 x
C call + loop 7.911 s 12.4 x
Pure C 7.172 s 13.7 x
```

This example is simple enough that I would consider these *ideal* speedups. A real application will not likely see these gains, depending on how much code is really the bottleneck. What these numbers are valuable for is a measure of what to expect. It should be realatively easy for any computationally intensive program to get to the 3-4 x level. If you need something in the 10 x range, you’re going to have to work much harder. If you need better than that, either your algorithm needs work, or this approach is just not going to work.