C Extensions in Ruby – Benchmarks

Earlier I talked about the performance benefits of moving code from ruby into small sections of C code. The good news is that not only is this fairly effective, but the interfacing is fairly painless.

The test I did were centered around floating point calculations (this is what I do in my research work, so that is what is important to me.) Integer only calcualtions are a totally different animal. My tests centered on the following totally arbitrary calculation


#pure_ruby.rb
total = 0.0 

1.0.step(2000.0,0.0001) do |x| 

result = (5.4*x**5 - 3.211*x**4 + 100.3*x**2 - 100 +
     20*Math.sin(x) - Math.log(x)) * 20*Math.exp(-x/100.3)
total += result / 0.0001 

end 

puts total

Which is the type of calculation you would see in a numerical integration type of calculation. Atomic potentials typically have these type of terms; I picked random coefficients.

This particular calculation on my iMac runs in just under 100 seconds (1.5 min), which is slow. The slowest part of the program is the one hefty calculation, the only other major contributor is the looping itself. The first test was to simply move the formula to C, which gives


#c_calc_call.rb
require "mathc"
mc = MathC.new
total = 0.0 

1.0.step(2000.0,0.0001) do |x| 

result = mc.calc(x)
   total += result / 0.0001 

end 

puts total

and


/* math.c */
 #include "ruby.h"
 #include "stdio.h"
 #include "math.h"static VALUE t_init(VALUE self)
 {
     return self;
 } 

static VALUE t_calc(VALUE self, VALUE x) {
     double newx;
     double result; 

newx = NUM2DBL(x); 

result = (5.4 * pow(newx,5) - 3.211 * pow(newx, 4) +
         100.3 * pow(newx,2) - 100 + 20 * sin(newx) -
         log(newx)) * 20 * exp(-newx/100.3); 

return rb_float_new(result);
 } 

VALUE cTest; 

void Init_mathc() {
     cTest = rb_define_class("MathC", rb_cObject);
     rb_define_method(cTest, "initialize", t_init, 0);
     rb_define_method(cTest, "calc", t_calc, 1);
 } 

the extconf.rb file is the same as in my earlier example, with some of the names changed.

This version runs in 25 seconds, almost 4x faster. (By the way, I do “times faster” as slow time / fast time, or how many times would the fast one run before the slow one finished. I don’t know if there is a more accepted way to do this.)

The final test I did was also to move the looping code into C. This makes the ruby part essentially a shell.


#c_loop_call.rb
 require "mathc"
 mc = MathC.new
 total = 0.0 

total = mc.bigcalc() 

puts total 

now all of the work is in the C code


/* mathc.c */
 #include "ruby.h"
 #include "stdio.h"
 #include "math.h"static VALUE t_init(VALUE self)
 {
     return self;
 } 

static VALUE t_calc(VALUE self, VALUE x) {
     double newx;
     double result; 

newx = NUM2DBL(x); 

result = (5.4 * pow(newx,5) - 3.211 * pow(newx, 4) +
         100.3 * pow(newx,2) - 100 + 20 * sin(newx) -
         log(newx)) * 20 * exp(-newx/100.3); 

return rb_float_new(result);
 } 

static VALUE t_bigcalc(VALUE self) {
     double total = 0.0;
     double x;
     double result; 

for (x=1.0; x <= 2000.0; x += 0.0001) {
         result = (5.4 * pow(x,5) - 3.211 * pow(x, 4) +
             100.3 * pow(x,2) - 100 + 20 * sin(x) -
             log(x)) * 20 * exp(-x/100.3);
         total += result/0.0001;
     } 

return rb_float_new(total);
 } 

VALUE cTest; 

void Init_mathc() {
     cTest = rb_define_class("MathC", rb_cObject);
     rb_define_method(cTest, "initialize", t_init, 0);
     rb_define_method(cTest, "calc", t_calc, 1);
     rb_define_method(cTest, "bigcalc", t_bigcalc, 0);
 } 

This bit runs in 8 seconds, 12x faster than the original code, and another 3x faster than just using the calculation as a function call.

Just for fun, I wrote the entire program in C, just to see how much overhead was inherent in the ruby setup. I’m including this just for completeness.


/* purec.c */
 #include "math.h"
 #include "stdio.h"int main() { 

double total = 0.0;
     double x;
     double result; 

for (x=1.0; x <= 2000.0; x += 0.0001) {
         result = (5.4 * pow(x,5) - 3.211 * pow(x, 4) +
             100.3 * pow(x,2) - 100 + 20 * sin(x) -
             log(x)) * 20 * exp(-x/100.3);
         total += result/0.0001;
     }
     printf("%en", total);
 } 

This bit ran in just over 7 seconds. Given that the initialization cost is something you pay once, there aren’t many situations where this degree of optimization would be necessary.

The final results


Pure Ruby       98.483 s        1.0 x
C call only     25.853 s        3.8 x
C call + loop    7.911 s       12.4 x
Pure C           7.172 s       13.7 x
 

This example is simple enough that I would consider these ideal speedups. A real application will not likely see these gains, depending on how much code is really the bottleneck. What these numbers are valuable for is a measure of what to expect. It should be realatively easy for any computationally intensive program to get to the 3-4 x level. If you need something in the 10 x range, you’re going to have to work much harder. If you need better than that, either your algorithm needs work, or this approach is just not going to work.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: