Ruby used for computation book

June 23, 2007

I ran across a neat book a little while ago that was designed as an introduction to numerical calculation in the context of gravitational science, starting with a 2 body system and going through many body. It has a good introduction to things like different order integrators, energy conservation, and stability. They decided to use Ruby for the book as a simpler introduction to programming than traditional languages. I think the intention is to include this in a classroom setting, so it will be interesting to see how that works out.

From my personal experience with this type of class (numerical programming, not gravitation) I really think it will help, because so much overhead of the class is dedicated to how to get the compiler to work and all of the wrangling with things that aren’t really important. There is a point where you need to introduce some C code for performance reasons (which is also discussed in one of the chapters), but the nice thing about Ruby is that you can do so much of the prototyping and “learning” in Ruby, then replace a very small amount of code and get tremendous performance gains. It would be very easy for the C part to be provided to a class once students have proven that they can implement the algorithm in Ruby.

Advertisements

Reading a file with Ruby

June 22, 2007

I intended to write this just after the file writing post, but got sidetracked with other ideas. The basic setup is like writing to a file.

f = File.new(filname)

There are two easy methods, and then one with more control.

The easy way

If all you want is the text of a small file, then simply use

text = f.read

which dumps the entire file into a string, with the \n characters embedded inside. This seems to be most useful for files that you are going to do something to, and then write back out. This is what I used for my template system. I didn’t want to have to worry about tags that happened to be split across lines.

text = f.readlines

is interesting in that it returns an array, where the elements of the array are the lines in the file. This is easiest to use when each line of the file is an individual ‘element’ (whatever that means for your situation.)

Also, you can read a file multiple times by calling #rewind, which resets you at the beginning of the file. The downside to both of these methods is that they read the entire file into memory. If the file is too large, you could exhaust the available memory, which would certainly cause bad things.

The hard way

The method with the most control is #gets. This returns the next line of the file, and also updates the #lineno attribute to hold the current line number. If you need (say) the fourth line of the file, you can do so with

3.times {f.gets}
text = f.gets

I have used this with some of the larger data files that our simulation programs produce. Since some of these data files can be very large, I wasn’t sure I wanted to read the whole thing into memory at once. The way I got around this was to read the file, and build a table of contents array with a new entry when I found a new timestep (which had distinctive text). Then I could rewind the file and read in only the timestep I was interested in. To get another timestep you simply repeat the process.

What about the \n?

There is a built in method to deal with the extra \n’s you will get with readlines or gets. The #chomp method will remove any number of newline characters (and is smart enough to deal with newline, carriage return, or both with one call.) Unless you need the newlines for some reason, all of you calls will likely be

text = f.gets.chomp

or

text = f.readlines.map {|line| line.chomp}

and thats all there is to it. This is fairly simple compared to some other languages I’ve used, and is one of the reasons I really like using Ruby.


Easy Installation of Ruby Classes

June 21, 2007

Just a quick tidbit, but I’ve got a copule of classes that I consistently require, and it gets a bit redundant to include particular directories. Its not something that I’m ready to make a gem out of, but I looked into the standard Ruby inlude path. I beleive the paths are fairly standard, but you can always check this using irb

irb(main):001:0> $:

will display the include path.

If you copy .rb files to any of these directories (probably requiring sudo) you can require them from anywhere. I know there are other ways to do this, but this is a very quick way to do it.


YAML for serialization

June 20, 2007

One of the projects I’m working on for research has to deal with atomic potentials, or a way to try to describe the forces between atoms when they are in a certain configuration. The more realistic ones are fairly complicated functions.

One of the tricks in dealing with potentials is to try to plot some specific configurations, and see how the energy varies. ‘Wells’ represent stable points where atoms will tend to sit, while ‘humps’ are barriers between stable points that an atom must overcome somehow. The height of the hump gives you an idea of how stable the low points are.

Plotting one of these potentials is not super difficult, just looping over some positions and (in my case) outputting the positions and energies to a file, which is then plotted with gnuplot. One of the things I wanted to do, however, is to plot a bunch of variations with one of the parameters swept over a range of values. One way would be to create the potential as an object and modify the parameters that way. What I wanted to do, is have parameter file that I read in to setup the potential.

Enter YAML

It turns out that this is essentially built into ruby, through the YAML methods that are designed for marshaling objects over various communication methods between computers/processes. All you have to do is

require 'yaml' # you may not need this line
thing.to_yaml

and you get a string representation of the object. You can then save it to a file. After this, restoring the object is as simple as

YAML.load(string)

or

YAML.load_file(filename)

Even complicated things

One of the neatest things about this is that nested objects translate seamlessly with this. For example, I have to following files that I work with.

#atom.rb
class Atom
  attr_accessor :x, :y, :z

  def initialize
    @x = @y = @z = 0.0
  end

  def r_ij(atom_two)
    r_squared = (@x-atom_two.x)**2 + (@y-atom_two.y)**2 + (@z-atom_two.z)**2
    return Math.sqrt(r_squared)
  end
end

and

#molecule.rb
class Molecule
  attr_accessor :atoms
  def initialize
    @atoms = []
  end
end

which lets you do this

imac:~/code/ruby-yaml brian$ irb
irb(main):001:0> require 'atom'
=> true
irb(main):002:0> require 'molecule'
=> true
irb(main):003:0> require 'yaml'
=> true
irb(main):004:0> a = Atom.new
=> #<Atom:0x59e304 @x=0.0, @z=0.0, @y=0.0>
irb(main):005:0> a.x,a.y,a.z = [1.0,2.0,3.0]
=> [1.0, 2.0, 3.0]
irb(main):006:0> b = Atom.new
=> #<Atom:0x598774 @x=0.0, @z=0.0, @y=0.0>
irb(main):007:0> b.x,b.y,b.z = [4.0,5.0,6.0]
=> [4.0, 5.0, 6.0]
irb(main):008:0> m = Molecule.new
=> #<Molecule:0x592b08 @atoms=[]>
irb(main):009:0> m.atoms.push a
=> [#<Atom:0x59e304 @x=1.0, @z=3.0, @y=2.0>]
irb(main):010:0> m.atoms.push b
=> [#<Atom:0x59e304 @x=1.0, @z=3.0, @y=2.0>, 
    #<Atom:0x598774 @x=4.0, @z=6.0, @y=5.0>]
irb(main):011:0> print save = m.to_yaml
--- !ruby/object:Molecule 
atoms: 
- !ruby/object:Atom 
  x: 1.0
  y: 2.0
  z: 3.0
- !ruby/object:Atom 
  x: 4.0
  y: 5.0
  z: 6.0
=> nil
irb(main):013:0> new = YAML.load(save)
=> #<Molecule:0x586cf4 
           @atoms=[#<Atom:0x58726c @x=1.0, @z=3.0, @y=2.0>, 
                   #<Atom:0x586f38 @x=4.0, @z=6.0, @y=5.0>]>
irb(main):014:0>

YAML is plain text

From my perspective, the real advantage of doing this is having the output in a plain text file. I can then edit the text in an editor and tweak things this way. I can also markup the text and run it through a template engine and generate a whole bunch of files. This was part of the magic behind my rake example, where I generated 800 or so configurations in an hour or so, which was very cool.

Edit: tweaked the formatting in the irb section to see the code better


Rake tricks

June 19, 2007

Recently, I had some plots I was trying to generate. The neat part was that everything broke down into little peices. I looked at using Rake to put everything together. The nice thing about Rake is being able to set up the rules for generating each piece, and letting the computer sort out what needs to be done.

Here are the steps I needed

  1. Generate .yaml files (I was doing a parameter sweep of a particular function)
  2. Run a script that takes the .yaml file and outputs x,y,z triples as a .dat file
  3. Generate a script for gnuplot to plot the .dat files as .png
  4. Run gnuplot with the script

I scoured a bunch of online resources about Rake. The rule and file list tricks seem to not be covered as much (or I’m not very bright) and they took me a long time to figure out. On the off chance that someone else isn’t clear about them, here is the Rakefile I used (with the calculation parts removed.)

SRC = FileList['*.yaml']
DAT = SRC.ext('dat')
GP = SRC.ext('gp')
PNG = SRC.ext('png')

task :default => [:png]

task :png => [:dat, :gp]

task :dat => DAT
task :gp => GP
task :png => PNG

rule '.png' => ['.gp', '.dat'] do |f|
  puts "png test #{f.name} #{f.source}"
  sh "touch #{f.name}"
end

rule '.dat' => ['.yaml'] do |f|
  puts "dat test #{f.name} #{f.source}"
  sh "touch #{f.name}"
end

rule '.gp' => ['.yaml'] do |f|
  puts "gp test #{f.name} #{f.source}"
  sh "touch #{f.name}"
end

So the SRC variable becomes a list of all the .yaml files. Changing the extension to png, and making a task of :png => PNG, makes a set of file tasks for the (non-existant) .png files. The rules then give the details of how to generate the .png from the required files.

The thing that was most interesting about this was the ability to define the rules to generate files, without having to specify all the steps for each file, which would be pages and pages (I ended up with over 800 .yaml files). All I had to do was generate the .yaml files, and then run rake.

Refrences

Rake Docs

Rake Tutorial from Jim Weirich (creator of Rake)Update: Fixed link

Rake Documentation

Automating tasks with Rake (registration required)


Templator – Ruby Template System

June 18, 2007

Most simulation software is based on a text input file. Mostly I’m doing molecular dynamics simulation using LAMMPS. One input file from the samples is the following

# 3d Lennard-Jones melt

units       lj
atom_style  atomic

lattice     fcc 0.8442
region      box block 0 20 0 20 0 20
create_box  1 box
create_atoms    1
mass        1 1.0

velocity    all create 3.0 87287

pair_style  lj/cut 2.5
pair_coeff  1 1 1.0 1.0 2.5

neighbor    0.3 bin
neigh_modify    every 20 delay 0 check no

fix     1 all nve

dump        id all atom 10 dump.melt

thermo      50
run     250

So the input file has the setup of where the atoms go, what the temperature is, what type of simulation you are doing, where the output files record and what the filenames are. If you are doing a sweep of some parameter over a large range it is a bit annoying to have to make a bunch of copies of the file, tweak them, and keep everything in sync. I usually like to have the output files have the major parameter in the filename and it is very easy to get these out of sync with the actual parameter.

I started looking for a simple template system with Ruby, where I could specify a template file, and then somehow loop over a range of numbers and replace specific parts.

So, what I wanted was something to take a template file like this…

regular text with <% title %> embedded
inside that I want to try to replace
and variable p equals <% velocity %>
along with another <%title %> tag, since
thats the point. And another <% title%> for
good measure.

and pass it to a function, along with an option hash, and have it return this…

regular text with test.0.5 embedded
inside that I want to try to replace
and variable p equals 0.5
along with another test.0.5 tag, since
thats the point. And another test.0.5 for
good measure.

I wrote up some code to take care of this. The base code I came up with is this…

class Templator
  def Templator.generate(template, options)
    #template is text string of the template file
    #options is a hash of things to replace

    #currently not checking they match up

    tag_regex = /<%s*w+_*w*s*%>/
    hits = template.scan(tag_regex)
    tags = hits.map {|item| item.chomp('%>').reverse.chomp('%<').reverse.strip}
    tags.map! {|a| a.intern}
    tags.uniq!

    tags.inject(template) {|ntext,tag|
        ntext.gsub(Templator.symbol_to_tag_regex(tag),
        options[tag].to_s)}
  end

  def Templator.symbol_to_tag_regex(tag_name)
    Regexp.new('<%s*' + tag_name.to_s + 's*%>')
  end
end

This design is one that is going to be called from Ruby code, and not from the command line. I designed this to be extremely flexible, and easy to automate for, say, a hundred files or so. All you have to do is read in the template file, build the options hash, and pass the two to the Templator#generate method. Here is the driver code I used.

require 'templator'

template = File.open('template.txt') {|f| f.read}

0.5.step(1.5, 1.0) do |x|
  opt = {:title => "test." + x.to_s, :velocity => x}
  newtext = Templator.generate(template, opt)
  filename = "test-file-" + x.to_s
  File.open(filename,'w') {|f| f.print newtext }
end

There are certainly more options for different purposes, but this one did what I wanted. Feel free to use this code if you’re in a similar situation, or if there is something you think it should do that it doesn’t, let me know and I’ll take a look at it.


C Extensions in Ruby – Benchmarks

June 15, 2007

Earlier I talked about the performance benefits of moving code from ruby into small sections of C code. The good news is that not only is this fairly effective, but the interfacing is fairly painless.

The test I did were centered around floating point calculations (this is what I do in my research work, so that is what is important to me.) Integer only calcualtions are a totally different animal. My tests centered on the following totally arbitrary calculation


#pure_ruby.rb
total = 0.0 

1.0.step(2000.0,0.0001) do |x| 

result = (5.4*x**5 - 3.211*x**4 + 100.3*x**2 - 100 +
     20*Math.sin(x) - Math.log(x)) * 20*Math.exp(-x/100.3)
total += result / 0.0001 

end 

puts total

Which is the type of calculation you would see in a numerical integration type of calculation. Atomic potentials typically have these type of terms; I picked random coefficients.

This particular calculation on my iMac runs in just under 100 seconds (1.5 min), which is slow. The slowest part of the program is the one hefty calculation, the only other major contributor is the looping itself. The first test was to simply move the formula to C, which gives


#c_calc_call.rb
require "mathc"
mc = MathC.new
total = 0.0 

1.0.step(2000.0,0.0001) do |x| 

result = mc.calc(x)
   total += result / 0.0001 

end 

puts total

and


/* math.c */
 #include "ruby.h"
 #include "stdio.h"
 #include "math.h"static VALUE t_init(VALUE self)
 {
     return self;
 } 

static VALUE t_calc(VALUE self, VALUE x) {
     double newx;
     double result; 

newx = NUM2DBL(x); 

result = (5.4 * pow(newx,5) - 3.211 * pow(newx, 4) +
         100.3 * pow(newx,2) - 100 + 20 * sin(newx) -
         log(newx)) * 20 * exp(-newx/100.3); 

return rb_float_new(result);
 } 

VALUE cTest; 

void Init_mathc() {
     cTest = rb_define_class("MathC", rb_cObject);
     rb_define_method(cTest, "initialize", t_init, 0);
     rb_define_method(cTest, "calc", t_calc, 1);
 } 

the extconf.rb file is the same as in my earlier example, with some of the names changed.

This version runs in 25 seconds, almost 4x faster. (By the way, I do “times faster” as slow time / fast time, or how many times would the fast one run before the slow one finished. I don’t know if there is a more accepted way to do this.)

The final test I did was also to move the looping code into C. This makes the ruby part essentially a shell.


#c_loop_call.rb
 require "mathc"
 mc = MathC.new
 total = 0.0 

total = mc.bigcalc() 

puts total 

now all of the work is in the C code


/* mathc.c */
 #include "ruby.h"
 #include "stdio.h"
 #include "math.h"static VALUE t_init(VALUE self)
 {
     return self;
 } 

static VALUE t_calc(VALUE self, VALUE x) {
     double newx;
     double result; 

newx = NUM2DBL(x); 

result = (5.4 * pow(newx,5) - 3.211 * pow(newx, 4) +
         100.3 * pow(newx,2) - 100 + 20 * sin(newx) -
         log(newx)) * 20 * exp(-newx/100.3); 

return rb_float_new(result);
 } 

static VALUE t_bigcalc(VALUE self) {
     double total = 0.0;
     double x;
     double result; 

for (x=1.0; x <= 2000.0; x += 0.0001) {
         result = (5.4 * pow(x,5) - 3.211 * pow(x, 4) +
             100.3 * pow(x,2) - 100 + 20 * sin(x) -
             log(x)) * 20 * exp(-x/100.3);
         total += result/0.0001;
     } 

return rb_float_new(total);
 } 

VALUE cTest; 

void Init_mathc() {
     cTest = rb_define_class("MathC", rb_cObject);
     rb_define_method(cTest, "initialize", t_init, 0);
     rb_define_method(cTest, "calc", t_calc, 1);
     rb_define_method(cTest, "bigcalc", t_bigcalc, 0);
 } 

This bit runs in 8 seconds, 12x faster than the original code, and another 3x faster than just using the calculation as a function call.

Just for fun, I wrote the entire program in C, just to see how much overhead was inherent in the ruby setup. I’m including this just for completeness.


/* purec.c */
 #include "math.h"
 #include "stdio.h"int main() { 

double total = 0.0;
     double x;
     double result; 

for (x=1.0; x <= 2000.0; x += 0.0001) {
         result = (5.4 * pow(x,5) - 3.211 * pow(x, 4) +
             100.3 * pow(x,2) - 100 + 20 * sin(x) -
             log(x)) * 20 * exp(-x/100.3);
         total += result/0.0001;
     }
     printf("%en", total);
 } 

This bit ran in just over 7 seconds. Given that the initialization cost is something you pay once, there aren’t many situations where this degree of optimization would be necessary.

The final results


Pure Ruby       98.483 s        1.0 x
C call only     25.853 s        3.8 x
C call + loop    7.911 s       12.4 x
Pure C           7.172 s       13.7 x
 

This example is simple enough that I would consider these ideal speedups. A real application will not likely see these gains, depending on how much code is really the bottleneck. What these numbers are valuable for is a measure of what to expect. It should be realatively easy for any computationally intensive program to get to the 3-4 x level. If you need something in the 10 x range, you’re going to have to work much harder. If you need better than that, either your algorithm needs work, or this approach is just not going to work.