YAML for serialization

June 20, 2007

One of the projects I’m working on for research has to deal with atomic potentials, or a way to try to describe the forces between atoms when they are in a certain configuration. The more realistic ones are fairly complicated functions.

One of the tricks in dealing with potentials is to try to plot some specific configurations, and see how the energy varies. ‘Wells’ represent stable points where atoms will tend to sit, while ‘humps’ are barriers between stable points that an atom must overcome somehow. The height of the hump gives you an idea of how stable the low points are.

Plotting one of these potentials is not super difficult, just looping over some positions and (in my case) outputting the positions and energies to a file, which is then plotted with gnuplot. One of the things I wanted to do, however, is to plot a bunch of variations with one of the parameters swept over a range of values. One way would be to create the potential as an object and modify the parameters that way. What I wanted to do, is have parameter file that I read in to setup the potential.

Enter YAML

It turns out that this is essentially built into ruby, through the YAML methods that are designed for marshaling objects over various communication methods between computers/processes. All you have to do is

require 'yaml' # you may not need this line
thing.to_yaml

and you get a string representation of the object. You can then save it to a file. After this, restoring the object is as simple as

YAML.load(string)

or

YAML.load_file(filename)

Even complicated things

One of the neatest things about this is that nested objects translate seamlessly with this. For example, I have to following files that I work with.

#atom.rb
class Atom
  attr_accessor :x, :y, :z

  def initialize
    @x = @y = @z = 0.0
  end

  def r_ij(atom_two)
    r_squared = (@x-atom_two.x)**2 + (@y-atom_two.y)**2 + (@z-atom_two.z)**2
    return Math.sqrt(r_squared)
  end
end

and

#molecule.rb
class Molecule
  attr_accessor :atoms
  def initialize
    @atoms = []
  end
end

which lets you do this

imac:~/code/ruby-yaml brian$ irb
irb(main):001:0> require 'atom'
=> true
irb(main):002:0> require 'molecule'
=> true
irb(main):003:0> require 'yaml'
=> true
irb(main):004:0> a = Atom.new
=> #<Atom:0x59e304 @x=0.0, @z=0.0, @y=0.0>
irb(main):005:0> a.x,a.y,a.z = [1.0,2.0,3.0]
=> [1.0, 2.0, 3.0]
irb(main):006:0> b = Atom.new
=> #<Atom:0x598774 @x=0.0, @z=0.0, @y=0.0>
irb(main):007:0> b.x,b.y,b.z = [4.0,5.0,6.0]
=> [4.0, 5.0, 6.0]
irb(main):008:0> m = Molecule.new
=> #<Molecule:0x592b08 @atoms=[]>
irb(main):009:0> m.atoms.push a
=> [#<Atom:0x59e304 @x=1.0, @z=3.0, @y=2.0>]
irb(main):010:0> m.atoms.push b
=> [#<Atom:0x59e304 @x=1.0, @z=3.0, @y=2.0>, 
    #<Atom:0x598774 @x=4.0, @z=6.0, @y=5.0>]
irb(main):011:0> print save = m.to_yaml
--- !ruby/object:Molecule 
atoms: 
- !ruby/object:Atom 
  x: 1.0
  y: 2.0
  z: 3.0
- !ruby/object:Atom 
  x: 4.0
  y: 5.0
  z: 6.0
=> nil
irb(main):013:0> new = YAML.load(save)
=> #<Molecule:0x586cf4 
           @atoms=[#<Atom:0x58726c @x=1.0, @z=3.0, @y=2.0>, 
                   #<Atom:0x586f38 @x=4.0, @z=6.0, @y=5.0>]>
irb(main):014:0>

YAML is plain text

From my perspective, the real advantage of doing this is having the output in a plain text file. I can then edit the text in an editor and tweak things this way. I can also markup the text and run it through a template engine and generate a whole bunch of files. This was part of the magic behind my rake example, where I generated 800 or so configurations in an hour or so, which was very cool.

Edit: tweaked the formatting in the irb section to see the code better

Advertisements

Rake tricks

June 19, 2007

Recently, I had some plots I was trying to generate. The neat part was that everything broke down into little peices. I looked at using Rake to put everything together. The nice thing about Rake is being able to set up the rules for generating each piece, and letting the computer sort out what needs to be done.

Here are the steps I needed

  1. Generate .yaml files (I was doing a parameter sweep of a particular function)
  2. Run a script that takes the .yaml file and outputs x,y,z triples as a .dat file
  3. Generate a script for gnuplot to plot the .dat files as .png
  4. Run gnuplot with the script

I scoured a bunch of online resources about Rake. The rule and file list tricks seem to not be covered as much (or I’m not very bright) and they took me a long time to figure out. On the off chance that someone else isn’t clear about them, here is the Rakefile I used (with the calculation parts removed.)

SRC = FileList['*.yaml']
DAT = SRC.ext('dat')
GP = SRC.ext('gp')
PNG = SRC.ext('png')

task :default => [:png]

task :png => [:dat, :gp]

task :dat => DAT
task :gp => GP
task :png => PNG

rule '.png' => ['.gp', '.dat'] do |f|
  puts "png test #{f.name} #{f.source}"
  sh "touch #{f.name}"
end

rule '.dat' => ['.yaml'] do |f|
  puts "dat test #{f.name} #{f.source}"
  sh "touch #{f.name}"
end

rule '.gp' => ['.yaml'] do |f|
  puts "gp test #{f.name} #{f.source}"
  sh "touch #{f.name}"
end

So the SRC variable becomes a list of all the .yaml files. Changing the extension to png, and making a task of :png => PNG, makes a set of file tasks for the (non-existant) .png files. The rules then give the details of how to generate the .png from the required files.

The thing that was most interesting about this was the ability to define the rules to generate files, without having to specify all the steps for each file, which would be pages and pages (I ended up with over 800 .yaml files). All I had to do was generate the .yaml files, and then run rake.

Refrences

Rake Docs

Rake Tutorial from Jim Weirich (creator of Rake)Update: Fixed link

Rake Documentation

Automating tasks with Rake (registration required)


Templator – Ruby Template System

June 18, 2007

Most simulation software is based on a text input file. Mostly I’m doing molecular dynamics simulation using LAMMPS. One input file from the samples is the following

# 3d Lennard-Jones melt

units       lj
atom_style  atomic

lattice     fcc 0.8442
region      box block 0 20 0 20 0 20
create_box  1 box
create_atoms    1
mass        1 1.0

velocity    all create 3.0 87287

pair_style  lj/cut 2.5
pair_coeff  1 1 1.0 1.0 2.5

neighbor    0.3 bin
neigh_modify    every 20 delay 0 check no

fix     1 all nve

dump        id all atom 10 dump.melt

thermo      50
run     250

So the input file has the setup of where the atoms go, what the temperature is, what type of simulation you are doing, where the output files record and what the filenames are. If you are doing a sweep of some parameter over a large range it is a bit annoying to have to make a bunch of copies of the file, tweak them, and keep everything in sync. I usually like to have the output files have the major parameter in the filename and it is very easy to get these out of sync with the actual parameter.

I started looking for a simple template system with Ruby, where I could specify a template file, and then somehow loop over a range of numbers and replace specific parts.

So, what I wanted was something to take a template file like this…

regular text with <% title %> embedded
inside that I want to try to replace
and variable p equals <% velocity %>
along with another <%title %> tag, since
thats the point. And another <% title%> for
good measure.

and pass it to a function, along with an option hash, and have it return this…

regular text with test.0.5 embedded
inside that I want to try to replace
and variable p equals 0.5
along with another test.0.5 tag, since
thats the point. And another test.0.5 for
good measure.

I wrote up some code to take care of this. The base code I came up with is this…

class Templator
  def Templator.generate(template, options)
    #template is text string of the template file
    #options is a hash of things to replace

    #currently not checking they match up

    tag_regex = /<%s*w+_*w*s*%>/
    hits = template.scan(tag_regex)
    tags = hits.map {|item| item.chomp('%>').reverse.chomp('%<').reverse.strip}
    tags.map! {|a| a.intern}
    tags.uniq!

    tags.inject(template) {|ntext,tag|
        ntext.gsub(Templator.symbol_to_tag_regex(tag),
        options[tag].to_s)}
  end

  def Templator.symbol_to_tag_regex(tag_name)
    Regexp.new('<%s*' + tag_name.to_s + 's*%>')
  end
end

This design is one that is going to be called from Ruby code, and not from the command line. I designed this to be extremely flexible, and easy to automate for, say, a hundred files or so. All you have to do is read in the template file, build the options hash, and pass the two to the Templator#generate method. Here is the driver code I used.

require 'templator'

template = File.open('template.txt') {|f| f.read}

0.5.step(1.5, 1.0) do |x|
  opt = {:title => "test." + x.to_s, :velocity => x}
  newtext = Templator.generate(template, opt)
  filename = "test-file-" + x.to_s
  File.open(filename,'w') {|f| f.print newtext }
end

There are certainly more options for different purposes, but this one did what I wanted. Feel free to use this code if you’re in a similar situation, or if there is something you think it should do that it doesn’t, let me know and I’ll take a look at it.


C Extensions in Ruby – Benchmarks

June 15, 2007

Earlier I talked about the performance benefits of moving code from ruby into small sections of C code. The good news is that not only is this fairly effective, but the interfacing is fairly painless.

The test I did were centered around floating point calculations (this is what I do in my research work, so that is what is important to me.) Integer only calcualtions are a totally different animal. My tests centered on the following totally arbitrary calculation


#pure_ruby.rb
total = 0.0 

1.0.step(2000.0,0.0001) do |x| 

result = (5.4*x**5 - 3.211*x**4 + 100.3*x**2 - 100 +
     20*Math.sin(x) - Math.log(x)) * 20*Math.exp(-x/100.3)
total += result / 0.0001 

end 

puts total

Which is the type of calculation you would see in a numerical integration type of calculation. Atomic potentials typically have these type of terms; I picked random coefficients.

This particular calculation on my iMac runs in just under 100 seconds (1.5 min), which is slow. The slowest part of the program is the one hefty calculation, the only other major contributor is the looping itself. The first test was to simply move the formula to C, which gives


#c_calc_call.rb
require "mathc"
mc = MathC.new
total = 0.0 

1.0.step(2000.0,0.0001) do |x| 

result = mc.calc(x)
   total += result / 0.0001 

end 

puts total

and


/* math.c */
 #include "ruby.h"
 #include "stdio.h"
 #include "math.h"static VALUE t_init(VALUE self)
 {
     return self;
 } 

static VALUE t_calc(VALUE self, VALUE x) {
     double newx;
     double result; 

newx = NUM2DBL(x); 

result = (5.4 * pow(newx,5) - 3.211 * pow(newx, 4) +
         100.3 * pow(newx,2) - 100 + 20 * sin(newx) -
         log(newx)) * 20 * exp(-newx/100.3); 

return rb_float_new(result);
 } 

VALUE cTest; 

void Init_mathc() {
     cTest = rb_define_class("MathC", rb_cObject);
     rb_define_method(cTest, "initialize", t_init, 0);
     rb_define_method(cTest, "calc", t_calc, 1);
 } 

the extconf.rb file is the same as in my earlier example, with some of the names changed.

This version runs in 25 seconds, almost 4x faster. (By the way, I do “times faster” as slow time / fast time, or how many times would the fast one run before the slow one finished. I don’t know if there is a more accepted way to do this.)

The final test I did was also to move the looping code into C. This makes the ruby part essentially a shell.


#c_loop_call.rb
 require "mathc"
 mc = MathC.new
 total = 0.0 

total = mc.bigcalc() 

puts total 

now all of the work is in the C code


/* mathc.c */
 #include "ruby.h"
 #include "stdio.h"
 #include "math.h"static VALUE t_init(VALUE self)
 {
     return self;
 } 

static VALUE t_calc(VALUE self, VALUE x) {
     double newx;
     double result; 

newx = NUM2DBL(x); 

result = (5.4 * pow(newx,5) - 3.211 * pow(newx, 4) +
         100.3 * pow(newx,2) - 100 + 20 * sin(newx) -
         log(newx)) * 20 * exp(-newx/100.3); 

return rb_float_new(result);
 } 

static VALUE t_bigcalc(VALUE self) {
     double total = 0.0;
     double x;
     double result; 

for (x=1.0; x <= 2000.0; x += 0.0001) {
         result = (5.4 * pow(x,5) - 3.211 * pow(x, 4) +
             100.3 * pow(x,2) - 100 + 20 * sin(x) -
             log(x)) * 20 * exp(-x/100.3);
         total += result/0.0001;
     } 

return rb_float_new(total);
 } 

VALUE cTest; 

void Init_mathc() {
     cTest = rb_define_class("MathC", rb_cObject);
     rb_define_method(cTest, "initialize", t_init, 0);
     rb_define_method(cTest, "calc", t_calc, 1);
     rb_define_method(cTest, "bigcalc", t_bigcalc, 0);
 } 

This bit runs in 8 seconds, 12x faster than the original code, and another 3x faster than just using the calculation as a function call.

Just for fun, I wrote the entire program in C, just to see how much overhead was inherent in the ruby setup. I’m including this just for completeness.


/* purec.c */
 #include "math.h"
 #include "stdio.h"int main() { 

double total = 0.0;
     double x;
     double result; 

for (x=1.0; x <= 2000.0; x += 0.0001) {
         result = (5.4 * pow(x,5) - 3.211 * pow(x, 4) +
             100.3 * pow(x,2) - 100 + 20 * sin(x) -
             log(x)) * 20 * exp(-x/100.3);
         total += result/0.0001;
     }
     printf("%en", total);
 } 

This bit ran in just over 7 seconds. Given that the initialization cost is something you pay once, there aren’t many situations where this degree of optimization would be necessary.

The final results


Pure Ruby       98.483 s        1.0 x
C call only     25.853 s        3.8 x
C call + loop    7.911 s       12.4 x
Pure C           7.172 s       13.7 x
 

This example is simple enough that I would consider these ideal speedups. A real application will not likely see these gains, depending on how much code is really the bottleneck. What these numbers are valuable for is a measure of what to expect. It should be realatively easy for any computationally intensive program to get to the 3-4 x level. If you need something in the 10 x range, you’re going to have to work much harder. If you need better than that, either your algorithm needs work, or this approach is just not going to work.


C Extensions in Ruby

June 14, 2007

Tooo Slooow

I recently wrote some analysis code for MD simulations my group is running, and decided to give it a shot in Ruby, instead of C++ (which a couple of people in my group use) or FORTRAN (which everyone else uses).

The entire experience was pretty nice, other than some meandering about how I wanted the class relationships to be. The ability to easily test things, poking around in the irb interactive shell, and extremely quick tweaks were the nice things. The downside is that it runs slow. Now I know about the dangers of needless speed optimizations, but this code is the core analysis tool for this type of simulation, and forms the base of almost any other calculation done later.

To try to deal with this, I started looking into extensions to ruby, but most of the examples I saw dealt with strings, and only one even mentioned arrays at all. Starting off, I had two goals:

  • Speedup in floating point code (generally slower all around than integer, and what my code uses)
  • How to pass numbers back and forth (single parameters and in an array).

In the future I will hopefully address working with ruby classes from C, but I haven’t gotten that far yet.

Extension basics

Ok, first things first. Getting something to work. Programming Ruby has an example that I started from. There is some information there, but not a ton. You start with two files, the .c code, and a extconf.rb file to produce a makefile.

First, the c file. This one is my_test.c

#include "ruby.h"
static VALUE t_init(VALUE self){
    return self;
}
static VALUE t_two_cubed(VALUE self) {
    int a = 2*2*2;
    return INT2NUM(a);
}
static VALUE t_sum(VALUE self, VALUE arr) {
    int i;
    int length;
    double num;
    double total;   

    total = 0.0;
    for (i=0; i < RARRAY(arr)->len; i++) {
        num = NUM2DBL(RARRAY(arr)->ptr[i]);
        total += 2 * num;
    }
    return rb_float_new(total);
}
VALUE cTest;
void Init_my_test() {
    cTest = rb_define_class("MyTest", rb_cObject);
    rb_define_method(cTest, "initialize", t_init, 0);
    rb_define_method(cTest, "sum", t_sum, 1);
    rb_define_method(cTest, "two_cubed", t_two_cubed, 0);
}

The t_init and Init_ parts will be there in every extension. This is how everything gets set up by ruby. Ruby calls the Init_my_test to initialize the extension (I’m not sure if this is called when the require is run, or the object is created). The Init_my_test function defines the class name the extension will use, in this case it is MyTest. This Init_my_test function also “wires up” the other functions so that they are available from ruby. For some reason this reminds me of doing code behind methods in ASP.NET. The rb_define_method function is passed (among other things) the name for ruby to use as the method name (the one in quotes) and the c function this method should point to (here they all start with t_). The number that is the last parameter in rb_define_method is the number of other parametrs the function is supposed to take (this part is strange.)

For example, the t_sum method is defined with two parametrs, but one is an internal mechanism thing, and one is an array from ruby. So this function takes one parameter in addition to the self that every function needs to have, so the number in rb_define_method is 1. The t_two_cubed method doesn’t take external parameters (only the required self) so rb_define_method uses 0 to show that. Presumably there is no requirement for the ruby name and the function name to be anywhere close to each other, but why would you do that to yourself?

The c code itself is pretty vanilla. The only thing special in the t_two_cubed function is the call to INT2NUM() that performs the translation from a C data type to a Ruby object. For a complete list of these, look in the ruby.h file.

The t_sum function has two of these calls. One for the input and one for output. This function takes an array, doubles each item, and returns the sum. The extension system understands the basic Ruby objects, such as Array, and Hash (in addition to the numeric and string classes.) This is done by building a struct in C that mimics the particular class. The two used most for Array are RARRAY(arr)->len which gives the length of array arr, and RARRAY(arr)->ptr, which gives a pointer to the array data. Items in the array are accessed by RARRAY(arr)->ptr[0], ptr[1], etc. The code above shows the standard way to iterate over all of the items in an array. The call to NUM2DBL translates the array items into C double precision numbers (floats in Ruby are double precision.) and the rb_float_new wraps the output so that Ruby can interperet it as a float. I’m not sure what the discrepancy in naming is between some of the functions (INT2NUM vs rb_float_new) but this is how they are defined. If you get strange numbers (like 2+2=-1860438) this is the first thing to check.

Makefile Generation

To compile and run this thing you need a very simple extconf.rb file. The one I used has two lines

require 'mkmf'
create_makefile("my_test")

There are many more options. The value in parenthesis controls what the compiled binary will be called, and what goes into require to use the extension. In this case you use require ‘my_test’, or -r my_test.
To compile, it is as simple as

ruby extconf.rb
make

then

imac:~/code/cmoduleruby brian$ irb -r my_test
irb(main):001:0> t = MyTest.new
=> #<MyTest:0x1d167c>
irb(main):002:0> t.two_cubed
=> 8
irb(main):003:0> t.sum([2.3,5.0,10.0])
=> 34.6

Hopefully this makes sense. I’ll post a more numerical example and some benchmarks next.


Integer loops

June 13, 2007

This is the final bit that I’ve written about using loops in Ruby. These are loops that either run a certain number of times, or that have some type of counter that is strictly integers.

The Goal

As I mentioned before, my goal here is to replace this very simple C fragment with Ruby.

for (i=0; i<10; i++) {
    printf("%d n",i);
}

Simplest loop

I should have mentioned this one first, but the step method is a more natural fit for the for loop. To run a block 10 times, simply do

10.times { |i| puts i }

Times passes the iteration number to the block (starting from 0.) This obviously also works if you assign (say) x=10 and then call x.times.

Counting loops

Ruby also provides the methods upto and downto, to start from one integer and count to another.

0.upto(9) {|i| puts i}

does what our for loop above does, but could start from any other integer. Downto goes the opposite way

9.downto(0) {|i| puts i}

If you are starting at zero, it makes more sense to use the times method, but these are a bit more flexible.

Ranges

Ranges are a little more flexible, but the syntax and usage of them isn’t really straightforward. A range object is setup by two numbers seperated by two or three dots.

0..10     # 0 to 10
0...10    # 0 to 9

The .. version includes the last number, the … version does not. Once you have a range, you can use each to iterate over all the numbers. The only advantage I see here over upto and downto is that you have some more flexability in which numbers to iterate over. Ranges have method called reject and find_all that takes a block and keeps the numbers where the block returns false or not false, respectively. This could be helpful if you wanted odd or even numbers, for example.

Conclusion

For me personally, coming to Ruby from C type languages, making sense of loops took a fair amount of time to get my head around. Iterators are a great trick, but there are times in numerical work where they just don’t work. I’ve talked to some other people who have had similar issues, so hopefully this series has helped, and shown you something you didn’t know.

I certainly have not covered everything. There are variations that may make more sense depending on your code and personal taste. There are also more primitive things (such as while) where you should be able to do almost anything that doesn’t fit in what I’ve covered here, but I haven’t had to do that yet. Please let me know in the comments if i’ve missed anything clever.


For like loops

June 12, 2007

In my earlier post, I talked about the use of iterators instead of pure loops in Ruby. When you can use them, they work wonderfully, but sometimes you just need to make the code loop. My impression is that these are the things that are not as clearly explained in Ruby material, and is likely to be a thorn in your side if you are coming to Ruby from something like Fortran or C where you have more control over the control flow of your code.

The Goal

As I mentioned before, my goal here is to replace this very simple C fragment with Ruby.

for (i=0; i<10; i++) {
    printf("%d n",i);
}

A note about blocks

I didn’t mention this before, but all of these looping techniques work by accepting a “block.” If you are new to Ruby this concept will hound you untill it finally makes sense. I am not going to be able to explain blocks here, so if you aren’t sure about them I would suggest you do some reading on those first. This won’t make much sense untill they are a little more clear.

Step – the one that works with decimals

If you need start/end/step values that are not integers. This is (I think) your only real option. It is a method built into the numeric classes, and you pass it the ending point, and the step value. Like this

0.0.step(9.0, 1.0) do |x|
  puts x
end

The block is passed values from step, untill the value would be greater than the end number (9.0 in this case) so this is more like

for (x=0.0; x<=9.0; x+=0.1)

where the <= instead of < can potentially throw you.

I have used this most recently for mapping out a complicated function using

0.0.step(0.8,0.01) do |x|
  0.0.step(0.8,0.01) do |y|
    #calculate function value and do something with it
  end
end

this will also count backwards if the start is higher than the end number, with a negative step.

Fencepost errors

Not to belabor the point about iterators, but notice that I had to change to i<10 to i<=9 when the type of test changed. If you forget to do this you end up with one more or less number than you were expecting. These are called fencepost errors based on the trick question kind of thing of ‘how many fence posts do you need for 3 sections of fence’ where the answer is not 3 but 4.

Finally, Ruby offers a bunch of methods for integer loops. I will talk about in the next post.