Shortcomings in Scientific Computing

July 16, 2007

I read an interesting article (via The Third Bit) a couple of weeks ago that addresses some of the shortcomings of ‘Scientific Computing’. The major shortcomings mentioned are

  • Simple text editors vs IDE
  • Lack of version control
  • Not enough testing

The major item (to me) is version control. If you aren’t using some type of version control, start now. Subversion is an easy enough system to get working (although the repository setup is a little harder than it needs to be.) If you think of this as just a backup system, you miss the real power of version control. In the project I am working on, I can instantly revert back to the code as it was on any date. I can tag or label a certain version for a paper, or other event. I trace a certain section of code and figure out how it has changed over the versions. This came in handy recently as I was trying to figure out a certain constant in our code. I was able to look back and find out that the code I inherited was setup that way. It would have taken much longer to sift through tarballs looking for this value.

I’m not convinced that the text editor vs IDE is that significant of an issue, but it depends on the language and environment that you are using. Some languages, like Java and C variants, have a lot of boilerplate text for functions and class definitions. For these languages, having something that can autogenerate this text is a real timesaver. For languages that are more compact, this is less of an issue. The same issue is there for some frameworks, like Microsoft’s .NET (which I used in some web development contract work I did a few years ago.) The class and function names in .net are generally long to very long, but the visual studio IDE is really good about popping up a list of things that could be called at whatever point you are (since the MS languages are static typed). There is some value to this, but I’m not convinced that the efficiency differences are enough to totally change how you work. My advice would be to use the editor you are most comfortable with, but try other things from time to time and keep an open mind.

One thing that an IDE does provide that is helpful is a good debugging facility. If you have to spend lots of time inside a debugger figuring things out, then you will benefit from doing more testing, but there are times that having a debugger is much more helpful than adding a whole bunch of print statements.

Finally, the article talks about testing. Testing and verification are vitally important to scientific computing. There have been cases where published results have turned out to be incorrect due to bugs in the program. A particular case came out this year where incorrect results were published in a very high profile journal, which lead to some groups not getting grant funding or published results, because their results didn’t agree. The problem with correcting this is that it isn’t a simple change. Software testing is a real commitment and a pretty fundamental shift in how you develop your code. I believe that the gains in productivity and knowing what your code is doing offsets the additional time to do the testing, but even in commercial software development, where testing is much more common, it is still not done by everyone. Even if you don’t adopt a full methodology, such as TDD, XP, or something like that, you really owe it to yourself to do as much verification as you possibly can. Test simple cases by hand. Verify things using simple models. Make sure your results make sense. I think it would be valuable to pick up a good Test driven development book, or look around for a unit testing program for your particular language and give it a try.

The real bottom line is that there is a lot of development going on outside of scientific computing. Keeping an open mind and trying new things is the best assurance we have of having better quality code. One resource is the Software Carpentry page at scipy.org. This page is python related, but most of the concepts should transfer to other languages fairly well.

Advertisements

Ruby used for computation book

June 23, 2007

I ran across a neat book a little while ago that was designed as an introduction to numerical calculation in the context of gravitational science, starting with a 2 body system and going through many body. It has a good introduction to things like different order integrators, energy conservation, and stability. They decided to use Ruby for the book as a simpler introduction to programming than traditional languages. I think the intention is to include this in a classroom setting, so it will be interesting to see how that works out.

From my personal experience with this type of class (numerical programming, not gravitation) I really think it will help, because so much overhead of the class is dedicated to how to get the compiler to work and all of the wrangling with things that aren’t really important. There is a point where you need to introduce some C code for performance reasons (which is also discussed in one of the chapters), but the nice thing about Ruby is that you can do so much of the prototyping and “learning” in Ruby, then replace a very small amount of code and get tremendous performance gains. It would be very easy for the C part to be provided to a class once students have proven that they can implement the algorithm in Ruby.