Make Lisp 15x faster than Python or 4x faster than Java

Hi Guys - welcome back to the Postabon blog (after our not-so-brief hiatus)!

I've been insanely busy for the last few weeks (I don't think I ever appreciated just how much actual work launching a startup is, until I actually did one). In the last month or so we've rewritten most of the front end of our website (it's much prettier!), made several trips to NYC and the west coast (most recently for Jason Calacanis's Open Angel Forum (among other things) where I got to meet lots of amazing angels and entrepreneurs), and so much more I can't talk about publicly just yet ;-)

I've had this post in mind for a few weeks now and, since I couldn't sleep last night, I finally got around to banging it out.

One of the three most common criticisims I hear when I tell people I'm using Lisp for my startup is that 'Lisp is so slow!' (the other two being 'How can you code with all those ugly parentheses?!', and the dirth of libraries - which I've addressed in other places).

This is quite funny to me, because Lisp is one of the fastest high level languages I know of (with a few obvious exceptions, in very strongly typed languages like Haskell or OCaml ;-)). One of the featuers I like most about Common Lisp is that you can optionally provide the compiler type hints, and you'll come close to the speed of raw C. Most languages provide some sort foreign function interface - but it's usually ugly, hard to use, and non-portable. Being able to take any existing Lisp code, and drop in a few '(declare (type ...))' declarations and getting the speed of a low level language is pretty great!

I thought I'd demonstrate this functionality, with a bit of real world code. Postabon recommends deals to users based on a variety of factors such as the age of the deal, other people's votes, your preferences, and (relevant for this example) your distance from the deal. Most of the other factors can be computed asyncronously and just cached, but your distance from deals is computed a lot, and can't really be pre-computed since I have no way of knowing your location a priori (well, that's not strictly true, and we do do some memoization, but it has a relatively low hit rate). The most common way to compute the distance between two latitude/longitude pairs is to assume the earth is a perfect sphere, and then do some basic trig that's known as the 'Haversine Formula' (Postabon actually does something a little different internally, but this is a reasonable simplification).

Some profiling suggested my program was spending a fair amount of time in a 'distance' function - and, since I'd already tackled a lot of the other low hanging fruit, I decided to take a few minutes to speed it up.

Without further ado, I'll present the results and the code, and talk about how I ran these tests.

Media_httppostabonmed_wxbmf

 

None of the code is 'optimal', but they are all just literal line-by-line translations of each other (and of the formula in the Wikipedia article :-D). Wherever one uses a temporary variable, so do the others, etc. All the benchmarks were run 5x each on the same Debian machine, using the latest runtime/compiler available from apt. Also of note is that virtual machine start up and end time was disregarded since all timings were done in code. I did experiment with some changes to make things more idiomatic (e.g., using doubles in Java, or list comprehensions in Python), but I ended up using whatever was fastest.

To transform the first piece of Lisp code into the second, I told the compiler I wanted to optimize for speed by typing (declare (optimize (speed 3) (safety 0) (debug 0))) at the top of the salient functions. Then, when I tried to compile my code, it would then tell me whenever it couldn't infer the type of something (or was being forced to coerce types of something) with a warning like:

; in: DEFUN DISTANCE
;     (* (THE FIXNUM *EARTH-RADIUS*) 2
;        (THE SINGLE-FLOAT (ASIN (THE SINGLE-FLOAT (SQRT #)))))
; ==>
;   (* (* (THE FIXNUM *EARTH-RADIUS*) 2)
;      (THE SINGLE-FLOAT (ASIN (THE SINGLE-FLOAT (SQRT #)))))
;
; note: doing signed word to integer coercion (cost 20), for:
;       the first argument of GENERIC-*

Translated, that just means that when SBCL is multiplying the Earth's radius by two, it doesn't know for certain that the result will fit in a 'fixnum' (analagous to an 'int' in most other languages) so it has to assume the result is an integer (which is an infinite precision integer, similar to a 'BigInt' in most other languages). This conversion is relatively expensive but, since I know the Earth's radius will always be under 7,000 km, - I can just give the compiler a 'hint' and guarantee that the result will always be small enough to fit in a 'fixnum.'

I'll admit these warnings are a little cryptic. When I first started programming with SBCL it used to me almost an hour to figure them all out properly optimize a small piece of code. But, after doing it a half dozen times, I can usually anticipate where the type conversions/ambiguities are going to be before the compiler even tells me - and I can usually speed up most small functions three or four fold with just five minutes of work.

Of course, I still do this sort of optimization fairly rarely, since the code loses some flexibility (and I'd rather work on adding new features until profiling tells me I have to go do this). I also very rarely use (safety 0) in production (if I do, I'm careful to have enough pre-assertions to prove that my hints will never be wrong). But, it's certainly nice to have the option to make the code go so fast so easily.

If I'm making some terrible mistake in my Java or Python code (which is possible, I'm a bit rusty with both), please let me know or submit a revised (but still equivalent) solution, and I'll be happy to rerun the benchmarks.

Why Key-Value stores are like C (and why you might want to use one anyways)

Introduction

A few months ago, I was starting to write code for my new startup, Postabon, which helps people find and share local discounts. One of the most important decisions I had to make was deciding what technology I should use for persisting data. I'd traditionally been a PostgreSQL guy for historical reasons (several of the past products I'd worked on had decided to use Postgres back when MySQL was missing a lot of really important features). I was willing to consider MySQL since all the concerns I had years ago seemed to be addressed - although I was a little scared by the recent Oracle/Sun acquisition. I'd also been hearing a lot about a 'NoSQL' movement ...

NoSQL

There's been a lot of talk these days about non-relational databases. The moniker 'NoSQL' explains what this trend is not - but doesn't do a good job explaining what it is. There is a reason for that: NoSQL encompasses many vastly different technologies (each of which is arguably as distinct from each other as from traditional relational databases) such key-value stores, document stores, graph databases, and so on.

A lot of big companies that operate at the sort of scale most of us only dream about had built their internal infrastructure on (often custom) key-value stores - which is what piqued my interest about the topic in the first place. When Amazon, Facebook, Digg, LinkedIn, and many other big names all start relying on similar technologies, it's a good idea to ask 'Why?' (yes, their datastores aren't exactly or exclusively Key-Value stores - but it's one of the most salient common threads).

 

Key Value Store

The core feature a key-value store should support must support is associating one object with another (where these objects can be as simple as 'binary blob' that the database has no semantic understanding of). This provides a persistent Associative Array (like the hashtables or associative lists we all know and love). Many key-value stores also provide some way to order/sort the keys they are storing. Obviously, this is all much simpler and lower level than a full relational database (particularly when used with an ORM or DSL).

By giving a programmer more direct control, and providing less safety, key-value stores can often provide better performance. They also allow for easier and automatic horizontal scalaiblity, since it's much simpler to partition your data across multiple servers when your database never has to do JOINs. On the down side, KV-stores almost all of safety guarantees that a good relational database can provide - foreign key and unique key constraints are difficult to enforce - and they encourage data denormilization (which can lead to inconsistency/corruption if you aren't careful). KV-stores are also often slower to develop with (the programmer has to manually decide which indexes to use, keep his denormalized data consistent, write his own 'joins', etc).

 

Language Wars

If you follow the programming language wars, this might all sound mighty familiar to you. In his essay Beating the Averages, Paul Graham argued that under a wide range of circumstances companies should use 'high level' programming languages that allow developers to create better software more rapidly. High Level Languages are often safer (e.g., no pointers),  take care of a lot of details for you (e.g., garbage collection), and are less verbose to allow for more rapid developement. For most common use cases the run-time performance hit of a 'high level' language is more than outweighed by the shortened developement time. That's why you'll almost never see a web startup using C, C++, Assembler, etc.

There's a pretty obvious analogy between high level languages and relational databases  - so the obvious question is that if high level languages are so great, then what's wrong with relational databases? Basically, the problem boils down to fact that the performance profile for most web applications is that they'll spend roughly 20% of their time processing - but 80% of their time waiting on I/O (database, disk, etc). The reason for this is that modern processors are unbelievably fast. A 3GHz CPU can add two numbers together in less time than it takes for a photon to travel from your monitor to your eye (speed of light / 3GHz ~ 10cm). But your disk is really slow - your hard disk seek time is probably on the order of 8 milliseconds - which means it takes over 20 Million CPU Cycles for your disk to get a random piece of data! Even your fancy SSD's seek time is probably around half a million CPU cycles.

Because of the huge disparity in CPU and disk speed, many applications are disk or I/O bound. If rewriting their application in C makes your code run twice as fast, it may only buy you an absolute performance boost of 10%. On the other hand, if you could make your database go twice as fast, it might buy you a net 30% performance boost - which is huge (and still far easier to do than rewriting your app in C).

 

Benchmarks

As a simple example of the sort of performance gains you might see from a key value store, I designed a trivial problem that's fairly representive of the sort of problem my  webapps deal with. I then wrote two solutions to it - one based on MySQL and the other on BerkeleyDB (a relatively simple key-value store from Oracle).

This benchmark comes with the usual caveats - the problem may not be representative of what you work with, both my solutions are grossly sub-optimal, my hardware or software (my Macbook, running Debian in VMWare, Common Lisp (SBCL) as my programming language, Elepahnt and CL-SQL as my persistence libraries, unoptimized databases, etc) may not be representative of yours. All the tests were run with a cold cache - which is probably not very realistic.

The problem I decided to tackle deals with a simple class with 3 notable instance variables:

  1. A unique identifier (an integer, hereafter referred to as the UID)
  2. A 'score' which can go up or down (an integer)
  3. A 'category' which is stored as a string

Based on this class, I envisioned 5 simple tests I could perform to test out the two databases:

  1. Populate the database with 100,000 randomly created instances
  2. Randomly increment or decrement the 'score' of each instance 20 times
  3. Get the UIDs of the items with the 10, 100, and 1000 highest scores
  4. Get the UIDs of the items with the 10, 100, and 1000 highest scores in a given category
  5. Delete all 100,000 instances from the database.

You can see the full code of my two solutions below: MySQL Solution and BerkeleyDB Solution

The benchmark results speak for themselves:

Benchmark Results

BerkelyDB was about 2x-3x faster than MySQL. As expected, the code for the key value store was considerably more complex. For example compare the code to fetch the the objects with 10, 100, and 1000 the highest scores:


Conclusion

I'm not here to make a hard sell of key value stores. I just wanted to let people know there's a new game in town - and if your application is I/O bound, you might want to write your own small benchmark to see if the benefit is worth the cost. Especially on the next project you start from scratch.