Why I don't want to use Go

There has been a lot of excitement about Go's upcoming 1.0 release - but I'm going to express a controversial opinion: I don't really care and have no intention of using Go.

I'll grant that C is sometimes painful - I've often searched for a more 'modern' systems-level language (with built-in decent implementations of standard datastructures, a well defined memory model, cross-platform abstractions for things like I/O and concurrency, saner imports, and a type-system strong enough to not require constant casts). D was fragmented last time I looked, so C++ seemed like the obvious choice, and it's perfectly workable as long as you limit yourself to the sane parts (although, I never really loved it - it's too big and complex, and has too much C-cruft hanging around).

So, when Go was first announced, I was excited. Who wouldn't be by the prospect of a new language for systems work by PikeThompson, et al? But after a few days of playing, I had trouble figuring out what niche Go is supposed to fit into.

For 'high-level' work where performance doesn't really matter, I tend to use Python (others prefer Ruby or Perl, but you get the idea). They're great for things like web-frontends that easily scale horizontally or one-off scripts.

For 'mid-level' work, where performance is important but I'm not willing to sacrifice all ease of development, the JVM languages are my goto (although, I sometimes fall back to Haskell/Lisp/Python if I need native interop and JNI is too expensive). 

Finally for low level work, where nanos/micros matter, C or C++ fit the bill (no unpredictable GC pauses, no huge object overhead, etc).

So, where is Go supposed to fit

I'll gladly acknowledge Go is a far better C, with garbage collection. But the GC makes it unacceptable for the only types of problems I'd ever consider using C for. And if I'm willing to put up with a GC, I'd always prefer Java/Scala/Python/Lisp/Haskell over Go for any problem I can think of.

And before you tell me that the GC isn't that big a deal: it is. Literally every big project I've worked on in a garbage collected language reserves a bunch of memory off-heap at startup and uses that for allocations on the critical path because the GC kills performance. Greplin's index servers, HBase's region servers, Signpost's realtime learning systems, ITA's fare finders, and others I probably can't talk about publicly *all* do this because they need to (trust me, no one *wants* to write a slab allocator and reference counting system if they can avoid it). The Oracle JVM's GC has had man-centuries of dev time put into it, and it still is too slow for every performance-critical project I've seen. Take a look at the performance improvements HBase saw by moving some allocations outside of the GC's domain.

I think the only way for Go to gain traction is to directly confront, and best, either the high-level languages or the mid-level languages. And, frankly, I don't see either of those two things happening.

 

PS: Follow me on Twitter (@smanek) if you're interested in JVM/Concurrency/Distributed Systems

The Modern Java Ecosystem (for the Sinatra or web.py lover)

The Horror of Java

Many developers, with good reason, are terrified of the Java ecosystem. If you were unlucky, you've have been plopped in the middle of a large Java EE service with thousands of lines of inflexible XML and tens of thousands of lines of generated Java that no human will ever read (former defense contractor here - I've written five thousand line WSDLs by hand and worked on projects that generate thousands of lines of SOAP responses with String concatenation). If you're one of the lucky ones, you worked on a 'modern and lightweight' project with Spring and Hibernate instead, which only entails plumbing the depths of abstraction with joys such as the AbstractSingletonProxyFactoryBean.

Compared to the ease of working with web.py or Sinatra, you'd be forgiven for writing off the entire Java world for good. I came pretty close - after my time defense contracting, I basically refused to write anything but Lisp and Javascript (which is basically just a Lisp with infix syntax ;-)) for the better part of a year because of the post traumatic flashbacks.

 

The Appeal of Java

Whether it was the need to work on search with Lucene, big data with Hadoop, or other JVM languages like Scala or Clojure - I kept getting pulled back into the Java world. Fortunately, in the last few years great libraries and frameworks have emerged that make Java more usable. Some of the ones I've found most useful include the JSON library Jackson, the RESTful web framework Jersey, and the miscellaneous utility functions in Google's Guava library.

 

Intended Audience

The rest of this post assumes that you already know Java (or will be learning it from a different source). There are a number of excellent books for learning the nooks and crannies of the Java language, in particular I can't endorse Bloch's Effective Java or Goetz et al's Java Concurrency in Practice enough.

 

 

Jersey

Jersey is my preferred framework for writing RESTful services. It's closer to Sinatra or web.py - but if you want something like Rails or Django you should look at Play.

It takes a little bit of glue to setup (read on - I explain it all in later sections!) - but once you do, it's easy to define a new resource that can handle many different 'verbs.' Here's an example that defines a 'Student' resource where you can add, list, and delete students:

 

And it's really that easy. You just annotate methods/classes with 'routing' information, and your done. We can now list, create, and delete student records from a RESTful HTTP web service. Take a look:

You can see the whole thing on github - and the rest of this article will explain how to setup some of the magic that makes this possible.

 

Jackson

I don't think I need to sell anyone on the advantages of JSON as a data interchange format: it's easy to work with, human readable, and flexible. Jackson is by far the best Java JSON library I've come across. The one problem is that it's a little difficult to figure out how to do non-trivial things straight from the Javadocs. The author of Jackson is the amazing Tatu Saloranta - and his personal blog is full of great tips on how to use Jackson in the real world. But, of course, sometimes it's hard to find the example you need on the blog - so I'm going to show some you some common usages.

If you use 'normal' looking plain old Java objects (POJOs) with a no-argument constructor and getters and setters for each instance variable, Jackson will work out of the box for you with no trouble or special annotations (see).

However, one of the best habits I ever got into was making objects immutable unless there is a good reason not to. The problem with that is by default, under the covers, Jackson instantiates objects using the no-argument constructor and then uses the setters to set the instance variables properly. That won't work for immutable objects - so we can use the @JsonCreator annotation to tell Jackson how to instantiate objects. Take a look at an example:

 

Now we're getting somewhere - this is more like code I might use in production. Next, let's say that we're using this JSON to communicate with other services, and a requirement comes down that courses be represented by a string of the department's shortname and course number, instead of the object that includes the vanity name, etc. That is, instead of the courses looking like:

we want them to look like:

This is where things start to get a little tricky, and Jackson's flexibility really shines. We can define completely custom classes that are responsible for serializing and deserializing the Course object like this:

 

There is a lot more to Jackson - but this should be more than enough to get started. Check out the actual documentation for ideas.

 

Guava

Guava is an enormous, and extremely powerful, library of utility functions. There is a lot of overlap with Apache Commons - but I generally find the Guava versions faster and easier. I've put together some quick examples of the pieces I use most often:

If you want to learn more about Guava check out the full docs. At least take a look at the Files utility functions and MoreExecutors static factories.

 

Maven

Maven is an all in one build and dependency managment system. It also has one of my favorite features, called 'archetypes.' These allow you to create a 'template' which can be used as the basis of new projects. I put together an archetype for Java web services which you can easily reuse - check it out on Github.

The most common thing you'll need to do with Maven is add a new dependency. Just head over to MVNRepository - search for the library you want to add, and copy the dependency (which MVNRepository will list verbatim) into your project's pom.xml.

You can use Maven to run your service too - just type 'mvn tomcat:run' in your service's directory, and it will automatically start up. In production, you'll want to deploy your project's WAR file to an application server (like Tomcat) - but that's a story for another day.

 

Much More

There is much more we can talk about but I hope I've shown you enough to convince you that Java Web Services have come out of the dark ages. If there's interest in more about the Java world please let me know on the below form:

 

Make Lisp 15x faster than Python or 4x faster than Java

Hi Guys - welcome back to the Postabon blog (after our not-so-brief hiatus)!

I've been insanely busy for the last few weeks (I don't think I ever appreciated just how much actual work launching a startup is, until I actually did one). In the last month or so we've rewritten most of the front end of our website (it's much prettier!), made several trips to NYC and the west coast (most recently for Jason Calacanis's Open Angel Forum (among other things) where I got to meet lots of amazing angels and entrepreneurs), and so much more I can't talk about publicly just yet ;-)

I've had this post in mind for a few weeks now and, since I couldn't sleep last night, I finally got around to banging it out.

One of the three most common criticisims I hear when I tell people I'm using Lisp for my startup is that 'Lisp is so slow!' (the other two being 'How can you code with all those ugly parentheses?!', and the dirth of libraries - which I've addressed in other places).

This is quite funny to me, because Lisp is one of the fastest high level languages I know of (with a few obvious exceptions, in very strongly typed languages like Haskell or OCaml ;-)). One of the featuers I like most about Common Lisp is that you can optionally provide the compiler type hints, and you'll come close to the speed of raw C. Most languages provide some sort foreign function interface - but it's usually ugly, hard to use, and non-portable. Being able to take any existing Lisp code, and drop in a few '(declare (type ...))' declarations and getting the speed of a low level language is pretty great!

I thought I'd demonstrate this functionality, with a bit of real world code. Postabon recommends deals to users based on a variety of factors such as the age of the deal, other people's votes, your preferences, and (relevant for this example) your distance from the deal. Most of the other factors can be computed asyncronously and just cached, but your distance from deals is computed a lot, and can't really be pre-computed since I have no way of knowing your location a priori (well, that's not strictly true, and we do do some memoization, but it has a relatively low hit rate). The most common way to compute the distance between two latitude/longitude pairs is to assume the earth is a perfect sphere, and then do some basic trig that's known as the 'Haversine Formula' (Postabon actually does something a little different internally, but this is a reasonable simplification).

Some profiling suggested my program was spending a fair amount of time in a 'distance' function - and, since I'd already tackled a lot of the other low hanging fruit, I decided to take a few minutes to speed it up.

Without further ado, I'll present the results and the code, and talk about how I ran these tests.

Media_httppostabonmed_wxbmf

 

None of the code is 'optimal', but they are all just literal line-by-line translations of each other (and of the formula in the Wikipedia article :-D). Wherever one uses a temporary variable, so do the others, etc. All the benchmarks were run 5x each on the same Debian machine, using the latest runtime/compiler available from apt. Also of note is that virtual machine start up and end time was disregarded since all timings were done in code. I did experiment with some changes to make things more idiomatic (e.g., using doubles in Java, or list comprehensions in Python), but I ended up using whatever was fastest.

To transform the first piece of Lisp code into the second, I told the compiler I wanted to optimize for speed by typing (declare (optimize (speed 3) (safety 0) (debug 0))) at the top of the salient functions. Then, when I tried to compile my code, it would then tell me whenever it couldn't infer the type of something (or was being forced to coerce types of something) with a warning like:

; in: DEFUN DISTANCE
;     (* (THE FIXNUM *EARTH-RADIUS*) 2
;        (THE SINGLE-FLOAT (ASIN (THE SINGLE-FLOAT (SQRT #)))))
; ==>
;   (* (* (THE FIXNUM *EARTH-RADIUS*) 2)
;      (THE SINGLE-FLOAT (ASIN (THE SINGLE-FLOAT (SQRT #)))))
;
; note: doing signed word to integer coercion (cost 20), for:
;       the first argument of GENERIC-*

Translated, that just means that when SBCL is multiplying the Earth's radius by two, it doesn't know for certain that the result will fit in a 'fixnum' (analagous to an 'int' in most other languages) so it has to assume the result is an integer (which is an infinite precision integer, similar to a 'BigInt' in most other languages). This conversion is relatively expensive but, since I know the Earth's radius will always be under 7,000 km, - I can just give the compiler a 'hint' and guarantee that the result will always be small enough to fit in a 'fixnum.'

I'll admit these warnings are a little cryptic. When I first started programming with SBCL it used to me almost an hour to figure them all out properly optimize a small piece of code. But, after doing it a half dozen times, I can usually anticipate where the type conversions/ambiguities are going to be before the compiler even tells me - and I can usually speed up most small functions three or four fold with just five minutes of work.

Of course, I still do this sort of optimization fairly rarely, since the code loses some flexibility (and I'd rather work on adding new features until profiling tells me I have to go do this). I also very rarely use (safety 0) in production (if I do, I'm careful to have enough pre-assertions to prove that my hints will never be wrong). But, it's certainly nice to have the option to make the code go so fast so easily.

If I'm making some terrible mistake in my Java or Python code (which is possible, I'm a bit rusty with both), please let me know or submit a revised (but still equivalent) solution, and I'll be happy to rerun the benchmarks.

Why Key-Value stores are like C (and why you might want to use one anyways)

Introduction

A few months ago, I was starting to write code for my new startup, Postabon, which helps people find and share local discounts. One of the most important decisions I had to make was deciding what technology I should use for persisting data. I'd traditionally been a PostgreSQL guy for historical reasons (several of the past products I'd worked on had decided to use Postgres back when MySQL was missing a lot of really important features). I was willing to consider MySQL since all the concerns I had years ago seemed to be addressed - although I was a little scared by the recent Oracle/Sun acquisition. I'd also been hearing a lot about a 'NoSQL' movement ...

NoSQL

There's been a lot of talk these days about non-relational databases. The moniker 'NoSQL' explains what this trend is not - but doesn't do a good job explaining what it is. There is a reason for that: NoSQL encompasses many vastly different technologies (each of which is arguably as distinct from each other as from traditional relational databases) such key-value stores, document stores, graph databases, and so on.

A lot of big companies that operate at the sort of scale most of us only dream about had built their internal infrastructure on (often custom) key-value stores - which is what piqued my interest about the topic in the first place. When Amazon, Facebook, Digg, LinkedIn, and many other big names all start relying on similar technologies, it's a good idea to ask 'Why?' (yes, their datastores aren't exactly or exclusively Key-Value stores - but it's one of the most salient common threads).

 

Key Value Store

The core feature a key-value store should support must support is associating one object with another (where these objects can be as simple as 'binary blob' that the database has no semantic understanding of). This provides a persistent Associative Array (like the hashtables or associative lists we all know and love). Many key-value stores also provide some way to order/sort the keys they are storing. Obviously, this is all much simpler and lower level than a full relational database (particularly when used with an ORM or DSL).

By giving a programmer more direct control, and providing less safety, key-value stores can often provide better performance. They also allow for easier and automatic horizontal scalaiblity, since it's much simpler to partition your data across multiple servers when your database never has to do JOINs. On the down side, KV-stores almost all of safety guarantees that a good relational database can provide - foreign key and unique key constraints are difficult to enforce - and they encourage data denormilization (which can lead to inconsistency/corruption if you aren't careful). KV-stores are also often slower to develop with (the programmer has to manually decide which indexes to use, keep his denormalized data consistent, write his own 'joins', etc).

 

Language Wars

If you follow the programming language wars, this might all sound mighty familiar to you. In his essay Beating the Averages, Paul Graham argued that under a wide range of circumstances companies should use 'high level' programming languages that allow developers to create better software more rapidly. High Level Languages are often safer (e.g., no pointers),  take care of a lot of details for you (e.g., garbage collection), and are less verbose to allow for more rapid developement. For most common use cases the run-time performance hit of a 'high level' language is more than outweighed by the shortened developement time. That's why you'll almost never see a web startup using C, C++, Assembler, etc.

There's a pretty obvious analogy between high level languages and relational databases  - so the obvious question is that if high level languages are so great, then what's wrong with relational databases? Basically, the problem boils down to fact that the performance profile for most web applications is that they'll spend roughly 20% of their time processing - but 80% of their time waiting on I/O (database, disk, etc). The reason for this is that modern processors are unbelievably fast. A 3GHz CPU can add two numbers together in less time than it takes for a photon to travel from your monitor to your eye (speed of light / 3GHz ~ 10cm). But your disk is really slow - your hard disk seek time is probably on the order of 8 milliseconds - which means it takes over 20 Million CPU Cycles for your disk to get a random piece of data! Even your fancy SSD's seek time is probably around half a million CPU cycles.

Because of the huge disparity in CPU and disk speed, many applications are disk or I/O bound. If rewriting their application in C makes your code run twice as fast, it may only buy you an absolute performance boost of 10%. On the other hand, if you could make your database go twice as fast, it might buy you a net 30% performance boost - which is huge (and still far easier to do than rewriting your app in C).

 

Benchmarks

As a simple example of the sort of performance gains you might see from a key value store, I designed a trivial problem that's fairly representive of the sort of problem my  webapps deal with. I then wrote two solutions to it - one based on MySQL and the other on BerkeleyDB (a relatively simple key-value store from Oracle).

This benchmark comes with the usual caveats - the problem may not be representative of what you work with, both my solutions are grossly sub-optimal, my hardware or software (my Macbook, running Debian in VMWare, Common Lisp (SBCL) as my programming language, Elepahnt and CL-SQL as my persistence libraries, unoptimized databases, etc) may not be representative of yours. All the tests were run with a cold cache - which is probably not very realistic.

The problem I decided to tackle deals with a simple class with 3 notable instance variables:

  1. A unique identifier (an integer, hereafter referred to as the UID)
  2. A 'score' which can go up or down (an integer)
  3. A 'category' which is stored as a string

Based on this class, I envisioned 5 simple tests I could perform to test out the two databases:

  1. Populate the database with 100,000 randomly created instances
  2. Randomly increment or decrement the 'score' of each instance 20 times
  3. Get the UIDs of the items with the 10, 100, and 1000 highest scores
  4. Get the UIDs of the items with the 10, 100, and 1000 highest scores in a given category
  5. Delete all 100,000 instances from the database.

You can see the full code of my two solutions below: MySQL Solution and BerkeleyDB Solution

The benchmark results speak for themselves:

Benchmark Results

BerkelyDB was about 2x-3x faster than MySQL. As expected, the code for the key value store was considerably more complex. For example compare the code to fetch the the objects with 10, 100, and 1000 the highest scores:


Conclusion

I'm not here to make a hard sell of key value stores. I just wanted to let people know there's a new game in town - and if your application is I/O bound, you might want to write your own small benchmark to see if the benefit is worth the cost. Especially on the next project you start from scratch.

A Simple Lisp Webapp for beginners

Introduction

Hello again, sorry for the delay, but getting Postabon (my Lisp-based startup that helps you find and share in-store discounts) off the ground over the last few weeks (both business and technology wise) has been consuming a lot of my free time :-D

This post is a simple explanation of how to setup a Common Lisp webapp. This isn't the only way to get things done, but actually building a deployable app is something that often gives Lisp newbies a bit of trouble and these are a general set of techniques that I've been using with some success for the last few years. The key trick is that everything should be as self contained as possible and not overly rely on anything outside of the app (i.e., the app should include all its own libraries). If you want to cut straight to the meat, checkout the GitHub repo - it's engineered so it can easily be 'forked' and modified to be the basis of a project.

 

Choosing a Lisp Implementation

Before we get started, you'll have to install a Common Lisp implementation. Out of the free ones, I've had the best experience with Steel Bank Common Lisp (SBCL) -  but I've been hearing a lot of good things about Clozure Common Lisp (CCL) lately. This example is tested with, and geared towards, SBCL - although I suspect with a one line modification in the startup script it should work with CCL as well.

The one major caveat with SBCL is that the threading support is a bit annoying - it only really works with Linux (I've occasionaly had some mixed success with other *nixs like Mac OS X or FreeBSD, but I would recommend against it). Threading is a requirement to hope to get  any semblance of reasonable performance out of a blocking webserver but, as of a few weeks ago, the binary distribution of SBCL has threading disabled (although, there seem to be plans to change this behaviour back to the more sensible default of having threads enabled for Linux in the near future).

The unfortunate consqeuence of this decision is that you'll have to install SBCL from source with threading enabled. Ironically, to compile SBCL from source, you must first have a working Common Lisp installation ;-) So, what we'll end up doing is installing an SBCL binary from somewhere and then using this crippled binary to bootstrap a proper SBCL installation from source (with threading enabled).

 

Install SBCL Binary

So, the first step is to install a binary SBCL. If it's possible, I'd highly recommend you use your distribution's package manager (yum, apt, emerge, etc) to install SBCL - otherwise you can install it from http://www.sbcl.org/platform-table.html (using the INSTALL document).

Once you install a binary SBCL, test it out by running 'sbcl' from the command line. When greeted by an '*' prompt (called a REPL), verify that (+ 2 2) returns '4'. You can also try typing '*features*' at the REPL - if the resulting list contains ':SB-THREAD' then your binary installation contains threading support out of the box (and you can skip the source installation). If, it doesn't you'll need to install SBCL from source.

 

Install SBCL from Source (only after Binary)

To install SBCL from source, first we must get a copy of the source. I personally prefer Git to SVN, so I usually get a copy of the SBCL source tree from Andreas Fuchs' Git repository with a command like:

git clone git://git.boinkor.net/sbcl

Using Git is probably overkill for a casual user (since it pulls in the entire commit history, which is ~50MB), but I prefer it. After you download the source: 'cd' into the SBCL directory and create a file called 'customize-target-features.lisp' with the following contents:

The run:

sh make.sh

sudo sh install.sh

Now that you have a customized SBCL installed from source, remove the binary copy you installed (if you used your distro's package manager) and ensure your PATH contains /usr/local/bin/. Then, run SBCL and verify that it's working and that the *features* includes :SB-THREAD.

 

Run the Webapp

Don't worry - I'll explain how the web app works in a moment, but now that you have a Lisp compiler installed you can try to run it. Start by grabbing a copy of the code with:

git clone git://github.com/smanek/trivial-lisp-webapp.git

Then you can 'cd' into trivial-lisp-webapp/scripts and run:

$ ./startserver.sh

After a bunch of stuff scrolls down the screen, you should see a line that says:

Webserver started on port 8080.

Verify that you can visit your site on http://localhost:8080, and then you're good to go.

 

Webapp Structure

There are a few noteworthy directories and files in the trivial-lisp-webapp:

  • asdf-systems: This is the directory that Another System Definition Facility (ASDF) uses to look for packages it wants to load. Every package that is loaded (including trivial-lisp-webapp) has a soft link to its '.asd' file which describes the package, its files, dependencies, and so on.
  • aux: This directory contains a bunch of ASDF enabled libraries that our webapp uses. It's generally more reliable to deploy libraries you want with your webapp than try to depend on libraries being installed system-wide. The two most notable libraries we use are Edi Weitz's Hunchentoot webserver and html-template - most of the other libraries are just dependencies for these two.
  • document-root/
    • static/: Static files that are served up by Hunchentoot (notably CSS, Javascript, and images). Anything placed in this directory is automatically served.
    • templates/: A series of templates that are compiled to generate HTML pages that users see. In the 'trivial' web app, we're basically just using html-template for server-side includes - but see the html-template documentation for instructions on how to do more complex things with them.
  • scripts: Scripts contains any scripts we need. In this case it simply contains 'startserver.sh', but in the future you could add scripts to analyze log files, update libraries, do backups, or anything else.
  • src/: Finally, the actual Lisp source code of the webapp. The files are:
    • init.lisp: A small file that bootstraps the webapp. It configures ASDF, loads dependencies, etc. when loaded by startserver.sh.
    • webapp.asd: The file that tells ASDF how to load our webapp (what dependencies are required, what should be exported, what order to load files in, etc).
    • global.lisp: A few global settings, such as paths.
    • misc.lisp: Any utility functions go here
    • pages.lisp: Setups handlers for any pages in the webapp
    • control.lisp: Functions to start and stop the webapp

A more substantial webapp might have some more functionality for a test suite, advanced configuration, easier deployment, backups, and so on - but this makes a pretty good starting point.

 

Next Steps

Now that you've got the basics down, it's time to get hacking! Try figuring out how to add a second page to the site (hint: modify pages.lisp and add a template to document-root/templates). Make a page that accepts GET/POST parameters and saves state across visits (see the hunchentoot documentation). You'll probably want to use emacs/slime/paredit mode, and there are plenty of great tutorials out there on doing so (I'm personally a bit partial to Marco Baringer's screencast - although it's getting a bit long-in-the-tooth these days).

In production, I run my webserver (via a script similar, but slightly more powerful than, startserver.sh) under the GNU Screen multiplexor (instead of as a daemon). I also setup a reverse proxy (nginx) in front of Lisp to shield Hunchentoot from malformed requests, provide faster gzip compression, and more resilient error pages.

If people find this helpful, and there's demand for it, I can write another post on how to expand our trivial-lisp-webapp to add persistence via a database. We could eventually even work our way up to building a very minimal MVC framework for Hunchentoot.

Rapid Iteration and the Effect of Network Effects

Hi Guys! This is John Buchanan, co-founder and COO of Postabon. I'm a former Investment Banker, Harvard Business School graduate, and new father. I thought I'd share some thoughts about the business and motivation behind Postabon, since I have a relatively unique perspective on this. Shaneal's been a little busy with the tech side of things since the launch (although he promises me he'll post another blog entry this week too!).

As some background, Postabon (www.postabon.com) is a crowd-sourcing platform that encourages users to find and share in-store discount recommendations.  Users can post shopping recommendations from their mobile phone or PC that are then viewable in real-time by any other user of the platform. We currently have 800 recommendations posted by users in Manhattan ranging from after holiday sales, happy hours and stores with generous return policies.

During the two years I was a student at Harvard Business School, I read over 800 business cases that outlined a problem facing a "protagonist" or a real life business manager. During each 90-minute class, a group of students (often unfairly) evaluate the manger, their industry and the situation at hand. While I often complained about the amount of reading, I found the training helpful while developing a business plan for Postabon, the company I co-founded with another HBS grad this June. Network effects, a concept I learned about at HBS, is an important aspect of our business model and the subject of this blog post.

Two trends led us to start the business 1) intense consumer demand for discounts and 2) and a change in the way people use mobile devices. Initially this led us to a similar conclusion as mobile couponers like Cellfire, Yowza, and others; partner with advertisers to offer location based discounts. While some of these mobile coupon providers have gained traction, we identified a flaw in this business model related to network effects.

Network effects is based on the idea that value of a networked service increases with each additional user added. Think of owning a telephone – it's worthless if you're the only person that owns one, but as more telephones are added to the network its value becomes exponentially greater. In the case of mobile couponers, network effects apply to users and advertisers. From the user's perspective, the network of local coupons becomes more valuable as the density and quality of discounts improve. From the advertiser's perspective, the value of advertising on a coupon network improves as users are added.

Several mobile coupon companies develop sales forces to sell large retailers or CPG companies to provide content on their network.  Without users, it's hard to convince these national players to publish content. For example, despite a fair amount of capital and some success (especially among grocers and west coast businesses) there are no deals posted on Manhattan from Cellfire. Without deals, there's little reason for me or other New Yorkers to use the Cellfire platform. Network effects create a barrier on both sides of Cellfire's business model and have prevented mainstream success.

We strive to make Postabon a very agile business, that can quickly change to find Product/Market Fit. Shaneal has done some work for the military, and likes to talk about the OODA (Observe-Orient-Decide-Act) Loop and Boyd's Law - which roughly says the team that can iterate fastest, wins. As soon as we realized some of the down-sides of a advertiser-based approach, we  adapted the Postabon platform to take full advantage of network effects and make it more focused on user-generated content and deal finding.

Postabon's crowd sourcing approach was developed to address the network effect constraints we identified in these other companies. Crowd sourcing allows us to generate significant content density on our network at a minimal cost.  Over the past six weeks we've generated over 1,000 posts with 800 in Manhattan or approximately 80 deals per square mile.  The closest competitor has approximately 200 deals on the entire island of Manhattan.  As we continue to attract more users who both post, review, and act on deals, our platform becomes more valuable to local and national advertisers. While Postabon also faces some user network effects it is our belief that a minimal number of active posters quickly creates a platform that is more valuable than alternative platforms.  It is our belief that we can quickly scale in a city like New York with minimal cost.

While the first month of our launch has significantly exceeded our expectations, Postabon still has a ways to go. We're looking forward to expanding in New York and other cities and continuing to prove our business model.

Why I chose Common Lisp over Python, Ruby, and Clojure

A few months ago, two co-founders (Stu Wall and John Buchanan) and I (Shaneal Manek) started working on a startup called Postabon.

The idea behind Postabon is simple: we wanted to create a platform where users could find and share ‘deals’ at brick and mortar stores (be they sales, coupons, happy hours, specials, etc). For example, if I’m out near a mall and need a pair of jeans I can pull out my phone and see which store near me is having the best sale on pants right now.

I just wanted to talk about a few of the high level technical decisions that I’ve made – in the hopes that it could help other people starting new projects out (and that I can get some feedback and learn something myself). This post is going to be pretty tightly focused on the language I chose. I have a few other posts in mind on topics such as the database (BerkelyDB) and overall architecture that I’m planning to write up in the next week or two..

Language

The way I saw it, was that as the sole programmer I needed a language that was concise, powerful, and that let me work quickly. This set of requirements, in my mind, eliminated Java (and I don’t know any C# …), which left me considering Python, Ruby, Clojure, and Common Lisp. I thought Haskell and Erlang were promising – but I’m just too inexperienced with them to commit to a large project (and, for better or worse, they aren’t really known as great languages for web applications).

Python

I am fairly experienced with Python, there are lots of great libraries/frameworks for anything I would want to do, and it would be easy to bring other programmers on-board later. However there were a few negatives that, in aggregate, were enough to get me to move on.

First, and most importantly, the Python 2 to 3 conversion really scared me. Most libraries I wanted to use were still Python 2 only – which meant I would have had to write Postabon’s back end in Python 2. But it makes no sense to me to write a large app, that I may have to maintain for years, in a language that has effectively received a death sentence. Python 2 is fine now – but in the coming months and years new libraries, features, and performance improvements are only going to be introduced in Python 3, and I didn’t want to get left behind or forced to take on an expensive and time consuming port in the future.

Second, I know it’s a bit cliche, but I don’t like the Global Interpreter Lock, which makes it basically impossible to write multi-threaded apps that work on multiple CPUs. Of course, writing a multi-process app would be a reasonable work-around, but it is a bit of an annoyance.

Finally, Guido’s disdain for functional programming makes it clear that I would be a second class citizen in Python-land. As a few small examples see:

My mind just works functionally, and I don’t want to be forced to fight the language I’m using at every turn.

Ruby

I’ve played with Ruby (mostly in the context of Rails) some – although I’m nowhere near as proficient with it as I am with Python, Lisp, etc. Ruby has a lot of the same strengths as Python, with fewer weaknesses. My criticisms about Python’s GIL apply to it too – but again simply using processes is an acceptable work-around.

The biggest reason I chose not to go this route is that the Ruby community is just moving too fast for me right now. Some major component of the development/production stack of choice seems to be changing every 6 months (e.g., I’ve seen the webserver go from FastCGI/Apache to Mongrel to Phusion to Unicorn). I couldn’t even easily figure out which version (1.8 or 1.9?) to use – or even which implementation (Ruby MRI, Ruby EE, JRuby, etc). Most of the articles I  found online are a few months old and I am told they are no longer accurate.

Also, much of the Ruby community is built around Rails, and I’m a bit wary of using ‘heavy weight’ frameworks like Rails (or Django) on large custom projects. In my experience they make the first 90% of what I’m trying to do be really easy – but then make the last 10% a living hell since I need to modify something the framework never intended me to control. I probably could have written a ‘bare-bones’ implementation of the site’s back-end in Rails in a week instead of two weeks, but I would rather ‘waste’ that one week up front to have more flexibility later.

For example, I ended up writing my own completely stateless session handling, building a fairly smart geo-spatial cache (in-memory R* trees that asynchronously persist to disk using B-Trees), and using a key-value store and raw b-trees for persistence (instead of a relational database). These (and a lot of other non-standard decisions I’ve made) are possible within Rails, but I think they would have cost me more time and energy than Rails would have saved up front – especially in light of my lack of experience with Rails.

In principle, I could see Ruby being the right choice for someone who was more experience with it upfront, is adequately plugged into the community and willing and able to switch out components of their stack. But, personally, I prefer a bit more stability in things.

Clojure

There are a lot of great things about Clojure: it runs on the JVM so I get all the Java libraries and that great JVM performance, it’s functional from the ground up, and it even has macros.

However, 6 months ago Clojure hadn’t even had it’s 1.0 release (and the language was constantly changing). When I tried to download it the Slime integration was completely broken and I had to manually search through the SVN repos of several key components to find a relatively recent working set of tools that worked together.

My feeling is that things are better now (Clojure is 2 years old!) and if I were making this decision again today, I would give much more serious consideration to Clojure.

Common Lisp

Finally, that brings me to Common Lisp. I have plenty of experience writing web apps in Lisp, so the high barrier to entry wasn’t a deterrent in my case. Although, make no mistake, that learning curve for writing a good CL web app is steep enough that I would warn most programmers to shy away from Lisp for writing a production app on a tight schedule (I would like to write a post or two that help new users get over the hump though).

I like that the language stabilized 15 years ago, that the kinks have been worked out, and that it has stood the test of time. I know the traditional complaint is the dearth of libraries, and there obviously aren’t as many options as they may be for other languages. While I have been fortunate to find several great options for everything I’ve needed to do so far (JSON/XML parsing, HTTP servers and clients, ORMs, etc) – more obscure libraries like Thrift and OpenID support may be an issue in the future. The lack of libraries is, without a doubt, the biggest disadvantage of CL and one of the reasons Clojure is so appealing to me. I can usually just write my own foreign function interface into a C library – but that’s really time consuming compared to downloading an egg/gem/jar.

The ability to use dynamic typing for most of my code but optionally give the compiler type-hints and get all the performance of a statically typed language for critical portions is a killer features that I still haven’t found elsewhere.

On balance, I think that Common Lisp was the best choice for me given my background and the needs of this project.

Status

Postabon just publicly released in New York! We are up to about 5K lines of Common Lisp (and well over 15K LoC total, including a website, mobile website, and iPhone App) and things have been progressing smoothly.

The next post I’ll write (probably in a few days) would cover our site’s general architecture. After that, I was planning to dig into some specifics about the best way to manage/deploy a production Common Lisp webapp, which I hope would help a new Lisper get off the ground.