Three Weeks with CouchDB

In early August, I started work with a new client - CodeBaby. Their main product is a Flash animation that interacts with the user of a particular page on a customer's site.

CodeBaby collects data on this interaction, and each animation has many events that need to be captured and stored. I joined the project to help build a system that will collect this data and make it available for analysis and reporting.

After diving into the problem, it quickly became apparent that storing the data in a normal database would not work. Currently, the animation event data is being stored in a MySQL database. This works - to a point. CodeBaby is growing quickly, and after analyzing some numbers, it was clear that the data storage requirements would reach the limits of MySQL or any relation database system pretty quickly.

So I began to investigate other alternatives. CouchDB quickly made its way to the top of the list, so I set out to build a prototype system using CouchDB and Rails.

The first step was to setup a CouchDB server and learn how it works. I was able to get CouchDB 0.9.1 up and running pretty quickly on both Amazon EC2 and Rackspace Cloud Servers using Ubuntu 9.04. Once running, I was able to use the web interface to CouchDB - Futon. Futon is actually a very good utility. You can do most everything you need to in the browser, including changing server config values (you occasionally have to restart the server via the command line).

Once I had CouchDB running, I needed to load sample data into a database. Using Ruby and the couchrest gem, this turns out to be very easy. In short order I had a sample database setup and populated with about 100K documents. At several hundred MBs, this is a nice size to work with for development purposes. I was able to try out CouchDB's replication support between the EC2 server and the Rackspace server. Replication in CouchDB is easy and painless. CouchDB supports much more advanced replication functionality, but it's very good at the basics.

At this point, I should note that CouchDB is a good answer for certain problems, but not others. If you plan on building a typical Rails web app, I would still recommend MySQL or a traditional relational database. CouchDB is the better choice for problems that don't have a well defined schema, or where you need to store a lot of data that does not change often, and need to be able to read and sort on that data - a pretty good description of what CodeBaby needs.

After getting CouchDB setup and populated with data, it was time to learn about CouchDB views. If you are like me, and have spent your development career using SQL and a relational database, be prepared to spend a solid two weeks learning views. A CouchDB view is similar to a view in MySQL, in that it lets you retrieve a subset of data, but the similarity ends there. CouchDB views are built using Javascript and the MapReduce algorithm. Results of the MapReduce calculations are indexed and stored in a b-tree on disk. Essentially, this means that the first time you run a view in CouchDB, it takes time. Minutes or hours, depending on how many document you are storing. After that, views are very fast, and this is the real power of CouchDB. If you have a CouchDB database containing hundreds of millions of documents, you will be able to filter and view those documents much faster than if you had the same data in MySQL.

CouchDB seemed to be a great fit. I was able to build a prototype system with CouchDB and Rails pretty quickly on Amazon's EC2 infrastructure. But when it came time to run performance tests, I discovered some issues.

Because of the way CouchDB indexes and stores views on disk, writes to the database are serialized. This was a bottleneck. Using specific data from CodeBaby, I was able to get about 300 document inserts per second. This was after optimizations such as generating each document's '_id' value, rather than letting CouchDB handle it. 300 inserts per second turned out to be too slow, and short of setting up a complicated system of multiple databases and intelligent load balancers (using a consistent hashing algorithm), I came to the conclusion that CouchDB would not work fro this project.

To be honest, I was a bit disappointed. I like everything about CouchDB, from its simple API to the way you write views in Javascript. CouchDB is a solid product and I'll definitely consider using it again in a future project.

Announcing Quiltivate.com Quilt Builder

I'm pleased to announce that one of my long-term side projects has launched a new service today.

Quiltivate.com is a community for quilters. You can browse hundreds of quilt block patterns and see what other quilters have built.

Today, Quiltivate launches a new service called the Quilt Builder.

What do I know about quilting? A lot more than I used to. My wife Kacie is the real expert though. I've been working with her over the past year or so to build a service for quilters to make their life easier. When it comes to planning a quilt, graph paper and colored pencils still rule the day. And if you are unlucky enough to spend money on the software available to 'help' you plan a quilt, you'll be ready to gouge your eyes out with those colored pencils after you try to figure out the software. The leading software available for quilters today costs well over $100 and you have to take a class to learn how to use it. And it requires Windows.

So we decided to create something simple and easy for quilters. Give it a try. I'm willing to bet if you are reading this, like me, you don't know much about quilting. But I'm willing to be you can figure out the Quilt Builder.

Quiltivate is built on Rails, of course. The Quilt Builder is a Flex application, talking to Rails behind the scenes. I had quite a bit of help building the Flex app from Daniel Wanja over at OnRails.org.

I've invested a lot of time, energy, and money in this project, and I'm really excited to see it launch. Kacie and I have built this service by continually reminding ourselves that good design matters and that 'less is more' - we are definitely trying to under-do the competition. We think the Quilt Builder does one thing extremely well. If you know any quilters out there, be sure to let them know about Quiltivate.com!

Merchant Account Lessons Learned

If you've ever been lucky enough to go through the process of setting up a merchant account and payment gateway, you know it can be a long and tricky process. The terminology, the fees, the forms to fill out...it's not fun. But it's a necessary evil. The alternative is PayPal, so let's not even go there.

Once you have a merchant account setup, it tends to run fairly smoothly. You pay the monthly fee and money shows up in your account, minus the occasional chargeback.

But what happens when you need a second merchant account? Why would you even need a second merchant account, or a third or fourth for that matter?

If I told you I can bring $250,000 in revenue to your business every month, you would probably do whatever it takes to work with me. But it's not that way with merchant account providers. Surprisingly, merchant accounts have a monthly cap on the total amount of transactions. So if your business is doing $75,000 per month in transactions and your growth shows that you will be at $100,000 next month, and your merchant account is capped at $80,000 per month, you had better find a second merchant account soon.

You would think that the more money you send to a company the better, but not with merchant account providers. It turns out they do not like chargebacks. The more transactions you send their way, the more likely they are to have to process and pay chargebacks - which eventually get billed to you, but not immediately.

So if your business is processing a high dollar amount in transactions each month, you will end up with multiple merchant accounts. This causes some interesting problems in whatever application or billing system you are running. First, refunds must go through the same merchant account as the original transaction. Merchant account providers will scream if you charge a customer $20 on a separate merchant account and refund the customer on their system - and for good reason. It's easy to commit fraud when you can't trace the flow of money through a system. Second, if you 'use up' the monthly limit on one account half way through the month, you must send all new transactions to a different system. You will end up creating what amounts to a 'load balancer' for merchant accounts to ensure that you do not reach the limit on one merchant account too soon.

The client I am working for processes a large amount of transactions each month and we have done what I've outlined above. Our team has essentially learned on-the-fly and built a system that can handle refunds and charges correctly. A load balancer for merchant accounts. Along the way, we've added a half dozen merchant accounts to the system and more are on the way. Once again, the ActiveMerchant library has been great. Some of the new merchant accounts were supported already, and with others, we've had to add support. Here are two that we are using that will someday make it into the ActiveMerchant library -

ActiveMerchant Support for the JetPay Gateway

ActiveMerchant Support for the FirstPay Gateway

An interesting side note on this issue. Many companies deal with this multiple merchant account issue and it can be very tricky. In particular, the chargebacks can really cause problems. A few years ago, the Denver-based airline Frontier had to file for bankruptcy. The cause, in simple terms, had to do with merchant accounts and chargebacks. So many flyers were telling their card company not to pay for already purchased tickets (for weather delays, cancellations, etc) that Frontier ended up having cashflow issues with their merchant providers. The only solution was to file for bankruptcy and fix system.

What I've Been Doing for the Past Year

It's been over a year since I last updated this area of the blog. I've been extremely busy, with both work and life. My wife and I had our first baby in early December 2008. These past few months have been amazing, but most of my time in 2008 was spent working hard for one client.

So what have I been doing? Simply put, I've been building a large system using both Ruby and Rails. Remember that small project for ID Watchdog (IDW) I built? Well, it turned into something much larger.

After launching the new signup system, the team I'm working with continued to iterate, build, and maintain applications for IDW. Legacy systems needed to be migrated to the new system we had built. New processes had to be designed and implemented. Then the real fun began: IDW had an IPO in August 2008, and since then IDW's growth has been off-the-charts. All of the systems we built in the first half of 2008 needed to be modified to handle this tremendous growth. New third party APIs needed to be added. Databases had to be optimized. Servers were added. More developers were hired. In short, the project grew very fast.

Some of the highlights while working with IDW in 2008:

I designed and built the core functionality for IDW's monitoring process using Ruby and BackgrounDRb. I designed and built a distributed billing system in BackgrounDRb. I wrote code to interface with a handful of third party APIs. Some of the code is set to become open source patches to the ActiveMerchant library. Our team made the migration from Subversion to Git, and never looked back. We built an API for the IDW system, which is now in use by hundreds of companies every day. We added servers as needed to handle the load. Finally, as the system began to push the limits of BackgrounDRb, I moved the background processing components to the delayed_job queue system.

In a nutshell, this is what I do. I build applications and systems for companies. Often I will start with a small piece of the puzzle, and work with the company over time to make sure the system will grow and scale as the company grows. Both Ruby and Rails are perfect for companies like IDW, where flexibility and short iterations are necessary to keep pace with the company's growth.

I've been doing this for over 10 years now, and it's what I love to do. I'm passionate about building applications and systems. I'm fortunate that I can work on projects that are interesting, using tools I love, with a great team of developers.

Open Source Ruby Library for the Merlin API

Along with the previous post, I would like to announce the ruby-merlin library.

If you are working with Merlin API data inside a Ruby or Rails application, be sure to try the library.

Open Source Ruby Library for the IDology API

While working on a recent project, I wrote a Ruby library for interfacing with the IDology API.

If you are building a Ruby or Rails application and need to work with IDology data, have a look at the ruby-idology library.

Pragmatic Studio Rails Alumni Pragmatic Studio Advanced Rails Alumni