Back to Main - Articles tagged with the "Database" category:

Three Weeks with CouchDB

In early August, I started work with a new client - CodeBaby. Their main product is a Flash animation that interacts with the user of a particular page on a customer's site.

CodeBaby collects data on this interaction, and each animation has many events that need to be captured and stored. I joined the project to help build a system that will collect this data and make it available for analysis and reporting.

After diving into the problem, it quickly became apparent that storing the data in a normal database would not work. Currently, the animation event data is being stored in a MySQL database. This works - to a point. CodeBaby is growing quickly, and after analyzing some numbers, it was clear that the data storage requirements would reach the limits of MySQL or any relation database system pretty quickly.

So I began to investigate other alternatives. CouchDB quickly made its way to the top of the list, so I set out to build a prototype system using CouchDB and Rails.

The first step was to setup a CouchDB server and learn how it works. I was able to get CouchDB 0.9.1 up and running pretty quickly on both Amazon EC2 and Rackspace Cloud Servers using Ubuntu 9.04. Once running, I was able to use the web interface to CouchDB - Futon. Futon is actually a very good utility. You can do most everything you need to in the browser, including changing server config values (you occasionally have to restart the server via the command line).

Once I had CouchDB running, I needed to load sample data into a database. Using Ruby and the couchrest gem, this turns out to be very easy. In short order I had a sample database setup and populated with about 100K documents. At several hundred MBs, this is a nice size to work with for development purposes. I was able to try out CouchDB's replication support between the EC2 server and the Rackspace server. Replication in CouchDB is easy and painless. CouchDB supports much more advanced replication functionality, but it's very good at the basics.

At this point, I should note that CouchDB is a good answer for certain problems, but not others. If you plan on building a typical Rails web app, I would still recommend MySQL or a traditional relational database. CouchDB is the better choice for problems that don't have a well defined schema, or where you need to store a lot of data that does not change often, and need to be able to read and sort on that data - a pretty good description of what CodeBaby needs.

After getting CouchDB setup and populated with data, it was time to learn about CouchDB views. If you are like me, and have spent your development career using SQL and a relational database, be prepared to spend a solid two weeks learning views. A CouchDB view is similar to a view in MySQL, in that it lets you retrieve a subset of data, but the similarity ends there. CouchDB views are built using Javascript and the MapReduce algorithm. Results of the MapReduce calculations are indexed and stored in a b-tree on disk. Essentially, this means that the first time you run a view in CouchDB, it takes time. Minutes or hours, depending on how many document you are storing. After that, views are very fast, and this is the real power of CouchDB. If you have a CouchDB database containing hundreds of millions of documents, you will be able to filter and view those documents much faster than if you had the same data in MySQL.

CouchDB seemed to be a great fit. I was able to build a prototype system with CouchDB and Rails pretty quickly on Amazon's EC2 infrastructure. But when it came time to run performance tests, I discovered some issues.

Because of the way CouchDB indexes and stores views on disk, writes to the database are serialized. This was a bottleneck. Using specific data from CodeBaby, I was able to get about 300 document inserts per second. This was after optimizations such as generating each document's '_id' value, rather than letting CouchDB handle it. 300 inserts per second turned out to be too slow, and short of setting up a complicated system of multiple databases and intelligent load balancers (using a consistent hashing algorithm), I came to the conclusion that CouchDB would not work fro this project.

To be honest, I was a bit disappointed. I like everything about CouchDB, from its simple API to the way you write views in Javascript. CouchDB is a solid product and I'll definitely consider using it again in a future project.

schema.rb troubles

While watching the migrations video on the Rails site today, I noticed a feature that I was not aware of. If you edit environment.rb and un-comment the line

config.active_record.schema_format = :ruby

Rails will generate your database schema in Ruby using the db/schema.rb file, and no longer use the database system's definition language (SQL). This means that you will no longer see development_structure.sql in the db directory. However, you can still generate this file by doing a "rake db_structure_dump", if you need it.

I had no trouble generating the schema.rb, but when it came time to run my unit tests, I was getting the following message:

Phil:~/Sites/rails phil$ rake test_units

(in /Users/phil/Sites/rails)

rake aborted!

Mysql::Error: Table 'admins' already exists:

I spent a while trying to figure this one out. When you run "rake test_units" the test database is supposed to be wiped out and rebuilt using schema.rb. But this was not happening.

The solution? Delete the database manually, and recreate a new empty test database. Strange.

I do love migrations though.

Pragmatic Studio Rails Alumni Pragmatic Studio Advanced Rails Alumni