Q: How is Memcached used?
A: The vast majority of top Websites use Memcached already. Databases can contain huge amounts of data, but for really high-performance Websites, they’re just too slow. Most of the data that you look up in a database, you’re going to be looking up over and over again. When a page gets rendered, the same data will be shown many times as you render similar pages over and over. As you browse through a site, most of the information displayed is not going to change from moment to moment, so it’s wasteful for the application server to look all this stuff up in the database at the same time, repetitively.
One solution would be for an application server to keep a local cache, so it doesn’t have to look up data so often. But really large sites run multiple instances of their application servers, so they need a cache to share between them. And Memcached is the one that fits that problem most cleanly. It’s a very simple protocol that was designed to run on very affordable hardware. There’s not a lot of complexity or learning curve in writing to its API, so it really took off in this space.
There are many other caching solutions but they’re all a combination of complicated and very expensive to use. Memcached hit a sweet spot being simple and free.
Q: What attracted you to Gear6 and what will you be doing in your new role?
A: I am going to be the face of Gear6 into the open-source communities that are part of this new Web scale architecture. These include Memcached,Gearman, Drizzle and libmemcached. Gear6 has a clear vision of how it wants to be positioned in this new Web stack, and I want to be part of it. It’s also exciting to me that commercial companies are beginning to form around Memcached. I am a big believer in open
source
and think I can help augment Gear6’s contributions to open source. It will be good for the company and for the open-source community.
Q: What is the most interesting thing about Memcached today?
A: Its amazing simplicity. It’s a very simple key value, or KV, store. KV stores are becoming very big and Memcached was there before people realized just how important they would become.
The other interesting thing about Memcached is really how inefficient it is. It was written so simply and so quickly and it solved a need so well that no one realized how wasteful it was of the memory was on the machine it was running on. One of the things Gear6 brings to the game – at the cost of a great deal more thought and some careful engineering – is more efficient memory use. The Gear6 Memcached distribution is Memcached at the same speed and same API but a much smaller memory footprint.
Q: You also contribute to Drizzle and libmemcached. How do they fit with Memcached?
A: Libmemcached is a client library for Memcached mainly written by Brian Akerwith input of many other people, including myself. It was written when Brian discovered that the existing C library binding for Memcached clients was slow and buggy, and so he felt the need to write a much faster, more efficient one, and now that’s becoming the basis for language bindings for many other languages. Python, Ruby and at least one of the well distributed Perl bindings use Memcached.
Drizzle is a fork of MySQL 6.0. It was done with idea of being a re-architecting and an opportunity to revisit some decisions, maybe correct some mistakes. But instead of focusing on enterprise use of a database, that is, competing with Oracle, Drizzle is designed to run behind web and application servers. It’s designed to run on rack-mounted machines with many processor cores serving the kinds of queries that get asked by application servers building web pages. Drizzle is going to start showing up on high performance websites because that’s what it was designed to do, and Memcached shows up on these same implementations because it enables high performance websites.
Q: Does this high performance extend to cloud applications?
A: Cloud providers are either running a fair amount of Memcached or should be running a fair amount of Memcached under the hood for their own uses. I hope soon that they will be exposing a Memcached service to their users. Multi-tenancy support will be very important for that, allowing multiple people to use the same actual cache servers without interfering with each other. Gear6 has done some work on multi-tenancy support and I know that several of the developers in the open source community have been doing some work on this. My hope is that we can get everything aligned so that we have only one implementation to manage.
Some people are resisting moving into the cloud because they can’t get high performance Memcached in their environment. They get medium performance Memcached by running the open source server on EC2 [Amazon Elastic Compute Cloud] nodes. Having a native Memcached in AWS [Amazon Web Services], Rackspace, Network.com or any of the other services is something that I would like to see happen.
Q: What is your best advice about Memcached to LAMP developers?
A: Design your applications to scale out. And design your applications to scale out. Don’t assume that you’re just going to get a bigger and bigger machine. Start with the assumption that you’re going to be running multiple instances of your application server talking to multiple shards or copies of your database and, of course, cache as much as you can in Memcached.