You’re reading Signal vs. Noise, a publication about the web by 37signals since 1999.

37signals

Taylor

About Taylor

Vizsla owner. Husband. Son. Brother. Ops Manager at 37signals.

Making Application Maintenance Fun

Taylor wrote this on / Discuss

Today I made intermission public. As I mentioned in my post about mysql_role_swap we’ve been working hard to limit / eliminate the impact our operations maintenance tasks have on our customer’s experience.

A few people noticed the /tmp/hold “leftover” in mysql_role_swap script. intermission is a product of that early exploration with coordinating database maintenance with request pausing in the web application tier. I’ve done a good bit of non production testing with intermission, but only limited production testing.

Last Friday we used intermission with mysql_role_swap to move Writeboard’s database to a new server. We had a single user facing exception, and we think it was likely caused by something other than the maintenance. For Friday’s maintenance we enabled request pausing via intermission, ran mysql_role_swap, restarted the unicorn (rails) processes, and then unpaused the requests. Total maintenance time was just a few seconds!

Making Database Maintenance Fun: mysql_role_swap

Taylor wrote this on / 10 comments

We’ve come a long way in the last year in the way we operate our sites. We’ve stabilized our applications, improved their response time, and increased their availability.

To accomplish these improvements we’ve done a series of database maintenances that varied from upgraded hardware, to new database servers, to configuration changes that required restart. In each of these operations we had one common goal: minimize the interruption to our customers.

Today we are releasing a small script that has made our lives, and our customer’s lives a whole lot better. We use this script to change the roles of our databases from replication masters to slaves, and vice versa. The fact that the script does all the steps previously performed by a human in a more timely and perfect manner is where we achieve all the gain.

Without this script we used to spend minutes accomplishing these maintenance tasks. With the script we’ve swapped databases under production load with no user noticeable interruption!

The script has lots of hard coded paths and users and other assumptions. But this is too good to keep to ourselves. We’re sharing it with you with the hope that it will improve your operations experience, and that you will contribute back changes that make it even better.

Behind the Scenes: The Hardware that Powers Basecamp, Campfire and Highrise

Taylor wrote this on / 20 comments

A few of us recently attended Velocity Conference in San Jose, CA. In the “hallway track sessions” a number people asked about the hardware that powers Basecamp, Campfire and Highrise.

Application Servers
All of our Ruby/Rails application roles run on Dell C Series 5220 servers. We chose C5220 servers because they provide high density, high performance, and low cost compute sleds at a decent cost point. The C5220 sleds replaced invidual Dell R710 servers which consumed a greater amount of power and rack space in addition to offering expandability we were not utilizing.

We use an 8 sled configuration with E31270 3.40GHz processors, 32/16G of ram, an LSI raid card and 2 non Dell SSDs. (For those of you thinking of ordering these … get the LSI raid card. The built in Intel raid is unreliable.) Each chassis with 8 sleds takes up 4U of rackspace: 3 for the chassis and 1 for cabling.

Job / Utility Servers
We use a combination of C6100 and C6220 servers to power our utility/jobs and API roles. We exclusively use the 4 sled version (of each) which means we get 4 “servers” in 2U. Each sled has 2x X5650 processors, 48-96G of ram, 2-6 ssds, and 4×1G or 1×10G network interfaces. This design allows to have up to 24 disks in a single chassis while consuming the same space as a single R710 server (which holds 8 disks max).

Search
For Solr we run R710s filled with SSDs. Each instance varies, but a common configuration is 2x E5530 processors, 48G of ram, 4-8 ssds, and 4×1g network interfaces. For Elastic Search we run a mix of Poweredge 2950 servers and C 5220 sleds with 12-16G of ram and 2×400G ssds in a raid 1.

Database and Memcache/Redis Servers
For Database roles we use R710s with 2x X5670 processors, 1.2TB Fusion-IO duo cards and varying amounts of memory. (Varies based on the database size.) We also have a number of older R710s powering Memcache and Redis instances. Each of these has has 2x E5530 processors and 2-4 disks with 4×1G network interfaces.

Storage
We have around 400TB / 9 nodes of Isilon 36 and 72NL storage. We serve all of the user uploaded content off this storage with backups to S3.

OS Choice
Database servers run RHEL or Centos 6 while application and utility servers run Ubuntu LTS.

Matt Kent joins 37signals as Sysop

Taylor wrote this on / 11 comments

Today Matt Kent started at 37signals as the sixth member of our operations team. Previously Matt worked for 10 years(!) at Bravenet as a System Administrator handling application deployment and scaling tasks. Some of Matt’s managers and coworkers used phrases like “backbone of our team” and “a great person, and a fantastic engineer” to describe what working with Matt is like.

Recently Matt was selected as an Opscode MVP for his contributions to Chef not once, not twice, but three times. Less than a day after Matt’s first interview, he resolved both our Chef bug reports endearing him to the entire team. In addition, Matt has experience with multiple data center setups, automated deployment, monitoring, and just about every other area we touch.

Welcome Matt!

I heard you like numbers...

Taylor wrote this on / 61 comments

In 2011

  • Our support team responded to 100,000 cases
  • Our syslog server logged an average of 1,500,000,000+ messages each day
  • Our solr indexer processed 428,000,000 indexing requests for Highrise alone
  • We hit our 100,000,000th person/company creation in Highrise
  • And a Basecamp user uploaded the 100,000,000th file (It was a picture of a cat!)

In the first 10 business days of 2012

  • We stored 100,000,000 unique statsd measurements
  • Our syslog server logged over 20,000,000,000 messages
  • Our applications sent more than 2,000 email notifications per minute
  • And Basecamp accepted an average of 75,000 file uploads from users per day

Interested in numbers other than revenue and profit? Leave a comment and we (mostly Noah) will dig them up for a future post.

Why Attacking Application Exceptions is Important

Taylor wrote this on / 14 comments

Background

Every week, a few of our programmers focus on responding to customer problems that might indicate a bigger issue with one of our applications. In addition, we’re constantly looking at issues (Rails exceptions, performance “hotspots”, etc.) that don’t bubble up to the customer level, but that are just “good housekeeping”. Taking care of these issues can have both real and measurable impacts — for example, we reduced application exceptions by 43% in December, allowing us to proactively fix even more problems before customers noticed.

Lets enter the house of Campfire, our real time chat product. Campfire uses memcache in a number of ways, one of which is to store fragments of messages that have been “spoken” in a room. When a user requests the transcript for that room we attempt to build a list of these messages from the cache before we hit the database. To do this, we build an array of message keys which are dispatched to a memcache-client multi_get request.

For a long time Campfire has generated exceptions showing multi_get failures, but the errors aren’t bringing down the site or causing users problems. When I first came to 37signals the exceptions were described as being caused by a bug in older versions of memcached (server), but no one knew “for sure”. Lacking any open issues or similar bug reports, as a work around our developers had patched the memcache-client to rescue ENOPIPE errors, raise by these failed get requests. The patch didn’t make the problem go away, but it kept it from being customer impacting…

Identifying the Problem

While testing a new branch of Campfire in staging, one of our developers was repeatedly encountering this error to the point where an exception was being raised for almost every page. Remember how I said these things pop up at the most inconvenient times? I decided to try and figure out these exceptions once and for all. (Ultimately, it turns out we hadn’t bundled the patched gem and that’s why the exception was being raised so often.)

After reviewing the code for memcache-client, I decided there wasn’t anything obviously wrong with its implementation for multi_get requests. To start my debugging, I fired up telnet, set two keys, and issued a “get keyone keytwo”. This worked fine. I had established that multi gets in the most basic form do work.

Next I used strace to attach to the unicorn (Ruby on Rails) process and see what the client was sending:

write(15, "get enter_messages/343555238/hide_author=false&hide_date=false&last_read=343841341 timestamp_messages/343575826/hide_author=false&hide_date=false&last_read=343841341 enter_messages/343575827/hide_author=false&hide_date=true&last_read=343841341 timestamp_messages/343582929/hide_author=false&hide_date=true&last_read=343841341 enter_messages/343582930/hide_author=false&hide_date=true&last_read=343841341 timestamp_messages/343589192/hide_author=false&hide_date=true&last_read=343841341 kick_messages/343589193/hide_author=false&hide_date=true&last_read=343841341 timestamp_messages/343592436/hide_author=false&hide_date=true&last_read=343841341 enter_messages/343592437/hide_author=false&hide_date=true&last_read=343841341 timestamp_messages/343606646/hide_author=false&hide_date=true&last_read=343841341 enter_messages/343606647/hide_author=false&hide_date=true&last_read=343841341 timestamp_messages/343654455/hide_author=false&hide_date=true&last_read=343841341 upload_messages/343654456/hide_author=false&hide_date=true&last_read=343841341 timestamp_messages/343663288/hide_author=false&hide_date=true&last_read=343841341 text_messages/343663289/hide_author=false&hide_date=true&last_read=343841341 text_messages/343663294/hide_author=true&hide_date=true&last_read=343841341 text_messages/343663298/hide_author=true&hide_date=true&last_read=343841341 text_messages/343663301/hide_author=true&hide_date=true&last_read=343841341 text_messages/343663459/hide_author=false&hide_date=true&last_read=343841341 text_messages/343663763/hide_author=true&hide_date=true&last_read=343841341 text_messages/343663833/hide_author=true&hide_date=true&last_read=343841341 timestamp_messages/343664243/hide_author=false&hide_date=true&last_read=343841341 text_messages/343664244/hide_author=false&hide_date=true&last_read=343841341 text_messages/343664249/hide_author=false&hide_date=true&last_read=343841341 text_messages/343664928/hide_author=true&hide_date=true&last_read=343841341 text_messages/343665538/hide_author=false&hide_date=true&last_read=343841341 text_messages/343665582/hide_author=true&hide_date=true&last_read=343841341 text_messages/343665661/hide_author=false&hide_date=true&last_read=343841341 text_messages/343665730/hide_author=true&hide_date=true&last_read=343841341 timestamp_messages/343668494/hide_author=false&hide_date=true&last_read=343841341 text_messages/343668495/hide_author=false&hide_date=true&last_read=343841341 timestamp_messages/343670646/hide_author=false&hide_date=true&last_read=343841341 kick_messages/343670647/hide_author=false&hide_date=true&last_read=343841341 timestamp_messages/343674727/hide_author=false&hide_date=true&last_read=343841341 upload_messages/343674728/hide_author=false&hide_date=true&last_read=343841341 text_messages/343675113/hide_author=true&hide_date=true&last_read=343841341 text_messages/343675330/hide_author=true&hide_date=true&last_read=343841341 text_messages/343675512/hide_author=true&hide_date=true&last_read=343841341 text_messages/343675662/hide_author=false&hide_date=true&last_read=343841341 text_messages/343675783/hide_author=true&hide_date=true&last_read=343841341 text_messages/343675850/hide_author=false&hide_date=true&last_read=343841341 text_messages/343675926/hide_author=false&hide_date=true&last_read=343841341 text_messages/343675962/hide_author=false&hide_date=true&last_read=343841341 text_messages/343676003/hide_author=true&hide_date=true&last_read=343841341 text_messages/343676419/hide_author=false&hide_date=true&last_read=343841341 text_messages/343676627/hide_author=true&hide_date=true&last_read=343841341 timestamp_messages/343683775/hide_author=false&hide_date=true&last_read=343841341 enter_messages/343683778/hide_author=false&hide_date=true&last_read=343841341 timestamp_messages/343697190/hide_author=false&hide_date=true&last_read=343841341 kick_messages/343697191/hide_author=false&hide_date=true&last_read=343841341 text_messages/343698503/hide_author=false&hide_date=true&last_read=343841341 text_messages/343698601/hide_author=true&hide_date=true&last_read=343841341 text_messages/343698676/hi", 4096) = 4096
rt_sigprocmask(SIG_BLOCK, [RT_27 RT_28], [], 8) = 0
select(17, [], [15], [], {30, 525125})  = 1 (out [15], left {30, 525122})
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
write(15, "de_author=false&hide_date=true&last_read=343841341 text_messages/343698734/hide_author=false&hide_date=true&last_read=343841341 upload_messages/343699275/hide_author=true&hide_date=true&last_read=343841341 text_messages/343699336/hide_author=true&hide_date=true&last_read=343841341 text_messages/343699454/hide_author=true&hide_date=true&last_read=343841341 text_messages/343699578/hide_author=false&hide_date=true&last_read=343841341 text_messages/343699907/hide_author=false&hide_date=true&last_read=343841341 text_messages/343700196/hide_author=true&hide_date=true&last_read=343841341 text_messages/343700303/hide_author=false&hide_date=true&last_read=343841341 timestamp_messages/343700490/hide_author=false&hide_date=true&last_read=343841341 text_messages/343700491/hide_author=false&hide_date=true&last_read=343841341 text_messages/343700960/hide_author=true&hide_date=true&last_read=343841341 text_messages/343701033/hide_author=true&hide_date=true&last_read=343841341 text_messages/343701249/hide_author=true&hide_date=true&last_read=343841341 text_messages/343701495/hide_author=true&hide_date=true&last_read=343841341 paste_messages/343701499/hide_author=true&hide_date=true&last_read=343841341 text_messages/343701536/hide_author=true&hide_date=true&last_read=343841341 text_messages/343701811/hide_author=true&hide_date=true&last_read=343841341 timestamp_messages/343705859/hide_author=false&hide_date=true&last_read=343841341 upload_messages/343705860/hide_author=false&hide_date=true&last_read=343841341 upload_messages/343705927/hide_author=true&hide_date=true&last_read=343841341 timestamp_messages/343707289/hide_author=false&hide_date=true&last_read=343841341 text_messages/343707290/hide_author=false&hide_date=true&last_read=343841341 text_messages/343707541/hide_author=false&hide_date=true&last_read=343841341 text_messages/343707657/hide_author=false&hide_date=true&last_read=343841341 text_messages/343708101/hide_author=false&hide_date=true&last_read=343841341 timestamp_messages/343711297/hide_author=false&hide_date=true&last_read=343841341 upload_messages/343711298/hide_author=false&hide_date=true&last_read=343841341 upload_messages/343711359/hide_author=true&hide_date=true&last_read=343841341 timestamp_messages/343714551/hide_author=false&hide_date=true&last_read=343841341 text_messages/343714552/hide_author=false&hide_date=true&last_read=343841341 text_messages/343714779/hide_author=false&hide_date=true&last_read=343841341 text_messages/343715190/hide_author=true&hide_date=true&last_read=343841341 text_messages/343715303/hide_author=true&hide_date=true&last_read=343841341 text_messages/343715380/hide_author=true&hide_date=true&last_read=343841341 text_messages/343715452/hide_author=true&hide_date=true&last_read=343841341 text_messages/343715558/hide_author=true&hide_date=true&last_read=343841341 text_messages/343716023/hide_author=false&hide_date=true&last_read=343841341 text_messages/343716257/hide_author=false&hide_date=true&last_read=343841341 text_messages/343716298/hide_author=false&hide_date=true&last_read=343841341 text_messages/343716338/hide_author=true&hide_date=true&last_read=343841341 text_messages/343716577/hide_author=false&hide_date=true&last_read=343841341 text_messages/343716745/hide_author=false&hide_date=true&last_read=343841341 text_messages/343716816/hide_author=true&hide_date=true&last_read=343841341 text_messages/343716900/hide_author=false&hide_date=true&last_read=343841341 text_messages/343716999/hide_author=false&hide_date=true&last_read=343841341 timestamp_messages/343717255/hide_author=false&hide_date=true&last_read=343841341 text_messages/343717256/hide_author=false&hide_date=true&last_read=343841341 text_messages/343717265/hide_author=false&hide_date=true&last_read=343841341 text_messages/343717555/hide_author=true&hide_date=true&last_read=343841341 text_messages/343717960/hide_author=true&hide_date=true&last_read=343841341 text_messages/343718088/hide_author=false&hide_date=true&last_read=343841341 text_messages/343718097/hide_author=false&hide_date=true&last_read=343841341 text_messages/34371819", 4096) = 4096
--- SIGRT_27 (Real-time signal 25) @ 0 (0) ---
getrusage(RUSAGE_SELF, {ru_utime={4, 270000}, ru_stime={0, 190000}, ...}) = 0
timer_settime(0x14, 0, {it_interval={0, 12712000}, it_value={0, 12712000}}, NULL) = 0
rt_sigreturn(0x4b5412a22c0ef)           = 4096
rt_sigprocmask(SIG_BLOCK, [RT_27 RT_28], [], 8) = 0
select(17, [], [15], [], {30, 517198})  = 1 (out [15], left {30, 517196})
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
write(15, "2/hide_author=false&hide_date=true&last_read=343841341 text_messages/343718248/hide_author=false&hide_date=true&last_read=343841341 text_messages/343718670/hide_author=false&hide_date=true&last_read=343841341 text_messages/343718779/hide_author=true&hide_date=true&last_read=343841341 text_messages/343718955/hide_author=false&hide_date=true&last_read=343841341 timestamp_messages/343725746/hide_author=false&hide_date=true&last_read=343841341 text_messages/343725747/hide_author=false&hide_date=true&last_read=343841341 text_messages/343725757/hide_author=true&hide_date=true&last_read=343841341 text_messages/343725762/hide_author=true&hide_date=true&last_read=343841341 timestamp_messages/343729823/hide_author=false&hide_date=true&last_read=343841341 text_messages/343729824/hide_author=false&hide_date=true&last_read=343841341 timestamp_messages/343758537/hide_author=false&hide_date=true&last_read=343841341 enter_messages/343758538/hide_author=false&hide_date=true&last_read=343841341 timestamp_messages/343764428/hide_author=false&hide_date=true&last_read=343841341 text_messages/343764429/hide_author=false&hide_date=true&last_read=343841341 text_messages/343764474/hide_author=true&hide_date=true&last_read=343841341 text_messages/343764528/hide_author=true&hide_date=true&last_read=343841341 text_messages/343764620/hide_author=true&hide_date=true&last_read=343841341 timestamp_messages/343773529/hide_author=false&hide_date=true&last_read=343841341 text_messages/343773530/hide_author=false&hide_date=true&last_read=343841341 timestamp_messages/343783468/hide_author=false&hide_date=true&last_read=343841341 text_messages/343783469/hide_author=false&hide_date=true&last_read=343841341 upload_messages/343783615/hide_author=true&hide_date=true&last_read=343841341 text_messages/343783650/hide_author=true&hide_date=true&last_read=343841341 text_messages/343783875/hide_author=true&hide_date=true&last_read=343841341 text_messages/343784004/hide_author=true&hide_date=true&last_read=343841341 text_messages/343784277/hide_author=false&hide_date=true&last_read=343841341 text_messages/343784372/hide_author=false&hide_date=true&last_read=343841341 timestamp_messages/343784578/hide_author=false&hide_date=true&last_read=343841341 text_messages/343784579/hide_author=false&hide_date=true&last_read=343841341 timestamp_messages/343802940/hide_author=false&hide_date=true&last_read=343841341 text_messages/343802941/hide_author=false&hide_date=true&last_read=343841341 paste_messages/343802954/hide_author=true&hide_date=true&last_read=343841341 text_messages/343803300/hide_author=true&hide_date=true&last_read=343841341 timestamp_messages/343812622/hide_author=false&hide_date=true&last_read=343841341 text_messages/343812623/hide_author=false&hide_date=true&last_read=343841341 text_messages/343813469/hide_author=false&hide_date=true&last_read=343841341 paste_messages/343813500/hide_author=true&hide_date=true&last_read=343841341 text_messages/343813695/hide_author=true&hide_date=true&last_read=343841341 text_messages/343813765/hide_author=false&hide_date=true&last_read=343841341 text_messages/343814066/hide_author=true&hide_date=true&last_read=343841341 enter_messages/343814107/hide_author=false&hide_date=true&last_read=343841341 upload_messages/343814360/hide_author=false&hide_date=true&last_read=343841341 text_messages/343814556/hide_author=false&hide_date=true&last_read=343841341 timestamp_messages/343814680/hide_author=false&hide_date=true&last_read=343841341 text_messages/343814681/hide_author=false&hide_date=true&last_read=343841341 text_messages/343815559/hide_author=false&hide_date=true&last_read=343841341 text_messages/343815889/hide_author=true&hide_date=true&last_read=343841341 paste_messages/343815893/hide_author=false&hide_date=true&last_read=343841341 text_messages/343815915/hide_author=true&hide_date=true&last_read=343841341 text_messages/343815953/hide_author=true&hide_date=true&last_read=343841341 text_messages/343816359/hide_author=false&hide_date=true&last_read=343841341 text_messages/343816657/hide_author=true&hide_date=true&last_read=3438413", 4096) = -1 EPIPE (Broken pipe)
--- SIGPIPE (Broken pipe) @ 0 (0) ---
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
sendto(9, "<13>Dec 29 20:43:11 campfire[17363]: [37s] Generic failure: Errno::EPIPE: Broken pipe", 85, MSG_NOSIGNAL, NULL, 0) = 85
rt_sigprocmask(SIG_BLOCK, [RT_27 RT_28], [], 8) = 0
select(17, [], [15], [], {30, 516190})  = 1 (out [15], left {30, 516188})
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
rt_sigprocmask(SIG_BLOCK, [RT_27 RT_28], [], 8) = 0
select(17, [], [15], [], {30, 516071})  = 1 (out [15], left {30, 516069})
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
write(15, "get enter_messages/343555238/hide_author=false&hide_date=false&last_read=343841341 timestamp_messages/343575826/hide_author=false&hide_date=false&last_read=343841341 enter_messages/343575827/hide_author=false&hide_date=true&last_read=343841341 timestamp_messages/343582929/hide_author=false&hide_date=true&last_read=343841341 enter_messages/343582930/hide_author=false&hide_date=true&last_read=343841341 timestamp_messages/343589192/hide_author=false&hide_date=true&last_read=343841341 kick_messages/343589193/hide_author=false&hide_date=true&last_read=343841341 timestamp_messages/343592436/hide_author=false&hide_date=true&last_read=343841341 enter_messages/343592437/hide_author=false&hide_date=true&last_read=343841341 timestamp_messages/343606646/hide_author=false&hide_date=true&last_read=343841341 enter_messages/343606647/hide_author=false&hide_date=true&last_read=343841341 timestamp_messages/343654455/hide_author=false&hide_date=true&last_read=343841341 upload_messages/343654456/hide_author=false&hide_date=true&last_read=343841341 timestamp_messages/343663288/hide_author=false&hide_date=true&last_read=343841341 text_messages/343663289/hide_author=false&hide_date=true&last_read=343841341 text_messages/343663294/hide_author=true&hide_date=true&last_read=343841341 text_messages/343663298/hide_author=true&hide_date=true&last_read=343841341 text_messages/343663301/hide_author=true&hide_date=true&last_read=343841341 text_messages/343663459/hide_author=false&hide_date=true&last_read=343841341 text_messages/343663763/hide_author=true&hide_date=true&last_read=343841341 text_messages/343663833/hide_author=true&hide_date=true&last_read=343841341 timestamp_messages/343664243/hide_author=false&hide_date=true&last_read=343841341 text_messages/343664244/hide_author=false&hide_date=true&last_read=343841341 text_messages/343664249/hide_author=false&hide_date=true&last_read=343841341 text_messages/343664928/hide_author=true&hide_date=true&last_read=343841341 text_messages/343665538/hide_author=false&hide_date=true&last_read=343841341 text_messages/343665582/hide_author=true&hide_date=true&last_read=343841341 text_messages/343665661/hide_author=false&hide_date=true&last_read=343841341 text_messages/343665730/hide_author=true&hide_date=true&last_read=343841341 timestamp_messages/343668494/hide_author=false&hide_date=true&last_read=343841341 text_messages/343668495/hide_author=false&hide_date=true&last_read=343841341 timestamp_messages/343670646/hide_author=false&hide_date=true&last_read=343841341 kick_messages/343670647/hide_author=false&hide_date=true&last_read=343841341 timestamp_messages/343674727/hide_author=false&hide_date=true&last_read=343841341 upload_messages/343674728/hide_author=false&hide_date=true&last_read=343841341 text_messages/343675113/hide_author=true&hide_date=true&last_read=343841341 text_messages/343675330/hide_author=true&hide_date=true&last_read=343841341 text_messages/343675512/hide_author=true&hide_date=true&last_read=343841341 text_messages/343675662/hide_author=false&hide_date=true&last_read=343841341 text_messages/343675783/hide_author=true&hide_date=true&last_read=343841341 text_messages/343675850/hide_author=false&hide_date=true&last_read=343841341 text_messages/343675926/hide_author=false&hide_date=true&last_read=343841341 text_messages/343675962/hide_author=false&hide_date=true&last_read=343841341 text_messages/343676003/hide_author=true&hide_date=true&last_read=343841341 text_messages/343676419/hide_author=false&hide_date=true&last_read=343841341 text_messages/343676627/hide_author=true&hide_date=true&last_read=343841341 timestamp_messages/343683775/hide_author=false&hide_date=true&last_read=343841341 enter_messages/343683778/hide_author=false&hide_date=true&last_read=343841341 timestamp_messages/343697190/hide_author=false&hide_date=true&last_read=343841341 kick_messages/343697191/hide_author=false&hide_date=true&last_read=343841341 text_messages/343698503/hide_author=false&hide_date=true&last_read=343841341 text_messages/343698601/hide_author=true&hide_date=true&last_read=343841341 text_messages/343698676/hi", 4096) = -1 EPIPE (Broken pipe)
--- SIGPIPE (Broken pipe) @ 0 (0) ---
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
sendto(9, "<13>Dec 29 20:43:11 campfire[17363]: [37s] Generic failure: Errno::EPIPE: Broken pipe", 85, MSG_NOSIGNAL, NULL, 0) = 85
close(15)                               = 0
munmap(0x7f7114948000, 4096)            = 0
close(15)                               = -1 EBADF (Bad file descriptor)

Here’s a basic run down of what you see above: The unicorn process (Rails) opens a socket to our memcache server in staging, and then issues a get command which is broken up into multiple 4096 byte packets. However, instead of receiving a reply from the server, our client socket gets a Broken Pipe error and it’s closed.

“OK…” I thought. I need to reference the memcache protocol specification and make sure this request conforms to the “get” protocol command. Was this request within in the protocol description? Yes.

At this point I knew the client was sending a legitimate request, so I needed to debug the server. Since our application uses multiple instances, I downed all but one instance, and fired up strace (strace -s 64000 -fp 7133).

[pid  7136] epoll_wait(19, {{EPOLLIN, {u32=32, u64=32}}}, 32, 4294967295) = 1
[pid  7136] rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
[pid  7136] read(32, "get enter_messages/343555238/hide_author=false&hide_date=false&last_read=343841341 timestamp_messages/343575826/hide_author=false&hide_date=false&last_read=343841341 enter_messages/343575827/hide_author=false&hide_date=true&last_read=343841341 timestamp_messages/343582929/hide_author=false&hide_date=true&last_read=343841341 enter_messages/343582930/hide_author=false&hide_date=true&last_read=343841341 timestamp_messages/343589192/hide_author=false&hide_date=true&last_read=343841341 kick_messages/343589193/hide_author=false&hide_date=true&last_read=343841341 timestamp_messages/343592436/hide_author=false&hide_date=true&last_read=343841341 enter_messages/343592437/hide_author=false&hide_date=true&last_read=343841341 timestamp_messages/343606646/hide_author=false&hide_date=true&last_read=343841341 enter_messages/343606647/hide_author=false&hide_date=true&last_read=343841341 timestamp_messages/343654455/hide_author=false&hide_date=true&last_read=343841341 upload_messages/343654456/hide_author=false&hide_date=true&last_read=343841341 timestamp_messages/343663288/hide_author=false&hide_date=true&last_read=343841341 text_messages/343663289/hide_author=false&hide_date=true&last_read=343841341 text_messages/343663294/hide_author=true&hide_date=true&last_read=343841341 text_messages/343663298/hide_author=true&hide_date=true&last_read=343841341 text_messages/343663301/hide_author=true&hide_date=true&last_read=343841341 text_messages/343663459/hide_author=false&hide_date=true&last_read=343841341 text_messages/343663763/hide_author=true&hide_date=true&last_read=343841341 text_messages/343663833/hide_author=true&hide_date=true&last_read=343841341 timestamp_messages/343664243/hide_author=false&hide_date=true&last_read=343841341 text_messages/343664244/hide_author=false&hide_date=true&last_read=343841341 text_messages/343664249/hide_author=false&hide_date=true&last_read=343841341 text_messages/343664928/hide_author=true&hide_date=true&last_read=343841341 text_messages/343665538/hide_author=false&hide_date=true&last_read=343841341", 2048) = 2048
[pid  7136] read(32, " text_messages/343665582/hide_author=true&hide_date=true&last_read=343841341 text_messages/343665661/hide_author=false&hide_date=true&last_read=343841341 text_messages/343665730/hide_author=true&hide_date=true&last_read=343841341 timestamp_messages/343668494/hide_author=false&hide_date=true&last_read=343841341 text_messages/343668495/hide_author=false&hide_date=true&last_read=343841341 timestamp_messages/343670646/hide_author=false&hide_date=true&last_read=343841341 kick_messages/343670647/hide_author=false&hide_date=true&last_read=343841341 timestamp_messages/343674727/hide_author=false&hide_date=true&last_read=343841341 upload_messages/343674728/hide_author=false&hide_date=true&last_read=343841341 text_messages/343675113/hide_author=true&hide_date=true&last_read=343841341 text_messages/343675330/hide_author=true&hide_date=true&last_read=343841341 text_messages/343675512/hide_author=true&hide_date=true&last_read=343841341 text_messages/343675662/hide_author=false&hide_date=true&last_read=343841341 text_messages/343675783/hide_author=true&hide_date=true&last_read=343841341 text_messages/343675850/hide_author=false&hide_date=true&last_read=343841341 text_messages/343675926/hide_author=false&hide_date=true&last_read=343841341 text_messages/343675962/hide_author=false&hide_date=true&last_read=343841341 text_messages/343676003/hide_author=true&hide_date=true&last_read=343841341 text_messages/343676419/hide_author=false&hide_date=true&last_read=343841341 text_messages/343676627/hide_author=true&hide_date=true&last_read=343841341 timestamp_messages/343683775/hide_author=false&hide_date=true&last_read=343841341 enter_messages/343683778/hide_author=false&hide_date=true&last_read=343841341 timestamp_messages/343697190/hide_author=false&hide_date=true&last_read=343841341 kick_messages/343697191/hide_author=false&hide_date=true&last_read=343841341 text_messages/343698503/hide_author=false&hide_date=true&last_read=343841341 text_messages/343698601/hide_author=true&hide_date=true&last_read=343841341 text_messages/343698676/hi", 2048) = 2048
[pid  7136] read(32, "de_author=false&hide_date=true&last_read=343841341 text_messages/343698734/hide_author=false&hide_date=true&last_read=343841341 upload_messages/343699275/hide_author=true&hide_date=true&last_read=343841341 text_messages/343699336/hide_author=true&hide_date=true&last_read=343841341 text_messages/343699454/hide_author=true&hide_date=true&last_read=343841341 text_messages/343699578/hide_author=false&hide_date=true&last_read=343841341 text_messages/343699907/hide_author=false&hide_date=true&last_read=343841341 text_messages/343700196/hide_author=true&hide_date=true&last_read=343841341 text_messages/343700303/hide_author=false&hide_date=true&last_read=343841341 timestamp_messages/343700490/hide_author=false&hide_date=true&last_read=343841341 text_messages/343700491/hide_author=false&hide_date=true&last_read=343841341 text_messages/343700960/hide_author=true&hide_date=true&last_read=343841341 text_messages/343701033/hide_author=true&hide_date=true&last_read=343841341 text_messages/343701249/hide_author=true&hide_date=true&last_read=343841341 text_messages/343701495/hide_author=true&hide_date=true&last_read=343841341 paste_messages/343701499/hide_author=true&hide_date=true&last_read=343841341 text_messages/343701536/hide_author=true&hide_date=true&last_read=343841341 text_messages/343701811/hide_author=true&hide_date=true&last_read=343841341 timestamp_messages/343705859/hide_author=false&hide_date=true&last_read=343841341 upload_messages/343705860/hide_author=false&hide_date=true&last_read=343841341 upload_messages/343705927/hide_author=true&hide_date=true&last_read=343841341 timestamp_messages/343707289/hide_author=false&hide_date=true&last_read=343841341 text_messages/343707290/hide_author=false&hide_date=true&last_read=343841341 text_messages/343707541/hide_author=false&hide_date=true&last_read=343841341 text_messages/343707657/hide_author=false&hide_date=true&last_read=343841341 text_messages/343708101/hide_author=false&hide_date=true&last_read=343841341 timestamp_messages/343711297/hide_author=false&hide_date=true&last_read=343841341 upload_messages/343711298/hide_author=false&hide_date=true&last_read=343841341 upload_messages/343711359/hide_author=true&hide_date=true&last_read=343841341 timestamp_messages/343714551/hide_author=false&hide_date=true&last_read=343841341 text_messages/343714552/hide_author=false&hide_date=true&last_read=343841341 text_messages/343714779/hide_author=false&hide_date=true&last_read=343841341 text_messages/343715190/hide_author=true&hide_date=true&last_read=343841341 text_messages/343715303/hide_author=true&hide_date=true&last_read=343841341 text_messages/343715380/hide_author=true&hide_date=true&last_read=343841341 text_messages/343715452/hide_author=true&hide_date=true&last_read=343841341 text_messages/343715558/hide_author=true&hide_date=true&last_read=343841341 text_messages/343716023/hide_author=false&hide_date=true&last_read=343841341 text_messages/343716257/hide_author=false&hide_date=true&last_read=343841341 text_messages/343716298/hide_author=false&hide_date=true&last_read=343841341 text_messages/343716338/hide_author=true&hide_date=true&last_read=343841341 text_messages/343716577/hide_author=false&hide_date=true&last_read=343841341 text_messages/343716745/hide_author=false&hide_date=true&last_read=343841341 text_messages/343716816/hide_author=true&hide_date=true&last_read=343841341 text_messages/343716900/hide_author=false&hide_date=true&last_read=343841341 text_messages/343716999/hide_author=false&hide_date=true&last_read=343841341 timestamp_messages/343717255/hide_author=false&hide_date=true&last_read=343841341 text_messages/343717256/hide_author=false&hide_date=true&last_read=343841341 text_messages/343717265/hide_author=false&hide_date=true&last_read=343841341 text_messages/343717555/hide_author=true&hide_date=true&last_read=343841341 text_messages/343717960/hide_author=true&hide_date=true&last_read=343841341 text_messages/343718088/hide_author=false&hide_date=true&last_read=343841341 text_messages/343718097/hide_author=false&hide_date=true&last_read=343841341 text_messages/34371819", 4096) = 4096
[pid  7136] rt_sigprocmask(SIG_BLOCK, [RT_27 RT_28], [], 8) = 0
[pid  7136] read(32, "2/hide_author=false&hide_date=true&last_read=343841341 text_messages/343718248/hide_author=false&hide_date=true&last_read=343841341 text_messages/343718670/hide_author=false&hide_date=true&last_read=343841341 text_messages/343718779/hide_author=true&hide_date=true&last_read=343841341 text_messages/343718955/hide_author=false&hide_date=true&last_read=343841341 timestamp_messages/343725746/hide_author=false&hide_date=true&last_read=343841341 text_messages/343725747/hide_author=false&hide_date=true&last_read=343841341 text_messages/343725757/hide_author=true&hide_date=true&last_read=343841341 text_messages/343725762/hide_author=true&hide_date=true&last_read=343841341 timestamp_messages/343729823/hide_author=false&hide_date=true&last_read=343841341 text_messages/343729824/hide_author=false&hide_date=true&last_read=343841341 timestamp_messages/343758537/hide_author=false&hide_date=true&last_read=343841341 enter_messages/343758538/hide_author=false&hide_date=true&last_read=343841341 timestamp_messages/343764428/hide_author=false&hide_date=true&last_read=343841341 text_messages/343764429/hide_author=false&hide_date=true&last_read=343841341 text_messages/343764474/hide_author=true&hide_date=true&last_read=343841341 text_messages/343764528/hide_author=true&hide_date=true&last_read=343841341 text_messages/343764620/hide_author=true&hide_date=true&last_read=343841341 timestamp_messages/343773529/hide_author=false&hide_date=true&last_read=343841341 text_messages/343773530/hide_author=false&hide_date=true&last_read=343841341 timestamp_messages/343783468/hide_author=false&hide_date=true&last_read=343841341 text_messages/343783469/hide_author=false&hide_date=true&last_read=343841341 upload_messages/343783615/hide_author=true&hide_date=true&last_read=343841341 text_messages/343783650/hide_author=true&hide_date=true&last_read=343841341 text_messages/343783875/hide_author=true&hide_date=true&last_read=343841341 text_messages/343784004/hide_author=true&hide_date=true&last_read=343841341 text_messages/343784277/hide_author=false&hide_date=true&last_read=343841341 text_messages/343784372/hide_author=false&hide_date=true&last_read=343841341 timestamp_messages/343784578/hide_author=false&hide_date=true&last_read=343841341 text_messages/343784579/hide_author=false&hide_date=true&last_read=343841341 timestamp_messages/343802940/hide_author=false&hide_date=true&last_read=343841341 text_messages/343802941/hide_author=false&hide_date=true&last_read=343841341 paste_messages/343802954/hide_author=true&hide_date=true&last_read=343841341 text_messages/343803300/hide_author=true&hide_date=true&last_read=343841341 timestamp_messages/343812622/hide_author=false&hide_date=true&last_read=343841341 text_messages/343812623/hide_author=false&hide_date=true&last_read=343841341 text_messages/343813469/hide_author=false&hide_date=true&last_read=343841341 paste_messages/343813500/hide_author=true&hide_date=true&last_read=343841341 text_messages/343813695/hide_author=true&hide_date=true&last_read=343841341 text_messages/343813765/hide_author=false&hide_date=true&last_read=343841341 text_messages/343814066/hide_author=true&hide_date=true&last_read=343841341 enter_messages/343814107/hide_author=false&hide_date=true&last_read=343841341 upload_messages/343814360/hide_author=false&hide_date=true&last_read=343841341 text_messages/343814556/hide_author=false&hide_date=true&last_read=343841341 timestamp_messages/343814680/hide_author=false&hide_date=true&last_read=343841341 text_messages/343814681/hide_author=false&hide_date=true&last_read=343841341 text_messages/343815559/hide_author=false&hide_date=true&last_read=343841341 text_messages/343815889/hide_author=true&hide_date=true&last_read=343841341 paste_messages/343815893/hide_author=false&hide_date=true&last_read=343841341 text_messages/343815915/hide_author=true&hide_date=true&last_read=343841341 text_messages/343815953/hide_author=true&hide_date=true&last_read=343841341 text_messages/343816359/hide_author=false&hide_date=true&last_read=343841341 text_messages/343816657/hide_author=true&hide_date=true&last_read=343841341 paste_messages/343816830/hide_author=false&hide_date=true&last_read=343841341 text_messages/343817036/hide_author=true&hide_date=true&last_read=343841341 paste_messages/343817042/hide_author=true&hide_date=true&last_read=343841341 text_messages/343817189/hide_author=false&hide_date=true&last_read=343841341 text_messages/343817229/hide_author=true&hide_date=true&last_read=343841341 text_messages/343817955/hide_author=false&hide_date=true&last_read=343841341 timestamp_messages/343818321/hide_author=false&hide_date=true&last_read=343841341 text_messages/343818322/hide_author=false&hide_date=true&last_read=343841341 text_messages/343819079/hide_author=false&hide_date=true&last_read=343841341 timestamp_messages/343825796/hide_author=false&hide_date=true&last_read=343841341 text_messages/343825797/hide_author=false&hide_date=true&last_read=343841341 text_messages/343825802/hide_author=true&hide_date=true&last_read=343841341 text_messages/343825809/hide_author=true&hide_date=true&last_read=343841341 text_messages/343826383/hide_author=false&hide_date=true&last_read=343841341 paste_messages/343826403/hide_author=true&hide_date=true&last_read=343841341 text_messages/343826708/hide_author=true&hide_date=true&last_read=343841341 paste_messages/343826722/hide_author=true&hide_date=true&last_read=343841341 text_messages/343827244/hide_author=true&hide_date=true&last_read=343841341 text_messages/343827298/hide_author=true&hide_date=true&last_read=343841341 text_messages/343827299/hide_author=false&hide_date=true&last_read=343841341 timestamp_messages/343830049/hide_author=false&hide_date=true&last_read=343841341 text_messages/343830050/hide_author=false&hide_date=true&last_read=343841341 text_messages/343830187/hide_author=false&hide_date=true&last_read=343841341 timestamp_messages/343831000/hide_author=false&hide_date=true&last_read=343841341 text_messages/343831001/hide_author=false&hide_date=true&last_read=343841341 text_messages/343831123/hide_author=true&hide_date=true&last_read=343841341 text_messages/343831211/hide_author=true&hide_date=true&last_read=343841341 text_messages/343831487/hide_author=false&hide_date=true&last_read=343841341 paste_messages/343831492/hide_author=true&hide_date=true&last_read=343841341 text_messages/343831675/hide_author=false&hide_date=true&last_read=343841341 text_messages/343831725/hide_author=true&hide_date=true&last_read=343841341 text_messages/343831788/hide_author=true&hide_date=true&last_read=343841341 text_messages/343831979/hide_author=false&hide_date=true&last_read=343841341 text_messages/343833552/hide_author=false&hide_date=true&last_read=343841341 timestamp_messages/343834848/hide_author=false&hide_date=true&last_read=343841341 paste_messages/343834849/hide_author=false&hide_date=true&last_read=343841341 text_messages/343834896/hide_author=true&hide_date=true&last_read=343841341 text_messages/343835031/hide_author=false&hide_date=true&last_read=343841341 text_messages/343836063/hide_author=true&hide_date=true&last_read=343841341 timestamp_messages/343840446/hide_author=false&hide_date=true&last_read=343841341 kick_messages/343840447/hide_author=false&hide_date=true&last_read=343841341 text_messages/343840829/hide_author=false&hide_date=true&last_read=343841341 text_messages/343840885/hide_author=false&hide_date=true&last_read=343841341 text_messages/343841050/hide_author=true&hide_date=true&last_read=343841341 paste_messages/343841076/hide_author=true&hide_date=true&last_read=343841341 text_messages/343841236/hide_author=false&hide_date=true&last_read=343841341 text_messages/343841341/hide_author=true&hide_date=true&last_read=343841341 text_messages/343841351/hide_author=false&hide_date=true&last_read=343841341 text_messages/343841428/hide_author=true&hide_date=true&last_read=343841341 text_messages/343841582/hide_author=true&hide_date=true&last_read=343841341 text_messages/343841603/hide_author=false&hide_date=true&last_read=343841341 text_messages/343841692/hide_author=true&hide_date=true&last_read=343841341 text_messages/343841855/hide_author=true&hide_date=true&last_read=343841341", 8192) = 8192
[pid  7136] rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
[pid  7136] rt_sigprocmask(SIG_BLOCK, [RT_27 RT_28], [], 8) = 0
[pid  7136] read(32, " text_messages/343841924/hide_author=false&hide_date=true&last_read=343841341 text_messages/343841969/hide_author=false&hide_date=true&last_read=343841341 text_messages/343842042/hide_author=false&hide_date=true&last_read=343841341 text_messages/343842523/hide_author=true&hide_date=true&last_read=343841341 text_messages/343842910/hide_author=false&hide_date=true&last_read=343841341 text_messages/343843039/hide_author=true&hide_date=true&last_read=343841341 text_messages/343843171/hide_author=false&hide_date=true&last_read=343841341 text_messages/343843284/hide_author=false&hide_date=true&last_read=343841341 timestamp_messages/343843427/hide_author=false&hide_date=true&last_read=343841341 text_messages/343843428/hide_author=false&hide_date=true&last_read=343841341 timestamp_messages/343843476/hide_author=false&hide_date=true&last_read=343841341 text_messages/343843477/hide_author=false&hide_date=true&last_read=343841341 text_messages/343843547/hide_author=false&hide_date=true&last_read=343841341 text_messages/343843686/hide_author=false&hide_date=true&last_read=343841341 paste_messages/343843878/hide_author=true&hide_date=true&last_read=343841341 text_messages/343843929/hide_author=false&hide_date=true&last_read=343841341 text_messages/343844070/hide_author=false&hide_date=true&last_read=343841341 text_messages/343844547/hide_author=false&hide_date=true&last_read=343841341 text_messages/343844842/hide_author=true&hide_date=true&last_read=343841341 text_messages/343844983/hide_author=true&hide_date=true&last_read=343841341 timestamp_messages/343847757/hide_author=false&hide_date=true&last_read=343841341 text_messages/343847758/hide_author=false&hide_date=true&last_read=343841341 text_messages/343848887/hide_author=true&hide_date=true&last_read=343841341 text_messages/343849266/hide_author=false&hide_date=true&last_read=343841341 text_messages/343849330/hide_author=false&hide_date=true&last_read=343841341 timestamp_messages/343850692/hide_author=false&hide_date=true&last_read=343841341 text_messages/343850693/hide_author=false&hide_date=true&last_read=343841341 text_messages/343850994/hide_author=false&hide_date=true&last_read=343841341 paste_messages/343851014/hide_author=true&hide_date=true&last_read=343841341 text_messages/343851040/hide_author=false&hide_date=true&last_read=343841341 text_messages/343851100/hide_author=true&hide_date=true&last_read=343841341 text_messages/343851171/hide_author=false&hide_date=true&last_read=343841341 text_messages/343852048/hide_author=false&hide_date=true&last_read=343841341 paste_messages/343852052/hide_author=true&hide_date=true&last_read=343841341 text_messages/343852107/hide_author=true&hide_date=true&last_read=343841341 text_messages/343852129/hide_author=true&hide_date=true&last_read=343841341 text_messages/343852168/hide_author=true&hide_date=true&last_read=343841341 text_messages/343852282/hide_author=false&hide_date=true&last_read=343841341 text_messages/343852487/hide_author=false&hide_date=true&last_read=343841341 text_messages/343852631/hide_author=true&hide_date=true&last_read=343841341 text_messages/343852765/hide_author=false&hide_date=true&last_read=343841341 timestamp_messages/343855843/hide_author=false&hide_date=true&last_read=343841341 text_messages/343855844/hide_author=false&hide_date=true&last_read=343841341 timestamp_messages/343856603/hide_author=false&hide_date=true&last_read=343841341 text_messages/343856604/hide_author=false&hide_date=true&last_read=343841341 text_messages/343856802/hide_author=false&hide_date=true&last_read=343841341 text_messages/343856970/hide_author=false&hide_date=true&last_read=343841341 text_messages/343857838/hide_author=false&hide_date=true&last_read=343841341 text_messages/343858343/hide_author=false&hide_date=true&last_read=343841341 text_messages/343858380/hide_author=false&hide_date=true&last_read=343841341 text_messages/343858474/hide_author=true&hide_date=true&last_read=343841341 text_messages/343858544/hide_author=true&hide_date=true&last_read=343841341 text_messages/343858596/hide_author=true&hide_date=true&last_read=343841341 timestamp_messages/343863819/hide_author=false&hide_date=true&last_read=343841341 text_messages/343863820/hide_author=false&hide_date=true&last_read=343841341 paste_messages/343863844/hide_author=true&hide_date=true&last_read=343841341 text_messages/343864027/hide_author=true&hide_date=true&last_read=343841341 text_messages/343864037/hide_author=false&hide_date=true&last_read=343841341 text_messages/343864071/hide_author=false&hide_date=true&last_read=343841341 text_messages/343864588/hide_author=false&hide_date=true&last_read=343841341 text_messages/343864630/hide_author=true&hide_date=true&last_read=343841341 paste_messages/343864755/hide_author=false&hide_date=true&last_read=343841341 text_messages/343865034/hide_author=true&hide_date=true&last_read=343841341 timestamp_messages/343866397/hide_author=false&hide_date=true&last_read=343841341 text_messages/343866398/hide_author=false&hide_date=true&last_read=343841341 text_messages/343866591/hide_author=true&hide_date=true&last_read=343841341 paste_messages/343866596/hide_author=true&hide_date=true&last_read=343841341 text_messages/343867002/hide_author=true&hide_date=true&last_read=343841341 text_messages/343867646/hide_author=false&hide_date=true&last_read=343841341 text_messages/343868189/hide_author=false&hide_date=true&last_read=343841341 text_messages/343868443/hide_author=false&hide_date=true&last_read=343841341 text_messages/343868603/hide_author=false&hide_date=true&last_read=343841341 text_messages/343868688/hide_author=true&hide_date=true&last_read=343841341 timestamp_messages/343868946/hide_author=false&hide_date=true&last_read=343841341 paste_messages/343868947/hide_author=false&hide_date=true&last_read=343841341 paste_messages/343869494/hide_author=false&hide_date=true&last_read=343841341 text_messages/343869517/hide_author=true&hide_date=true&last_read=343841341 timestamp_messages/343875073/hide_author=false&hide_date=true&last_read=343841341 text_messages/343875074/hide_author=false&hide_date=true&last_read=343841341 paste_messages/343875398/hide_author=true&hide_date=true&last_read=343841341 text_messages/343875689/hide_author=false&hide_date=true&last_read=343841341 timestamp_messages/343877689/hide_author=false&hide_date=true&last_read=343841341 text_messages/343877690/hide_author=false&hide_date=true&last_read=343841341 text_messages/343878129/hide_author=false&hide_date=true&last_read=343841341 text_messages/343878477/hide_author=false&hide_date=true&last_read=343841341 text_messages/343878485/hide_author=true&hide_date=true&last_read=343841341 text_messages/343878491/hide_author=true&hide_date=true&last_read=343841341 text_messages/343878499/hide_author=false&hide_date=true&last_read=343841341 text_messages/343878588/hide_author=true&hide_date=true&last_read=343841341 timestamp_messages/343880810/hide_author=false&hide_date=true&last_read=343841341 text_messages/343880811/hide_author=false&hide_date=true&last_read=343841341 paste_messages/343880830/hide_author=true&hide_date=true&last_read=343841341 text_messages/343880919/hide_author=false&hide_date=true&last_read=343841341 text_messages/343880994/hide_author=false&hide_date=true&last_read=343841341 timestamp_messages/343885259/hide_author=false&hide_date=true&last_read=343841341 text_messages/343885260/hide_author=false&hide_date=true&last_read=343841341 text_messages/343885374/hide_author=true&hide_date=true&last_read=343841341 text_messages/343885436/hide_author=true&hide_date=true&last_read=343841341 text_messages/343885452/hide_author=true&hide_date=true&last_read=343841341 enter_messages/343885679/hide_author=false&hide_date=true&last_read=343841341 text_messages/343885711/hide_author=false&hide_date=true&last_read=343841341 text_messages/343885723/hide_author=true&hide_date=true&last_read=343841341 text_messages/343885744/hide_author=true&hide_date=true&last_read=343841341 text_messages/343885878/hide_author=true&hide_date=true&last_read=343841341 text_messages/343886065/hide_author=true&hide_date=true&last_read=343841341 timestamp_messages/343886343/hide_author=false&hide_date=true&last_read=343841341 text_messages/343886344/hide_author=false&hide_date=true&last_read=343841341 text_messages/343888444/hide_author=false&hide_date=true&last_read=343841341 timestamp_messages/343890769/hide_author=false&hide_date=true&last_read=343841341 text_messages/343890770/hide_author=false&hide_date=true&last_read=343841341 timestamp_messages/343894891/hide_author=false&hide_date=true&last_read=343841341 text_messages/343894892/hide_author=false&hide_date=true&last_read=343841341 text_messages/343895056/hide_author=false&hide_date=true&last_read=343841341 text_messages/343895130/hide_author=true&hide_date=true&last_read=343841341 text_messages/343895152/hide_author=true&hide_date=true&last_read=343841341 text_messages/343895269/hide_author=true&hide_date=true&last_read=343841341 text_messages/343895393/hide_author=true&hide_date=true&last_read=343841341 text_messages/343895542/hide_author=true&hide_date=true&last_read=343841341 timestamp_messages/343902958/hide_author=false&hide_date=true&last_read=343841341 text_messages/343902959/hide_author=false&hide_date=true&last_read=343841341 text_messages/343903053/hide_author=true&hide_date=true&last_read=343841341 text_messages/343903267/hide_author=false&hide_date=true&last_read=343841341 text_messages/343903512/hide_author=false&hide_date=true&last_read=343841341 text_messages/343903628/hide_author=false&hide_date=true&last_read=343841341 text_messages/343903725/hide_author=true&hide_date=true&last_read=343841341 text_messages/343903864/hide_author=true&hide_date=true&last_read=343841341 text_messages/343903882/hide_author=false&hide_date=true&last_read=343841341 text_messages/343903968/hide_author=false&hide_date=true&last_read=343841341 text_messages/343904055/hide_author=false&hide_date=true&last_read=343841341 text_messages/343904090/hide_author=true&hide_date=true&last_read=343841341 text_messages/343904553/hide_author=false&hide_date=true&last_read=343841341 text_messages/343904624/hide_author=true&hide_date=true&last_read=343841341 text_messages/343904678/hide_author=true&hide_date=true&last_read=343841341 timestamp_messages/343908400/hide_author=false&hide_date=true&last_read=343841341 text_messages/343908401/hide_author=false&hide_date=true&last_read=343841341 text_messages/343908458/hide_author=false&hide_date=true&last_read=343841341 text_messages/343908510/hide_author=false&hide_date=true&last_read=343841341 text_messages/343908566/hide_author=true&hide_date=true&last_read=343841341 text_messages/343908629/hide_author=true&hide_date=true&last_read=343841341 timestamp_messages/343910155/hide_author=false&hide_date=true&last_read=343841341 text_messages/343910156/hide_author=false&hide_date=true&last_read=343841341 text_messages/343910180/hide_author=true&hide_date=true&last_read=343841341 text_messages/343910261/hide_author=true&hide_date=true&last_read=343841341 text_messages/343910355/hide_author=true&hide_date=true&last_read=343841341 text_messages/343910536/hide_author=true&hide_date=true&last_read=343841341 text_messages/343910565/hide_author=true&hide_date=true&last_read=343841341 timestamp_messages/343912945/hide_author=false&hide_date=true&last_read=343841341 text_messages/343912946/hide_author=false&hide_date=true&last_read=343841341 text_messages/343912955/hide_author=true&hide_date=true&last_read=343841341 timestamp_messages/343917638/hide_author=false&hide_date=true&last_read=343841341 kick_messages/343917639/hide_author=false&hide_date=true&last_read=343841341 text_messages/343919040/hide_author=false&hide_date=true&last_read=343841341 timestamp_messages/343928070/hide_author=false&hide_date=true&last_read=343841341 text_messages/343928071/hide_author=false&hide_date=true&last_read=343841341 timestamp_messages/343944056/hide_author=false&hide_date=true&last_read=343841341 text_messages/343944057/hide_author=false&hide_date=true&last_read=343841341 timestamp_messages/343946376/hide_author=false&hide_date=true&last_read=343841341 text_messages/343946377/hide_author=false&hide_date=true&last_read=343841341 text_messages/343946786/hide_author=true&hide_date=true&last_read=343841341 text_messages/343947443/hide_author=true&hide_date=true&last_read=343841341 text_messages/343948263/hide_author=true&hide_date=true&last_read=343841341 timestamp_messages/343951411/hide_author=false&hide_date=true&last_read=343841341 kick_messages/343951412/hide_author=false&hide_date=true&last_read=343841341 timestamp_messages/343958275/hide_author=false&hide_date=true&last_read=343841341 text_messages/343958276/hide_author=false&hide_date=true&last_read=343841341 paste_messages/343958310/hide_author=true&hide_date=true&last_read=343841341 text_messages/343959039/hide_author=false&hide_date=true&last_read=343841341 text_messages/343959484/hide_author=true&hide_date=true&last_read=343841341 timestamp_messages/343961703/hide_author=false&hide_date=true&last_read=343841341 text_messages/343961704/hide_author=false&hide_date=true&last_read=343841341 timestamp_messages/343972558/hide_author=false&hide_date=true&last_read=343841341 kick_messages/343972559/hide_author=false&hide_date=true&last_read=343841341 timestamp_messages/343976120/hide_author=false&hide_date=true&last_read=343841341 enter_messages/343976121/hide_author=false&hide_date=true&last_read=343841341 timestamp_messages/343982780/hide_author=false&hide_date=true&last_read=343841341 enter_messages/343982781/hide_author=false&hide_date=true&last_read=343841341 text_messages/343982834/hide_author=false&hide_date=true&last_read=343841341 timestamp_messages/343984059/hide_author=false&hide_date=true&last_read=343841341 kick_messages/343984060/hide_author=false&hide_date=true&last_read=343841341 timestamp_messages/343987418/hide_author=false&hide_date=true&last_read=343841341 enter_messages/343987419/hide_author=false&hide_date=true&last_read=343841341 timestamp_messages/343997513/hide_author=false&hide_date=true&last_read=343841341 kick_messages/343997514/hide_author=false&hide_date=true&last_read=343841341 kick_messages/343997666/hide_author=false&hide_date=true&last_read=343841341 timestamp_messages/344008833/hide_author=false&hide_date=true&last_read=343841341 enter_messages/344008834/hide_author=false&hide_date=true&last_read=343841341 timestamp_messages/344009870/hide_author=false&hide_date=true&last_read=343841341 text_messages/344009871/hide_author=false&hide_date=true&last_read=343841341 text_messages/344010043/hide_author=false&hide_date=true&last_read=343841341 upload_messages/344010066/hide_author=true&hide_date=true&last_read=343841341 upload_messages/344010083/hide_author=true&hide_date=true&last_read=343841341 text_messages/344010323/hide_author=true&hide_date=true&last_read=343841341 timestamp_messages/344015438/hide_author=false&hide_date=true&last_read=343841341 kick_messages/344015439/hide_author=false&hide_date=true&last_read=343841341 timestamp_messages/344018437/hide_author=false&hide_date=true&last_read=343841341 kick_messages/344018438/hide_author=false&hide_date=true&last_read=343841341 timestamp_messages/344033626/hide_author=false&hide_date=true&last_read=343841341 enter_messages/344033627/hide_author=false&hide_date=true&last_read=343841341 text_messages/344033767/hide_author=false&hide_date=true&last_read=343841341 text_messages/344033769/hide_author=true&hide_date=true&last_read=343841341 text_messages/344033772/hide_author=true&hide_date=true&last_read=343841341 text_messages/344033774/hide_author=true&hide_date=true&last_read=343841341 timestamp_messages/344041143/hide_author=false&hide_date=true&last_read=343841341 upload_messages/344041144/hide_author=false&hide_date=true&last_read=343841341 text_messages/344041191/hide_author=true&hide_date=true&last_read=343841341 text_messages/344041227/hide_author=true&hide_date=true&last_read=343841341 timestamp_messages/344041414/hide_author=false&hide_date=true&last_read=343841341 text_messages/344041415/hide_author=false&hide_date=true&last_read=343841341 text_messages/344041637/hide_author=true&hide_date=true&last_read=343841341 text_messages/344041654/hide_author=true&hide_date=true&last_read=343841341 timestamp_messages/344052470/hide_author=false&hide_date=true", 16384) = 16384
[pid  7136] rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
[pid  7136] --- SIGRT_27 (Real-time signal 25) @ 0 (0) ---
[pid  7136] getrusage(RUSAGE_SELF, {ru_utime={12, 90000}, ru_stime={13, 600000}, ...}) = 0
[pid  7136] rt_sigreturn(0x4b54187fd1996) = 0
[pid  7136] epoll_ctl(19, EPOLL_CTL_DEL, 32, {EPOLLIN, {u32=32, u64=32}}) = 0
[pid  7136] close(32)                   = 0
[pid  7136] listen(28, 1024)            = 0
[pid  7136] listen(27, 1024)            = 0
[pid  7136] brk(0)                      = 0x1c52000
[pid  7136] brk(0)                      = 0x1c52000
[pid  7136] brk(0)                      = 0x1c52000
[pid  7136] brk(0x1c4b000)              = 0x1c4b000
[pid  7136] brk(0)                      = 0x1c4b000
[pid  7136] brk(0)                      = 0x1c4b000
[pid  7136] brk(0)                      = 0x1c4b000
[pid  7136] brk(0)                      = 0x1c4b000
[pid  7136] brk(0x1c49000)              = 0x1c49000
[pid  7136] brk(0)                      = 0x1c49000
[pid  7136] rt_sigprocmask(SIG_BLOCK, [RT_27 RT_28], [], 8) = 0
[pid  7136] epoll_wait(19,  <unfinished ...>
[pid  7133] <... epoll_wait resumed> {}, 32, 85) = 0
[pid  7133] rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
[pid  7133] --- SIGRT_27 (Real-time signal 25) @ 0 (0) ---
[pid  7133] getrusage(RUSAGE_SELF, {ru_utime={12, 90000}, ru_stime={13, 600000}, ...}) = 0
[pid  7133] timer_settime(0x104, 0, {it_interval={0, 12812000}, it_value={0, 12812000}}, NULL) = 0
[pid  7133] rt_sigreturn(0x4b54187fe0371) = 0
[pid  7133] rt_sigprocmask(SIG_BLOCK, [RT_27 RT_28], [], 8) = 0
[pid  7133] epoll_wait(3, {}, 32, 1)    = 0
[pid  7133] rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
[pid  7133] rt_sigprocmask(SIG_BLOCK, [RT_27 RT_28], [], 8) = 0
[pid  7133] epoll_wait(3, {}, 32, 1000) = 0
[pid  7133] rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
[pid  7133] --- SIGRT_27 (Real-time signal 25) @ 0 (0) ---
[pid  7133] getrusage(RUSAGE_SELF, {ru_utime={12, 90000}, ru_stime={13, 600000}, ...}) = 0
[pid  7133] timer_settime(0x104, 0, {it_interval={0, 12787000}, it_value={0, 12787000}}, NULL) = 0
[pid  7133] rt_sigreturn(0x4b541880d6c8e) = 0

(Strace Tip: I used -f to follow the forked processes, and -s 64000 to make sure I got enough string output to see the full read/write data sent to the socket.)

You can see that the memcache server reads the first 2048 bytes, then reads another 2048 bytes, then reads another 4096 bytes, then reads 8192 bytes, and finally reads another 16384 bytes. So after the first read, it doubles the amount of data it’s reading for each iteration. However, after the fourth read, the epoll call returns 0, and the client’s connection is closed.

Identifying the Root Cause

In this case, I knew Jeremy had worked on this bug before and I thought he would understand the all the data I had gathered. So with Jeremy’s help, I started looking at the memcache code on Github. Within minutes Jeremy pinged me saying he had noticed this code that’s designed to nuke junk connections. Which makes sense, but also doesn’t help us here!

This code path does exactly what I had seen in the strace above: It reads from buffer 4 times, while doubling the amount read each time. After 4 times, it stops reading and returns.

Conclusion and Resolution

Although memcache’s behavior is by design, it’s not compatible with the expected behavior of the protocol. In the issue I opened, I recommended a configuration option to set the amount of data to read, or the number of times to attempt reading more data. I’m not sure if this is a realistic solution though.

Hypothetically we could work around this by increasing the initial buffer size and recompiling memcache so that in 4 iterations the server would receive all of the get command. We could also patch the memcache-client multi_get method to break the multi_get requests into multiple requests with fewer keys. For now we’ve left our patched client in place, with hope that the memcache team will be able to get this resolved soon.

As I said previously, the goal is to identify a root cause. And, with Jeremy’s help, I’ve done just that. It feels great to have a complete understanding of this exception, and where the break down is in our stack.

We now know, we don’t have to be afraid of upgrading memcache with a fear of causing more multi_get problems. We also know that this is a legitimate problem that’s causing a significant performance penalty in our application. (Since large multi_gets fail, that means we aren’t getting the data back from memcache and hitting the database more than we need to.) Finally, for me, it shows that I don’t have to be bothered by debugging with strace. I’m no C code expert, but even with my limited understanding, strace provided enough clues for me to follow this all the way to the source in less than one hour. And since you all have an extra hour to spare, I hope this encourages you to take a look at that problem that’s been lingering in your stack.

Anton Koldaev join 37signals as newest Sysop

Taylor wrote this on / 22 comments

Today, we are announcing Anton Koldaev as the newest member of our operations team! Anton hails from Moscow and recently worked at the first Russian-based cloud hosting company.

In David’s recent post on remote working, there were a number of comments about investing in more junior or less experienced individuals. Anton is 24 and a great example of that and quickly won us over during his 30 day trial as a junior member of our team.

How we hired Anton

Anton saw our job board post via an old colleague’s tweet and responded with a well-written email to us. We reviewed his resume and gave a traditional operations task to complete: Deploy Redmine to EC2 using Chef, and document the process.

Anton not only did an excellent job from the operational side of things, he was the only applicant to use the Redmine instance to document his work. In addition, he was the only applicant to take a page from tally and build a little Campfire integration for code pushes, deployment notifications, etc.

Impressed with Anton’s work on the trial project, each member of the team had a chat with him on Skype. (We also talked to his references who all had great things to say.) As a team we agreed that Anton would be a good contractor, and offered him a 30 day trial.

Anton started with documentation, and worked his way through projects such as upgrading our Chef server, deploying new hardware for Basecamp and rebuilding our staging database servers. Near the end of his trial, Anton was able to get a visa to visit Will in the UK for a week of co-working and in depth learning about our operations.

Welcome Anton!

Behind the Scenes: Internet Connectivity

Taylor wrote this on / 23 comments

Last year, we suffered a number of service outages due to network problems upstream. In the past 9 months we have diligently worked to install service from additional providers and expand both our redundancy and capacity. This week we turned up our third Internet provider, accomplishing our goals of circuit diversity, latency reduction and increased network capacity.

We now have service from Server Central / Nlayer Networks, Internap and Level 3 Communications. Our total network capacity is in excess of 1.5 gigabits per second, while our mean customer facing bandwidth utilization is between 500 megabits and 1 gigabit. In addition, we’ve deployed two Cisco ASR 1001 routers which aggregate our circuits and allow us to announce our /24 netblock (our own IP address space) via each provider.

Keeping Basecamp, Highrise, Backpack, and Campfire available to you at all times is our top priority, and we’re always looking for ways to increase redundancy and service performance. This setup has prevented at least 4 significant upstream network issues from becoming customer impacting… which we can all agree is great!

Looking for two more people to join our operations team

Taylor wrote this on / Discuss

We are looking for two more people to join our operations team. We would prefer individuals interested in both application development and systems engineering. We’ve got 100’s of servers running Ubuntu, 400+ terabytes of Isilon storage, and we’re building out a second site. We use lots of “bare metal” in addition to VMware virtualization and some of the Amazon and Rackspace Cloud services.

Come work with us at 37signals and do the best work of your career.

Want more information? Check out this post on the Job Board.

Eron Nicholson joins 37signals as Sysop

Taylor wrote this on / 7 comments

Yesterday Eron Nicholson joined John, Will, and me on our operations team!

Recently Eron’s work included building and maintaining several multi-data center installations for a green energy startup. In addition to wearing the hardware hat, Eron was responsible for networking, monitoring and systems administration. As if those duties weren’t enough, Eron also developed internal tools and did other software design (including embedded systems).

We are really looking forward to Eron’s work on our monitoring and high availability systems.

Welcome Eron!

PS Hat tip to Nic at Newrelic for introducing us to Eron.

PPS Eron hails from North Carolina and enjoys racing … of a different variety (24 Hours of Lemons).