Keeping your slave warm Apr 12 2010
17 comments Latest by Matthew Savage
In 2009 we ran into some problems when failing over to the Basecamp slave database. Basecamp relies on a keeping large working set of recently-accessed data in its InnoDB buffer cache for speed. Normal MySQL replication only sends writes, not reads, to the slave. How could we ensure the data in the slave’s cache is up to date?
We contracted Percona to build a solution into their Maatkit toolset based on their experiments with SELECT query mirroring. It involves a clever usage of tcpdump to capture and replay SELECT queries from the master to the slave database.
Here’s the resulting command.
/usr/bin/mk-query-digest --statistics --iterations 4 --run-time 15m --type tcpdump
--filter '$event->{arg} && $event->{arg} =~ m/^SELECT/i'
--statistics --execute \"h=db-slave,P=3306,u=slave,p=password,D=production\"
--execute-throttle 70,30,5
The tcpdump utility captures MySQL traffic from the master and feeds the data into the mk-query-digest script. This script filters only the SELECT queries and executes them on the slave database. The throttle argument sets the percentage of time the script should execute queries on the slave, how often to check that value, and a percentage probability that queries will be skipped when the threshold is exceeded.
Here’s some sample statistical output:
# execute_executed 124668
# throttle_checked_rate 29
# throttle_rate_avg 29.84
# throttle_rate_ok 29
According to these values, the script didn’t reach the 70% query execution threshold we set. Our queries are executing on the slave cleanly.
Since we began using this tool we switched production database servers without a performance reduction.
Got a web design project in mind? Find a web designer on Sortfolio. Browse by visual style, portfolio, budget, and geographic location.
Over 1 million people use 37signals' simple web-based software to collaborate on projects, track contacts, and organize their business with an intranet.
17 comments so far
seimen 12 Apr 10
sounds like a ugly hack to me
James Byers 12 Apr 10
I’ve found maatkit to be invaluable. For every awkward, time-consuming, or seemingly impossible MySQL administration task, it seems there’s an mk-something that eases the pain. Thanks for helping support Percona in building maatkit.
Pablo Torres 12 Apr 10
You guys are known for loving the Mac. Do you use it for sysadminning too?
Ben Tucker 12 Apr 10
Out of curiosity, why not have reads always distributed between the two DB servers from the clients and just have writes going to the primary DB? Seems this would solve the problem of keeping both caches warm with the added benefit of distributing the load. We’ve had great success using mysql-proxy for this task.
Anonymous Coward 12 Apr 10
@37signals
I thought you were using Amazon EC2 for your entire deployment environment.
Unfortunately, per the article linked above, I never recall seeing the follow-up article how on well using EC2 went.
Anonymous Coward 12 Apr 10
+1 Ben Tucker
I don’t understand why you want to “replicate” reads across your environment.
Only write’s should matter.
And then like Ben said, simply have multiple read-only servers.
Sounds like your current architecture needs some REWORK
MI 12 Apr 10
Ben, the reason we don’t use mysql-proxy for that is that the last benchmarks I saw (admittedly quite old) showed it adding a very significant amount of latency: http://www.mysqlperformanceblog.com/2009/06/09/mysql-proxy-urgh-performance-and-scalability/
Splitting reads and writes is definitely something we could do, but it’s not particularly valuable if your read load can be handled by a single well tuned server. It also means you need at least three database servers instead of two to maintain N+1 and ensure that you can lose a server without performance impact.
wk 12 Apr 10
Best. Title. Ever.
Alexis 13 Apr 10
I like this idea. However what happens if a select statement is inside and insert or update ?
MI 13 Apr 10
Alexis, if you look closely you’ll notice that the SELECT regular expression pattern is bound to the beginning of the string (ie: ^SELECT) so it will only match queries that begin with SELECT . We also lock down the user who does the slave warming such that it only has SELECT privileges.
markd 13 Apr 10
This title is why most companies would have an approval process for blog posts. I hope you guys can scale and stay classy.
DL 13 Apr 10
Nice work around to your issue. I agree with @seimen but at the same time the solution has a simple elegance to it that is very 37signals.
I love the title.
George Anderson 13 Apr 10
I’m genuinely surprised by these comments: “Best. Title. Ever.” and “I love the title.” Sure, you have the right to title your posts as you see fit, but “Keeping your slave warm” seems downright insensitive. Clever in some respect, maybe, but insensitive nonetheless given the innuendo.
Adam 14 Apr 10
Lighten up, George.
markd 14 Apr 10
Looking at it again, I’m not sure it’s clear that my previous comment was negative. This title is pathetic, and completely not in keeping with 37signals’ character.
Mark French 14 Apr 10
I’m not a database guy, but I’m familiar with the concept of “master” and “slave” when it comes to electronic musical devices, and I found it shocking then too. But I realize that to the database guys, the title is kinda funny. I have a question though: is there any other synonymous terminology in usage for this process that isn’t so… emotionally loaded?
Thanks.
Matthew Savage 17 Apr 10
Wow, are we really so wound up and PC that we cant even use the words ‘master’ and ‘slave’ in a neutral context any more? I suppose if this is the case then we should also ban the name Adolf… right?
Come on people, sure slavery is bad, but lets think before we take things out of context. Perhaps twitter has changed our thought processing to the point where we say it first, regret it later?
Comments are closed