Bigger Aint Always Better
Recently I was troubleshooting some inefficiencies with the jobs systems locking and fetching queries at work. Like a good little boy I, originally, came up with one index which satisfied all the queries that I needed to run against this particular critical table.
(`completed`,`heartbeat`,`priority`,`datacenter`,`worker`) -- datetime,datetime,tinyint,varchar(16),varchar(255)
the query looked like this:
SELECT `id`,GET_LOCK(CONCAT('foo_',`id`),0) as mylock FROM `foo_jobs` WHERE `completed` = '0000-00-00 00:00:00' AND `worker` = '' AND `dirty` = 0 AND IS_FREE_LOCK(CONCAT('foo_',`id`)) = 1 AND `priority` = '0' LIMIT 1
This worked really well as long as the number of jobs in the table remained small.. But there was an event horizon of sorts… where after a certain number of rows (say 10k-25k) the query above took so long to get rows that the system could no longer catch up, but would fall invariably behind… Apparently, the problem was that there was too much data in the key. Because the requirements for these queries had changed slightly (due to other scaling improvements) I was able to simplify and break the indexes into parts which vastly outperform the previous index.
KEY `get_a_job` (`priority`,`workerpid`) -- tinyint, int KEY `janitor` (`completed`,`heartbeat`,`worker`,`dirty`) -- datetime,datetime,varchar(255),tinyint
This works because workerpid is enough to tell us whether a job definition is in process (or was, and has not been cleaned yet.) and makes the following query run against a super small index…
SELECT `id`,GET_LOCK(CONCAT('foo_',`id`),0) as mylock FROM `foo_jobs` WHERE `priority` = '0' AND `workerpid` IS NULL AND IS_FREE_LOCK(CONCAT('foo_,`id`)) = 1 LIMIT 1
If you’re interested in what dark monstrosity could possibly require queries like these (and are a brave PHP developer) I invite you to check out the jobs system
Erlang… Starting to come together…
I’m finally starting to “get” erlang… just a little bit… I’ve managed to make several TCP daemons… an echo server, a reverse echo server, and a server which spits out an md5 values of the input given.
Yea… I know… Lame… but its one hump I’m finally over…
Just because you build it, doesnt mean they will come
Here’s a small bit of advice for all you would-be “cloud storage providers.” Just because you have a buttload of disks doesn’t mean people will be falling over themselves to use your software. If I have to spend *any* of my time worrying about your load, storage, or other internal algorithms (or unnecessary limitations for that matter) then YOU . HAVE . FAILED.
If I have to take the time to shard my data into 4096 different containers because you couldn’t be bothered to think “hey what if a service with a lot of users that create a lot of stuff decides to use us as a store?” Then you’re obviously not in it to win it (so to speak.)
Give us ABSTRACTED storage. Non abstracted storage we can do on our own thank you.
Just what you need to know to write a CouchDB reduce function
Lets say you have the CouchDB classes (located here) all compiled together and included into your test.php script. Lets also say that you have created a database with the built-in web ui called “testing”. Finally let us say that your test.php has the following code in it, which would add a record to the db every time it is run. (i know that the data in the document serves no useful purpose… but really I just want to figure out this map/reduce thing so that I can make awesome views… so this suffices sufficiently.)
require_once dirname( dirname( __FILE__ ) ) . '/includes/couchdb.php'; $couchdb = new CouchDB('testing', 'localhost', 5984); $key = microtime(); $result = $couchdb->send( '/'.md5($key), 'put', json_encode( array( "_id" => md5($key), "time" => $key, 'md5' => md5($key), 'sha1' => sha1($key), 'crc' => crc32($key) ) ) ); print_r($result->getBody(true));
After running the code a bunch of times you would end up with a bunch of documents which look more or less like this:
Now lets say you want to write a view that told you what the first characters of the _id were and how many documents share that first letter. This is analogous to the following in MySQL
SELECT LEFT(md5, 1) AS `lchar`, count(md5) FROM `md5table` GROUP BY `lchar`
Your map function is easy, because you dont have any selection criteria, so we process all rows
function(doc){ emit(doc._id,doc); }
The reduce function is where the actual programming comes in… And it seems there aren’t many well explained examples of exactly how to do this (I just brute forced it by trial and error)
function(key, values, rereduce) { var output = {}; if ( rereduce ) { // key is null, and values are values returned by previous calls // // see http://wiki.apache.org/couchdb/Introduction_to_CouchDB_views // // essentially we are taking the previously reduced view, and the // reduced view for new records, and we are reducing those two things // together. Summarizing two summaries, essentially for ( var i in values ) { // here we have multiple prebuilt output objects and we're simply combining them // just like below we have an array with a numeric id and an output object // // retrieve a summary var vals = values[i]; for ( var key in vals ) { // debugging // log(key); // // store in or increment our new output object if ( output[key] == undefined ) output[key] = vals[key]; else output[key] = output[key] + vals[key]; } } } else { // key is an array, which we dont care about, and values are the // values returned by the map // // see http://wiki.apache.org/couchdb/Introduction_to_CouchDB_views // // we are taking each document and processing that, reducing it down // to a summary object (output) for each of the rows passed for ( var i in values ) { // we have an array, values, with numeric ids and a document objects // // retrieve a document var doc = values[i]; // get what we want from it, the first char of the md5 var key = doc._id.substr(0, 1); // debugging // log( key + " :: " + doc._id ); // // store or increment the output object if ( output[key] !== undefined ) output[key] = output[key] + 1; else output[key] = 1; } } // done return output; }
and in code, using a temporary view, ( if you used this view all the time you would want to make it permanent… but this is about how to lay out a reduce function, nothing more ) so request code that looks like this
$view = array( 'map' => 'function(doc){ emit(doc._id,doc); }', 'reduce' => ' function(key, values, rereduce) { var output = {}; if ( rereduce ) { // key is null, and values are values returned by previous calls for ( var i in values ) { var vals = values[i]; for ( var key in vals ) { // log(key); if ( output[key] == undefined ) output[key] = vals[key]; else output[key] = output[key] + vals[key]; } } } else { // key is an array, which we dont care about, and values are the values returneb by the map for ( var i in values ) { var doc = values[i]; var key = doc._id.substr(0, 1); // log( key + " :: " + doc._id ); if ( output[key] !== undefined ) output[key] = output[key] + 1; else output[key] = 1; } } return output; } ' ); $result = $couchdb->send('/_temp_view', 'POST', json_encode($view) ); print_r($result->getBody(true));
would give you output that looks like this:
stdClass Object ( [rows] => Array ( [0] => stdClass Object ( [key] => [value] => stdClass Object ( [0] => 15 [1] => 17 [2] => 16 [3] => 13 [4] => 27 [5] => 18 [6] => 26 [7] => 15 [8] => 18 [9] => 21 [a] => 12 [b] => 23 [c] => 20 [d] => 27 [e] => 28 [f] => 26 ) ) ) )
I hope this helps somebody out.
random php… a multi-channel chat rooom class using memcached for persistence
why? i dunno… just because… just a toy…
no sql, no flat file, no write permissions required anywhere, no fuss
class mc_chat { var $chan = null; var $mc = null; var $ret = 5; function __construct($memcached, $channel, $retention=5) { $this->mc = $memcached; $this->chan = $channel; $this->ret = $retention; } function messages( $from=0 ) { $max = (int)$this->mc->get("$this->chan:max:posted"); $min = (int)$this->mc->get("$this->chan:min:posted"); $messages = array(); for ( $i=$min; $i< =$max; $i++ ) { if ( $i < $from ) continue; $m = $this->get($i); if ( $m['user'] && $m['message'] ) $messages[$i] = $m; } return $messages; } function get($id) { return array( 'user' =>(string)$this->mc->get("$this->chan:msg:$id:user"), 'message' => (string)$this->mc->get("$this->chan:msg:$id"), ); } function add($user, $message) { $id = (int)$this->mc->increment("$this->chan:max:posted"); if ( !$id ) { $id=1; $this->mc->set("$this->chan:max:posted", 1); } $this->mc->set("$this->chan:msg:$id:user", (string)$user); $this->mc->set("$this->chan:msg:$id", (string)$message); if ( $id >= $this->ret ) { if ( !$this->mc->increment("$this->chan:min:posted") ) $this->mc->set("$this->chan:min:posted", 1); } } } $mc = new Memcache; $mc->connect('localhost', 11211); $keep_messages = 10; $chatter_id = 1; $chat = new mc_chat($mc, 'chat-room-id', $keep_messages); $chat->add($chatter_id, date("r").": $chatter_id : foo"); $chat->messages(37); // messages only above id=37 $chat->messages(); // all the latest messages
Debian Lenny, Avahi, AFP… Linux Fileserver for OSX Clients
If you’re like me you have an OSX computer or 3 at home, and a debian file server. If you’re like me you hate samba/nfs on principle and want your debian server to show up in finder. If you’re like me you arent using debian 3 which is what most of the walkthroughs seem to expect… This is how I did it… With Debian Lenny.
What we’re using, and why:
- Avahi handles zeroconf (making it show up in finder) (most howtos involve howl which is no longer in apt)
- netatalk has afpd
- afpd is the fileserver
From: http://blog.damontimm.com/how-to-install-netatalk-afp-on-ubuntu-with-encrypted-authentication/
- apt-get update
- mkdir -p ~/src/netatalk
- cd ~/src/netatalk
- apt-get install cracklib2-dev libssl-dev
- apt-get source netatalk
- apt-get build-dep netatalk
- cd netatalk-2.0.3
From: http://www.sharedknowhow.com/2008/05/installing-netatalk-under-centos-5-with-leopard-support/
- vim bin/cnid/cnid_index.c ## replace “ret = db->stat(db, &sp, 0);” with “ret = db->stat(db, NULL, &sp, 0);” line 277
- vim etc/cnid_dbd/dbif.c ## replace “ret = db->stat(db, &sp, 0);” with “ret = db->stat(db, NULL, &sp, 0);” line 517
Mine
- ./configure –prefix=/usr/local/netatalk
- make
- make install
- vim /etc/rc.local ## add “/usr/local/netatalk/sbin/afpd”
- /usr/local/netatalk/sbin/afpd
- apt-get install avahi-daemon
- vim /etc/nsswitch.conf ## make the hosts line read “hosts: files dns mdns4″
- cd /etc/avahi/services
- wget http://www.disgruntled-dutch.com/media/afpd.service
- /etc/init.d/avahi-daemon restart
in case that file drops off the face of the net, this is its contents (except “< ?” is “<?” and “< !” is “<!”) :
< ?xml version="1.0" standalone='no'?><!--*-nxml-*--> < !DOCTYPE service-group SYSTEM "avahi-service.dtd"> <service -group> <name replace-wildcards="yes">%h</name> </service><service> <type>_afpovertcp._tcp</type> <port>548</port> </service>
At this point your server should show up under the network in your finder… and you should be able to connect with any system username/pw combo
randomly decided to install opensolaris inside virtualbox
just to try it out. its not done installing…. but… 8 char max for the user login is pretty lame… what is this… 1990?
making munin-graph take advantage of multiple cpus/cores
I do a lot of things for Automattic, and many of the things I do are quite esoteric (for a php developer anyways.) Perl is not my language of choice, but I’ve never balked at a challenge…. just… did it have to be perl? Anyways. We have more than a thousand machines that we track with munin… which means a TON of graphs. munin-update is efficient, taking advantage of all cpus and getting done in the fastest time possible, but munin-graph started taking so long as to be useless (and munin-cgi-graph takes almost a minute to fully render the servers day/week summary page which is completely unacceptable when we’re trying to troubleshoot a sudden, urgent, problem.) So I got to dive in and make it faster…
Step 1: add in this function (which i borrowed from somewhere else)
sub afork (\@$&) { my ($data, $max, $code) = @_; my $c = 0; foreach my $data (@$data) { wait unless ++ $c < = $max; die "Fork failed: $!\n" unless defined (my $pid = fork); exit $code -> ($data) unless $pid; } 1 until -1 == wait; }
Step 2: replace this
for my $service (@$work_array) { process_service ($service); }
with this
afork(@$work_array, 16, \&process_service);
I also have munin-html and munin-graph running side-by-side
( [ -x /usr/local/munin/lib/munin-graph ] && nice /usr/local/munin/lib/munin-graph --cron $@ 2>&1 | fgrep -v "*** attempt to put segment in horiz list twice" )& $waitgraph=$! ( [ -x /usr/local/munin/lib/munin-html ] && nice /usr/local/munin/lib/munin-html $@; )& $waithtml=$! wait $waitgraph wait $waithtml
I did several other, more complicated hacks as well. Such as not generating month and year graphs via cron, letting those render on-demand with munin-cgi-graph
All said we’re doing in under 2.5 minutes what was taking 7 or 8 minutes previously
Google stops development on 6 services
[edit: link]
I already see the Stallmanites rallying for their battle cries. Never using anything you didn’t write yourself is an asinine concept, in my opinion… This coming from someone who can write web services himself. The truth is that using services “in the cloud,” “on the web,” or anywhere else is just like using local software in one very important sense. (I particularly like one comment I heard on this once which went something like: “Would you be able to validate the source code to the ls binary on your own?”)
If your data is not in two completely separate locations, then it’s not safe.
My wife, who’s in need of some extra storage space (shes starting to get into photography some) got two external 120gb hard drives (which were on clearance.) Before I let her use them I sat her down and gave her some important advice: “If you store your data on one of these drives… it is NOT backed up… it’s just stored on that disk. And if something happens to that disk there is NOTHING that I can do to save your photos. Period. I bought you two because every so often you need to copy whats important to the second disk. That was if one disk dies, you don’t loose your stuff”
I’m not sure why people feel that if they’re using apps in the cloud that this doesn’t apply to them. A service shutting down is basically equivalent to loosing a hard disk. Be prepared. Back your data up if it’s that important!
As an aside. All WordPress installations allow you to export your data — even WordPress.com. I suggest people take advantage of that on occasion!
A Pure Memcached Queue
This is pretty clever… Might have to code this up in PHP… memcachequeue-a-pure-memcached-queue
Subscribe to the comments for this post
(click for full size)