Saturday, April 3, 2010

Connecting to Memcached

The Memcached project provides a "distributed memory object caching system", allowing quick storage and lookup functionality for key-value pairs. It is both free and highly useful and a necessary component to scaling most high traffic websites.

Libraries are available in many languages, but not Factor. I thought it would be fun to fix that and document some of the steps.

The first step is to understand what protocols are supported (in this case both text and binary) and over what mediums (either UDP or TCP). An example of the text protocol looks like:

$ telnet localhost 11211 Trying ::1... Connected to localhost. Escape character is '^]'. set foo 0 60 3 bar STORED get foo VALUE foo 0 3 bar quit Connection closed by foreign host.

We will start by implementing the binary protocol (which is more efficient than the text one) over TCP.

First, we will list the needed vocabularies and begin a namespace for the client implementation.

USING: accessors arrays assocs byte-arrays combinators fry
io io.encodings.binary io.sockets kernel make math math.parser
namespaces pack random sequences strings ;

IN: memcached

The io.sockets vocabulary provides some nice words for doing simple networking. One of those words is with-client, which connects to a specified address and then runs a quotation with the input and output streams configured to read and write to the connected socket. We will use this to provide a structure for making Memcached requests:

SYMBOL: memcached-server
"127.0.0.1" 11211 <inet> memcached-server set-global

: with-memcached ( quot -- )
    memcached-server get-global
    binary [ call ] with-client ; inline

Requests are made to the Memcached server by specifying a command type, and occasionally providing other fields:

TUPLE: request cmd key val extra opaque cas ;

: <request> ( cmd -- request )
    "" "" "" random-32 0 \ request boa ;

The header to every request is 24 bytes, and includes a magic byte, the command, the key length, the extra header length, the total body length including the value, an opaque value, and a field used for CAS (compare and swap):

: send-header ( request -- )
    {
        [ cmd>> ]
        [ key>> length ]
        [ extra>> length ]
        [
            [ key>> length ]
            [ extra>> length ]
            [ val>> length ] tri + +
        ]
        [ opaque>> ]
        [ cas>> ]
    } cleave
    '[ HEX: 80 _ _ _ 0 0 _ _ _ ] "CCSCCSIIQ" pack-be write ;

Now that we have that, sending a request is straight-forward:

: (send) ( str -- )
    [ >byte-array write ] unless-empty ;

: send-request ( request -- )
    {
        [ send-header    ]
        [ extra>> (send) ]
        [ key>>   (send) ]
        [ val>>   (send) ]
    } cleave flush ;

The response packet from the server has a similar structure to the request, so we can start by parsing the header:

: read-header ( -- header )
    "CCSCCSIIQ" [ packed-length read ] [ unpack-be ] bi ;

We can check the header for some error conditions:

: check-magic ( header -- )
    first HEX: 81 = [ "bad magic" throw ] unless ;

: check-status ( header -- )
    [ 5 ] dip nth {
        { HEX: 01  [ "key not found" throw     ] }
        { HEX: 02  [ "key exists" throw        ] }
        { HEX: 03  [ "value too large" throw   ] }
        { HEX: 04  [ "invalid arguments" throw ] }
        { HEX: 05  [ "item not stored" throw   ] }
        { HEX: 06  [ "value not numeric" throw ] }
        { HEX: 81  [ "unknown command" throw   ] }
        { HEX: 82  [ "out of memory" throw     ] }
        [ drop ]
    } case ;

We need some words to optionally read the key and value if they are present:

: (read) ( n -- str )
    dup 0 > [ read >string ] [ drop "" ] if ;

: read-key ( header -- key )
    [ 2 ] dip nth (read) ;

: read-val ( header -- val )
    [ [ 6 ] dip nth ] [ [ 2 ] dip nth ] bi - (read) ;

And now we have everything we need to read the response:

: read-response ( -- val key )
    read-header {
        [ check-magic  ]
        [ check-status ]
        [ read-key     ]
        [ read-val     ]
    } cleave swap ;

Believe it or not, but this is all we really need to implement the various Memcached commands, including the two most basic: GET and SET:

: m/get ( key -- val )
    HEX: 00 <request> swap >>key
    send-request read-response drop 4 tail ;

: m/set ( val key -- )
    HEX: 01 <request> swap >>key swap >>val
    { 0 0 } "II" pack-be >>extra ! flags exp
    send-request read-response 2drop ;

Replicating the telnet session from the beginning of the article from the Factor listener:

( scratchpad ) [ "bar" "foo" m/set ] with-memcached
( scratchpad ) [ "foo" m/get ] with-memcached .
"bar"

This could be improved in several ways, for example: handling servers that stop responding, keeping the network connections open over several sets of requests (or use UDP), and allowing access to some of the more advanced parts of the Memcached protocol (including flags and expiration timers).

It is available on my Github, and hopefully will be pulled into the main repository soon.

2 comments:

Unknown said...

I didn't find extra/memcached in your GitHub. Is it gone now?

mrjbq7 said...

The code is in my re-factor project.

I haven't had a chance to work with Slava to merge it into the main repository (although it passes all its tests and works fine). Also, I hope to add UDP support in the future.