SQLAlchemy and memcached

I think it is safe to say that my personal goal of learning and using Python this winter has been a huge success! I have now found myself hard at work on a large Python-based project in my spare time and investigating issues that I haven't been able to track down much documentation on. How to use Memcached in conjunction with SQLAlchemy was one of those issues.

Making Sure Everything Is Ready
Nearly all of the trouble I encountered with SQLAlchemy boiled down to how I initialized my objects. For a given mapped object, how I am doing things in my projects now is:

  1. Create the engine
  2. Create the schema metadata
  3. Bind the engine to the MetaData
  4. Declare the orm class to be mapped
  5. Create the schema table to be mapped
  6. Create the orm mapping
  7. Bind and the orm mapping to the schema table
  8. Compile the bound orm mapping

Leaving The Session Behind
One of the first things I attempted to do in my web application with SQLAlchemy was detach a mapped object from the session and store it accross page requests. I wanted to detach the object from the session context to separate class inheritance from the context of the child and avoid the thread-safety dangers of the session.

class DetachedORMObject(object):
    @classmethod
    def fetch_by_field(cls, field, value):
        session = SESSION()
        try:
            class_object = session.query(cls).filter(field == value).one()
        except sqlalchemy.orm.exc.NoResultFound:
            class_object = None
        finally:
            session.close()
        return class_object

The above code works when a common sessionmaker, SESSION, has been defined in the module. It provides the ability to create a whole collection of utility tasks to be used in implementation. For my purposes, it allowed for a common path for fetching orm objects from the database that could be overridden in the inheritance chain.

The issue I ran into with the above code dealt with related tables. The default behavior for SQLAlchemy is for related table attributes to be lazy-loaded. Once the orm object has been detached from session, the related attribute no longer has the ability to fetch the related data from the database. This can be fixed by disabling the lazy loading of relations in mappings that are going to be disconnected.

UserTable.mapper = mapper(User, UserTable, \
    properties = { 'user_status': relation(UserStatus, lazy=False)})

Serial Killer
Now that I had objects being detached from the session successfully, I needed to start serializing them. My goal for serialization is to store the orm objects in memcached. We can avoid the extra complexity of introducing memcached into the mix at this point by using cpickle for testing. This is the same package that python-memcached uses to serialize objects as it interacts with a memcached server.

Initially, I thought pickling worked like a charm. I was able to pickle an object to a string and load it again without encountering errors. Once I tried to interact with the attributes of the object, I started getting AttributeError exceptions. After doing some digging, I discovered that the mapping was broken when we attempted to unpickle the object. The solution was to explicitly compile the mapping.

UserTable.mapper = mapper(User, UserTable, \
    properties = { 'user_status': relation(UserStatus, lazy=False)})
UserTable.mapper.compile()

The compilation step is not included in the tutorials in the SQLAlchemy documentation. It is implicitly invoked when python interacts with the mapped object through the SQLAlchemy API. In fact, if program that is un-pickling loads a seperate object first, an AttributeError exception will not be thrown. By explicitly compiling the mapping, we ensure that pickling can occur successfully before the SQLAlchemy API is called.

Who's Object is it Anyway?
The final issue I encountered was in finding a method to generate a instance key in a generic fashion. While including abstract methods forcing child classes to provide instance keys to the parent class would work, I wanted a more elegant solution. Investigating the SQLAlchemy internals pointed me in the direction of the class manager for the mapped objects. It links the orm objects to the mapper, with in turn links it to the metadata. I can update the fetch_by_field method as follows:

    @classmethod
    def fetch_by_field(cls, field, value):
        """Fetch the requested object from the cache and database"""
        orm_object = None
        matched_primary_key = True
        for key in cls._sa_class_manager.mapper.primary_key:
            if field.key != key.key:
                matched_primary_key = False
        if matched_primary_key:
            orm_object = cls.get_cached_instance('(' + str(value) + ')')
        if orm_object is None:
            orm_object = super(MemcachedORMObject, cls). \
                fetch_by_field(field, value)
            if orm_object is not None:
                orm_object.set_cached_instance()
        return orm_object

The method first compares the field to the primary key collection in the mapper. If it is found, it attempts to fetch the cached value. If not found, it fetches it using the parent fetch_by_field and adds it to the cache. It should be noted that I am converting the field value to a string when creating the instance key. A long integer column will append the letter L to the end of __repr__, but not __str__. Because of this discrepancy, keys may not match between the initial set and subsequent get unless the conversion is explicitly handled.

Putting it All Together
with all of the issues handled, it's time include memcached in the mix. My solution was to a set of classes that mapped objects can inherit to include the necessary caching behavior.

  • DetachedORMObject
    Implements a generic fetch_by_field as well as the necessary db synchronization tasks. All session handling in the dependency chain is done here.

  • MemcachedObject
    Abstract class that has the ability to save and restore itself from memcached.

  • MemcachedORMObject
    Implements both DetachedORMObject and MemcachedObject. Contains the instance key management logic.

Final Notes
I have included a tarball of example scripts that show the full interaction with memcached. I pulled the database classes straight out of my application, and verified that all of the issues were repeatable using SQLite instead of PostgreSQL.

In my own implementation, I included classes to implement an optional second level of caching using resident memory. I also included a separate class hierarchy covering read-only database objects as well as object collections.