Saturday, February 16, 2008

Tell Python to Shove it!

I recently stumbled across the shove python module when reading the excellent PyMOTW posting about the shelve module. The shove module is very similar to shelve, except that it provides multiple storage back ends (including some very interesting ones like S3 and SVN), a caching layer, and optional compression.

The very simplest usage of shove would look something like:

from shove import Shove
data = Shove()

This will give you a simple in-memory shove object that you can use just as you would a plain dict.

To use one of the other storage back ends, you pass a URI when creating the object. Some example are:

# Store persistent objects in files
file_data = Shove('file://path/to/file')

# Store persistent objects in a BSDDB database file
bsdb_data = Shove('bsddb://path/to/file')

# Store persistent objects in SVN
svn_data = Shove('svn://user:password@')

# Store persistent objects in S3
s3_data = Shove('s3://s3_key:s3_secret@bucket')

You can also store data in any DB that SQLAlchemy supports by sending a SQLAlchemy DB URI to Shove (Note that this does require having SQLAlchemy installed).

# Store persistent objects in SQLite using SQLAlchemy
sqlite_data = Shove('sqlite:///relative/path/to/database.txt')

To compress data that you store, pass compress=True as an argument

# Store compressed data in the 'data.db' BSDDB file
bsddb_data = Shove('bsddb://data.db', compress=True)

Coming full circle now, I am finding this to be very useful with Mulib. Shove gives me simple persistent RESTful storage with Mulib. Taking a simple example from a previous post, it looks something like this:

from mulib import mu, stacked
from eventlet import api, httpd
from shove import Shove

# Create methods so that stacked knows how to handle Shove objects
stacked.add_consumer(Shove, stacked.consume_dict)
stacked.add_producer(Shove, stacked.produce_dict_html, 'text/html')
stacked.add_producer(Shove, stacked.produce_anything, '*/*')

root = Shove('bsddb://data.db')

# Initialize data if it is not there
root.setdefault('hello', 'Hello World')
root.setdefault('contacts', {})

api.tcp_listener(('', 5000)),

# Make sure to sync the data before we exit

It also wouldn't be that hard to add a greenlet that would sync the db periodically to help guard against loosing data if the server crashed. The one downside to this is that reading to or from the DB will block the app since it is not an async operation. I still need to figure out a decent way to handle that, but so far it hasn't been noticeable for the simple stuff that I am experimenting with at the moment.

I have not yet had time to explore the caching capabilities of Shove, but it also gives you several ways to cache the data (In memory, Memcached, etc.). It would also be interested in seeing a back end for Amazon's new SimpleDB service and also for git.

If you are looking for simple object persistence in Python, Shove is a very nice solution.


Esaj said...

Wow, good find!

Anonymous said...

If you are looking for simple object persistence in python why not check out zodb also?

Chuck said...

@anonymous: That is a good point (I should also have noted that zodb is also one of the available back ends for Shove), the advantage with shove in this instance is the availability of different back ends with a single interface. It allows me to quickly prototype, and then change later to say S3 storage later if I needed to. It also makes it easy for someone else using a script that I write use a different storage backend if they were to need to.

Asymptote said...

One more good reason to use shove is that it's thread-safe; take a look inside the root "" file of the package and you'll notice a "synchronized" decorator.

Then look around at the .py files in the store subdirectory and you'll notice "@synchronized" sprinkled here and there.

Just keep in mind that this locking mechanism results in single-reader single-writer access. If you're using shove for a simple desktop application, you don't need to care. However, if you want to use it for a web server you may want to explore more powerful databases, such as SQL or BSDDB.

Chuck said...

@asymptote: I had totally forgotten about that, but a great point as well. Shove works great for the same use cases that you would use Shelve (You data fits very nicely in dicts). In my case it works out very well, since mulib.stacked stores your data in a dict.