You are currently browsing the monthly archive for October 2008.

Antonio Cangiano has just made my life a lot easier by releasing a TextMate bundle warpping the  completely awesome Pygments syntax highlighting library. Since I tend to draft posts in TextMate before cleaning them up in the WordPress, this will immediately enhance the readibility of my posts. I spent a few bucks on Custom CSS extra and now we can insert some readable code:

Before Pygments:


import wsgiref.handlers
import uuid

from google.appengine.ext import webapp


class MainHandler(webapp.RequestHandler):

  def get(self):
    self.response.out.write(uuid.uuid4())


def main():
  application = webapp.WSGIApplication([('/', MainHandler)],
                                       debug=True)
  wsgiref.handlers.CGIHandler().run(application)


if __name__ == '__main__':
  main()

After Pygments:

import wsgiref.handlers
import uuid

from google.appengine.ext import webapp


class MainHandler(webapp.RequestHandler):

  def get(self):
    self.response.out.write(uuid.uuid4())


def main():
  application = webapp.WSGIApplication([('/', MainHandler)],
                                       debug=True)
  wsgiref.handlers.CGIHandler().run(application)


if __name__ == '__main__':
  main()

Awesome. Now I just have to go back and edit all my posts.

In our last article, we introduced CherryPy’s default dispatcher that maps URL components to a tree of objects and their respective methods. One drawback to this approach is that you must instantiate the hierarchy of objects ourselves, or define a spaghetti of nested classes. In this article we’ll take a look at a more sophisticated method of mapping URLs to page handlers based on the Routes project, which in turn is based on routing concepts from Ruby on Rails.

RoutesDispatcher

In contrast to the default dispatcher, where the object structure defines the available routes, using the RoutesDispatcher entails that we predefine the set of available URLs and assign the components of each to specific page handlers in our application. Let’s take a look at an example of the default dispatcher versus the routes dispatcher. Here is our application:

# main.py
import cherrypy
import wsgiref.handlers

class Root:
  @cherrypy.expose
  def index(self):
    cherrypy.InternalRedirect("hello")
class HelloWorld:
  @cherrypy.expose
  def index(self):
    return "Hello world!"
class GoodbyeWorld:
  @cherrypy.expose
  def index(self,num=None):
    return "Goodbye World!"

def main():
  root = Root()
  root.hello = HelloWorld()
  root.goodbye = GoodbyeWorld()
  app = cherrypy.tree.mount(root,"/")
  wsgiref.handlers.CGIHandler().run(app)

if __name__ == '__main__':
  main()

Using the DefaultDispatcher the following picture depicts the translation of URL components to a page handler, with the default root URL “/” highlighted in blue:

Here is the application emulating the same URL-to-handler matching using the RoutesDispatcher:

# main.py
import cherrypy
import wsgiref.handlers

class HelloWorld:
  @cherrypy.expose
  def index(self):
    return "Hello world!"
class GoodbyeWorld:
  @cherrypy.expose
  def index(self,num=None):
    return "Goodbye World!"

def main():
  hw = HelloWorld()
  gw = GoodbyeWorld()
  mapper = cherrypy.dispatch.RoutesDispatcher()
  mapper.connect('home',"/",controller=hw)
  mapper.connect('hello','/hello',controller=hw)
  mapper.connect('goodbye','/goodbye',controller=gw)
  app = cherrypy.tree.mount(None,config={"/":{"request.dispatch": mapper}})
  wsgiref.handlers.CGIHandler().run(app)

if __name__ == '__main__':
  main()

The pictorial representation of this dispatcher would look like so:

There are several things to notice, including that the Root class is no longer needed. Another thing to notice that the HelloWorld.index() page handler is mapped twice, to two different URLs, obviating the need for the redirecting requests using the default dispatcher. A side effect of multiple routes to a single controller, however, is that a client-side redirect is not issued, CherryPy itself just serves the same content from two different locations. This is actually a faster approach, since the client does not have to re-issue their request to get to the right page, but may create some “code debt” that may need to be maintained across versions of your applications.

A bit of improvement is made by use of the RoutesDispatcher, but not so much that it makes it worthwhile to adopt yet. Let’s take a look at some more examples and see if we can convince ourselves that this is indeed a good investment in our development effort. First let’s change our HelloWorld.index() method to take an optional argument for the number of times to print “Hello World!”:

class HelloWorld:
  @cherrypy.expose
  def index(self,num=1):
    msg = ''
    for x in range(0,int(num)):
      msg+= "Hello World!"
    return msg

Next let’s create a route to handle this request:

mapper.connect('hello-2','/hello/:num', controller=hw, num=1)

The colon prefix is used to denote that an URL component is actually a request parameter. Thus :num is provided as a keyword argument to HelloWorld.index() method. You can define default values for URL components within the URL pattern, and these will override any default values within the class methods. For instance, if our previous example had been:

# main.py
class HelloWorld:
  @cherrypy.expose
  def index(self,num=1):
    msg = ''
    for x in range(1,int(num)):
      msg+= "Hello World!"
    return msg
  # ...
def main(self):
  mapper.connect('hello-2','/hello/:num', controller=hw, num=None)
  # ...

A request to http://localhost:8080/hello will result in an error, since num will always equal None in this case. Knowing this, I would not define default values in the method signatures, leaving that for the route definitions to supply. Another word of caution, CherryPy’s RoutesDispatcher provides a wrapper for the Routes library, and reuses some of the same method names, but there are several important differences between the two.

Specifically, the RoutesDispatcher.connect() method must provide a pattern name, an URL, and a controller t o map the request to. This is true even when the URL pattern has the :contoller keyword parameter defined. In contrast, the default URL pattern for Routes which is “/:controller/:action/:id“, would have obviated the need to define a controller or a name for the route. Another important difference is that you must provide an actual object instance for the controller to the connect() method, not just a class name. Perhaps the two are related, but I have not had a chance to look into this closer.

Even so, the easy specification and handling of request parameters within URL patterns is a big enough reason to choose the RoutesDispatcher over the default, but the decision really becomes a no-brainer when you start planning your application around serving resources in a RESTful manner. We’ll take a look at providing REST style URL patterns for CherryPy applications in our next post, Advanced Routing in CherryPy.

My only confirmed regular reader, Sal, notified me of a potential problem with using CherryPy on GAE which stems from the multi-threaded nature of CherryPy and the single threaded ways of GAE. It was first noticed in trying to take advantage of the cherrypy.session facilities. In short, you can use cherrypy.session to store session values in the memcached backend, but I am still trying to determine whether the session information is deleted from the store, even on a cherrypy.session.delete() call. and it seems to work just fine, but I have to confirm a few things regarding the session cookie left behind on the browser.

The original comment is here, for the curious.

If you would still need sessions and just can’t wait for the bugs to be ironed out, here is how you go about using cherrypy.session. In your main.py:


from google.appengine.api import memcache
# patch sys memcache module locations to use GAE memcache
sys.modules['memcache'] = memcache

You’ll need these configuration options defined and applied to the tree.mount() in the main():


cf = {"/":{'tools.sessions.on':  True,
            'tools.sessions.storage_type': "memcached",
            'tools.sessions.servers': ['memcached://'],
            'tools.sessions.name': 'hello_gb_session_id',
            'tools.sessions.clean_thread': True,
            'tools.session.timeout': 10, # ten minute session timeout, not sure if this works
            }}
  app = cherrypy.tree.mount(MyClass(),config=cf)


Then in your methods you can get and set session variables at will like so:


def index(self):
    cherrypy.session['msg'] = "Hello World!"
    return cherrypy.session['msg']

Just a quick post today, covering a few miscellaneous items:

  • Google App Engine just released a hint of a road map for the next two quarters and three very interesting points are discuss:

    1. Big data file uploads and downloads
    2. Bulk uploading and downloading tools for very large databases
    3. Payment services for increased quota needs
  • There is a GAE-specific Webapp framework out there called Google App Engine Oil which seems to be bringing a lot of the goodness from the Ruby on Rails world to GAE
  • My next topic, Using CherryPy’s RoutesDispatcher, is taking a bit of time to write, since there are gobs of examples and the topic is a big one. Sorry for the delay, but I would rather write it right, than write it quick.

Until next time!

In our last post when did a bare bones introduction to CherryPy. Now let’s dig a little deeper into how GAE and CherryPy work together to map URLs to objects and methods, referred to as dispatching.

As discussed, CherryPy’s default behavior is to group together classes to create an Application, where classes and methods form a hierarchical tree of objects. CherryPy does this by virtue of the cherrypy.tree.mount( class, URL) method. Once you have a suitable hierarchy of objects, CherryPy will parse incoming URL requests into the path components. Each path component is then used to make a path through the Application’s root to a page handler method of one of the classes. If CherryPy get to a leaf node (a method) before all of the path components are used up, these are turned into positional arguments for that method. If the url path ends before a page handling method, CherryPy will call the index() method of the class. While this sounds a bit complex, in practice it is fairly straight forward. Let’s set up an example.

HelloWorld, Part Duex

In this example we’ll be creating two inter-related classes, a greeting of Hello World! and reply class Good Day!:


# main.py
import cherrypy
import wsgiref.handlers

class HelloWorld:
  @cherrypy.expose
  def index(self):
    return "Hello world!"
class GoodDay:
  @cherrypy.expose
  def index(self,num=None):
    reply = "Good Day!"
    if num is not None:
      for x in range(1,int(num)):
        reply += "
Good Day!"
    return (reply)

def main():
  root = HelloWorld()
  root.reply = GoodDay()
  app = cherrypy.tree.mount(root,"/")
  wsgiref.handlers.CGIHandler().run(app)

if __name__ == '__main__':
  main()

If you were to make a graph of these objects it would look something like this:

And, assuming you are running GAE locally on port 8080, the following table represents some URL to page handler mappings:

http://localhost:8080/ HelloWord().index()
http://localhost:8080/index HelloWord().index()
http://localhost:8080/reply GoodDay().index()
http://localhost:8080/reply/4 ERROR PAGE
http://localhost:8080/reply/index/4 GoodDay().index(4)
http://localhost:8080/reply/?num=4 GoodDay().index(num=4)

As expected, the root URL, “/”, maps to our first object HelloWorld(). Since no other path components are given, the default index() method is called. Note the second example where we make an explicit call to index().

For the third example, we continue to traverse the object tree, following the reply path to the GoodDay() and its default index() method. The forth example shows that by default, mapping of URL components are quite literal, thus “4″ is expected to be mapped to GoodDay().4(), and not being defined will result in an error. The 5th and 6th examples show the proper way to pass arguments to the GoodDay().index() method, using the positional and keyword styles, respectively.

Notice that we have yet to talk about GAE. In this example, we have pretty much left all routing to CherryPy. Other than copying the CherryPy modules into the directory and editing the main.py, nothing was touched in the default GAE application. If you would like to serve requests out of a none-root URL, then you would need to sync the mount points of app.yaml and cherrypy.tree.mount() like so:


  # in app.yaml
  handlers:
  - url: /hello/.*
    script: main.py

  # in main.py
  app = cherrypy.tree.mount(root,"/hello/")

  # now accessible as http://localhost:8080/hello/

Also note that CherryPy’s default dispatcher does not differentiate between GET and POST requests. The MethodDispatcher class adds this capability on top of the default dispatcher, but if method parsing is of interest, then I suspect that you have more requirements on URL mapping than either of these dispatcher can grant. In our next article, we’ll cover the RoutesDispatcher, which is a very robust method of mapping URLs to page handlers taking a cue from the Ruby on Rails world.

Just a quick note on the one-day cloud-computing focused conference I just attended, Computing Among the Clouds. Of particular interest to this blog was a presentation by Joe Gregorio on GAE.

The talk was basically covering the tutorial to GAE, so I personally didn’t get anything out of it, but I serendipitously sat next to him during the previous talk and had a chance to talk to him during the intermission about a hot-button topic: bulk data loading.

A little context, I recently went looking for a way to set a key_name value using the packaged bulk-loading tools. By default, the bulk load tools turn CSV rows into Datastore entities using a sequntial numerical ID and not a key_name, even when there is a column “key_name” if the type descriptor. I wanted to created entities with varchar keys, without having to create a new entity in a custom handler, in an effort to minimize CPU usage during uploads, which is a known problem. In the package google.appengine.ext.bulkload there are a pair of methods that set a key_name if defined in the uploaded data, but these are tied to a cryptic mention of a “version-1″ format.

I asked Joe whether he knew what these methods were about, if Google was working on better tools for data upload / download / sync, or at least if he new what “version 1″ format data was or might possibly refer to and he pleaded ignorance on all counts. Reflecting on this conversation, I think I have to call bullshit here, at risk of going against my “no-negative vibes” mantra. I think he knew exactly what I was talking about and for whatever reason was not at liberty to disclose details. Which would have been a fine answer by me frankly.

Why I am calling Joe out about this? Well, mainly because I just found the methods the night prior to the conference as I was researching a project and he was the GAE representative at the conference. Sorry, but them’s the breaks.

Why do I think he maybe could have given me a more reasonable answer than pleading complete ignorance? Two reasons: (1) Protocol Buffers and (2) the released protocol buffer version 2 code for the memcached API on the groups list. Version 1 I think refers to protocol buffers version one, which has just been upgraded to version 2 and GAE has already announced that V2 specs are going through QA. My thinking is that this is either someone’s 20%, or that protocol buffer client/servers are used internally at Google to load data (or both) and somehow these methods have ended up in the HEAD branch by mistake. There is certainly no released client that talks to these server methods, and no documentation elsewhere in the code base, official API reference, or articles that hint at how the PB loads would work or what is required to make them work.

Now this is all perfectly understandable since PB V2 is coming soon to all parts of GAE, and it would be confusing to say the least to release some uber-complicated stream protocol that is soon to be replaced. But don’t plead complete ignorance, that’s just insulting.

Let’s Talk About Cherrys and Pie

Or better yet, let’s discuss CherryPy, the excellent Web application stack that we have chosen to act as the controller in our GAE applications. Remember that controller represents the “C” in the MVC design pattern. As the name implies, a controller guides the requests of users to the data of interest and back again. Essentially it provides a flow to the stateless protocol that is the Web.

Since the focus of CherryPy is on HTTP request handling, you can easily swap out any templating or data access layer that is best suited to a project. Because of this, the current version of CherryPy (3.1) works out of the box within GAE’s framework, specifically it works well with the Datastore API. We’ll cover the Mako template library that will serve as our View in a later post.

Read the Zen of CherryPy if you have a spare moment as some of those nuggets will lodge themselves into your subconscious for later use. For now, you should note that writing CherryPy applications is very close to regular Python object oriented programming. As an example, lets run through the requisite “Hello World” example using various levels of complexity. Up first is regular Python session invoked from a shell and the resulting output:


  angel$ python -c 'print "Hello World" ' 
  Hello World 

Taking this up a notch, let’s encapsulate that basic printing logic within a class:


# hello_world.py
class HelloWorld:
  def speak(self):
    return "Hello World"

def main():
  print HelloWorld().speak()

if __name__ == '__main__':
  main()

Here’s the result of running this script in the shell:


angel$ python hello_world.py 
Hello World

Now let’s take a look at the CherryPy version, from the CherryPy Tutorial:


# cherry_hello.py
import cherrypy

class HelloWorld:
    @cherrypy.expose
    def index(self):
        return "Hello world!"
cherrypy.quickstart(HelloWorld())

And invoking this script from the shell gives us:


angel$ python cherry_hello.py
[16/Oct/2008:21:56:43] ENGINE Listening for SIGHUP.
[16/Oct/2008:21:56:43] ENGINE Listening for SIGTERM.
[16/Oct/2008:21:56:43] ENGINE Listening for SIGUSR1.
[16/Oct/2008:21:56:43] ENGINE Bus STARTING
CherryPy Checker:
The Application mounted at '' has an empty config.

[16/Oct/2008:21:56:43] ENGINE Started monitor thread '_TimeoutMonitor'.
[16/Oct/2008:21:56:43] ENGINE Started monitor thread 'Autoreloader'.
[16/Oct/2008:21:56:43] ENGINE Serving on 127.0.0.1:8080
[16/Oct/2008:21:56:43] ENGINE Bus STARTED

Whoah Blossom! Where’s my hello world? What’s going on here? Well, the last command cherrypy.quickstart(HelloWorld()) has taken our code and mapped the class and method definitions to URLs that are Web accessible using CherryPy’s default configurations. If you run this locally and point your browser to http://127.0.0.1:8080/ you will see our not-so-secret message. The one note I should point out is that much like how the index.html is the default page of a web site, the index() method of a class is considered the default method to invoke if another method is not mapped to an URL.

Sidebar: that “@cherrypy.expose” above the method definition is called a decorator, and it effects the behavior of the immediately proceeding method. The alternative to using the expose decorator is to change the method directly though its attributes, but this can only be done after the method has been defined:


def index(self):
  # ... some code
index.exposed = True

Pretty simple eh? Sure looks like any other Python program to me, and that’s a good thing. Here is that same example, but modified to run on GAE:


# cherry_hello_gae.py
import cherrypy
import wsgiref.handlers

class HelloWorld:
    @cherrypy.expose
    def index(self):
        return "Hello world!"

def main():
    app = cherrypy.tree.mount(HelloWorld(),"/")
    wsgiref.handlers.CGIHandler().run(app)

if __name__ == '__main__':
  main()

You’ll also need to set up your GAE app.yaml configuration to point to this file:


# app.yaml
application: cherry_hello
version: 1
runtime: python
api_version: 1

handlers:
- url: .*
  script: cherry_hello_gae.py

Still pretty simple. The rest of the CherryPy tutorial is worth a read, but you should note that there will be differences to the way they present basic CherryPy applications to how it will work within GAE. Specifically, the section on the configuration file does not really apply to GAE applications unless you make a concerted effort to bring in the configuration options into the application in your application script. It will not be automatically loaded, as regular CherryPy applications outside of GAE will do.

Hope you enjoyed this quick introduction to CherryPy. Look forward to more articles delving deeper into GAE and CherryPy.

Here’s a quick peak into the posts I am cooking up for the MainStory.

  • A few articles on CherryPy, including:
    • Introducing CherryPy
    • How is it effected by GAE’s environment
    • Using CherryPy’s RoutesDispatcher to route requests
    • CherryPy tools you should get familiar with
    • When to choose a GAE mechanism over the equivalent function in CherryPy
  • A few articles on Mako, including:
    • Intro to Mako syntax and usage
    • Nested layouts
    • Caching page fragments

All these articles will take some time to write, though, since I have a day job and two young boys. I am about halfway down with the CherryPy intro, and expect to have it finished by the end of the weekend. But I’m curious as to whether folks would like to see several articles that get released faster, but may leave things for later posts, or would y’all like to see monolithic and more complete pieces?

I often see questions put to the GAE group list related to data management. Whether the particular question deals with bulk loading of data, backing up data from the production Datastore, or offline processing of data, it seems this is an area crying for a strong and generally applicable open source tool, yet there is none to my knowledge.

The bundled CSV bulk-uploading tools leave much to be desired. The lack of reasonable feedback from the tool, it’s reliance on a cookie that times out as part of the URL parameter set, and the way it will just skip to the next set of inserts in your CSV file, even when the insert fails, make bulkload_client.py not suitable for more than a few hundred rows. This situation will likely change in the future, and in fact if you take a look in the source, you’ll note some mention of  “Version 1″ formatted data, which I think are probably  zipped and/or pickled entities that may offer us some more robust data loading.In the meantime, we’ll have to assume that a white knight will not come to our rescue, and we’ll have to get our hands dirty, hopefuly learning something along the way.

My idea for a CRUD style admin console would actually consist of two parts, a reasonably generic admin console that sits on the server, and a local client to act as the data broker. This way you can standardized API for accessing your data models. The thought is to create a re-distributable base application for data administration. GAE developers would only need to supply the data models and some authentication scheme. For authentication, a simple API key (unique to each application) would do, but basic authentication, or even using Google AuthSub tokens, should not be complicated to implement.

Once configured, the admin console would then be loaded as a separate release from regular application, say with a version value of “adminconsole”, and accessed solely through the release URL.

The local client can build up a database of objects locally and pre-compute the keys needed for relationships, thus avoid CPU overloads I’ve noticed when inserting entities that are related to other pre-existing rows. Finally the local client can insure that objects where indeed inserted from the status messages given by the admin console. If not, the client should retry the transaction with increasing lags between request, as the most probable cause for insertion failures are quota releated.

I’ll be working on such a system in the very near future. If you are interested in a joint development, please leave a comment.

n00b note: This post is related to several other posts regarding the 1000 file per application quota and may be a little more advanced than appropriate for this point in the MainStory. If it goes over your head, don’t worry too much about it, since I refer back to it when the n00b-to-ninja journey reaches this point.

Right, so packaging modules as zip file and altering the system path has gone a long way to making the 1000 file limit quota a moot issue. At least that’s the case for your application code, but what about the content? What if you are building the next Beanie Babies® fan site focused on spotting and cataloging knock-offs. You would need a catalog of images containing every beanie baby ever, as well as pictures of the knock-offs and comparative close-ups to help your users identify fakes. That’s a lot of Beanies.

There are three options available solve this issue:

  1. Store the images in the data store and stream them from there.
  2. Zip archive a set of images and use the ZipHandler class to serve the images.
  3. Use an external service to host the files.

Option (3) assumes that you are willing and able to pay for whatever infrastructure or service costs it will take to host your content. Which means that you’ll need to coordinate the deployments of content and application code. It also means that you will have to “code in defense” to account for service outages of the content delivery mechanism. This level of complexity deserves its own post, so we’ll take it as a given that external hosting of files is not an option right now.

That leaves options (1) and (2). Both have trade-offs in terms of complexity, performance and implementation details. Let’s take a look at each in turn.

Preamble

For our example I’ll pass on SpotFakeBeanies.com, since I am no expert in the subject matter and also want to avoid any litigiously vulnerable content. Instead we’ll create a site dedicated to pictures of my thumb, about 10 should be more than adequate. Throughout this post, you should be keeping in mind that GAE has a hard limit of 1MB for file size and Datastore record size. We’ll look at the implications of that limit during the final discussion.

Using the Datastore

The official Google App Engine site contains an article on how to serve dynamic images that are stored within the Datastore, but it lacks a few details for the beginner. It glosses over as how one uploads files that are not already on the web, and also assumes that the content type is the same for all your files. Both of these assumptions are usually not the case. The article does provide a few hints, though, like how to create a model for storing files, and how to serve the files from the Datastore once they are in place. Let’s fill in a few blanks, shall we?

Using the Datastore requires that we create a model with a property that can store what is known as a BLOB (“binary large object”). We should also store a few more attributes about the picture along with the data. Here is our model class:

# AllThumbs models
from google.appengine.ext import db
class MyThumb(db.ImageModel):
  name = db.StringProperty()
  content_type = db.StringProperty(default=None)
  content = db.BlobProperty(default=None)

Next we’ll need a way to get our images into the data store. To test out the functionality a simple HTML form will do for now. We’ll put on our fancy-pants in a later post for selecting more than one file to upload:


<form action="/create" enctype="multipart/form-data" method="post" >
File: <input type="file" name="thumbpic" />
< input type="submit" value="upload picture" />
</form>

Finally, let’s review the controller code. We’ll need an index() method to query for our current pictures and another method to handle the file upload, create(). The create() method will simply redirect to the index() method once the image is inserted into the Datastore:


class Root:
  tl = TemplateLookup(directories=['views'])
  @cherrypy.expose
  def index(self, page=1):
    """index action"""
    pcount = MyThumbs.all().count() 
    pics = []
    if pcount > 0:
      offset= (pcount / 3 ) * (int(page) - 1)
      pics = MyThumbs.all().fetch(3, offset=offset)
    return self.tl.get_template("index.html").render(pics=pics,
                                                     page=page,
                                                     pcount=pcount )

  @cherrypy.expose
  def create(self,thumbpic=None):
    """upload action"""
    if thumbpic is not None:
      mt = MyThumbs(name=thumbpic.filename, 
                    content_type=thumbpic.type, 
                    content=thumbpic.file.read())
      mt.put()
    raise cherrypy.HTTPRedirect("/")

One last method is need to serve the dynamically streamed images into your web pages. This is called from HTML img tags as the value for the src attribute:


@cherrypy.expose
def tpic(self,i=None):
  if i:
    mt = MyThumbs.get(i)
    cherrypy.response.headers["Content-Type"] = mt.content_type
    return mt.content
  else:
    return None

It is used like so in our view:


% for p in pics:
<a href="/tpic?i=${p.key()}">
<img src="/tpic?i=${p.key()}" alt="${p.name}" width="50"/>
</a>
%endfor

Using ZipHandler

The alternative to storage of content in the Datastore is to upload it as a static asset of your application. In other words, put all you images into a zip file called images.zip. A simple directive in your app.yaml file will serve you well here:

- url: /images/.*
  script: $PYTHON_LIB/google/appengine/ext/zipserve

This directive will map an URL like “/images/my_thumbs001.png” to a file “my_thumbs001.png” in the root of the zip file. If you collected the images within a directory (e.g. “thumbpics”), you’ll need to add this to the URL path (e.g. “/images/thumbpics/my_thumb001.png”).

Since this method assumes that you know what you are uploading, there is no need to store the locations and names into the Datastore. You can just create the URLs in the Mako template if you like:


<% for x in range(1,10): %>
<img src="/images/thumbpics/my_thumb${"%0.3d" % x }.png"/>
<% endfor %>

Another option would have been to create entries in the Datastore that held the image URL location as a StringProperty.

Summary

We’ve taken a look at how to implement uploading and serving of static content to GAE applications in order to circumvent any issues related to the 1000 file per application limit. At face value, the ZipHandler method seems less complex, but when you start to consider a 1MB file size limit, you’ll soon realize that you’ll need split the image set into separate zip files. In turn, that means that the URL path creation will be dependent upon which image you are refering to, unless you create your own CustomZipHandler class to search all zip files for any given file name.

On the other side of the fence, using the Datastore adds a small bit of complexity to your code to handle uploads and serving of content. You’ll also need to write your own data uploading routines, as there are no pre-packaged Datastore upload scripts for GAE that handles BLOBs.

As for speed, I don’t think it is an issue, since the ZipHandler automatically uses client-side caching, as well as in-memory caching, and the Datastore method can take advantage of the memcache API, as well as set client-side caching headers as needed.

So really your choice should come down to one thing: Will this content change during the lifetime of the application? If not, then you will most likely it is easier to use the ZipHandler method. If so, then I highly recommend using the Datastore method.

Also note that these are not mutually exclusive. For instance, you can choose to bundle logos, icons and other ubiquitously used images (and other static files) within zip archives, and any potentially dynamic content within the Datastore.

Not much of an answer, I know, but nothing is ever simple or easy as one hopes. You can find an example application that utilizes both approaches outlined above in the footnotes. Also a hearty shout-out to AirTight Interactive’s Simpleviewer for providing a good working example of an image sharing site to the GAE community. Their example actually uses a bit of error checking for uploads and downloads, mine, as soley a demonstration project, does not.


Footnotes

Follow

Get every new post delivered to your Inbox.