Bottle : Authentication

29 Oct

Continuing on with our middleware series, we now cover authentication. There are a ton of authentication and authorization WSGI middleware, as well an basic authnetication example used in the WSGI documentation. Some are out of date, and a lot of others are tightly integrated with other parts of a particular frameworks request handling. It would have been easy enough to RollYO basic authentication, but I really hate reinventing a wheel I don’t have to.

I decided to investigate AuthKit, part of Pylons, to service my authentication needs, and struggled through a lack of documentation and fairly large code base, all for your pleasure.

Authentication with AuthKit

AuthKit assumes a lot of the setup for your middleware follows Pylons conventions. It was a struggle for me to make heads or tails of the examples, not being familiar with Pylons application configuration and how requests were routed. The secret sauce to actually make AuthKit work with bottle is to realize that there are actually multiple levels of AuthKit middleware that you have to invoke to get the authorization chain to even start up. Here is how you go about it in Bottle:

from authkit import authenticate, authorize 
from authkit.permissions import RemoteUser

from bottle import *

# bottle exposed function
@default() # maps to root URL
def hello():
    return "hello"

# get the default bottle application
app = default_app()

# set up an authorization permission for 
# basic authentication of a remote user
app = authorize.middleware(app, RemoteUser())

# A simple authentication function
def basic_auth(environ, username, password):
    return username ==  password

# now activate the authentication
auth_config = {
    'authkit.setup.method':'basic',
    'authkit.basic.realm':'Test Realm',
    'authkit.basic.authenticate.function':basic_auth,
    'authkit.setup.enable':'True'
}
app = authenticate.middleware(app,app_conf=auth_config)

# run the application
run(app=app)

To make this work for App Engine, you need to include the AuthKit sources and account for deploying Bottle applicatios on GAE, covered in the Bottle docs and other posts.

Bottle : Request Method Overriding

29 Oct

Bottle is a great little Web application framework, but in it’s quest for simplicity, it left out a couple of key components that are needed for cu3w0rx: HTTP method overriding and basic authentication. Luckily Python’s WSGI middleware can fulfill this role.

WSGI Middleware

WSGI middleware is a handy way to add functionality to an application by adding layers to the request/response chain in between the client request and your application. Incidently, Ruby’s Rack project took inspiration from WSGI middleware.

Method Overriding

A while back I bemoaned the fact that CherryPy’s RoutedDispatcher could not handle PUT and DELETE requests from forms submitted by your typical web browser, which most often only does GET and POST requests. I submitted a patch to the CherryPy project, but I now believe that this is the wrong approach, and that middleware can handle altering the REQUEST_METHOD header in response to a submitted form with a hidden “_method” parameter, as is the accepted convention. Here is some middleware for the server side application which will resige on GAE:

# method_overide.py
# WSGI middleware to set the HTTP REQUEST_METHOD header from a submitted form
# that contains a "_method" hidden variable.

class MethodOverride(object):
  def __init__(self, app):
    self.app = app

  def __call__(self, environ, start_response):
    method = webapp.Request(environ).get('_method')
    if method:
      environ['REQUEST_METHOD'] = method.upper()
    return self.app(environ, start_response)

And here is how you would insert it in between your bottle application:

from bottle import *
from google.appengine.ext.webapp import util
from method_override import MethodOverride

@route("/test_put",method="PUT")
def testput():
  return "PUT success"

@route("/test_delete",method="DELETE")
def testdelete():
  return "DELETE success"

# run in GAE
def main():
  app= default_app()
  # insert the method override middleware
  app = MethodOverride(default_app())

  util.run_wsgi_app(app)

if __name__ == '__main__':
  main()

Now forms that define the REQUEST_METHOD as a hidden param “_method” will be routed to the correct function.

cu3w0rx : Bottle

28 Oct

One of the great aspects of CloudCrowd is that the code base is so small and readable. A big part of that comes by virtue of using the Sinatra Web application framework for the master and slave daemon processes. Sinatra is much closer to the GAE webapp framework than CherryPy (by default), in that you define methods that correspond to the HTTP verbs (GET, PUT, POST, DELETE and HEAD). Sinatra differs in that the method’s first argument is the route that the method services. The GAE webapp framework instead forces you to define a class that the defines the HTTP verbs, and later map those classes to some route. Let’s take a look at “Hello World!” to illustrate the difference. Here is the GAE webapp version

from google.appengine.ext import webapp
from google.appengine.ext.webapp.util import run_wsgi_app

class HelloWorld(webapp.RequestHandler):
	def get(self):
		self.response.out.write("Hello World!")

app = webapp.WSGIApplication([("/",HellowWorld)])

def main():
	run_wsgi_application(app)

if __name__ == "main":
	main()

Now here is the version in Sinatra:

require "rubygems" # RubyGems is optional, depending on your setup
require "sinatra"
get '/' do
  "Hello World!"
end

Much more readable and concise. Here is a version of HellWorld in CherryPy, using the default MethodDispatcher, for comparison:

import cherrypy

class HelloWorld:
    exposed = True
    def GET(self):
        return "Hello World!"

app = HelloWorld()

d = cherrypy.dispatch.MethodDispatcher()
conf = {'/': {'request.dispatch': d}}
cherrypy.tree.mount(root, "/", conf)

Better, but you are still separating the the mapping of the root URL “/” to the HelloWorld class outside of the class. I started digging around the interwebz for Python frameworks that would work closer to the lightweight Sinatra and found Bottle. Bottle is small, self-contained and uses python decorators to map a function to a route. Brilliant. Here is the example using Bottle:

from bottle import route, run

@route("/") # assumes GET method
def hello():
	return "Hello world!"

run() # This starts the HTTP server

Now that’s what I’m talking about! Deploying a Bottle application to GAE is covered in their documentation. We’ll be using Bottle for cu3w0rx to create the master and slave daemons in later posts.

Designing cu3w0rx

27 Oct

cu3w0rx. Lovely name, eh? Moving on …

In this series we will be looking to implement a simple Map-Reduce framework that closely models the design and implementation of the CloudCrowd, which is written in Ruby. CloudCrowd  has some nice design choices. Specifically its small size (~1,800 LOC), use of JSON for message transport, and emphasis on HTTP as the protocol is quite nice. It is also pretty to look at, and the interface is entirely AJAX driven, so relies on the same service calls as the rest of the suite.  I think it is worth our while to set the stage for the project. Here are a list of things we will be taking directly from CloudCrowd:

  1. The App Engine application will serve as the central master resource.
  2. It will serve as the master work queue. All jobs will be submitted to it for processing.
  3. All communication will be via JSON messages. As in CC, the web site will make use of the JSON returned from AJAX requests to the resource handlers.
  4. There will be a clear specification for the top-level properties of a Job message, but handlers will be responsible for vetting the provided options to handle the request.
  5. Operations on inputs are assumed to happen on local disk. E.g. jobs will be staged onto the worker nodes’ scratch space.
  6. Jobs will have the option of a callback URL on success.
  7. For simplicity’s sake, authentication to the master, and between master and slaves, will be the same basic HTTP authentication credentials.
  8. A worker node will accept work item requests based on the machine’s load.

There will be several points that will stray from CloudCrowds implementation as well:

  1. Nodes will publish their capabilities to the master. It will not be assumed that all nodes have all the same capabilities. In this respect cu3w0rx will more resemble the nanite project (Ruby + ERLang).
  2. Worker nodes will not share the same code base as the master. The major reason for this is that worker nodes will not run on GAE, hence it makes no sense to hamstring them with the restrains that GAE imposes on python.
  3. Workers will maintain their own state in a local database. This will server to keep track of capabilities, number of jobs processed, monitoring statistics, and results from previous jobs.
  4. Map and reduce are implemented as two separate jobs, as far as the worker nodes are concerned.
  5. Since this is a demonstration project, we will not implement a scheme to save result files to non-volitile storage. Instead we will provide a way to give authenticated access to result files from worker nodes. Result files from Reduce phases will live only at the final destination host (e.g. the hosts that have checked back into the master as having succeeded a particular job.
  6. The queue is not necessarily FIFO. I am actually not sure if this is also true about CloudCrowd, but its worth mentioning here.

I would also like to implement a way to provision and configure cloud VMs using the excellent libcloud library, but I think that is outside the scope of a demonstration project. If you see anything missing (or think some stuff can be left out) leave a comment!

App ID struggle

24 Oct

I just tried for 10 minutes to get an unused application identifier that would make sense for the new example app. Adding insult to injury, every single supposedly taken ID I tried to access via the <appID>.appspot.com URL was a 404. Perhaps there is a bit of squatting going on?

Yeah you can map to domain name, but it makes writing tutorials that much more of a pain ;)

JRuby on GAE

14 Oct

After a long hiatus, I am trying to pick up the series of GAE posts. One small problem is that by day I am a Ruby programmer and switching context to Python for posts is a bit of extra work. That and a friendly prod from Ilya Grigorik for Ruby programmers to start writing about JRuby on Google App Engine has me thinking that I should play to my strengths more.

Having said that, I plan on doing one more series for Pythonistas, in order to implement a simple Map-Reduce work queue system using GAE as the master node. This comes from a direct need for us to support MapReduce type workflows on both Windows and Linux machines and an existing Ruby project that I would have loved to use (CloudCrowd) does not work on windows. In general any Ruby project that assumes fork() is available on the system tends to have problems in a Windows environment. The project is small enough that it will not be too much work to port over the concepts to Python.

“But Angel, doesn’t disco already fit the MapReduce void for Python?”

Technically, yes it does, but it relies on Erlang for communication between master and nodes, which is obviously a no-go for GAE.

“But Angel, if you are a Ruby guy, why don’t you just fork CloudCrowd and make it run on JRuby + GAE?”

Maybe one day I will, but Windows worker node compatibility is a must-have for us and as I researched CloudCrowd, the code base kept getting more and more Unix-centric. I did make a branch of the codebase that uses Ruby threads to overcome the use of fork() but the solution is non-optimal and broke when I tried to merge it back into the master branch which added even more into Unix dependencies for node CPU and memory statistics. Then other priorities took over and I have not looked back on that project.

Plus it will be an interesting project to cover on AppMecha!

Google App Engine Backup and Restore

22 Dec

A quick note to point folks to thow follwoing project, that may be of interest to GAE developers:

Google App Engine Backup and Restore

http://github.com/aral/gaebar/tree/master

Check it out! Also GitHub.com is probably the most useful site I have ever used.

Breaking the Chain

15 Nov

Well folks, I can see a light at the end of the tunnel for  tutorials and background material on GAE. Thus far I have been using the Seinfeld Calendar method (via dontbreakthechain.com) to keep my posts coming, but recently I’ve had to burn a few hours at home finishing up day-job matters, as you can see:

Don't Break the Chain

Don

Those white spaces are really pretty annoying, but there is not much I can do about it now, unless my name is Hiro, and its not. As bad as it is, though, it is only going to get worse, as there are a lot of projects that need to get out the door by end-of-year. Admittedly this is a self-imposed deadline, but a deadline none the less.

I think what really bugs me is that the white spaces don’t have an obvious pattern. I think I would not mind it as much if it was more checker-board type, or stripes:

checkers

Much better. The message is a little annoying, but I can deal with that.  Which is to say that I am trying to balance the day-jobs work-load with family and hobbies, so the posts will keep coming, but at a slower pace.

Platform Poll Results

14 Nov

So we got a whopping total of 9 votes (and one “other” that gave no alternate answer). Predictably the top answer was Java (60%) with Ruby running second (30%) . On the surface this makes sense, since there is a well-established developer community and Google itself uses Java for a lot of their projects.

Personally I don’t think it will be Java for a few reasons. First and foremost because even though GWT is a Java based framework, the output is Java script, not compiled code. Second, the whole reason Python is so ill-disposed to C extensions is to restrict access to the file system. I would assume that arbitrary compiled Java packages carry the same problem. Possibly you can get around this by stringent sandboxing of the JVM or providing a custom compiler for GAE deployments. I don’t know enough about it to make any intelligent sort of comment (may have already proven that) but both of these solutions seem like heavy investment into an immature product. Especially when there are viable alternatives, like PHP and Ruby, that share the same sort of constraints as the current Python implementation.

If I had a vote, it would be Ruby, since I work in that every day. I am biased,  yes, but it’s my blog, so I am entitled ;) I would not be surprised if it was PHP, though. After all, it pretty much still runs the Web.

Mako Introduction

13 Nov

Thus far we have spent a lot of time covering CherryPy. I think it is high time to cover our choice of view template engine, Mako. As a reminder, view system components serve to present data that is being fed to it from the controllers. Ideally view templates should be given all of the data they need to render appropriately, and not have to request more information from the data models directly. The view template then cherry picks (no pun intended) from the data it is given and formats it into an appropriate view of the data given the task at hand, for instance an summary of a posts versus the full entries.

Mako is a template library written in Python. HTML markup is intermixed with Python varaibles, control structures and statements. As such it uses similar approaches to JSP, ASP and PHP, but given the origins in MVC style Web applications, it is closer to ERB templates in Ruby land. Strictly speaking, Mako can produce any text format, not just HTML, but that is it’s main purpose.

Sidebar: Mako templates are compiled to Python modules at runtime for maximum performance. This has implications on memory and CPU usage quotas for GAE, but only if you have a tremendous amount of templates. In any case, you can configure Mako to only hold a certain amount of pre-compiled templates in memory, and the engine will retire the oldest (base on last access time) templates to make room for the new.

Let’s take a look at a basic example:

<%!
  import re
  def censor(text):
    return re.sub(r'ass', 'butt', text)
%>
<%
  msg = "<b>Hello World!</b>"
%>
<html>
<body>
  <p>Hello World!</p>
  <p>${ msg }</p>
  <p>${ msg | h }</p>
  <p>
    "Classic mistake" == ${ censor("Classic mistake") }
  </p>
</body>
</html>

OK, that’s a lot to take in, so let’s review each piece in turn. First and foremost are the directives for arbitrary Python blocks, denoted by <% %> and module level blocks, denoted by <%! %>. The difference between the two is that module level blocks are only evaluated on loading the template on the first time into memory, which may only be once per application. Ideally these module level blocks should be used for imports and some method definitions. The blocks follow all of the syntax rules for Python, including proper indentation.

Next is a simple expression substitution, denoted by ${ }. The contents within the ${} tag are evaluated by Python directly and evaluated directly into a string before handing the result back to the template. You can also call any imported function or functions defined within the scope of the template, such as the censor() function defined in the module block.

If the content for the expression is user-supplied (e.g. user generated content from the data store or forms) you often want to further process that result before presenting it to the client for security reasons (e.g. preventing malicious Javascript). This is done by using filters to escape the expression’s output string to something that is non-threatening to a browser. These escapes can be added to an expression substituion using the | operator:

  ${ "<script> alert(\"danger will robinson\") ; </script> " | h }

The above expression produces

  &lt;script&gt; alert("danger will robinson"); &lt;/script&gt;

instead of the possibly malicious javascript within the content of the expression. Mako includes a number of built-in escaping mechanisms, including HTML, URI and XML escaping, as well as a “trim” function.

Control Structures

Mako templates provides the basic set of control structures – conditionals (i.e. if/elif/else), loops ( while and for), and exception handling ( try/except). These are denoted by a plain “%” sign as the first non-whitespace character on a line. Since Python indentation does not apply to them, must be explicitly ended by the corresponding end tag. Here are two examples copied from the Mako docs:

% if x==5:
    this is some output
% endif

% for a in ['one', 'two', 'three', 'four', 'five']:
    % if a[0] == 't':
     its two or three
    % elif a[0] == 'f':
    four/five
    % else:
    one
    %endif
% endfor

As you can see, the conditions and iterators from Python work just fine.

Tags

In addition to these Python-embedding structures, Mako provide a few XML tags for adding functionality specific to the Mako templates. The tag names all begin with “%” but are not like the Python block demarkations since the tag end does not have a corresponding “%”. Confusing, I know, but here is the basic syntax format:

<%include file="foo.txt"/>

<%def name="foo" buffered="True">
    this is a def
</%def>

Notice how the closing “>” is not prefixed with “%” as the Python blocks are. Also notice that when the tag has content, the closing tag follows XML syntax. The following table lists the available tags and attributes:

Tag Description
%page defines general characteristics of the template
%include include another template, optionally passing arguments to it
%def defines a Python function which contains a set of content and can be called at some other point in the template
%namespace    Like Python import statement, allows access to rendering functions and metadata of other template files, Python modules, etc.
%inherit allows templates to arrange themselves in inheritance chains
%call used to call <%defs> with additional embedded content
%doc multiline comments
%text suspends parsing and returns the content as plain text

All of the tags are very well documented in the Mako documentation, but it is worth discussing at the tag, which defines functions like the Python def, but the content of the tag follows the syntax rules of Mako. These are compiled into Python bytecode and can be evaluated in expression statements like so:

<%def name="int_square(x)">
  Integer square of ${x} is ${ int(x) * int(x) }
</%def>

${ int_square(33) }

Next post, we’ll cover how templates can call and include each other using the inherit tag.

Follow

Get every new post delivered to your Inbox.