Optimizing Python Generator Functions in the AST

April 19th, 2008

I know I’ve written quite a few Python articles of late, so it may be frustrating for some of you to see yet another Python-specific post. There is a reason for it! First, Python is perhaps my favourite scripting language (Ruby’s a close second, if only because I’ve had some bad experiences with Ruby bindings to C libraries in the past – probably related to language popularity at the time). Second, the AST optimization patch I’ve been working on has meant I’ve been pretty immersed in the source code of the Python compiler the last few week.

Anyway, this post is a direct result of some work I’ve been doing with the AST optimization patch. Specifically, I’ve been seeing test_generators fail with a couple of specific optimizations.

In the current Python trunk, code like this:

def mygenerator1():
  if 0:
    yield 5
  return

or this:

def mygenerator2():
  return
  yield 7

will result in the function being treated as a generator, even though the generator will not actually return anything. That’s fine, but what happens when we throw the AST optimizer into the mix? Suddenly mygenerator1 becomes:

def mygenerator1():
  return

Similarly, mygenerator2 becomes:

def mygenerator2():
  return

What effect does this have when we actually call our generator functions? Here’s an example of what happens with mygenerator1 before the optimizer does its work:

>>> mygenerator1()
<generator object at 0xb7d2bc2c>
>>>

And here's what happens after the AST optimization patch is applied:

>>> mygenerator1()
>>>

As you can see, mygenerator1 is no longer a generator function with the AST optimizer in effect. Ouch.

In order to try and find a way to make optimization of generator functions a possibility in the AST optimizer, I went digging through the Python source code to investigate how and when an ordinary function becomes a generator function. The answer is a little complicated, but I'll do my best to summarize it all here.

It all begins in Python/compile.c's PyAST_Compile function. Before any code generation takes place, PySymtable_Build (see Python/symtable.c) is called to generate a symbol table from the AST being compiled.

PySymtable_Build in turn visits every node in the AST to construct the symbol table for compilation. Among the visitor functions is symtable_visit_expr, which is where a function is initially marked as a generator function if it is found to contain a "Yield" node. This is done by setting the ste_generator field of the current symtable_entry:

           case Yield_kind:
                if (e->v.Yield.value)
                    VISIT(st, expr, e->v.Yield.value);
                st->st_cur->ste_generator = 1;
                // ... code omitted ...

st_cur is effectively the symbol table entry for the current local namespace, and the "yield" statement can only appear inside a function. So here, when we're marking st_cur as a generator, we're effectively saying "this function is a generator!". This is only the first step in the making of a generator function.

Later on in the compile process (see Python/compile.c), we hit the compile_function function, which is responsible for generating the Python bytecode for a FunctionDef AST node. compile_function calls assemble() to build the code object that will be executed when this function is called. The assemble() function calls the makecode() function which in turn calls compute_code_flags(). In compute_code_flags, we find this:

    if (ste->ste_type == FunctionBlock) {
        if (!ste->ste_unoptimized)
            flags |= CO_OPTIMIZED;
        if (ste->ste_nested)
            flags |= CO_NESTED;
        if (ste->ste_generator)
            flags |= CO_GENERATOR;
    }

The result of this function will then be used as the co_flags for the PyCodeObject generated by the assemble() function. So, as we can see here, if the ste_generator field is set by PySymtable_Build for our function, our corresponding PyCodeObject is going to have its CO_GENERATOR flag set.

However, this still doesn't explain the behaviour we see when we actually call a generator function. Ordinary functions will just pass back whatever value is given to "return" (or Py_None, in the event that no "return" is explicitly given, or if no value is given to "return"). In the case of a generator function, however, the call will return a new "generator" object. So where does this generator come from?

Inside Python/ceval.c we find PyEval_EvalCodeEx. This function will eventually be called either by run_mod or run_pyc_file in Python/pythonrun.c or by exec_statement inside Python/ceval.c. PyEval_EvalCodeEx will step into itself recursively until it hits the execution frame of our generator function. Like the symtable_entry for the Symtable, the "frame" can be thought of as a sort of "local namespace" for the code object being executed at runtime. This is a simplified explanation, but it should be enough to make the concepts presented here easy enough to follow.

The frame for the function being executed has a code object associated with it, which in turn has the co_flags that were set for the function at compile time. Thus, PyEval_EvalCodeEx merely needs to test for the existence of the CO_GENERATOR flag:

    if (co->co_flags & CO_GENERATOR) {
        /* Don't need to keep the reference to f_back, it will be set
         * when the generator is resumed. */
        Py_XDECREF(f->f_back);
        f->f_back = NULL;

        PCALL(PCALL_GENERATOR);

        /* Create a new generator that owns the ready to run frame
         * and return that as the value. */
        return PyGen_New(f);
    }

The call to PyGen_New() seen here is how our generator function is truly differentiated from an ordinary function: the resulting PyObject will be a generator instance (see Objects/genobject.c). This will eventually bubble up to your Python code and, whenever we call the next() method or pass the generator off to a for loop, the tp_iternext method on the generator will be called to execute the function and yield the next available value. Once a StopIteration exception is thrown or the function returns (which, within the generator object, results in a StopIteration being thrown anyway), the generator will have ceased execution.

*exhales* So that's how generator functions work. :)

Despite all this analysis, the only work-around I can see is doing a full scan of the FunctionDef body for Yield nodes. If we find a Yield anywhere in the body, then rather than replacing an "if" node with a "pass", we might just insert an unreachable "yield" somewhere to force the compiler to produce a proper generator function. This isn't really an ideal solution, but it would require the least number of changes to the existing code base. Another option involving a bit more work would introduce an annotated AST: we can mark FunctionDef nodes as "generators" at the AST level.

In any case, the ultimate solution to this requires a broader discussion than what I can cover here. The exploration into how Python "knows" a function is actually a generator function was quite interesting, nonetheless. Hopefully it's interesting to somebody else out there. :)

Categories: Software Development | No Comments

The Internals of Python’s IMPORT_NAME Bytecode

April 14th, 2008

This was originally planned as a response to this post by Paul Bonser, but grew a little unwieldy (and his comment submission form seems to be broken?).

Effectively, Paul was (somewhat sleepily) mulling over the workings of the IMPORT_NAME bytecode. This bytecode is generated in response to Python code like the following:

import sys

And also for:

from foo import bar, baz

You’ll have to see the original post for the actual bytecode generated for this code, but Paul was asking why the latter syntax generates an IMPORT_NAME bytecode instruction which seems to do nothing at all with the fromlist, then generates additional IMPORT_FROM bytecodes that fetch the fromlist attributes from the parent module.

The documentation for __import__ somewhat solves this mystery:

Note that even though locals() and ['eggs'] are passed in as arguments, the __import__() function does not set the local variable named eggs; this is done by subsequent code that is generated for the import statement. (In fact, the standard implementation does not use its locals argument at all, and uses its globals only to determine the package context of the import statement.)

Essentially, when the IMPORT_NAME is executed for ‘from foo import bar, baz’, the __import__ builtin is called with the fromlist and a few other arguments (namely the globals() and locals() from the current frame of execution) to provide custom import handling for your Python programs. For example, you may want to prevent users of your program from writing scripts that import certain modules. I imagine Google’s App Engine might be using something like this to prevent access to certain evil or unavailable modules (but that’s just a wild, unfounded guess).

The code in Python/ceval.c for IMPORT_NAME seems to back this up (I’ve annotated the code with a few comments):


        case IMPORT_NAME:
            w = GETITEM(names, oparg);
            /* 1. LOCATE THE __import__ BUILTIN */
            x = PyDict_GetItemString(f->f_builtins, "__import__");
            if (x == NULL) {
                PyErr_SetString(PyExc_ImportError,
                        "__import__ not found");
                break;
            }
            Py_INCREF(x);
            v = POP();
            u = TOP();
            /* 2. BUILD THE LIST OF ARGUMENTS FOR __import__ USING THE fromlist, globals() AND locals() */
            if (PyInt_AsLong(u) != -1 || PyErr_Occurred())
                w = PyTuple_Pack(5,
                        w,
                        f->f_globals,
                        f->f_locals == NULL ?
                          Py_None : f->f_locals,
                        v,
                        u);
            else
                w = PyTuple_Pack(4,
                        w,
                        f->f_globals,
                        f->f_locals == NULL ?
                          Py_None : f->f_locals,
                        v);
            Py_DECREF(v);
            Py_DECREF(u);
            if (w == NULL) {
                u = POP();
                Py_DECREF(x);
                x = NULL;
                break;
            }
            READ_TIMESTAMP(intr0);
            v = x;
            /* 3. CALL __import__ WITH THE module name, fromlist, globals() AND locals() */
            x = PyEval_CallObject(v, w);
            Py_DECREF(v);
            READ_TIMESTAMP(intr1);
            Py_DECREF(w);
            SET_TOP(x);
            if (x != NULL) continue;
            break;

So this answers the question of why IMPORT_NAME needs the fromlist in the first place: it is merely passed along to __import__ to make it available to custom import handling code. But why aren’t the fromlist attributes added to the namespace inside IMPORT_NAME? I’m guessing it was a design decision: we already have an opcode for adding elements to the namespace, so why have a special case for imports? Of course the details may be more involved than that, but it’s the most obvious explanation I can think of.

In any case, thanks for the thought-provoking post, Paul!

UPDATE: Seems my comment made it through to his blog after all. Sorry for the double-up!

Categories: Compilers, Python, Software Development | 2 Comments

Python 2.6a2: Compile ASTs from within Python code

April 4th, 2008

I’m not going to go into too much depth because Georg Brandl has already covered it, but it’s an interesting topic. I couldn’t help but write a little entry about it. :)

A new alpha of Python has just been released, including a patch I wrote for compiling Python Abstract Syntax Trees down to bytecode. This means it’s now possible to manipulate ASTs from within your Python program, which lets you do all sorts of crazy things – like this, for example.

Piping this little program into itself yields the following:

$ ./python wacky.py <wacky.py
Bwahaha! I was once an Assign node!
Bwahaha! I was once an Assign node!

Neat huh?

Anyway, since then – on Neal Norwitz’s advice – I’ve started on an experimental patch for what I hope will one day be an optimizer for Python ASTs. Even though it’s early days, the possibilities offered by optimizing at the AST level are very interesting. For example, the (dirty, filthy, ugly, hack of a) patch I’m working on at the moment has support for optimizing this code:

if 1:
    'true'
else:
    'false'

Down to this (remember, no bytecode has been generated yet):

'true'

Very, very exciting stuff.

Anyway, I have a train to catch. More on this when I have more to show!

Categories: Compilers, Python, Software Development | No Comments

When Speed Matters

April 3rd, 2008

I’d be the first to tell you if, when confronted with a task that requires a degree of automation, you use a scripting language like Python, Ruby, Perl, Bash or some mix of the four. However, I recently had a problem that involved changing the permissions on a large number of files.

My initial approach used a small (10 lines or so) Bash script to traverse the filesystem hierarchy and change the permissions based on whether we were processing a file or a directory. The resulting script ran over more than 3000 files and directories in about a minute and a half. Which wasn’t exactly slow, but I had to run this script about twenty to thirty times throughout the day and it felt very unproductive to have to wait for a whole minute and a half before I could continue my work. So I did something crazy: I rewrote that little Bash program in C.

The resulting C program – of maybe 75 lines of code – finished in 1.3 seconds.

Initially I thought I had made a mistake, so I checked the processed files. They were all exactly as I expected them to be. I was astounded: I could execute this program almost 150 times before my old Bash solution finished even once! This decision made my day – the rest of my afternoon was much more productive. I even felt happier to know I wasn’t wasting so much time on something trivial and secondary to the actual task at hand.

Again, I’m not one to go preaching about how important performance is with respect to any given language – often it’s much, much easier to write a few Python/Ruby/Perl scripts and push data between them with some Bash glue. In this particular case, however, the choice of a lower-level language was clearly a massive win.

Even in retaining the pragmatic perspective, it really makes you wonder just how much time is lost to “inefficient” software stacks…

Categories: C, Linux, Software Development | 4 Comments