What do you most dislike about PHP?

December 17th, 2008

The title of this post is probably a little bit flamebait-ish, but I’m actually genuinely interested to get the Internet’s opinion on the following:

Pick any three aspects of PHP that makes you feel dirty as a developer, and post them as a comment here.

For me it’s:

  • The standard library is a huge, monolithic mess. This has been covered before, so I won’t beat a dead horse. I’d love to see the standard library reorganized and some sanity introduced around the naming of certain aspects.
  • Superglobals because they behave weirdly in certain scenarios and make unit testing a pain in the ass.
  • omg\namespace\separator\is\Backslash, which has effectively killed this long-awaited feature for me.

Again, I don’t intend for this to be about hating on PHP or its developers (read: comments that don’t contribute anything meaningful to the discussion won’t see the light of day). Instead, I want to formalize what causes people pain when working with PHP today. Who knows — maybe somebody, somewhere will listen?

Categories: PHP, Software Development | 2 Comments

Python Language Internals: From Source to Execution

December 10th, 2008

Last week I gave a presentation in Sydney for the Open Source Developer’s Conference on the internals of the Python compiler. Now that the conference is over, I figure there’s no better place for the original paper than right here on my blog.

Thanks again to my employer, Shine for all their wonderful support with this. Thanks too to the folks who reviewed my paper before it was submitted and my colleagues at Shine who endured a painful test run that ran well over time! ;)

OSDC 2008 was a whole lot of fun. If you happened to miss the conference, I strongly recommend you make it along next year. It’s well worth the trip.

Download Python Language Internals: From Source to Execution, and a corresponding patch against Python 2.6 beta 3 implementing the “unless” statement. If you want to test the patch, you also be aware of the nasty little circular dependency that may cause you some grief — apply the latest (non-review) patch from the tracker before you start the build and you should be fine.

Categories: Compilers, Python, Software Development | 1 Comment

Speaking at OSDC 2008

August 15th, 2008

I’ve been offered a position to speak at this year’s Open Source Developers’ Conference on the internals of the Python compiler. I’ll be introducing the audience to the basic workings of the Python bytecode compiler, demonstrating how one would add a new construct to the language.

If you’re at the conference and interested in messing with the guts of a real compiler but not quite sure where to start, please come along. The only real prerequisite is a sound understanding of the C language.

The conference runs from the 2nd to the 5th of December. Registration should open some time in October if you’re interested in tagging along.

Now, back to my paper …

OSDC 2008: Sydney

Categories: Compilers, Python, Software Development | 2 Comments

Taking the Pain Out of Complex Forms in Rails

July 20th, 2008

The other day I was discussing Rails’ form processing behavior with Ben, when the topic of editing multiple associations in a single form came up. Effectively, he needed to be able to manipulate a collection of records associated with a parent via a has_many association. For each item in this collection, he wanted something like two drop down lists — nothing unreasonable. He said it was similar to recipe 13 from Advanced Rails Recipes — “Handle Multiple Models in One Form” — except that he had to use two fields instead of the one textfield, and it was proving to be surprisingly tough.

I know what you’re thinking: surely this can be extended to more than one form field per item without any problem? Up until I read the code myself a few months ago, I wouldn’t have thought it would be so difficult either — but it’s true. The parameter parsing code is quite quirky and, without reading the detail of the source, borderline unpredictable. I’ve had a few ideas for dealing with more complicated forms in Rails ever since I first ran into this issue. Only recently have I actually gotten around to doing something about it. Here’s an abridged version of an email I just sent to the rails-core mailing list:

Hi,

A colleague of mine recently ran into a related problem when dealing with collections of records in Rails forms, which prompted me to finally do something about the parameter parsing behavior. Now, while I’m sure we’re not the only ones hitting this wall, I’m aware that proposals to change to the parameter parsing semantics in Rails is likely to be met with a little caution, hesitation — possibly even terror. :) With that in mind, I’ve put up a plugin so everybody can give this a go without having to apply any nasty patches:

http://www.deskchecked.com/git/rails/plugins/form_collections.git

Please see the README (attached) for the full details.

Cheers,
Tom

This is a plugin which greatly simplifies the existing Rails parameter parsing code, and makes it a whole lot easier to deal with collections in your forms. I encourage all you Rails developers out there to take a look when you get the opportunity, I’d love to hear your opinions on this one. You can get a copy of the git repository using the following command:

$ git clone \
    http://www.deskchecked.com/git/rails/plugins/form_collections.git

UPDATE: Here’s the contents of the README file:
UPDATE 2: Removed the README: looks terrible in the new design due to space restrictions.

Categories: Ruby, Software Development | 11 Comments

JVM Compiler Construction with Scala and BCEL, Part 1.5

July 9th, 2008

The second part of my Scala compiler construction tutorial has been a long time coming. This post is, unfortunately, not the second part of the article — although that is coming soon. Honest.

Since part 1 was published, Scala 2.7 has been released which — among other things — introduced changes to the parser combinator library. Changes that meant the source code from part 1 will not compile in Scala 2.7. Sorry about that. Updated, working code can be found at the end of this post.

So what exactly has changed? Well, let’s see …

keyword() no longer discards its result

A bit of a refresher first:

In both Scala 2.6 and 2.7, the keyword implicit is called whenever you use a string in your “grammar rules”. For example:

def sum = expr ~ "+" ~ expr

Is the equivalent of:

def sum = expr.~(keyword("+").expr)

In Scala 2.6’s parser combinator library, the result of the keyword() call was actually the UnitParser — that is, a parser that would discard its result. At the time, that meant we could use the tilde operator (”~”) to create a sequenced parser and everything was fine:

def sum = expr ~ "+" ~ expr ^^
    ((left : Expr, right : Expr) => Sum(left, right))

In 2.7, the keyword() implicit returns a Parser[String] rather than a UnitParser. This means we have to either deal with the newly introduced tokens as follows:

def sum = expr ~ "+" ~ expr ^^
    ((left : Expr, op : String, right : Expr) => Sum(left, right))

Or alternatively …

Use <~ and ~> to indicate the important parts of a parse rule

Let’s start with something simple based on the 2.6 combinators:

def bracketExpr = "(" ~ expr ~ ")" ^^ ((e : Expr) => e)

Again, in 2.7 we know that keyword() is no longer a UnitParser, so we have to deal with it like so:

def bracketExpr = "(" ~ expr ~ ")" ^^
    ((l : String, e : Expr, r : String) => e)

Alright, this compiles and does what we expect. But why should we keep those strings around if we don’t need them? They sure do clutter up the code a whole bunch.

<~ and ~> can be used to include or discard the result of a given parser. <~ builds a parser that takes the result of two parsers (the parsers to its left and right) and builds a parser that takes the result of both and discards the result of the one on the right. Conversely, ~> yields a parser that takes the result of both parsers and discards the result on the left.

So, using these two operators we can rewrite bracketExpr as follows:

def bracketExpr = "(" ~> expr <~ ")" ^^
    ((e : Expr) => e)

Ah. Much better. :)

The ^^^ operator

I had to go digging in the Scala source code to work out what exactly this one does.

First, let’s take a look at some 2.6 code:

def simpleExpr = term * (
    "+" ^^ ((x : Expr, y : Expr) => Add(x, y)) |
    "-" ^^ ((x : Expr, y : Expr) => Sub(x, y))
)

In 2.6, this can parse zero or more repetitions of “term” interleaved by “+” and “-” (check out part 1 if you need a refresher on how the * combinator works). In 2.7 it’s a compile-time error because of the fact keyword() is no longer a UnitParser (are you seeing a pattern here? :) ), and ^^ is trying to pass the resulting String on to the anonymous method in each case.

If we use the ^^^ operator here, we can effectively discard the result of the keyword parse, and build a parser that uses a simple anonymous method to parse the current pair of terms:

def simpleExpr = term * (
    "+" ^^^ ((x : Expr, y : Expr) => Add(x, y)) |
    "-" ^^^ ((x : Expr, y : Expr) => Sub(x, y))
)

Exactly what we’re after. This compiles and behaves as expected.

What else?

There may be more changes to the parser combinator library which I haven’t covered here, but I’m not going to go looking for any more changes since the updated code seems to work just fine. This should be enough to at least understand the updated code for the compiler described in part 1 without needing to deal with any cryptic compiler errors.

Finally, the new code!

Special thanks to Harshad for sending through working code for 2.7 ages ago which I never got around to posting here. This code, along with the Scala API docs, was used to figure out just what had changed since 2.6. The code below is derived from some code he sent to me a few months back.

I’m really sorry this has been so long in the making. I’ll try to get around to writing the “real” part two of this article. In the meantime, here’s the updated code. Thanks! Please post or email any comments or questions.

Categories: Compiler Construction with Scala and BCEL, Compilers, Functional Programming, Scala, Software Development | 3 Comments

Implicits for the Masses

July 5th, 2008

I just finished reading Tony Morris’s blog post on Scala implicits and saw the following comment:

To me, implicits look a lot like global variables, which is why I don’t like them.

Or maybe I missed something?

Uh, yep — you missed something :) I was going to reply in the comments to Tony’s post but as per usual it became far too long, so I’m posting it here instead.

Let’s try something a little less abstract than what was covered in Tony’s article. I haven’t actually tested any of this code to check that it compiles (lazy, lazy, lazy!), but it should be pretty straightforward & clarify what implicits can be used for:

class Person(name : String) {}
class Conversation(person1 : Person, person2 : Person) {
    def greet() = {
       println(person1.name + " says: Hello, " + person2.name)
       println(person2.name + " says: Oh, hello " + person1.name)
    }
}
implicit def stringToPerson(name : String) : Person = new Person(name)

With this implicit in effect, we can create a Conversation without even mentioning the Person class:

val conv = new Conversation("John", "Mary")
conv.greet()

See how we passed two strings to the Conversation constructor, even though it’s supposed to take two Persons?

Global variables don’t really come into it, it’s more about implicit type conversion.

Why would you do this? Well for one, it lets you create faux multimethods:

class FrameWrapper(frame : JFrame) {
    def title_=(s : String) = frame.setTitle(s)
    def visible_=(b : Boolean) = frame.setVisible(b)
}
implicit def wrapFrame(frame : JFrame) : FrameWrapper =
    new FrameWrapper(frame)

This means we can call FrameWrapper methods on a JFrame and the Scala compiler will figure out what we mean:

val frame = new JFrame
frame.title = "My Application"
frame.visible = true

Normally you’d have to call setTitle/setVisible at the second and third lines of the code above because JFrame doesn’t have any public attributes called “title” and “visible”. However, with the implicits in effect the Scala compiler will wrap your JFrame in a FrameWrapper instance to call that method. A weak example to be sure, but it really is a nice way to “scala-ize” any existing Java code. All this is statically checked too, so it won’t try to set your children on fire when you’re not looking.

Nifty eh?

Categories: Scala, Software Development | 15 Comments

Ruby Releases Are Scary (Or: How CI Can Save Your Ass)

July 4th, 2008

In many open source software projects, full backwards compatibility is ensured between minor point releases (for example, 1.2.3 to 1.2.4). Generally speaking, these releases are made mostly to get important defect and/or security fixes to the public in a relatively timely manner. PHP is a notable exception to the rule: I haven’t been following development all that closely of late, but in the past it was not uncommon to break backwards compatibility between point releases.

Another major exception to the rule that hits a little closer to home is Ruby. In the past, backwards-incompatible changes have crept into point release changes — I’m uncertain as to whether or not this was intentional. However, the Ruby project also provides what’s known as a “patchlevel” release for each given point release. The patchlevel counts the number of patches applied to any given point release. Generally speaking, these seem to be bug and security fixes.

If you’re interested, the patchlevel release of your Ruby installation can be seen like so:

[ tom ] ~
$ ruby --version
ruby 1.8.x (YYYY-MM-DD patchlevel XXX) [i486-linux]

Anyway, over the past few days I’ve been watching the discussion leading up to patchlevel releases of Ruby 1.8.6 and 1.8.7. It’s been an interesting experience. You can follow the discussion from message #17499 here.

First there is an announcement that new versions of Ruby will drop in three days’ time. Shortly after this, the announcement that both versions are failing numerous rubyspecs (57 failing for 1.8.7!). Then a report that memory leaks may be present in both versions, followed by a report that one of the tests hang in Win32. Eventually a request to delay the release came through, along with a recommendation by Charles Nutter (the guy who brought us JRuby) to implement continuous integration.

I’m really surprised at a few things:

  • The original decision to make a release was based upon “I think current 1.8.6/1.8.7 is [more] stable than p230/p22″. Please, please tell me that the decision that the trunk is more stable than a previous release wasn’t based on a gut feeling. Please.
  • No continuous integration. It’s very important for software projects to maintain and continuously run a set of regression test. Otherwise, sooner or later they will break their users’ code.

Ruby is a great language and I applaud the work of the developers — maintaining a beast like the Ruby core for naught but love must be hard work. However, I have to say that I think their fumbling to get a stable release out the door on short notice — especially in the face of their recent security problems — is a concern for anybody relying on the Ruby code base. All that said, it sounds like the project is already taking steps in the right direction based on an earlier email to the list from Matz.

One day in the near future I’m sure that cutting a Ruby release won’t be quite so painful, but the lesson here is clear: Exercise your tests frequently — and ideally, automatically — or getting a stable release out the door might be nigh on impossible.

Categories: Compilers, Ruby, Software Development | 7 Comments

The Internals of Python’s IMPORT_NAME Bytecode

April 14th, 2008

This was originally planned as a response to this post by Paul Bonser, but grew a little unwieldy (and his comment submission form seems to be broken?).

Effectively, Paul was (somewhat sleepily) mulling over the workings of the IMPORT_NAME bytecode. This bytecode is generated in response to Python code like the following:

import sys

And also for:

from foo import bar, baz

You’ll have to see the original post for the actual bytecode generated for this code, but Paul was asking why the latter syntax generates an IMPORT_NAME bytecode instruction which seems to do nothing at all with the fromlist, then generates additional IMPORT_FROM bytecodes that fetch the fromlist attributes from the parent module.

The documentation for __import__ somewhat solves this mystery:

Note that even though locals() and ['eggs'] are passed in as arguments, the __import__() function does not set the local variable named eggs; this is done by subsequent code that is generated for the import statement. (In fact, the standard implementation does not use its locals argument at all, and uses its globals only to determine the package context of the import statement.)

Essentially, when the IMPORT_NAME is executed for ‘from foo import bar, baz’, the __import__ builtin is called with the fromlist and a few other arguments (namely the globals() and locals() from the current frame of execution) to provide custom import handling for your Python programs. For example, you may want to prevent users of your program from writing scripts that import certain modules. I imagine Google’s App Engine might be using something like this to prevent access to certain evil or unavailable modules (but that’s just a wild, unfounded guess).

The code in Python/ceval.c for IMPORT_NAME seems to back this up (I’ve annotated the code with a few comments):


        case IMPORT_NAME:
            w = GETITEM(names, oparg);
            /* 1. LOCATE THE __import__ BUILTIN */
            x = PyDict_GetItemString(f->f_builtins, "__import__");
            if (x == NULL) {
                PyErr_SetString(PyExc_ImportError,
                        "__import__ not found");
                break;
            }
            Py_INCREF(x);
            v = POP();
            u = TOP();
            /* 2. BUILD THE LIST OF ARGUMENTS FOR __import__ USING THE fromlist, globals() AND locals() */
            if (PyInt_AsLong(u) != -1 || PyErr_Occurred())
                w = PyTuple_Pack(5,
                        w,
                        f->f_globals,
                        f->f_locals == NULL ?
                          Py_None : f->f_locals,
                        v,
                        u);
            else
                w = PyTuple_Pack(4,
                        w,
                        f->f_globals,
                        f->f_locals == NULL ?
                          Py_None : f->f_locals,
                        v);
            Py_DECREF(v);
            Py_DECREF(u);
            if (w == NULL) {
                u = POP();
                Py_DECREF(x);
                x = NULL;
                break;
            }
            READ_TIMESTAMP(intr0);
            v = x;
            /* 3. CALL __import__ WITH THE module name, fromlist, globals() AND locals() */
            x = PyEval_CallObject(v, w);
            Py_DECREF(v);
            READ_TIMESTAMP(intr1);
            Py_DECREF(w);
            SET_TOP(x);
            if (x != NULL) continue;
            break;

So this answers the question of why IMPORT_NAME needs the fromlist in the first place: it is merely passed along to __import__ to make it available to custom import handling code. But why aren’t the fromlist attributes added to the namespace inside IMPORT_NAME? I’m guessing it was a design decision: we already have an opcode for adding elements to the namespace, so why have a special case for imports? Of course the details may be more involved than that, but it’s the most obvious explanation I can think of.

In any case, thanks for the thought-provoking post, Paul!

UPDATE: Seems my comment made it through to his blog after all. Sorry for the double-up!

Categories: Compilers, Python, Software Development | 2 Comments

Python 2.6a2: Compile ASTs from within Python code

April 4th, 2008

I’m not going to go into too much depth because Georg Brandl has already covered it, but it’s an interesting topic. I couldn’t help but write a little entry about it. :)

A new alpha of Python has just been released, including a patch I wrote for compiling Python Abstract Syntax Trees down to bytecode. This means it’s now possible to manipulate ASTs from within your Python program, which lets you do all sorts of crazy things – like this, for example.

Piping this little program into itself yields the following:

$ ./python wacky.py <wacky.py
Bwahaha! I was once an Assign node!
Bwahaha! I was once an Assign node!

Neat huh?

Anyway, since then – on Neal Norwitz’s advice – I’ve started on an experimental patch for what I hope will one day be an optimizer for Python ASTs. Even though it’s early days, the possibilities offered by optimizing at the AST level are very interesting. For example, the (dirty, filthy, ugly, hack of a) patch I’m working on at the moment has support for optimizing this code:

if 1:
    'true'
else:
    'false'

Down to this (remember, no bytecode has been generated yet):

'true'

Very, very exciting stuff.

Anyway, I have a train to catch. More on this when I have more to show!

Categories: Compilers, Python, Software Development | No Comments

When Speed Matters

April 3rd, 2008

I’d be the first to tell you if, when confronted with a task that requires a degree of automation, you use a scripting language like Python, Ruby, Perl, Bash or some mix of the four. However, I recently had a problem that involved changing the permissions on a large number of files.

My initial approach used a small (10 lines or so) Bash script to traverse the filesystem hierarchy and change the permissions based on whether we were processing a file or a directory. The resulting script ran over more than 3000 files and directories in about a minute and a half. Which wasn’t exactly slow, but I had to run this script about twenty to thirty times throughout the day and it felt very unproductive to have to wait for a whole minute and a half before I could continue my work. So I did something crazy: I rewrote that little Bash program in C.

The resulting C program – of maybe 75 lines of code – finished in 1.3 seconds.

Initially I thought I had made a mistake, so I checked the processed files. They were all exactly as I expected them to be. I was astounded: I could execute this program almost 150 times before my old Bash solution finished even once! This decision made my day – the rest of my afternoon was much more productive. I even felt happier to know I wasn’t wasting so much time on something trivial and secondary to the actual task at hand.

Again, I’m not one to go preaching about how important performance is with respect to any given language – often it’s much, much easier to write a few Python/Ruby/Perl scripts and push data between them with some Bash glue. In this particular case, however, the choice of a lower-level language was clearly a massive win.

Even in retaining the pragmatic perspective, it really makes you wonder just how much time is lost to “inefficient” software stacks…

Categories: C, Linux, Software Development | 4 Comments