Entries from April 2008 ↓
April 27th, 2008 — search
(A story about Captcha, and why they do more bad than good. You may also like Parable of the Nowfollow)
Part 1: Captcha and Spammers
In a time not so long ago, and not so far off existed a country called Theweb. It was a well populated place, populated with upstanding citizens who called themselves Webpage, and lived together in harmony.
For long there was no conflict in Theweb, but where happiness resides can evil be the far behind? The evil S.P.Ammers infiltrated Theweb. They were able to bend a Webpage to do their bidding.
Of course the Webpages were enraged. How dare the evil S.P.Amers do their evil in the fair land of Theweb? So they developed a weapon called Captcha. The S.P.Ammers evil weapons were useless against Captcha, and so Theweb had a period of peace and prosperity.
But of course the S.P.Ammers were not waiting during this time. They had developed huge weapons such as Botnets which broke Captcha, and rendered it ineffective. Chaos reigned in Theweb, while the Webpages rushed to build bigger, stronger and better Captcha. Ah a war it was! The Webpages building bigger and stronger Captchas, the S.P.Ammers building bigger and stronger Botnets to fight them.
And yet, Webpages had a single purpose, a single Raison d’entre. They wanted to serve TheHumans, who seeked information the Webpages had. TheHumans were affected less than S.P.Ammers, but Captchas made it difficult for them to use Webpages as well. As the Captchas got bigger and stronger, it got harder and harder for TheHumans to use Webpages.
Part 2: Sometime later
The Captcha-Botnet arms race heated up. TheHumans caught in this crossfire, had so much trouble talking to Webpage that they stopped using all together.
Part 3: The Moral of the story
As I argued in Parable of the Nofollow, some social problems have no technical solution. Trying to beat automated spam using automated methods is a very slippery slope. There is one area, where spam is even more rampant than comment spams - Email. Just wonder if you needed to crack a captcha, each time you needed to send a mail.
And captchas, have been getting stronger, and more difficult to crack for humans. And yet spammers can crack the hardest captchas with greater than 10% efficiency. How long till the efficiency of humans and bots start to converge?
Of course Spam is a problem, and we need to find a method to combat this menace. Akismet does a very good job of identifying comment spam, and it is free or extremely affordable, depending on your needs. Of course it can have some false positives/false negatives, so human some intervention will be needed.
By now, we have a very good idea how to fight spam unobstusively, using lessons we learnt in fighting email spam. Let us use these techniques to fight comment spam and give captcha a rest.
If you liked this parable, you may like to read, Parable of the nofollow, or subscribe to the feed so you can stay on top of such stories.
April 27th, 2008 — django, interviews
James Bennett is the release manager of Django, and a long time contributor. He works on Ellington, a CMS designed for news organizations. His book, Practical Django Projects, is being published by Apress, and is scheduled to hit bookshelves in June 2008. He graciously agreed to be interviewed at the 42topics.com blog. His blog, The B-List, can be found here.
Shabda: Would you tell something about yourself, how did you get started with Django, and what other OSS projects are you involved with?
James: I got into Django fairly soon after the initial public release; I’d been doing PHP and Perl work (mostly Textpattern on the PHP side, Scoop on the Perl side), and I was working on teaching myself Ruby and Rails because it looked interesting. But I’d always liked Python; it was just that there weren’t a whole lot of good options for Python web development at that point. You could do Zope, or you could do Twisted, but they both had pretty steep learning curves when compared to the type of work I was doing, so it just wasn’t worth it.
Django changed that; I think I did the tutorial the day it was released, and I just fell in love.
These days most of my open-source time is devoted to Django, and to various Django-based apps I’ve written and released. Back in the day I used to do the occasional bit of PHP hacking, and I wrote some plugins for blogging engines, but none of that stuff’s maintained anymore.
Shabda: What are your responsibilities as the release manager of Django? Who are the other core contributors of Django, and what are their current areas of focus?
James: My responsibility is largely bureaucracy. Within the project, I keep an eye on the various branches and their progress, try to stay on top of active/problem areas in the ticket tracker, and maintain a list of things that have to happen before 1.0.
Outside the project itself, I also stay in touch with people who are distributing Django — Linux distros which package it, etc. — and watch their Django-related bug reports. When there’s a security fix, I also send out an advance notice to them that we’re going to roll a release, and do the initial disclosure so they have some lead time to respond to that.
Outside of that, I also contribute to the documentation, as well as the occasional code patch, and I maintain the 0.91-bugfixes branch, which provides legacy support and security updates for some of the really ancient Django installs out there (our policy is to support the current release and two prior releases with security updates, which means 0.91, 0.95 and 0.96 right now).
The rest of the core team is just people everybody knows if they watch the developers’ list or the Trac timeline: Adrian and Jacob are the lead guys, obviously. Then there’s Malcolm, who you interviewed the other day, and who’s carved out a pretty big niche for himself: he did the bulk of the Unicode work and he’s doing queryset-refactor, he does a lot of maintenance on the i18n system, and he somehow still finds time to have a day job. I don’t know how he does that.
Then there’s Russell Keith-Magee, (russelm in Trac), who’s contributed all sorts of useful stuff. He’s the go-to guy for testing and serialization, and just an all-around brilliant guy who fixes things left and right.
Gary Wilson and Joseph Kocherhans are also both big names you’ll recognize from commit messages; Joseph used to work at the next desk over at World Online, but now he’s up in Chicago with Adrian, working on EveryBlock. He’s done a lot of the grunt work on newforms and on laying groundwork for newforms-admin.
Ian Kelly and Matt Boersma keep our Oracle support working, and do an amazing job of helping to track down and solve obscure problems with that.
And rounding out the core commit bits are Simon Willison and Luke Plant, who aren’t as active these days but can still be seen popping up every so often.
Out on the branches, we’ve had a lot of work done on newforms-admin lately by Brian Rosner, who’s been stepping up and helping to really get that sorted out, and there’s been a lot of unsung UI work by Christian Metts, who’s a designer at World Online but also a not-bad JavaScript programmer and can flaunt some Python when he feels like it.
And then there’s the “gis” branch, GeoDjango, which is adding support for GIS — spatial querying — to the Django ORM.
Justin Bronn and Jeremy Dunck really kick-started that, Justin gave a nice demo of some of their work at PyCon this year.
And recently we’ve also been giving commit access to some of the translators, so they don’t have to go through as much bureaucracy to submit updated translation files; that’s sped some things up on the i18n front, because we’re blessed with a large number of people who are willing to pitch in and do that work.
Oh, and I can’t forget Wilson Miner; he doesn’t flaunt it that often, but he’s the guy who originally designed the Django admin, and he’s been known to make the occasional tweak to fix CSS stuff.
(hope I didn’t forget anybody there)
Shabda: Would GeoDjango be sometime merged with the trunk, or is it forever going to be a parallel branch to trunk?
James: The goal is that the GIS branch will merge, sometime after queryset-refactor lands, and it’s going to provide an application — django.contrib.gis — that you can use to enable spatial queries for your models. There’s a pretty good writeup on the wiki page of how that works, and they’ve been tracking queryset-refactor because it helps make some of the custom query construction easier.
I don’t know for certain if it’ll hit trunk before Django 1.0, but it will not be a branch forever; they’ve put in a ton of work on that, and I’m looking forward to getting to use it
Shabda: Once Django hits 1.0, would 0.91 be end of lifed, but support for .96 and .95 continue?
James: Like I said, the policy is we provide security fixes for the current release plus the previous two, so 0.91 would sort of fall off there after Django 1.0. But there are a lot of legacy installs out there that are perfectly happy for now, and I wouldn’t be surprised to see people unofficially continuing to submit patches and keep that maintained for as long as there are people who want to use 0.91.
Shabda: A little about you. You have majored in philosophy. I can think of one other, Paul Graham. Would you say, what you learnt from philosophy help with programming?
James: Well, I wouldn’t say there’s anything specific necessarily. But I think there’s a big place for people with liberal-arts backgrounds to come to programming, and I think philosophy’s a good path to do that.
If you look at a typical philosophy program, you’re doing a lot of logic, a lot of critical analysis, a lot of abstract reasoning.
You have to get comfortable sooner or later with all sorts of formalisms that don’t necessarily have any practical meaning, and that’s very similar in a lot of ways to programming
And when you get right down to it, as programmers, about 90% of what we’re paid to do is think: our job is to take a problem, analyze it, break it down into pieces and solve them.
And that’s not terribly different from what you spend four years doing in a philosophy program.
I’ve actually joked about that a bit with some of my former professors, that I still get to argue as much as when I was doing philosophy, but the programming pays a lot better.
I do think, though, that there’s a big need for that sort of thing; we don’t really teach critical thinking anymore, and while it’s a vital skill to have no matter what you do for a living, it’s absolutely crucial to programming. So if you can get a good liberal-arts background where you’ve been taught how to look at things and pick them apart and analyze them, you can definitely do well as a programmer. Though it’d also be a good idea to take at least a few elective math courses…
Shabda: You have been working on Ellington, how is Ellington CMS different from, say, a customized version of Drupal?
James: Well, they have some things in common: both are modular CMS-style products, both are meant to be extensible.
But Ellington is really targeted from the ground up at news operations. There’s all sorts of specialized stuff in there that’s really optimized for the way a newsroom works, most of it culled from our experience as a newspaper, and of working with other papers.
So where with Drupal you’d really have to do a lot of customization because nobody’s really done this kind of “news all the way through” version of Drupal, with Ellington it’s there out of the box, and you hit the ground running.
I think we also have a very unique position because we are a news company, we’ve got a bunch of newspapers, some periodicals, TV news, etc.
And we work with those folks every day, we see how they do stuff, we hear about it when they run into problems, and so we’re in a good spot to see just what a newsroom staff really needs out of their online platform.
Shabda: Wordpress is PHP’s killer app, arguably Basecamp is ROR’s. What would you say Django’s killer app is?
James: Honestly I don’t know right now; I think the thing that Django wins on is that there doesn’t have to be the One Big App that everybody uses, instead there’s this huge blooming ecosystem of applications.
In a way, it’s like asking what the “killer library” of the Python stdlib is; the killer feature is that you’ve got all that stuff available.
Though there are definitely some cool Django apps out there right now, and a lot more on the way. I think there’s a different mentality, though, because in general Python people seem to keep their heads down and just get stuff done. So it may be you don’t hear about some project until maybe they decide to do an open-space talk at PyCon or OSCON, and then you just get blown away.
There was a guy at PyCon who came up and showed me an app he’s been developing, and I won’t spoil it and give away what it is, but it made my jaw drop.
He’d taken something that’s a really common software niche that’s been dominated by these abominable products because it’s not really a sexy thing to be doing, and just absolutely nailed it. Guy’s probably gonna make millions.
But I don’t know if there is a general-purpose “killer app” for Django right now, and I’m not really sure I want there to be one; people can see that sort of thing and think “oh, that’s all it does”. They look at, say, Ellington, and think “oh, this is only good for a newspaper-style CMS“, or they see Rails and Basecamp and think “oh, this is no good for me, I’m not doing a Web 2.0 thing”. So in a way I’m kind of glad we don’t have a “killer app” hanging over us and pigeonholing Django.
Though I should definitely give a shout out to Review Board; of the public Django apps I’ve seen, it’s probably the coolest, and again takes something that’s not usually sexy in terms of software and really nails it.
Shabda: As you said, “Python people seem to keep their heads down and just get stuff done”, do you think Django needs to a better job marketing itself. For example, I have pitched Django to a fair number of people, and I always have to start with with “Django what?”, as compared to say ROR, or PHP which seems to have a good brand recall.
James: Well, to some extent Django hasn’t done a whole lot of explicit marketing because we’re not at 1.0 yet, and I expect that after 1.0 it’ll both be a lot easier to do marketing and that there will be more of it going on.
But at the same time, Django’s doing pretty well as it is; Google’s doing their App Engine stuff with Django bundled, there are startups using it to build the next big thing, and it’s even been quietly sneaking its way into some huge corporations.
Plus it seems like every time I turn around there’s another book coming out.
I saw one yesterday at the bookstore downtown; I hadn’t heard about it until that moment, but I think that makes five or six books that’ll be out by the end of this year.
So I think that’ll help. Digital Web magazine did a feature article on Django just recently, and I did an article for Sitepoint a while back as well.
I like to view this as the phase where we build up momentum until eventually Django is an unstoppable juggernaut and everybody’s listening to gypsy jazz music ;).
Shabda: Everyone is pretty excited about your coming book. When can we have it in book stores? Would you give a brief overviews of what’s in it?
James: The book will, I think (and hope) be shipping around the end of June.
It’s very much a hands-on introduction to Django: walking through building three applications, picking up progressively more advanced bits of Django as you go, and seeing some best practices in action.
So you start out with just simple stuff, using the contrib apps and learning the basics of getting Django running. Then you do some simple customizations of admin templates, then start building some models and views, then on into building full-on applications.
There’s not room to cover every single thing you can do with Django, but I think it handles a pretty good spectrum of techniques from basics up to some advanced things that let you poke around and really get a feel for how stuff works.
And of course, you get periodic bouts of me up on my soapbox yelling at people about how to write reusable applications, because that’s what I do.
Shabda: You talk a little about App Engine in Batteries sold separately. What effect do you see of App Engine on python hosting ecosystem, on Python web applications, and on Django?
James: I honestly don’t know what to expect from App Engine. It’s a very different kind of thing from what anybody outside Google is used to, and it’ll probably be a while before it’s really shaken out and we get an idea of the impact it’ll have. I fully expect that Google’s going to start supporting other languages, so there won’t be this effect where you’ll always have to use Python to use App Engine. And the way they’ve sandboxed it is going to make it feel weird to Python people, I think. And that’s on top of getting used to the fact that you’re not using an RDBMS; I’ve been watching the blog flamewars about that, and that seems to be the big takeaway.
If I had to make a prediction now, I’d say that App Engine will bring some people to Python, but probably not in hordes, and that its big long-term effect is going to be to point out to a lot of folks that they’re not really using the “R” in “RDBMS“, and so maybe it’s OK to think about their applications in a different way. And that’s not Python-specific at all.
Shabda: What were the focus areas of the newforms-admin branch. How far have they been achieved. About when would the newforms-admin branch be merged? How backwards incompatible is this going to be?
James: Well, there are a couple goals running parallel.
The first thing, obviously, is that with the oldforms package deprecated we’ve got to get stuff migrated to newforms, and the admin’s got to make that jump.
And because the admin does a lot of tricky and complex stuff with forms, it’s also been a fertile proving ground for designing some advanced newforms features that’ll find their way back into trunk and make it easier to do that tricky stuff on your own.
Another big thing is cleaning out the coupling issue where you declare a class inside a model to activate the admin interface; there’s no reason for that to happen, so instead there’s a class you’ll subclass and override stuff on to customize the behavior, and then you’ll register a model to have the admin interface you’ve set up that way.
Finally, there’s been a lot of stuff abstracted in newforms-admin; while I don’t think it’s really an official design goal, there’s been a lot of opportunity to provide hooks where people can go in and customize stuff. I looked at the code a couple months ago, and it was about 95% of the way to being able to run completely without django.contrib.auth, for example, because there are enough hooks where you can override something and replace defaults to make it use something else.
And that’s a huge deal, because it means you could do stuff like run the admin on HTTP auth, or run it in a corporate environment against an LDAP database, and you barely have to touch anything; you can already do an auth backend that handles most of the work, then you tweak a few things for the admin and you’re good to go with no hacking directly on Django code.
Plus, a lot of people will be happy to see that the methods you can go in and override all get the HttpRequest object as an argument, so even though it’s bad workflow a lot of the time you’ll be able to have the admin do stuff like automatically fill in a foreign key with the current user.
And along the way there’ve been huge numbers of bug fixes; there’s a lot of stuff that’s always just been really fragile in the admin, like edit_inline, and newforms-admin is a good opportunity to just yank out all the things that made them troublesome and do it right.
Shabda: Time for a meme. What is the one think you absolutely love about Django, and one thing which Django should have done differently?
James: I love contrib.localflavor, and I want people to go look at it and adore it and use it. There’s so much useful stuff in there, you can validate checksums on Scandinavian social-security numbers, you can get lists of all the state in Brazil, it’s just this great gigantic resource for localizing your apps, and it kicks so much ass.
Something to do differently… I’d probably rewrite the tutorial to not do the apps-inside-projects thing. That trips a lot of people up because they come away thinking they have to do it that way, and so they really miss out on the benefits of being able to write an app once and just use it over and over and over.
Shabda: Before we leave, would you like to give a quick tip, or hard to find information about Django to our readers?
James: The best tip I can give is not to be afraid to look at code. We do our best to document all the useful stuff, but that’s a pretty huge area to cover, so there are always going to be neat little things hidden away that you either won’t see unless you read the code, or won’t see until some date in the far future when somebody gets time to document it.
Plus, if you find something cool you can write up a little tutorial and post it, and then people will start reading your blog. That’s pretty much what happened to me, where my blog was for a long time just me writing up these little articles to remind myself of stuff I’d learned.
And especially if you’re learning Python as you go, reading a significant piece of code like Django can really help to improve your understanding of the language. There’s really nothing better than that for assimilating the way a language works, and I think Django’s around the right size — it’s small enough you can carry a solid understanding of it around in your head, but it’s big enough that there’s lots of stuff, and lots of different kinds of stuff, that you can look at and learn from.
Shabda: Thanks a ton for this interview, and for sharing these useful information.
James: Sure.
This was James Bennett’s interview. I plan to be interviewing a few more leaders from Django community, so if you would like me to ask any question, write them in the comments.
April 23rd, 2008 — django, interviews
Malcolm Tredinnick is a core contributor to Django, and was the driving force behind the Queryset-refactor branch of Django, which adds important capabilities such as model inheritance. He has a long association with OSS, and contributed significantly to GNOME and Django. He graciously agreed to be interviewed at 42topics blog. Malcolm’s blog, Defying Classification, can be read here.
Shabda: Would you tell a few things about yourself, how did you get involved with OSS and Gnome, and with Django?
Malcolm: Here are some recollections I’ve written about: 1 and 2
Short version; started using Linux in university as a poor undergrad; kept using it since then (started back in 1993).
Started using GNOME in 1999 (after trying a very early version of Qt and what became KDE), started contributing to GNOME about 12 months later (mid-2000).
Started using Django in October 2005, I guess (a few months after it was open sourced) and haven’t stopped. Started contributing patches more or less immediately and was given commit privileges about March of 2006, from memory.
Shabda: So what is one thing in Django which you absolutely love, and one thing which you think Django should have done differently?
Malcolm: I guess Django’s separation of responsibilities is something that has always been attractive to me. It’s very easy to keep the cross-application business logic and the persistent storage logic and the state control and the visual presentation separate.
I guess the piece that probably routinely annoys me the most is a slight inconsistency in the template tags: we often mix up how to use a literal string in template tags (sometimes it’s in quotes, sometimes not), which makes it very hard to later make that tag also take a template variable.
As somebody who handles bug reports a lot, something like that ends up taking up a lot of time in my life, which could probably be better spent elsewhere. Still, it’s almost impossible to change now, so that’s the way life goes.
Shabda: Queryset-refactor is almost done, what would were the major overarching goals for this branch?
Malcolm: A number of things…
1. Clean things up internally, so that future extensions are easier. A lot of the existing (trunk) query construction grew from something much smaller. It all mostly worked, but it was getting pretty hard to manage. Some bugs were almost impossible to fix in the trunk form too, which was the real motivation.
2. Organise things so that I could add model inheritance, which was the main thing that started me looking deeply at all that. This has been accomplished.
3. Make it easier to add backends like Oracle and MS SQL Server that don’t support, say, limit and offset in SQL. That meant providing a more object-based query construction so that they can tweak things before creating the SQL string.
In the process, we’ve made it a lot easier for people to extend this for backends and to add functionality to existing backends.
For example, adding new lookup types is now possible (not particularly easy, but possible). And the geo-django branch can do their stuff without having to patch the query construction code any longer.
(Justin Bronn, the geo-django maintainer has been tracking qs-rf very closely, so we know that geo-django works nicely with qs-rf).
I guess they’re the main points. The first one — a better/different internal organisation — was the main one and was really the bulk of the work. Getting everything to mostly work was relatively easy. Getting it all to work perfectly took a lot longer.
Shabda: So would this make, say, writing a backend for non relational DBs like Bigtable easier. Michael Trier was working on a Sqlachemy based ORM for Django, how would this be affected by qs-rf merge?
Malcolm: Well, that’s really two questions. So, one at a time…
Whether it would make a non-relation backend easier,… yes. The QuerySet class passes all the work of talking to the backend, whatever it is, off to a class called Query. Writing a different sort of backend would require writing a new Query class that knew how to take the function calls QuerySet makes and turn them into the right return types.
Mostly, QuerySet does not poke at the internals of Query: it uses “public” methods only. This is deliberate. It means that writing something like a Query class for a different sort of backend (RDF store, or LDAP or whatever) might very well be possible.
As for something like SQLAlchemy, I gather it’s made things easier. I know Michael and Brian have been writing their code against the qs-rf code and it seems to be working.
However, I haven’t looked at their code, so I’m not sure if they’re using the QuerySet/Query split very much or not. I gather they’re doing it slightly differently from how I might have done based on somethings they mentioned when I was talking to them yesterday — apparently a recent qs-rf commit “broke” their approach to something and that was a little surprising to me. But it wasn’t a showstopper apparently, so I’m not too worried.
Shabda: As you said, a commit sort of “broke” their approach. How backwards comaptible is qs-rf. Is it mostly backwards compatible, but some corner cases will break, or would any thing which worked previously, but does not now should be reported as a bug?
Malcolm: It’s almost 100% backwards compatible. The thing I “broke” (and I’m using quotes, because it wasn’t really a breakage, just a change) was a very internal thing. Obviously if you’re writing something like a new storage backend, you need to dive into the internal API of the ORM a bit and that’s what the django-sqlalchemy stuff is doing…
For normal user code, there should be very few changes required. There’s a list on the wiki page for the branch but they are mostly things that won’t affect normal code.
Only if you are doing something very “tricky” or slightly unsupported will changes be required.
Of course, some people will see slightly different results in complex querysets because we have fixed a lot of bugs (over 60 that were reported to Trac plus some that never got that far) and some of those fixes change the results that were returned.
So some people may be relying on currently incorrect behavior, but I doubt that’s going to be too troublesome.
Also, I’ve tried to make the error reporting a lot clearer so that when somebody does try to do something undefined, it should hopefully give an error message (e.g. introducing an infinite ordering loop in models)
Shabda: Speaking of Bigtable, Google’s App Engine seemed to be sort of a letdown from the initial hype. What are your opinions about the effect of App Engine on Django?
Malcolm: Well, I think the letdown came from people letting their expectations get ahead of their brains a bit. Google didn’t really seem to over-hype it or misrepresent it.
Obviously when Google releases something, everybody gets excited. But you have to at least take the time to look at what was released.
For a particular class of applications, I think App Engine is probably ideal. It gives cheap hosting, great reliability and access to some Google’s storage stuff.
For example, if you have something that presents a read-only view onto a large bunch of data, App Engine would be very appropriate. Okay, you have to do data extraction from multiple models manually, since there’s no joining, but that’s not too hard to work around and people will write little helper functions for those cases.
Remember that a lot of very successful, very popular websites are basically read-only: a lot of newspaper sites, things like EveryBlock (and, formerly, Chicago Crime) — all those sorts of sites. Even news.google.com.
There’s a great deal of information that wants to be “presented” to people. Not everything requires form entry and comments and the like and even those are possible on App Engine if you need them.
So I’m kind of glad the initial reaction phase has settled down a bit and people are starting to think about what is possible, rather than focusing on what isn’t possible. Some interesting things will come out of App Engine, I suspect. We just don’t know what they are yet because people are still writing them.
As far as how it affects Django, that’s probably both good and bad.
If people view App Engine as being representative of everything Django, it’s not going to be too handy. However, I doubt that will be the prevailing opinion. People realise that Google have used portions of Django to implement portions of the App Engine experience. And those pieces of Django are quite handy. The URL dispatching, the general data flow, the templates — all will be nice for people to use.
It’s also getting a few people interested in the internals of Django as they try to work out whether they can move the extra bits that are ‘missing’ from App Engine into the fold.
In addition, we’ve also already received some direct positive benefits: some of the Google team, particularly Guido van Rossum, have been filing a few bugs against Django as they did the initial App Engine development. So there are a few bug fixes in things like newforms and the templating component that are a direct result of App Engine development inside Google.
Shabda: There is app-engine-django, which is trying to bridge the impedance mismatch between Django ORM, and App Engine ORM. Is there any effort to bridge this from Django side by making Django ORM play nice with Appengine. If such a request comes in what would Django’s response be?
Malcolm: shrug. I have no idea.
I think this is an ideal opportunity for some people to write stuff like django-appengine and work out what might be needed and come up with a concrete list of things that might be needed. Predicting the future is hard enough to be mostly futile and it’s not really worth speculating.
Right now, the core Django developers are fully focused on getting Django 1.0 released. That’s where our time is going. How other people spend their time is up to them and it’s definitely not for me to judge whether it’s worthwhile or not.
People need to experiment. It’s the way new software products are developed. So anybody who’s interested and working on this should be encouraged. That’s how open source development has always worked and I think it works quite well.
I guess I could hazard a guess that any intrusive changes just to support some App Engine style thing probably aren’t interesting prior to Django 1.0, since they’re not necessary to get that out of the door. But that’s as much of a prediction as I’m willing to make and anybody who actually attaches any value to that prediction is probably being foolish. I’m wrong more often than I’m right.
Shabda: Well this gets asked a lot on the mailing lists, qs-rf is almost done, Newforms admin should be done soon, so when can we expect Django 1.0, and what new goodies will this bring?
Malcolm: heh
Well, the real answer is “as soon as possible”.
Notice that this is a big step up from “when it’s ready”, which is the normal answer here. Hopefully people appreciate that the maintainers want to get 1.0 out the door as much as anybody else. It’s just that release 1.0 when it’s not at a point we’re happy with won’t be useful in the future.
More concretely, we have a few major things to do between now and 1.0 and they’re all getting pretty close.
Then there’s a period of stabilizing and random bug squashing and triage to make sure we haven’t missed anything big.
The idea (and driving goal) is that whatever we release as 1.0 should remain backwards-compatible on an API level until, say, 2.0. So we need to get all the big infrastructure changes in before 1.0.
That isn’t the same as saying 1.0 will be bug free or any pipe dream like that, because it won’t be. But fixing bugs is an ongoing job and as long as we can feel comfortable that we can fix existing bugs without having to make major breakages, we can feel happy about 1.0.
Shabda: This has happened to me a few times, I am trying to pitch Django to a client, and then I mention Django has not hit 1.0 and we would be developing their site against a subversion checkout, and boom, I am against a brick wall. I am guessing this happens to a lot of people. Any suggestions on overcoming client’s reluctance on this point.
Malcolm: Well, this is the hard problem, of course. There are at least three issues here.
Firstly, it’s well known that in a lot of corporate situations, they only want to go with “released” versions, whatever that means. The fact that released versions are often a disaster (e.g. Microsoft Vista) seems to be forgotten, but that’s the way things go. However, it shouldn’t be the ultimate driving force in this equation. It’s one consideration only and having the software released faster just to meet some immovable, often-times unsupportable policy isn’t the answer.
The second aspect is to realise that there are a lot of successful sites built on Django 0.96 and even Django 0.91. So it’s not impossible to stick with the last release.
Now, of course ,that tends to show up the problem with the first point, since 0.96 has some bugs that have subsequently been fixed, so sticking to a release handicaps you there.
However, if we did more frequent releases, that adds a lot of burden to the release process. For example, distributions (Debian, Suse, Fedora, etc), now have a bigger problem in choosing the right version to ship since they also want some stability. Security releases either become harder (we have to patch more versions) or else we have to “end of life” earlier releases more frequently, which harms those people using those releases (again, it reduces stability for the userbase).
The third point is somewhat related to that: in order to make sure that a release can be relied upon, we sometimes need to make sure that we do all the necessary preliminary work so that we don’t immediately break any code that relies on this new release.
That is exactly the situation we’re in now. We release 0.96 as a sort of “stability point” for people to rely on. Almost immediately, we then started to introduce a bunch of necessary changes to things that will require code to be ported when people upgrade. When this round of churning has finished, we’ll make the next release. That happens to be 1.0 in this case.
If we released, say 0.97 at some point in the past few months, people would have a similar but slightly different sort of marketing problem: now they have 0.96 code that has to be ported if they want to move to 0.97. But it’s also just “more work” on some level, since they’ll have to do even more porting when 1.0 comes out. So we aren’t helping developers by providing them with a lot of stability.
It’s fairly well understood by most experienced developers that this is a tough path to walk. With open source software it’s even harder, since everybody is a volunteer and we have a largely unknown userbase to satisfy.
Sometimes that difference is forgotten by people who argue that “no company ever works like this”. A corporation can set release deadlines and in exchange for everybody having to work to those deadlines, those people get paid. It’s called a salary. Open Source software doesn’t work like that.
I’m sure that once Django reaches 1.0 or perhaps a little later, we’ll slip into time-based releases so that things go a bit more smoothly. At the moment that simply isn’t practical because of all the changes we need to make.
Every project goes through that phase. Before Linux had time-based releases, they had to get an amount of features in and a certain stability level reached. Ditto for GNOME (which only started doing time-based releases at 2.0 and there was a long gap between 1.4 and 2.0 precisely because we needed to be able to guarantee that at 2.0). Ditto for KDE and Ubuntu and …
So, yeah ,it’s slightly tough times for everybody at the moment. Partly Django is a victim of its own success here: everybody wants to use it and we’re trying to keep up.
Hopefully people realise that ultimately that’s a good thing. It’s an investment in the future in the sense that we’ll still be here in a year, two years, five years… because of this great use.
Shabda: What are the future plans for Django. Django has most of the things which I want in a framework, (import soul), and we would hate to have Django suffer from featuritis. After 1.0 what are the areas Django wants to tackle?
Malcolm: Again, this is asking for a predication and I don’t do those.
Partly because I don’t know and am not really in a position to say, in any case. (I’m just a contributor). Partly because having too many goals too far ahead of time is possibly going to restrict people.
Django runs on its contributors. A lot of the ideas that are implemented are initially suggested and often developed in quite some detail by people other than the core developers. It might not always seem that way, because you sometimes have to see the common thread in a bunch of requests before you notice the lurking feature requirement ,but it’s true.
So what happens post 1.0 is going to depend a lot on how people use Django, on what people offer in the way of code and, particularly, what good ideas are suggested.
Who could have predicted Google AppEngine when Django 0.96 was released? What other things like that will appear as a result of Django 1.0? Who knows?
Of course, there are some things that we could probably say will be worked on (successfully or not is another question entirely). Multi-database support seems to be popular and one day somebody might do it; maybe there’ll be more front-end developments — e.g. somebody like you who keeps wanting more Ajax support will actually develop something that provides this support everybody seems to want but hasn’t ever specified.
We can sort of see a few things coming up again and again, so I guess they’ll be worked on. But it’s really going to be up to the people who write the code, who come up with the ideas, who write the websites that use Django, who try to teach Django to others. And right now, I have no idea what direction that will go in.
Shabda: Before we close, would you like to share a handy tip which you use a lot, but does not get used so much otherwise?
Malcolm: People possibly overlook both the {% debug %} template tag (best to use it as <pre>{% debug %}</pre> in a template) and the debug context processor django.core.context_processors.media.
Both of those are very useful for trying to see what’s going on when you’re passing information between a view function and a template.
Shabda: Thanks Malcolm. It was great talking to you.
I plan to interview Leaders from Django community, in coming few weeks, so if you would like me to ask any specific questions, put them in comments, and I will ask them when I interview.
April 22nd, 2008 — google, search
Part 1: Webpage and the Umans
In a time not so long ago, and not so far off existed a country called Theweb. It was a well populated place, popluated with upstanding citizens who called themselves Webpage, and lived together in harmony.
Each Webpage knew some other web pages in Theweb, and though well of them. You could ask a Webpage if they knew another Webpage, and they always replied truthfully.
Next to Theweb existed another country called Realworld. Its citizens, called Umans started to trade with Theweb, and Webpages. When Umans wanted any information, they traded with Webpage which had the information. This trade of information for time worked well, but for one problem.
For Umans finding the Webpage they wanted to Trade with was hard, as there were too many Webpages. Asking each page, if they had the information the Uman seeked, or knew a Webpage who did, was difficult for the Umans.
Meanwhile some Umans had created a Deus ex machina to solve this problem. Called the G.O.Gle, it had knowledge of each Webpage from Theweb. Umans could ask the G.O.Gle, which Webpage had the information they seeked, and the G.O.Gle found the correct answers.
Yet the G.O.Gle needed to find which Webpage was more important. For this it devised as system, where it asked each Webpage about the other Webpages they knew about. The Webpage which was known by many Webpages was deemed to be important, and the G.O.Gle asked more Umans to trade with that Webpage.
This system seemed to work well, with trade between Umans and Webpages flowing freely. Yet when trade starts, can corruption be far behind? Unlike Theweb, Realworld had evil citizens as well. Called S.P.Ammers, they wanted to divert the trade to the Webpages they wanted. S.P.Ammers could hypnotise a Webpage into telling the G.O.Gle about pages they did not know about.
This baffled the G.O.Gle. Earlier each Webpage told about the other Webpages it knew about truthfully. So using popularity of the Webpage, it could find which Webpage was important. Now it could not be sure if the Webpage was telling the truth, or it has been hypnotized by any S.P.Ammer. The G.O.Gle was stumped, its results started getting flawed, the trade between Webpage and Umans was disrupted.
Part 2: Revenge of the G.O.Gle
After trying to find a solution to this problem for a long time, the G.O.Gle had a brainwave. “Why not solve this problem at the source. Let the Webpages deal with this problem.” So G.O.Gle, who now had a huge say in the trade, asked Webpages to deal with other Webpages in a different way. Earlier, when asked if a Webpage knew another Webpage, they could have said “Yes, I know her” or “No, I do not know her”. Now when a Webpage knew another Webpage, but was not sure if it was work of S.P.Ammer, it could also say “Yes, I know her. NOFOLLOW.”.
Everyone was amazed by the great wisdom shown by the great wisdom shown by the G.O.Gle, and praised this solution.
Part 3: Collateral damage
A few years had passed, and the work of S.P.Ammers was still relentless. They were still busy hypnotizing Webpage, to suit the trade to their liking. What everyone seemed to have forgotten in the affair of the Nofollow, was that there were other methods of trade apart from going via G.O.Gle. And for S.P.Ammers hypnotizing the Webpages did not take any work. Given enough volumes, they could still mould the Trade to their liking.
And also by now Webpages had become very wary of other Webpages. If they were not absolutely sure about the other Webpage, they were telling, “Yes, I know her. NOFOLLOW.”.
The whole Theweb was build on the premise of one Webpage knowing many other Webpages. They even had a name for this relationship, Hyperlinks.
Influencial citizens like Ms. W.I.Kipedia, refused to know any one. Even if she knew a Webpage, and being such an upstanding citizen, she knew all the nice people, she told “Yes, I know her. NOFOLLOW.”.
For the G.O.Gle also this proved to be can of worms. The basic information using which they were able to find which Webpages were important was skewed, and so the results with G.O.Gle were skewed as well.
Part 4: The Good news
Unfortunately there is no good news. The few years have passed, and S.P.Ammers have been winning. More people are falling back behind the wall of blanket Nofollow, without regards for whether this will deter spammers. There is less data avaialable for the G.O.Gle to find important Webpages, which maens that unimportant ages are considered important. Trade between Theweb, and Realworld is happening, but spammers can turn the flow of trade to their will.
Moral of this Parable
There is no parable without morals, is there? Nofollow, of course, have not been as successful as first promised. Sites with clean link sources like Wikipedia label all external links as nofollow. This means that useful webpages, which Wikipedia links to(and hence are important and useful), are ranked lower than a page which a spammer creates, and which are linked from a link farm.
So what is the solution? As argued in The Tragedy of the Commons, some social problems have no purely technical solution. This problem falls in one of those problems. A much better solution is to use something like Akismet API to check if a link looks like spam, and let user handle the few false negative which happen.
April 22nd, 2008 — personal, rambling, startup
In the glorious tradition of the internet, where we generalize from way too little data, I am going to tell you exactly why people start startups. Of course, I have no experience, on why people make this trade of security for adventure, apart from my own, and a few other very early stage startups. So yes, this is a personal story. This is why I, and everyone I know started a startup.
“If you want to build a ship, don’t drum up the men to gather wood, divide the work and give orders. Instead, teach them to yearn for the vast and endless sea.”
Paul Graham’s influential essay, Why to not not start a startup, tells why starting a start up is a decision which is a rational, and a logically sound decision. And of course, Marc Anderson tells the reasons for not doing a startup. As does Matt Marron.
Today is exactly three months, since I left my job. When I started, I did not know what I wanted to build. After three months , finding a co-founder, losing a co-founder, talking to a number of very early stage startups, I can tell why every one who starts a startup does so for exactly one reason. Without further ado that reason is,
You would rather be doing this than any other thing, and unless you do this you will be depressed, sad, possibly suicidal until you do this.
Everyone tries to minimize the risks and uncertainties in their life. We would rather take a 90% chance of having a million dollar, compared to to say 2% chance of 100 million dollars, even though the probability adjusted value of 2% * 100mil is larger. So even if logical analysis of probabilities tells you that a startup is a good decision, unless you can ask “Would I be rather doing this, no matter what the odds”, and answer YES, chances are you are not going to start a startup.
The reasons for “Would I be rather doing this, no matter what the odds”, are different for everyone. Might be you want to see a change in the world, and you do not see any one else doing this. Might be you consider yourself the most persistent, the most hardest working, or the smartest person, and wonder if Bill Gates can make 53 Billion dollars, why can’t you. If you “would rather be doing this, no matter the odds”, you are going to start a startup someday, whether you want to or not. No amount of persuasion, or dissuading can change that. You can run but you can not hide.
Why I would rather be doing this ….
India has a few great Software services companies, but not one great Software Product company. I want to start a world class product company in India. This is a small start, but then “The First 20 Million Dollars Is Always the Hardest“.
April 21st, 2008 — google, marketing
Widely respected for their engineering talent, and amazing hacker culture, Google is never credited for the amazing marketing they do. When we look for the companies which have done a great job at marketing/branding, we think of Guiness, Apple, or Ikea. Today I want to talk about the marketing lessons we can learn from Google.
It is all about the ROI.
For Barcamp Hyderbad, Google let us use its offices. They generally let you use their premises for hacker events. Let us calculate the ROI for Google for allowing this event.
This event was on Saturday, so cost of distraction to normal operations was minimal. There is some cost incurred in lighting and electricity, and Google’s famed free snacks. For this the 300 odd hackers, in this event, get to see Google’s campus/infrastructure/people first hand. In many companies in India, the cost to get an employee is in the ballpark of INR 50000. If even ten people from this group decide to sometime join Google, the ROI for Google is made up, many, many times over.
This is such a no-brainer, I always wonder why every company does not go all out to ask people to organize such events in their campuses.
Owning a word in prospect’s mind is the best way you can market your self.
“The most powerful concept in marketing is owning a word in the prospect’s mind” - So says Al Ries.
Google does not yet own own the word search, at least not yet. But they own the word, PAGERANK, in the prospect’s(webmaster’s) mind. Now pagerank is a grossly overhyped concept, and is just one of the 200+ factors even Google considers when ranking a page. Yet as this is simple and easy to understand concept, countless webmasters have spent countless hours agonizing over their pagerank. Using pagerank, Google is able to push their toolbar(people install toolbar so that they can see the page rank of each web page), which helps them get more data, and to further refine their search.
You should always be pushing your products.
In his book The High Performance Enterpreneur, Subroto Bagchi tells the story when he was in a high profile keynote address by Bill Gates. Subroto Bagchi was sitting expecting Bill Gates to tell the directions IT was going to take, and give a speech befitting a visionary, and instead BG gave a speech telling how great Windows XP was, and a sales pitch about WinXP. “If you are not proud of your product, and sell it at every opportunity, who will” - infers SB.
Google mixes new products it launches, or wants to give a boost, very strongly in its organic search results. A search for video shows embedded Youtube links in organic search results. Most searches link to Google-groups, and Google videos. For a long time Google pushed Booksearch with web searches. Not that it is bad, but you must always be pushing your product, nepotistically.
If you can’t be first in a category, set up a new category you can be first in.
This is again from The 22 Immutable Laws of Marketing. Google can not beat Microsoft in desktop office suites, so they start a web based office suite. This seems so obvious, and yet Microsoft and Yahoo are trying to better search by playing by Google’s rules. They might create better algorithms(and I profesize that Yahoo’s algorithm is at least as good as Google’s), but more data beats better algorithms, and Google has way more data than Yahoo or Microsoft have. Social search has so much potential, and yet we see half baked products from Wikia?
You can get away with murder, you just need to position yourself right.
What would you call a software which logged every search you did, sites you visited, times you stayed on them? If not from Google, spyware, if from Google, toolbar.
Of course, you can classify this as an example of permission marketing, by showing you the page rank of all sites, easy access to Google sites, they get permission to track you, and data mine you.
April 18th, 2008 — Uncategorized
I posted five things I hate about Django, so as a penance, I will of course have to tell the “Five things I love about Django”.
The Admin interface rocks:
I have demoed Django to a fair number of People, and when you write a few lines in models.py, and then show the auto generated Admin interface, this is a jaw-dropping moment. Happened with me every time I introduced Django to someone. In Barcamp Hyderabad 05, people could not believe this was so easy, and asked if there was some more code behind this.
Of course Admin is much more useful than as a show off tool. When I am developing a new website I write the views to query the DB first, at that time the Admin is indispensable. And in many cases, just writing the Models, and tweaking the Admin is enough for me.
Documentation is comprehensive, available, and always maintained:
When I was looking around to learn a python framework, after search I has to choose between Django and Turbogears. Now after having used Django a lot, and Turbogears for comparison, I believe Django to be better. But this decision I could not have taken when I was just starting out. I choose Django because it seemed better documented, with tutorials more freely available.
Having the whole framework documented, from the most basic, to the more esoteric is huge time saver.
The community is extremely helpful:
Whenever I have asked, a question in django-users or on IRC#Django I always get helpful responses, and a lot of help. Few communities have this culture where, people with so much experience are willing to help out people who are just starting out.
There is a reusable app for almost everything.
Need to handle hierarchical data as relational data, django-mptt to the rescue. Need to handle voting, tagging, or registration. Well we got you covered.
Easy things are easy, and hard things are possible:
You just want to do as SELECT * FROM ... WHERE ..., just use Model.objects.filter. Ok, so your queries span a number of relationships, and you want to reduce the number of queries, use select_related. Or might be you want to model a relationship, which Django ORM can not, might be a little sql in .extra will help you? Or might be you have a lot of GROUP BY or UNION ALL, drop to raw sql in connection.cursor. Depending upon your requirement, you moved from the trivially easy to the somewhat hard.
Similarly with templates, you just want to substitute some variables. Use {% for ... %} and {{ ... }}. Or you want to do something difficult/custom in template, (and are sure that template is the right place to do this), just write a template tag.
Bonus
The development web server: When you are developing, not worrying about Apache, and any configuration, is a huge help. And the fact that pdb plays nice with web server is awesome, (compared to GAE servers which give a BdbQuit exception).
April 18th, 2008 — django, python
The five things I hate about * meme seems have died down, and memes, should not be allowed to die.
Of course I love Django, and have bet very heavily on it. But we do not know a topic, until we know it warts, so here you go. The listing is in no particular order, so sorry no numbering.
Ajax with Django is hard:
Most of the Django Community has decided that bundling Javascript helpers with a python framework is bad idea. Though I can understand the reasoning, (argued here and rebuttal), that Javascript is so basic that you can not be expected to not know it, I can not agree with it. SQL is as basic as Javascript, and yet we have ORM for abstracting away the common and the tedious.
Of Course, with simplejson, and a good Javascript library, you can build Ajax apps fast and with only a minimal amout of fuss. And yet switching between Python and Javascript, twice every hour is a huge time drain. Eg. I put commas after the last element in Python arrays, with JS this would work in FF, but fail with weird errors in IE.
If you get the same row from the DB twice using Model.objects.get, you will get two different objects. Apart from the performance problems of two DB queries, when only one should have done, when you update one of them, the other does not get updated, and you will have interesting things happening in your application. And if you update both of them, you might write two inconsistent changes to the DB.
Look at this code for example.
See this code
In [2]: from django.contrib.auth.models import User
In [3]: usr1 = User.objects.create_user(‘ram’, ‘demo@demo.com’, ‘demo’)
In [4]: usr2 = User.objects.get(username=‘ram’)
In [5]: usr3 = User.objects.get(username=‘ram’)
In [6]: user2 == user3
—————————————————————————
NameError Traceback (most recent call last)
…
In [7]: usr2 == usr3
Out[7]: True
In [8]: usr3.username = ‘not_ram’
In [9]: usr3.save()
In [10]: usr2.username
Out[10]: u‘ram’
In [11]: us3.username
—————————————————————————
NameError Traceback (most recent call last)
…
In [12]: usr3.username
Out[12]: ‘not_ram’
In [13]: usr2 == usr3
Out[13]: True
Whether Sessions are browser length/persistent are set sitewide:
You can set whether you want sessions to be browser length/persistent using SESSION_EXPIRE_AT_BROWSER_CLOSE in settings.py. But you can not set them per user, without mucking with django internal. This might seem a minor annoyance, yet this is something which you need to do for every app, as the remember me, function will not work without this.
Newforms is very limited:
Let us say you want the Form to contain a varible number of fields. How can you define the NewForms class to do your biddings.
from django import newforms as forms
class MyForm(forms.Form):
foo = froms.CharField()
bar = froms.CharField()
This can only create a form with a fixed number of fields. While there are ways to generate forms with variable number of fields, (generate the Form class programatically), they are not easy or well documented. (Remind me to write such tutorial sometime.)
Bonus question: How can you generate a form with same form elements multiple (and variable number) times, ala what happens with edit_inline?
Settings mixes application configuration which should be public and passwords, which should be private:
If I am distributing an app MIDDLEWARE_CLASSES is something which I would assume users would not (generally) modify. Similarly, in most of the cases, INSTALLED_APPS, would also be something which users would not change, (unless you are distributing standalone_apps). This means, I want to source control settings.py. But settings.py also contain my DB setiings, and SECRET_KEY, which means, I cannot source control settings.py.
And while we are at it, can we refactor settings.py, so it works without
os.environ[‘DJANGO_SETTINGS_MODULE’] = ’settings’
Bonus:
Two things which used to bug me but no more.
1. You cannot extend Models - Well now you can if you use queryset-refactor, or soon can if you are on trunc.
2. Url configuration using regexes. - Now they have two problems. joke notwithstanding, mapping URLs to views is one problem where regexes fit the problem beautifully. With less that 50 lines of code, you can manage a large number of views, and Url patterns.
Now stay tuned for Five things I love about Django
April 17th, 2008 — appengine, django, python
I wrote a new tutorial on building a search engine using Appengine, and Yahoo Search API here. This uses pure Appengine API, and not Django, and is a tutorial on how to use Appengine without Django.
April 12th, 2008 — python, satire, startup
With launch of Google Appengine, there has never been a better time to start a startup. Let not the lack of a business plan or a pitch hold you back. Go to our web 2.0 startup pitch generator, and get your own, custom, startup pitch. Hurry only 24192 available.
The original source for this was written by Nathan and was in Perl. Of course we needed a web2.0 logo for such a marvelous piece of code. This comes from web2.0 logo generator.
The source for this is available here