Entries from May 2008 ↓

An Interview with Jacob Kaplan-Moss - Creator of Django

Jacob Kaplan-Moss Jacob Kaplan-Moss is the co-creator of Django along with Adrian Holovaty, as well as the author of the Django Book. He has been involved with Django since before it was called Django. He is currently employed at Whiskey Media where his job is hacking at Django. He blogs on Jacobian.org. He graciously agreed to be interviewed at the 42topics blog.


Shabda: Would you tell a little about yourself? How did you get started with Django? What other software/applications have you worked with. (Both OSS and otherwise)?

Jacob: So a bit about me: I grew up in Silicon Valley, so like many other geeks I got started with computers really early; I was programming professionally before I graduated high school. I didn’t start out doing web development; in college I worked on video surveillance systems for airports, harbors, marinas, and highways. That’s where I found Python: I rewrote a Java-based camera controller in Python and haven’t looked back since. I started doing web development pretty seriously when I moved to New York and took a job for a design firm there. The job was pretty terrible, and the technologies were worse: the good sites were PHP, and there was a bunch of WebLogic crap that was absolute hell to maintain.
So in 2004 I saw the job opening that Simon and Adrian posted on their blogs — and jumped on it. Web development in Python seemed like a dream job, and it really was. So that’s how I ended up in Kansas working for the local paper. At that point I suppose I was using Django, though we didn’t have a name for it yet: we just called it “The CMS”.
I’d been getting more into Open Source all along, though I’d barely contributed (I think I got a single line into Python at one point years ago — that was my largest contribution).

Shabda: What areas of Django have you worked most in? What are you current areas of focus?

Jacob: Well, at this point I’ve touched pretty much every part of Django at some point weather in the form of improvements, refactoring, or just small patches and bug fixes. I’m far from an expert in any particular area, though, so I’d say my main role is more holistic: I’m most concerned with making sure that Django “feels” correct and that APIs and conventions match across the framework.
Right now my current focus is documentation: I’ve been working on improving the structure and organization of the documentation so that it’s easier to find what you’re looking for. We’ve got something like 40,000 lines of documentation so organization and metadata is critical.
Now that I’m lucky enough to get to work on Django at work, I’ll probably be able to take on some more big tasks like that in the future.

Shabda: One of the most loved things about Django is its comprehensive documentation. What is the motivation behind refactoring this? What is going to be the new organization? How is this progressing?

Jacob: Thanks — I’ve always thought that Django’s documentation set us apart from most Open Source projects, and I’ve been really proud of what we’ve done there. However, over the past year or so as the size of the documentation (and Django itself) grew I think we’ve slipped from “outstanding” to merely “above average.” To us (the core maintainers and I) that’s not acceptable.
The best place to read up on the project and goals is in a post I made to Django-dev a couple of months ago. In a nutshell, I’m breaking up the documentation into smaller, more manageable chunks — the current DB API is almost sixty printed pages! I’m also separating more high-level how-to’s and topical guides from the detailed API references that get in the way when you’re just learning.
There’s some tool improvements going on under the hood that’ll make the documentation easier to write, edit, maintain, and contribute to; hopefully that’ll help decrease the number of undocumented features.
The work’s pretty close to done, actually. I was trying to finish before I went on vacation a month ago, but didn’t quite get there. So that means I need to roll in a month’s worth of documentation improvements that happened to the current docs while I was gone, but that shouldn’t take much time. You can expect to see the new docs rolled out online very soon.

Shabda: What does Whiskey Media, the startup you are currently associated with do? Is your role there developing Django, or are you associated with other day to day activities as well?

Jacob: I think I’ll have to be a bit coy and not say very much about Whiskey Media; we’d rather let our actions speak for themselves than try to build some sort of artificial hype. I’ll just say that we’re trying to solve some of the big problems in web publishing; you can probably see why Django’s the best technical bet for a company trying to be the on the cutting edge of content publishing.
Most of my time at Whiskey is devoted to working on Django — coding, but also community management, evangelism, and organization. I also have some internal responsibilities, of course, but those are more nebulous: mostly I just help out wherever I’m needed.

Shabda: As you said “we’d rather let our actions speak for themselves than try to build some sort of artificial hype”. Do you see Django taking a potential hit in marketing due to similar belief of Django community? Does Django need to market itself differently than it has been doing till now? What can Django community do for this?

Jacob: Huh, that’s a tricky one. Well, let me break down the question a bit and look at a few different angles.
So first, I’ve always been a bit uncomfortable with the idea — pervasive in Open Source — that this is some sort of popularity contest. People are always comparing traffic figures, or numbers of job postings, or numbers of Google results, or whatever. I think the idea’s supposed to be that if Project Foo has more users than Project Bar that Foo somehow “wins”. But I don’t see it that way at all.
I mean, I moved from New York City to Lawrence, KS. New York City has a population of, what, 8 million or so? Lawrence has a population of around 80,000. Does that mean New York City is “better”? To most of those 8 million, sure… but to me? Hell no.
As in most areas, there’s no accounting for taste: one of my good friends here in Lawrence is a die-hard Perl fan, and though it makes him a bit twisted it doesn’t mean that I’m somehow “better” than him — we just have different needs and different ways of thinking.
All that said, though, there is undeniably a value in popularity.
Only a tiny number of people who use a given piece of Open Source software become involved in the community, and only a tiny fraction of those contribute back to the project, and only a tiny fraction of those contributors become long-term committers. So more users translates directly to more contributors, and more contributors brings more “value” to the project.
(By “value” I’m referring to code, good ideas, etc.)
So as you can probably tell this is something we struggle with. On one hand I think Django’s great, and it’s in the project’s best interest to persuade others to give it a shot. But on the other hand there’s a huge amount of bullshitting that goes on where software is concerned, and I try my best to not add to the pile.
It’s hard to tell when you’ve crossed the line from evangelism to hype, and hype can be toxic. Witness the recent backlash against Ruby on Rails: do you think they’d get such violent vitriol if they’d been more modest in their promotion?

Shabda: Talking of ROR, a lot of people these days have to evaluate between Django and ROR. What questions should they ask themselves to answer this question? (Well apart from “do you know Python better or Ruby better?”). To make this more interesting when would YOU choose ROR over Django, if you knew both Ruby and Python equally well?

Jacob: So first let me back up a step — I’ll answer directly in a moment — and make a quick point about web development in general. The past couple-three years have seen a radical improvement in the quality of tools available to web developers. It really wasn’t that long ago that CGI represented the state-of-the-art in web development.
Today, though, there’s really some fantastic tools available. Rails, Django, Symfony, Pylons, TurboGears, Seaside… any of these tools represent a major improvement over the CGI/PHP model of web development. If you’re still writing web sites the way you did five years ago, you’re missing out.
When it comes to which of these tools to choose, of course, we’re back to that question of taste. Each tool comes from a different world, and has a different “attitude” towards web development. The cool part, though, is that most new web development tools emphasize a quick start as a key feature: you could probably evaluate a dozen web development frameworks in just a couple of days. So my best advice is to to try a few and see what “clicks”.
It’s probably important to also pay attention to the language that the framework uses. One of the real pleasures of writing Django apps is that you get to take advantage of the awesome Python community. I’ve got a set of libraries in Python that I can’t imagine developing web apps without; many of those libraries don’t have analogues in any other environment.
So if I knew Python and Ruby equally well — I don’t — I’d still probably lean towards Python, and towards Django. However, there’s one place I can think of where Rails is far superior: Rails runs on the JVM. This is a big deal: there’s any number of large corporate environments where the JVM is the only game in town. And obviously if I had to choose between Java and Ruby I’d choose Ruby!
I’ll mention, though, that Jython (Python on the JVM) is improving by leaps and bounds, and that getting Django working perfectly on the JVM is one of the Google Summer of Code projects the Jython team is sponsoring.

Shabda: Coming back to the marketing question.. For most of non hackers choosing the framework to use is a big question, and the decision they will make depend on various non-tech factors, such as availability of capable skilled people. You mentioned that you would choose Django over Rails due to the awesome Python libraries. For the long term survival and growth of Django, do you not think that an early capture of mindshare in developer community is important? For example I have pitched Django to a fair number of people, and I always have to start with “Django, what?” as compared to “Yeah, we are evaluating Rails for our requirements.”

Jacob: Sure, there’s definitely a value of having Django be familiar to your friendly local Pointy Haired Boss, so in that context a certain amount of advertising is important. There’s nothing worse than being forced by management to use the wrong tool for the job. Keep in mind, though, that this cuts both ways: I’d feel pretty unhappy if there was a team using Django because someone read about it in CTO Monthly. I’d agree that we could be doing more in terms of “brand awareness” or something, but all in all I’m pretty happy with the size and quality of Django’s community.

Shabda: There is not much information about the early days of Django, so a little about that. How did the move at WorldOnline from PHP to Python happen? Why did you create a new framework, instead of reusing something like Zope?

Jacob: I wasn’t yet at World Online when we moved from PHP to Python, but from what I understand it was a pretty typical change. Adrian and Simon got fed up with the pain and suffering wrought by PHP, and wanted something cleaner and — most importantly — something that would be easy to maintain. Python really shines here.
The main reason we ended up building our own framework was that we didn’t know we were building a framework. We just wanted to “build cool shit” and, over time, we built tools to help us do that. It wasn’t until we started showing it off that we realized we had something that could be used by other people.
This, by the way, is one of the reasons I think Django turned out so great. If you sit down one day and say, “I’m going to develop a framework!” you’re almost certainly going to become an Architecture Astronaut, and if you ever actually finish the thing’ll be so over designed nobody will want to use it. If, on the other hand, you simply try to solve real-world problems in a clean, obvious way, you’ll eventually end up with a great tool.
Look at Rails, TurboGears, even PHP; they all started as simple libraries written by frustrated programmers just trying to get the job done.

Shabda: What are the overarching goals of Django? Again to make this more interesting, Here is a quote from you. “My work as a core developer of Django focuses on giving anyone — even (especially) non-programmers — the tools to create dynamic, content-driven websites.” Should not that be the job of something like Wordpress, while Django should aim to give “programmers, but not non-programmers — the tools to create dynamic, content-driven websites.”

Jacob: Wordpress is great if you want to publish a blog, but what if you want a website to track a book collection, or sell tickets to concerts, or organize a local farmer’s market? There’s a great deal that Wordpress (and other single-purpose tools) can do, but the amazing thing about the web is just how wide-open the possibilities are. There’s a whole world of possibilities out there, and the end-goal is to help anyone self-publish anything they can dream up.
One of the most fascinating aspects of the history of communication is how intertwined literacy is with social controls. For most of the history of humankind, literacy was strictly available to the elite. The Web created the greatest democratization of publishing ability in history, and has almost immediately turned into a battleground between traditional, centralized publishing and decentralized democratic publishing.
As a programmer, we don’t personally play much of a role in this sea change, but I do see the lowering of the barrier to self-publishing as something we ought to continuously think about.

Shabda: If Django’s goal is “and the end-goal is to help anyone self-publish “, does not that mean Django is trying to fill the niche of an Extensible CMS like Drupal, as compared to filling the gap left by PHP? Are non programmers really using Django, or are they using apps built with Django?

Jacob: No, you’re right that Django’s lower-level than something like Drupal. Django’s not trying to be a CMS but to be a tool you could use to build your own CMS. It’s easier to design your own content models than to shoehorn your publishing into a CMS limited by the ideas of its developers.
There are indeed quite a few non-programmers using Django, though in the long run I think the interesting trend is that more and more computer users are learning a little bit of programming — enough to develop a site with Django, say.

Shabda: (Since I ask this to everyone!) What is one thing about Django which you absolutely love, and one thing which you think Django should do differently?

Jacob: I only get to choose one thing I love?
I think my favorite bit of Django is the URLconf system. I love that Django forces me to think about URL design as part of my application instead of some byproduct; I’ve always hated web tools that try to someone pretend that you’re writing a desktop app. I’ll admit to obsession over my URL design from time-to-time, but I really enjoy clean, semantic URLs.
As for something we should do differently: the assumption that you’ll only use a single database is a bad one, and needs to be fixed. Unfortunately, it’s an assumption that we made really early on, which means that fixing it is going to be tricky. There’s some smart people working on it right now so I’ve got high hopes, but I wish we’d not made that assumption to begin with.

Shabda: Django started as a in house project and was later open sourced. At your current startup your major job is with Django itself. The Django book which you wrote was released under a permissive license. How difficult is it to convince people outside the hacker culture of the business value of open source? What are the best ways to do this? How difficult was doing this at World online?

Jacob: These days, thankfully, the business value of Open Source is pretty well-established. There really hasn’t been a lot of “convincing” necessary. For example, Apress didn’t just agree to release the Django Book under a permissive license: they actively encouraged it. You’d have to ask them if it was “worth it” in terms of sales, but I’m sure that being the first Google hit for “django book” didn’t hurt!
Releasing Django at World Online is actually an interesting story. We decided at the 2005 PyCon that we should Open Source some of our software, and started building a business case for doing so. We prepared a series of arguments — open source will increase our visibility and lead to easier sales and easier hiring, open source will improve the quality of our software, etc. — and took them to our management. All of those things turnned out to be true, by the way, but o our surprise, the argument that was the most effective was actually a “moral” one. We talked about how Open Source had helped our business (Apache, Linux, Python, etc.) and argued that it was time to “give back” to the community. The World Company has always been a company that’s tried to be a conscientious part of our local community here in Lawrence, and they really jumped at the chance to participate in the global Open Source community.
I like to tell this story, by the way, because it really makes me hopeful that this “hacker culture” is in fact compatible with business culture. Fact is that most businesses are in fact concerned with doing the “right thing” — they just often don’t know what that is when it comes to technology. Of course the business case for Open Source needs to be there, too — and it is — but I think there are a lot of companies that’ll jump on the chance to “give back.”

Shabda: Before we close would you like to share any tips with us? Jacob: If you don’t read the Django community aggregator you really should: there are some incredibly smart people blogging about Django and you’ll learn something new from all of them.


This was the interview of Jacob Kaplan-Moss. We have a few more Django interviews coming, before we close this series of Django interviews, so stay tuned.

An idea a day - A geographical wiki

This is an article in the Five Startup Ideas series at the 42topics blog. In his essay, Ideas for Startups, Paul Graham argues that ideas are not a critical factor for success of startups. Although I do not believe that ideas are worthless, as many people do, I believe that they are not any where near as important as execution. So to prove my point, I am giving away 5 startup ideas in next five days. All of them describe a problem, its solution, the technology involved, the competition and market size. If you are not a hacker, and want to build any of these things may I suggest Uswaretech.


Title:

A Geographical wiki

The problem:

There have been a few attempts to mix Wikipedia style collaborative editing with Geographical data, such as wikimapia. However they suffer from two problems.

  1. For a collaborative software such as wiki to work, they must be very open, easy to roll back to a previous version, easy to edit and easy to audit. You must be able to see whose contributions were constructive and who is just spamming. It must be easy to roll back to a previous version. These features are currently missing from the current solutions.
  2. There is no intent on these. All geographical information is fair game, Instead limit your wiki to commercial information only. Make the businesses provide reasonable amount of information to add themselves to the wiki.

The solution:

  1. Learn from Wikipedia, and how they succeeded. Provide extremely easy ways to audit changes, contribution. Make it awfully easy to change, rollback and edit information.
  2. Only allow businesses to add themselves to the wiki. “My House” is not a valid place to show. This is similar to the notability guidelines in Wikipedia. Make business provide enough information before they can add themselves to the wiki. (Of course you need to provide enough benefits to business in return for that.)

Technologies involved:

Do not use Google Maps, or make a mashup with another API. You need a lot of control over the mapping part, something which no API will provide. Instead go the Everyblock route. They built the mapping API themselves, either buy this technology from them or wait another year until their Knight News Challenge expires and they release the code under a open license.

Existing Competition:

Wikimapia does something similar. Read The Problem section to find out how you can differentiate yourself.

Market Size:

Wikimapia has an alexa rank of less than 2000.

Others:

None


This was part 5 of the series of 5 startup ideas. For next five days we will publish a new idea a day. If you want to read all of them, please subscribe. Oh and have you seen the 42topics startup section? Or if you want you can create your own topic.

This was the last essay in 5startupideas series. Hope you use some of these ideas, or get one of your own. Here is a quote from Yogi Berra to finish things up. “When you come to a fork in the road, take it.”

An idea a day - Alternative to GAE

This is an article in the Five Startup Ideas series at the 42topics blog. In his essay, Ideas for Startups, Paul Graham argues that ideas are not a critical factor for success of startups. Although I do not believe that ideas are worthless, as many people do, I believe that they are not any where near as important as execution. So to prove my point, I am giving away 5 startup ideas in next five days. All of them describe a problem, its solution, the technology involved, the competition and market size. If you are not a hacker, and want to build any of these things may I suggest Uswaretech.


Title:

Building a scalable alternative to Google App Engine.

The problem:

Scalability is Hard, lets go shopping. -Consultant Barbie

There are many companies trying to take the pain out of scaling. Amazon EC2 and GAE make this much easier.

However there are still big problems with GAE. For example not being able to run cron jobs make this instantly unusable for many people. Many of these shortcomings would be removed in time, still I believe that when a site can pay for itself, many people would like to move it on their infrastructure, with much more freedom to do things.

The solution:

Create a drop in replacement for GAE. People must be able to just drop in their GAE applications and start using it, without modification.

Technologies involved:

Of course this is a hard problem. You essentially want to build a super scalable system. But this is possible if you mix some open source components well. For hardware, host all your system on EC2 instances. Use Hadoop for getting Mapreduce functionality. Use Hbase or Hypertable instead of Bigtable. Use Django to talk to them.

Existing Competition:

This is such a new area there are no existing competition in this area. But if you want to take other systems promising infinite scalability as competition then there is Heroku. Of course GAE is an competitor as well.

Market Size:

Heroku raised 3 Million USD recently.

Others:

It would not be a lot of extra work to build massively scalable solutions for Django. You play with the same stack, but write two database API, one which mimics Django, and another which mimics GAE.


This was part 4 of the series of 5 startup ideas. For next five days we will publish a new idea a day. If you want to read all of them, please subscribe. Oh and have you seen the 42topics startup section? Or if you want you can create your own topic.

An idea a day - Remotely hosted Analytics solution

This is an article in the Five Startup Ideas series at the 42topics blog. In his essay, Ideas for Startups, Paul Graham argues that ideas are not a critical factor for success of startups. Although I do not believe that ideas are worthless, as many people do, I believe that they are not any where near as important as execution. So to prove my point, I am giving away 5 startup ideas in next five days. All of them describe a problem, its solution, the technology involved, the competition and market size. If you are not a hacker, and want to build any of these things may I suggest Uswaretech.


Title:

Remotely hosted Analytics solution.

The problem:

For most commercial websites tracking results, about their visitors, about the sites which link to them, about which users convert to sales/leads is a big problem. Google Analytics is the default solution for doing this. In truth Google Analytics is an absolutely marvelous piece of software. Yet there are a few problems with this.

For most webmasters, giving information to Google about their sites is a issue. If you install Google Analytics on your site, what you know about your site, Google know too. Your best performing Adwords keywords, conversion ration and most anything. This is a situation in which many webmasters would not like to be in, but as Google Analytics as the best analytics software, most webmasters choose this.

The solution:

Create a remotely hosted analytics software. Do not think much, just STEAL all the features from analytics. If you want to go beyond Analytics, you may try providing services provided by Crazyegg, but I think that would be a case of featuritis. Provide ironclad guarantees in your SLA, that this data will never be provided to any third party.

Technologies involved:

A server side technology to collect and make meaning of data. Javascript code to capture the visitors information.

Existing Competition:

There are a lot of people who are trying to do this. Mint and Clicky are doing this. Of them Mint is a self hosted solution which means that the users need to install on their server, which is hassle many people do not want to get through. Clicky is a remotely hosted software but this is not comparable to the level of detail provided by Google Analytics.

Market Size:

Google bought Urchin in 2005. The exact terms of Google’s Analytics purchase were not disclosed, but were expected to be in ballpark of 30 Millions.

Others:

The big issue here is gaining webmasters trust, that you would never share their data with any third party.


This was part 3 of the series of 5 startup ideas. For next five days we will publish a new idea a day. If you want to read all of them, please subscribe. Oh and have you seen the 42topics startup section? Or if you want you can create your own topic.

Popularising Django - Part 2

If you would have read my Popularizing Django post, you might know that I consider building a killer packaged app to be the best way to popularize Django. This is a post about what that app must be.

For PHP it was Wordpress and PhpBB. Both were free, very easy to install and came with every thing packaged. If you have followed the history of either you must know that they have always been plagued by security problems.
Assertion: Most users, even programmers, value ease of use and install over technical superiority. Case in point, Windows vs Linux.

For Rails, arguably it is Basecamp. This is a website people can use for free, and can see that building complex, rich and engaging applications is possible with Rails. With the build a weblog in 15 minutes, Rails already proves that building webapps is fast and easy.
Assertion: Killer apps need to prove easy things are easy, but complex things are possible.


Can a blog application be Django’s killer app? We have many, many Blog applications written with Django. If we build a technically better application than Wordpress, can this be Django’s killer app? Sorry, if you are still writing a Blogging app, (apart from personal use/learning), you are deluding itself. You can build an awesome application, but can you get Chris Pearson to designs a theme for your app for free. Wordpress’s three gazillion free themes/plugins make competing against them with a blogging application almost impossible. Same goes for competing against MediaWiki or PhpBB or Drupal.

The other end of the spectrum is building say a CRM as the killer app for Django. There is SugarCRM, but it is not widespread, and a technically superior solution can knock this off its pedestal. Of might be a open source project management application?
Yet with killer apps you want width of penetration, not depth of penetration. With a CRM you can get a few Rabid fans, but not an army of people who are tweaking your application everyday.

So what can be the killer app of Django? Are there still applications left which can have a depth of penetration, and are not yet in widespread use? Yes. A lot of people want to build a Social news site, and they are forced to use Pligg, and Pligg absolute sucks. Or maybe a social network app?

But I am most hopeful about Everyblock. Once the source for Everyblock is released, a ton of localities and cities not served by Everyblock would like to build such a site. Imagine a large number of Everyblock clones, and a million people hacking on Django. Never underestimate the power of large number of college kids tweaking small little things.

An idea a day - Recomendation system based ad network

This is an article in the Five Startup Ideas series at the 42topics blog. In his essay, Ideas for Startups, Paul Graham argues that ideas are not a critical factor for success of startups. Although I do not believe that ideas are worthless, as many people do, I believe that they are not any where near as important as execution. So to prove my point, I am giving away 5 startup ideas in next five days. All of them describe a problem, its solution, the technology involved, the competition and market size. If you are not a hacker, and want to build any of these things may I suggest Uswaretech.

Title:

Ad network which takes into account user feedback.

The problem:

Today Adsense is the default Ad network. If a person has a website they just slap an adsense ad unit on the page, and try to get a few bucks.

Adsense is a contextual advertising solution. It read the page and tries to find the area the page is about. For example, try this site. From reading this site, google decided that this is about web hosting, and showed ads about web hosting.

The problem with this is that this site is about free web hosting. So no person on that site is clicking on paid web hosting ads, and the site owner in not making any money. Assume now that the demographic of this site is gamers. So if you showed ads about games, the ad revenue for the site owner can be potentially much higher.

Of course you need to do this algorithmically. So you need to take users feedback into account. See solution for how this can be done.

The solution:

There are a few ways to find out what ads will convert for a given webpage. When the ad unit is put up on the page, show a random collection of ads. After a few clicks have happened, you can pin down with reasonable accuracy the niches from which these clicks happened. Now start showing ads from that niche, and keep track of which subniches convert best. Soon you can find out which niches, and ads are best performing and show them.

When you are just starting you would not have large inventory of ads to show on all pages. So you need to show ads from other networks as well. Amazon has an API, so does Ebay, Adsense has one too, though I am not sure it has what you would be looking for. You can use these to show ads your members would be interested in.

Technologies involved:

Use any server side technology ROR, Django, or if your feeling adventurous J2EE. All of the services mentioned above have SOAP or REST api. So any programming language won’t be a problem.

Existing Competition:

You are essentially trying to mix Recommendation System with Ad networks. There is no ad network which I know which does this. There have been a few ad networks which have used the Ebay shopping API to create such an ad network, most notably Shopping Ads, but you still need to tell the system which area your site targets, and then it shows ads from this area. This system of asking which ads work best for a site cab be automated away, or can be used to make a first guess, but an automated system can perform much better than asking humans for each page in site about this.

Market Size:

As in my previous post, I am unable to find the exact size of market here, but Google’s financial information is here. Also the whole internet is based on advertising, so the market size is big.

Others:

None


This was part 2 of the series of 5 startup ideas. For next five days we will publish a new idea a day. If you want to read all of them, please subscribe. Oh and have you seen the 42topics startup section? Or if you want you can create your own topic.

An idea a day - An automated Adwords optimizer

This is an article in the Five Startup Ideas series at the 42topics blog. In his essay, Ideas for Startups, Paul Graham argues that ideas are not a critical factor for success of startups. Although I do not believe that ideas are worthless, as many people do, I believe that they are not any where near as important as execution. So to prove my point, I am giving away 5 startup ideas in next five days. All of them describe a problem, its solution, the technology involved, the competition and market size. If you are not a hacker, and want to build any of these things may I suggest Uswaretech.


Title:

An automated Adwords optimizer

The problem:

A lot of the advertisers are using Adwords for SEM. They have to manually keep track of the keywords they are optimizing on, the ad sales copy, the ROI on each keyword+Ad sales combination. For a lot of advertisers, who have a large inventory finding the keywords they want to advertise on, and keep track of is a big challenge. This manual process is very inefficient, drudgery filled and can be automated away.

The solution:

Take the example of Google search for Buy books. A lot of advertisers are bidding on this keyword. Now take the results for Buy harry potter no one is advertising on this keyword. Most of the advertisers would have Harry Potter in their stock, but can not afford to advertise on this as keeping track of each keyword and measuring ROI is unfeasible. So the Software need to do things,

  1. Allow to easily track all items in advertisers inventory. Mix this with other commercial intent keywords and bid on them. For example if the advertisers inventory is [’Harry Potter’, ‘LOTR’, ‘Lord of the Flies’, ‘Kamsutra’, … ‘Iacocca’] then with one click the advertiser bids on [’Buy Harry Potter’, ‘Buy LOTR’, ‘Buy Lord of the Flies’, ‘Buy Kamsutra’, … ‘Buy Iacocca’]. The inventory is pulled from the advertisers database.
  2. Track the ROI on each keyword. For example Buy Harry Potter is profitable to the advertiser, but for some reason Buy LOTR is not. Automatically remove the keywords which do not perform. The costs are pulled from the advertisers database. The conversion ratio can be calculated by adding a javascript to ‘Thanks you for the purchase’ page.
  3. There are a lot of other places where automated optimization can be done. For example people who are advertising on buy books would also want to advertise on buy boks, a misspelling, yet very few are. Automatically advertise on misspellings and track the ROI as in 2.

Technologies involved:

You would want this to be a web based hosted service. So you can use any server side technology you want. ROR, Django any would do. The interesting part is that Adwords has an API using which you can interact with an Adwords account, and most things are possible. using the API is not free, but has a liberal pricing, so this will not be a barrier.

Existing Competition:

There are a few software in this area such as Adgooroo, but it still requires a lot of manual intervention. There is still a lot of place for optimization and automation which can be done in this space.

Market Size:

I am unable to get an exact breakdown of the market size of Adwords, but Google’s financial information is here. Of the 5,186.04 million revenue for 3 month period a significant percentage are from Adwords program. (If you can find a better source for market size please let me know.)

Others:

There is significant risk that if you build your software around the Adwords API, and your software leads to losses for Google, the API may be changed or removed. You need to find a way to make this software win-win for Google and the advertisers. This is possible if you find new areas the advertisers can advertise on.


This was part 1 of the series of 5 startup ideas. For next five days we will publish a new idea a day. If you want to read all of them, please subscribe. Oh and have you seen the 42topics startup section? Or if you want you can create your own topic.

And if have a question about this, or think this idea sucks leave a comment and I will reply to you queries.

Parable of the single sheep - Or How Google is destroying the internet, and nobody seems to know.

This is a parable in two parts. Story and the Moral. If you are in a hurry you might want to skip ahead to the moral (But you miss the beautiful story).

The Story

Long ago was the kingdom of Foobr, a kingdom mostly of shepherds, who grazed their sheep under the benevolent but watchful eyes of their King Oggle. There were all types of shepherds in the kingdom, some had only a few sheep, and some had a few hundreds. The sheep too were of all types and varieties, some gave a ton of wool, and some only a few bales.

What man webmaster of you, having an hundred sheep sites, if he lose one of them gets one penalised, doth not leave the ninety and nine in the wilderness, and go after that which is lost, until he find it?
Luke 15:4

For a long time the kingdom, and its economy based on wool prospered well. It was not a very efficient marketplace, though, as the shepherds who had more sheep grazed more of the common fodder and pastures, and these shepherds got rich. The shepherds who had a single but more productive sheep could only make so much wool!

The king though of himself as hard but fair. “Ah! I need to overcome this inefficiency in my kingdom. I need to reward those who use resources judiciously, and punish those who have a lot of inefficient sheep.” he thought. He asked his Sages to get to work to determine which sheep were inefficient, and needed to be killed! “This will increase the efficiency in my kingdom, and make the kingdom happier overall.”

The sages worked hard, analyzed millions of records and found out that red sheep were less efficient than other sheep. “If we kill all the red sheep, the efficiency of the kingdom will go up by 10% and in two years the production by 5%.” The king was duly impressed and ordered all the red sheep killed. As promised, the efficiency increased by 10% overnight!

****

Ramu was a simple shepherd. He had but one sheep. It was efficient, but as luck would have it red in color. The king’s decree left him without a means of income. “I guess this was my bad luck. I will not buy a red sheep next year, and buy only the most efficient sheep.”

For some reason, unexplained at that time, the quick gains in efficiency were not maintained. “I know, we need to kill the least efficient sheep again.” So the sages went back to their laboratories again, and found that now the pink sheep were the least efficient. The king ordered all pink sheep destroyed. Guess what was the color of Ramu’s sheep?

Ramu was not alone in this misery of his. Many other people, who had but one sheep had their sheep killed, and went to a starving condition. The real culprits, the people who the king wanted to target, the people who had hundreds of sheep, lost many of their sheep as well, but only a percentage of their total sheep. Everyone saw this happening, and even those people who had only one efficient sheep decided to hedge their bets, and started having many inefficient sheep. The more sheep the king killed, the more prudent it became to have many inefficient sheep. In no time efficiency had plummeted, and total production of wool was a fraction of earlier.

The Moral

Spam in webpages is a major problem facing search engines. For long the Search Engines have tried to counter this problem by algorithms in which nobody knows which website will be classified as spam. If the algorithms decides that your website is spam, boo, it is toast.

With Google driving most of the traffic to most sites, no webmaster can afford to have his only source of income depend on Google’s whims. This means that they must hedge their bets against the vagaries of Google’s changing guidelines, and instead of building one kick-ass website must build a large number of websites. Bye-bye engaging content, welcome mediocrity.

For trade to flourish, and for wealth to be made there must be a set of rules which everybody knows, a priory, and if they hold themselves to these rules they must be assured of their safety. This is not the case on the Internet. On the internet, rule of man, not rule of law works. Google is the judge, jury and executioner. This leads to a wild west landscape where webmasters must hedge their bets by having a large number of so-so websites.

In security, for long we have known that “Security through Obscurity” does not work. I postulate that even in fighting spam, “Security through Obscurity” does not work. After all for ten years Search Engines have tried fighting spam though Security through Obscurity, is it not time that we rethink the strategies?

One of the biggest inventions for English society, which allowed their citizens certain inviolable rights, and which allowed them to build a strong society on rule of law, not rule of men was the Magna Carta, which proclaimed,

NO Freeman shall be taken or imprisoned, or be disseised of his Freehold, or Liberties, … but by lawful judgment of his Peers, or by the Law of the Land….

We need a similar proclamation

No Website shall be taken or penalized, or be relieved of its Ranking, or Traffic, …. but by the lawful judgement of the Law of the Land which are known to all ….

This will give the webmasters the peace of mind to focus their energies on one website, with deep and engaging content instead of making them hedge their bets on many mediocre web site. It will give the peace of mind that they will not be penalized by an ever changing law, which will make some of their tactics shady, and take away their only source of income.

Here is to a better internet.


If you liked this, you might also like Parable of the Captcha or Parable of the Nofollow. The name is of course stolen from Parable of the Lost Sheep

Oh and yes, 42topics is live now. Did you know we have an SEO section, and that you can create a topic about topics you care about?

An interview with Michael Trier

Michael Trier is a long time Django user and evangelist. He has worked with a number of technologies including Rails and .net. His insights on marketing Django to traditionally Enterprisy areas were extremely informative. He produces TWiD, along with Brian Rosner which is great to keep abreast of the latest happenings in the Django community. He graciously agreed to be interviewed by the 42topics blog.


Shabda: Would you tell a little about yourself, how did you get started with Django, what other projects have you used or are associated with?

Michael: Well, I’ve been programming ever since I can remember, probably around 11 years old. I grew up in Silicon Valley, and that whole story is a pretty interesting one. I did the usual thing of starting out with languages like ASM, C, C++ , Pascal. Moved on to things like Delphi, VB, and most recently I spent quite a bit of time with Ruby, Rails, and within the past year and a half dabbling with Django.
I came to Django for a particular reason. I was focussed on building a high content push type of site and it just seemed like Django was a much better fit for that than Rails. Obviously I could have done it with either language, but I believe in using the right tool for the job.

Shabda: You are using Django for your next venture. What are the specific areas where using Django been a better way to develop for you, compared to any other choice you might have made, say Rails?

Michael: I hate to bring up the scaling issue, but that was certainly something I was seeing as a problem with Rails. The type of site I’m building I hope to see some pretty high traffic (don’t we all). So that was one thing. The other thing was that it seems like if you’re doing a large content driven site, Django just makes that very easy. Additionally, as most people say, one big win for Django is the built in Admin. It has allowed me to focus on the front end of the site, while giving the other people involved a way to immediately start working on the content part. Right away this helps us see where we may need to enhance functionality or perhaps we’ve built in too much flexibility.
That kind of feedback comes back to us quickly and so that is invaluable.

Shabda: How do you compare Rails to Django? In what areas is Django better than Rails (Apart from scaling/efficiency)? What does Django still need to learn from Rails?

Michael: That’s a good question and not something I’ve given a lot of thought to, but off the top of my head I would say that with something like the Admin, Django definitely wins over Rails. Now within the Rails community there are a lot of interesting third-party plugins that have attempted to mimic the Django admin, but up until now those that I’ve looked at have fallen short. It’s a huge effort as we’ve seen with the amount of work being put into the NewForms-Admin rewrite.
I also think the middleware stuff in Django is very nicely done, although it could use a bit more options in terms of request / response ordering of the middleware items. I also think Django templates are just perfect. I really can’t see how to enhance on those at all.
As far as Rails is concerned, there’s a lot of nice features in rails that Django could learn from, and most of it just has to do with the maturity of the two projects.
Rails caching takes caching one step further. Rails has model level validations which are very nice and quite frankly the right way to do it, in my opinion.
ActiveRecord, Rails’ ORM, has things like aggregation support, and supports a lot of flexibility in how you are able to filter the dependency relationships between your models with things like has_many :through (basically intermediate models).
I also like the before_filter and after_filter type of stuff in rails. It makes it real easy to invalidate cache or do other things in an event driven way.
Oh one thing back on the Django side, I think NewForms are done really well. It’s an elegant solution. And it is interesting because in the .net world I’m starting to see a copy of that through things like dynamic data controls.

Shabda: One specific area which I believe Django can learn from Rails is marketing. What would you say to this? With 1.0 release coming soon, how can Django start to market itself better.

Michael: I would agree with you on this. I think the impending 1.0 release will definitely do a lot on its own to market the product. There are also at least 3 more books coming out in the next several months. The Google App Engine announcement that heavily featured the name of Django has also helped to bring it focus.
But there’s another element that I don’t know if you can fabricate and that is that DHH is a charismatic individual.
In terms of real things that can be done are obviously things like more screencasts just featuring the product. It’s interesting to note that consistently the number 1 screencast on iShowU is the Django one that features a simple tutorial on Django. Although a lot of people have no interest in screencasts, there are a lot of people that do. We all learn in different ways.
I also think that on the djangoproject.com weblog we can do a much better job of regular blogging on what is going on within the community.
There are times when you look at the weblog and it’s 2 or 3 months old. From the face of the website you would think that nothing is going. The last release is over a year old. Meanwhile those of us that are in the community every day, we see that tons of stuff is happening. This needs to be communicated better.

Shabda: Talking of GAE, what effect do you see of it on Django? How can Django make the GAE-Django integration painless, or even if this should be done, and what efforts the Django community should expend on this instead of say focusing the efforts on 1.0 release?

Michael: I think GAE helps Django, but only a little bit. It helps it from the standpoint of name recognition and will cause a lot of people to say to themselves, “what’s this Django thing all about.” I think GAE really helps Python in a real way.
In terms of making the integration painless, that’s going to be a very difficult task. I have a taste of that with the django-sqlalchemy project I’m working on, and I’m just mapping a relational model to another relational model. With GAE it’s quite different. That said, Google folks are working on on a project to do “some” mapping of Django to GAE. I think over time it will expand in its focus. I think if they are willing they are the best people to do the work. I’d rather see Django focus on getting to that 1.0 point.
As far as GAE itself is concerned, I’m kind of on the fence on the real value there. I know that people like Jaiku are porting their stuff over to it, and Kevin Rose said that it would have been a great platform for something like Digg. Personally, I’m not really convinced of that, but it’s still early so we’ll all have to wait and see what comes of it.

Shabda: Your TWiD has been a great help for people to keep track of all the happenings in the community. Would you like to share some interesting tidbits you have learned with TWiD, and point to the more interesting ones?

Michael: Thank you, I’m glad that people find it helpful. I think one of the interesting things is the amount of work that goes into it. A lot of people are surprised because neither Brian nor I are very professional, so when we’re on the mic it sounds like we’re just sitting there talking about whatever. The reality is that it takes quite a bit of work in finding the stuff we want to discuss, attempting to understand enough about the topics to at least sound somewhat intelligent on them, and then there is the recording and post-production.
As far as actual topics the two recent episodes on Internationalisation have been a great learning experience for me personally. Going into it, I really didn’t even know enough about it to know what questions to ask. Thankfully Malcolm Tredinnick put the whole thing together. He’s been a huge help to the show.
Another show that was fascinating for me was the one on GeoDjango with Matthew Wensing. It’s a fascinating topic and I hope to put together more shows on that subject.
I think a lot of people like the interviews, and we do too, but we also don’t want to make the show just interviews every week. So we try to split it up.

Shabda: Would you tell a little about the venture you are working on?

Michael: Sure. The project is a hyper-local media site. We’re working on providing an online place for a mix of general media type of content (stories, events, etc…) with citizen journalism types of offerings. A lot of people are working in this space, trying to figure out how you bring the big media type of stuff down to a local level.
We then want to mix some of that with interesting datasets, a la EveryBlock, as well as provide some level of social interaction. The cool thing is that once we release we’re going to make the entire thing available on a New BSD license.
Frankly, initially it’s not going to be that interesting. A lot of what we’ll provide in the way of Classifieds, Marketplace, Aggregators, stories, etc.. will be similar to the types of offerings that you see in things like Ellington or just about any newspaper’s online presence these days. Down the road I hope that we can expand it into something quite useful.
The framework we’ll be releasing is called ArEyah. So expect to see something from me on that in the next several months. Timeframes are tough to nail down at this point because this is a side project in addition to some of the other things I’m involved in.

Shabda: For many people, such as me, the question is what business advantage remains for you after you are releasing all your secret sauce under an open license. For example even EveryBlock will have to release all their source under an open license, after the end of the Knight Grant expiry period. What would you say to this? What do you hope to achieve with releasing the project under a BSD license?

Michael: That’s really a good question. For me it was two-fold. First I wanted to take the “intellectual value” of it off the table. In other words, I did not want to be in a position with my partners where I had an extremely unfair advantage. Secondly I really think this is something that could benefit communities everywhere; of course that remains to be seen. If that is the case, then I think a lot of individuals would get involved and make it a better product for the benefit of all.
As far as competitive advantage, I really see that being in the execution. In comes down to how well we serve our community. If we are not doing a good job of it then someone else should be able to come along and beat us out there. There were certainly be some things that are specific to the community that we’re targeting that don’t have a place in the general framework. Those things will be our own and they will be highly tailored for our use, in order to serve a specific need we have here.

Shabda: We see a lot of comparison going on between Django and ROR, or Django and Turbogears. But we do not see enough comparison between Django and other traditionally ‘Entrprisy’ frameworks. Say Java based frameworks like Struts+Hibername or Asp.net. As you have worked with .net, how would you compare Django with these frameworks. What can Django do to make itself popular in the areas dominated by these frameworks?

Michael: The differences have less to do with feature set comparisons. A lot of what I do every day for corporate clients could be done much more quickly and actually often more robustly with something like Django. So it’s not a thing where you can say feature x is in .net but not in Django. That said like any framework / language there are edge case types of things where you might say “Java is the right tool for this job.” What it really comes down to is the corporate culture. .NET and Java own those corporate cultures. There are real reasons why it makes good business sense to build your corporate infrastructure on Microsoft products. You can pick up the phone and get three .NET developers tomorrow. Regardless of whether your company is based in Louisville, Kentucky or White Plains, New York.
In the case with Microsoft, much more than in the Java world, they provide a singular full-stack solution. No one, in the executive sense, needs to make any more decisions about which reporting tool, which database backend, or which IDE they are going to decide to use. Often that alone is motivation enough.

Shabda: Your last post on your blog was about the benefits of DVCS as compared to Centralized systems. What are the compelling benefits of DVCS over, say, SVN. In particular how can moving to a DVCS help Django?

Michael: Well I’m somewhat new to the DVCS world, as are a lot of people, so I’m no expert on the subject. To me though the benefits of a DVCS are at more of a personal level. I have seen tremendous personal benefit in being able to commit while sitting in a coffee shop, or being able to branch code locally and then merge that back in.
As far as benefit to a centralized project like Django, I guess it remains to be seen. I’ve been watching the shift of Rails to Git with great interest.
So I guess in summary, I just don’t know enough at this point to say that it would be beneficial. The shift is probably more psychological. I think, for a lot of projects, it would be just another way to have a centralized repository.

Shabda: You and Brian Rosner are working on django-sqlalchemy. How would a sqlalchemy based ORM be better than the current implementation? What is the status of this project?

Michael: Yes, Brian and I are probably the primary committers, but we have several other individuals that have pitched in on the project. I like to think about the benefits in two separate areas. First, there’s the approach where you just plug django-sqlalchemy in and continue to use Django with its filter syntax, etc… In that case the benefits you gain are not benefits at the ORM level (because you’re still using Django’s syntax, which doesn’t support things like aggregation). The benefits instead are things provided as a result of having SQLAlchemy as the backend. So this would be things like multi-database backends, sharding, additional database support like DB2 or Firebird.
The second approach is where you actually need things like aggregation or more complex queries, without resorting to raw sql. In that case we expose SQLAclhemy’s ORM right on your models. So the full power of SQLAlchemey gets exposed and available for you to use whenever you need.
So I could technically have 90% of my ORM code just using Django’s syntax but then realize that I need to do something a little outside of its capabilities. In that situation I might chose to just use the exposed properties to get what I need.
Finally there’s one more thing related to this that we still do not have clear at this point, but will come down the road, and that’s actually adding different functionality at the model level. This might be things like Intermediate Model support being made available through django-sqlalchemy.
As far as status of the project, it has been moving along very well. We have a few more filters to implement and a handful of management commands, plus lots more testing. One of the things I’ve done in the past week is to code up a test application using django-sqlalchemy to see in a “real-world” sense where some of the problems are. That has been really helpful.

Shabda: So if the extra features added by SQLachemy is not needed, then this aims to be backwards compatible with current Django syntax?

Michael: Yes, our aim is that you could plug it in and run all your stuff just as is. In other words with django-sqlalchemy as your backend db (that’s how it gets exposed) we should be able to pass all of Django’s test. Once we’re able to do that we tag it 1.0 and get some people hammering on it.
We do have one big hurdle that I have not discussed. Currently we are using multiple inheritance to modify the Django classes. That doesn’t work for the contrib apps or third-party apps because that would require a change to their code base.
We took this approach originally just because we wanted to focus on the mapping issues first and prove the concept. The eventual plan is to use some class replacement techniques to inject our stuff right into the Django models at evaluation time. That will make it work across the board.

Shabda: Before we leave. Would you like to share a tip, or hard to find information about Django with our readers?

Michael: Not really a code tip, because we do those each week on TWiD, but more of an approach tip. The generic views stuff is extremely powerful, and quite often I see a lot of new users doing stuff in views that could easily be done with a generic view, or with a wrapped generic view. James Bennett has a great post on this, and I suggest everyone check it out. I think often, especially for people that are new to the community, the power of it is overlooked.
Finally one more thing, spend some time in IRC. IRC is a great way to get an education in Django very quickly. Reading the questions and the responses has been invaluable to me in learning how to use Django more effectively. It’s a great community.

Shabda: Thanks a ton for this great interview. It was extremely informative and interesting.

Michael: Thank you.


This was the interview of Michael Trier. This week I am not going to any more interview, but stay tuned for next week, we have even more Django interviews coming.

And of course, the 42topics is live now. And we have a Django section. (How 42topics works?) So join now, and lets get rolling.

Popularizing Django — Or Reusable apps considered harmful.

For all its technical merits, Django is still a very niche technology. It is my belief that the thing which is holding Django back the most, is due to one of its strengths.

Making reusable apps is easy and simple in Django. In Django this is the correct way to do things. You take a few apps, mix them together in your project, and deploy to start your site.

Compare the installation steps of Wordpress and an imaginary blog software better than Wordpress called Djangopress.

Wordpress

  1. FTP wordpress to webserver.
  2. Point browser to site.com/blog
  3. Next-Next-Next done.

Djangopress

  1. Svn checkout Djangopress
  2. Svn checkout django-registration
  3. Svn checkout other Django apps Djangopress depends on. Maybe django-mptt, django-threadedcomments or a few others.
  4. Edit your settings.py to add all these apps to INSTALLED_APPS.
  5. Add database settings, and other changes if needed.
  6. Telnet to your server and do syncdb
  7. Create templates. Done.

This does not take into account the extra hoops Apache makes you jump through, compared to using a PHP app.

How I got started with web programming.

I wanted to run a forum. PhpBB was free, and seemed most widely used. Installed it, and wanted to tinker with it, so learnt Php. If there was a different forum software, which was technically superior, but which asked me to write templates for it before I could start a forum, guess which one I would have chosen?

So how to popularize Django.

In my interview of James Bennett, I asked what is Django’s killer app. And he said there need not be a Killer app for Django, reusable apps will do. I guess I will have to disagree. Even internet needed a killer app to get breakthrough popularity. Let’s see what a Killer app gives you.

  1. It fills a big niche, so people are forced to learn your language/framework.
  2. It forces the Hosting company to support your language/framework.
  3. If a large number of places use it, it gives your framework name recognition.

So to popularize Django, I propose setting up DjangoPackagedApps.com to distribute packaged Django apps, to complement reusable Django apps. A packaged Django app, must have these properties. 1. All dependencies must be included. 2. Beautiful templates must be included out of the box. 3. Users must not need to modify anything in settings.py apart from the database settings.

And installing the PackagedApp must be no more than the number of steps needed in Wordpress.

  1. Svn checkout/FTP DjangoPackagedApp
  2. Only thing to edit in settings.py is database settings.
  3. Do syncdb. done.

Do yo use Django? Do you program? Find things which YOU will love reading at 42topics.com.