Paul Joseph Davis

Python in Bioinformatics
========================

Just read another interesting article [1] by Titus Brown [2]
referencing an earlier post [3] I'd read that lamented the woes of why
Python hasn't taken over the bioinformatics world. I'd spent some time
reflecting on the fact why Perl seems to have gained such dominance in the world
of bioinformatics.

Titus argued that alot of it had to deal with Lincoln Stein getting things
rolling early on in the life of BioPerl. First I have to admit that I don't know
a whole lot about the history of the rise of any of the Bio* projects. But I
can't say that I see one person being the main cause for such an
industrial/social rise of BioPerl in particular.

Titus does make an observation I agree with though:

    However, I think the tide is shifting away from Perl: from the
    not-so-imminent release of a complex, backwardsly-incompatible Perl 6, to
    the massive quantities of completely non-reusable Perl code that have been
    flung in every direction, people are starting to get sick of Perl. also, a
    lot of people in academia are moving towards Python for bioinformatics, if
    not in a very coordinated way

This observation (although indirect) is at the very core of my theory on why
BioPerl rose to dominance.

Namespacing
-----------

If you think about it the ideas behind namespacing in Perl and pretty much any
other language are quite different. Perl uses an ad hoc namespacing system that
allows anyone to contribute to particular areas of a given 'project'.

Or think of it this way. For me to have code accepted into any of the other Bio*
projects, I have to go through the rigamarole of submitting patches to core
developers. This process involves not only writing the code, but navigating
project conventions, politics, and other random hurdles.

To contribute to BioPerl, I write some code and upload it to CPAN.

Without devolving into a rant on why I hate Perl and this model, I'll just end
it there. My theory on BioPerl's gargantuan size has more to do with its easier
conglomeration of subprojects into the overall Bio::* namespace.

Brief note
----------

James Casbon mentioned that python supports namespace packages [4] to allow a
similar type of developmental style. I knew of these via paste and they irritate
me to no end, but its a fair point.

Two things to note though. The first mention of namespaces in the change log is
version 0.5a9. Googling for "setuptools 0.5a9" returns results from 2005. I'd be
willing to say BioPerl was dominant before 2005 and perhaps even more dominant
than it is now.

Secondly, the docs on "namespace packaging":ns_packaging highlight the sad state
of Python package management. But that's a whole different story.

References
----------

[1]:  http://ivory.idyll.org/blog/sep-08/the-future-of-bioinformatics-part-1a.html
[2]:  http://ivory.idyll.org/
[3]:  http://igotgenes.blogspot.com/2008/08/not-biopythonista-i-thought-id-be.html
[4]:  http://peak.telecommunity.com/DevCenter/setuptools#namespace-packages


Copyright Notice
----------------

Copyright 2008-2010 Paul Joseph Davis

License
-------

http://creativecommons.org/licenses/by/3.0/