Thursday, November 25, 2010

Languages

A talk with my boss yesterday about programming languages got me thinking a bit more. Why is it that some languages become mainstream while others remain obscure or academic? Part of this is languages have to be 'cool enough' to be worth while. By that I mean by the nature of programmers they want to either work on something very cool, or use a very interesting tool. Most have a domain that they are passionate about; for me that is computational biology and artificial life. However many of us don't work within their domain of specialty. (like for me I don't have a PhD, so the jobs are very rare) So if a programmer is working on problems that are not in themselves exciting what gives them drive?

I think that the answer is the tools and techniques that they can apply to the problem. As an example, I do not find web technology in itself to be particularly thrilling. I recognize that it is one of the most important technologies around currently, and changes the whole way client server interactions take place (REST interfaces are beautiful). However, something happens when I am using an interesting tool like Python, and test driven development. I get excited when I get each test to pass. It is no longer a matter of focusing on the web technology, but now I focus on the tests, and it becomes a game where my score is the number of unit tests I can get to pass. I think many people who program are the same way. They want to play with cool technology, but also don't want to relearn everything.

Since they become productive with one set of tools, they are reluctant to move out of that unless they really have to. If a developer is a java expert, and is given some problem that is more amenable to a perl or python script, and they do not know perl or python already, they will likely solve the problem in java instead of taking this as a chance to learn a new tool. In many cases they will just 'get it done' with the tool that they know and are comfortable with. However over time this leads to situations where the tool that they are expert with is no longer the current industry fad, and they have passed up all these small chances to delve into a new tool. For an individual developer this doesn't mean too much, but as an industry it means that fewer developers are willing to adopt some other tool at any given time point. Thus leading to a problem of how to find enough developers proficient in a new technology to actually use it.

The other side of the coin I think is managements perception of 'academic'. Software companies exist to produce a product, and to make money with that product. To do so the product has to work well enough that users actually want to buy it. This leads to the importance of code quality and maintainability and long term service. However to most managers it seems that academic means more whimsical programming, code that is all throw away, and does not have strict quality standards. This very well might be true in the language design labs, or OS labs, where things are proof of concept. However, outside of these areas things change very much. Go to a physics/engineering/genomics/applied math group and likely the situation is very different.

For one investments in equipment are taken seriously, and in many cases specialized instrumentation that interfaces with computers may be kept for decades in use because of prohibitive costs in obtaining new equipment. In such cases any custom code written for that instrument must work, and must be maintained over a long lifetime by different people. Additionally there are high stakes if there are errors in the system, as it could invalidate all the research done with that equipment. Costs of reagents are high, and rerunning experiments could be a huge financial drain on a research group. However such systems are generally on the smaller side of programming projects in academia.

Consider the case however of the code base used for working on grand challenge type problems. Here code quality is probably more important than in most companies. A team might spend months writing code for their scheduled run on a supercomputer. If their code doesn't work they have to essentially buy more time and wait again for their run to come up. This is not the time to find a null pointer exception. So these things have to be written by teams quickly, robustly, and in a manner that is is maintainable, just like commercial software. However, because of the huge complexity of many of the problems the overhead in learning a new language for a project is just not that big of a deal if it means that the problem goes from impossible to just barely possible.

Thus I think commercial software houses should look not to the CS departments for what is up and coming in languages, but to applied math, physics, bioinformations/computational biology groups. Right now this seems to point to Python, Scheme, and R.

No comments: