Perl vs. Python

So here’s a blog post that NOBODY will want to read (or at least, none of the people who typically read my posts). The handful of people who would be interested don’t even know about my blog. So, in some sense, this post is just for me.

On the off chance that you’re still reading, I’ll add some background so if you DO choose to read this post, at least you’ll some chance of understanding it.

Back in the 1980’s, my best friend taught me to program. He taught me in a computer language named BASIC, and I really enjoyed it. When I went to college, I took several computer science classes and learned to program in Fortran (which I later used a great deal), Pascal (which I liked a lot, but never was in a position to use), COBOL (which I abandoned immediately after the class was over), and assembly language.

I started programming a lot, primarily for fun, and learned lisp (I became fluent in two different dialects of lisp), and a little prolog and ADA (just playing around with these).

Later, in graduate school, my research in theoretical chemistry allowed me to become proficient in Fortran. I didn’t really care for the language, but it was the one in use, so I got a lot of experience using it. At the same time that I was studying chemistry, I was also working as a system administrator in our research group. This led me to learn two scripting languages: bash and perl. I also had a small side job where I learned C. A few years later, I added ruby (to a small degree) and PHP (extensively). I’ve also used java, javascript, and python, though not to a huge extent.

Learning all of these languages, even though most of them I never became truly proficient in, has given me a pretty broad AND deep understanding of programming. I’ve been completely proficient in several languages: Basic, Fortran, lisp, bash, perl, C, and PHP. By proficient, I mean that I have written many thousands of lines in each, and at the time, I was fluent with the syntax. Although I have moved on from most of these (and I would no longer consider myself fluent in the syntax), I remain proficient in bash and perl.

One side effect of having learned so many languages is that I am able to sit down and read code in pretty much any language, even ones that I have not seen before. On numerous occasions, I have debugged programs or written extensions to programs in languages that I was not proficient in. I may have to refer to a language reference to figure out exactly how to say something in that language, but I understand WHAT I need to say, so it’s just a matter of looking it up in a reference book to see how to say it in that particular language.

Now, even though I could program in any language that I want (and I could become proficient very quickly), I have a definite favorite language: perl. I’ve been programming in perl since around 1990. It was the language du jour for computer systems programmers, and became very popular. It remained on top for quite a while, until the language Python came along, and after a few years, Python became the most popular language in that niche. Although perl has not gone away, python has become so dominant that I have been asked several times if I would switch to python. My answer has always been no, so I want to record my justification.

If you look up a comparison of perl vs. python, you generally get the following statements (plus or minus a few others):

  • both are high-level, general purpose scripting languages
  • python is simpler to read and easier to understand
  • python code block are marked with whitespace; perl’s are marked with braces {}
  • both have a large set of libraries and extensions
  • python’s Object-oriented support is better than perl’s

So,let me address each of these briefly.

High-Level

The way that I like to think of a high-level language, is that it is less like a computer and more like a human.

When talking about how things are stored, computers think about bits and bytes, memory blocks, memory pointers, etc. These are concepts which are critical to how a computer ‘thinks’, but unrelated to how a human thinks. Humans would think about storing a date as a 4-digit year, a 2-digit month, and a 2-digit day. A low level description of these (which is how a computer would think of them) is 3 different integers, but the first would be a 2-byte integer (in order to cover numbers from 1-9999), and two bytes (which can each handle the 1-12 or 1-31 required to store a month or day).

When manipulating the data, the computer thinks about byte operations like addition, shifting, xor’ing, etc. Operations which (with the exception of addition) are very different than a human thinks of things. A human is going to think of things in a method more natural to the type of data. For example, you still might talk about addition when dealing with dates (i.e. what is the date if 5 days from now), but the operations are far more complex than simple addition.

The point is, both python and perl are high-level which makes them relatively easier for a human to think of than some of the lower level languages. Although I have programmed in lower level languages (such as assembly language, C, and Fortran), I definitely prefer the high-level languages. In this respect, to be honest, neither perl nor python has a compelling superiority over the other.

READBILITY

Here’s one difference that I am 95% in disagreement with. The statement is ‘python code is more readable than perl code’. Bulls***. Here’s the correct statement: readable code is more readable than unreadable code.

I have perl code that I wrote 30 years ago that I can still pick up and read. On the other hand, I’ve been given perl code that I literally could not understand at all despite being fluent in the language. On a few occasions, I have needed to read python code. Once, I got a python program that I wanted to modify. I looked at it, and it was beautiful. Clear, organized, etc. I easily made the modifications I needed. Another time, I was given a piece of python code to modify and it was so unreadable that I threw it away and rewrote it from scratch (in perl). The C programming language has the reputation for being extremely cryptic and hard to read due to it being a lower-level language, and I can confirm that I have picked up many pieces of C code which were cryptic I would have had to decipher every single line to determine what it was doing. However, several years ago, I looked at the C code for a program called postfix (which delivers a huge amount of the email that you read) written by a wonderful programmer named Weitse Venema which was so beautifully written it was like reading a story.

The point is, you can write readable code in every single language that I know of. It’s a bit harder in a few languages such as assembly language (which is basically talking directly in a language that computers understand so it’s as far from human thought as you can get), and lisp (which orders everything in a very non-human way), but almost all languages (including C) are sufficiently high enough that with even a small amount of care, you can write readable programs. Readability is 95% programmer, and only 5% language.

The tiny 5% edge that I’m willing to give to python is that python basically has one way to do things where perl began with the motto “there is more than one way to do things”. That means that many operations can be performed in multiple ways in perl where there may only be one reasonable way in python. This does tend to complicate things a bit, but it has long been my practice to write things in very much the same way and to stick to it. This is why I can read my code from 30 years ago.

One other thing that python people complain about is the use of punctuation. Every piece of data in perl has a piece of punctuation attached to it, where python does not. So, in perl, you might see the variables:

     %authors
     @books
     $author

I can tell that the first variable (%authors) is a type of data called a hash in perl or a dictionary in python (the ‘%’ tells me this). This is a list of data that has pairs. For example, one piece of the data probably consists of a book name and an author’s name and %authors is a list of all of the book/author pairs.

The second variable (@books) is a list of values (the ‘@’ tells me this). So this might be the list of book titles.

The third variable ($author) is a single value (the ‘$’ tells me this). So this is the author of one of the books.

In python, the variables would simply be:

     authors
     books
     author

Although I can still deduce a lot from the names, the first one is ambiguous. Is it just a list of author (i.e. it would correspond to a perl variable @authors), or is it the hash/dictionary? With well-written python code, you can figure it out and it can be very easy to read, but make no mistake… the punctuation in perl does NOT make it unreadable. It imparts extra information.

WHITE SPACE (INDENTATION)

The next difference is one which almost everybody I’ve ever talked to has shrugged off as unimportant. I disagree 100%. As a matter of fact, this is the single reason which I will NOT be switching to python (all other differences I could and would be willing to ignore). Python treats whitespace as part of the program. To clarify, whitespace are the spaces and tabs that just create blank spaces between characters that you can’t really see and, more importantly here, the spaces used to indent lines of code.

Back in the days of Fortran, we learned that using whitespace as syntax was a nuisance. In Fortran, the first few characters in each line had special meaning, so the actual code had to be indented past these characters. It was always a nuisance to get the characters in exactly the right location, and if you didn’t, the code would not work. Fortran whitespace was a nuisance, but it’s use was so limited that we could pretty much ignore it.

More generally, spaces have been used to indent code to that it was visually easier to read. This has been done since very early. But, with the exception of Python (and to a much lesser degree Fortran), the indentation was just an aesthetic consideration.

So here is an example of why python got it SOOOO wrong. Here are two very simple examples of python code:

     example 1:
        for book in book_list
           author = book["author"]
           pages = book["pages"]

     example 2:
        for book in book_list
           author = book["author"]
        pages = book["pages"]

Notice that in the second case, the ‘pages’ line is not indented? Although you might think that’s just an aesthetic issue, it is not. Python uses indentation to define code blocks, so the first example is a single code block, but the second is two blocks (the ‘for book’ line and the ‘author’ line form one code block but the ‘pages’ line is a separate block). As such, these two examples (despite being 100% identical with the exception of whitespace) perform two very different functions!

In perl, you might have:

     foreach $book (@books) {
        $author = $book->{"author"};
        $pages = $book->{"pages"};
     }

Here we have indentation too, but it is purely aesthetic. The code block is determined by braces ({}) and the indentation is unrelated to that. If one (or both) of the line were not indented, the code would still function identically.

The problem is that in any language, you use many levels of indentation to assist in readability. As I write a program, I frequently take code from one location (where it might be indented to one level) and move it to another location (where it might be indented to a different level). I do this for several reasons, but mainly to just group related pieces of code. It’s good code organization (which makes for improved readability). In perl (and EVERY other language except for Fortran and Python), indentation doesn’t matter, so move the code and don’t worry about the indentation (most editors will fix it automatically). In python, it matters a great deal, and editors can’t fix it automatically, so you have to do so very carefully.

Every time I’ve programmed in python, I’ve been bitten by this. It is so annoying, so difficult to spot (because the difference between good code and broken code is so subtle), and so needless, that I just refuse to use python. Give me a language that allows me to reorganize code as desired without introducing bugs that are exceptionally difficult to locate. There are enough other interesting languages out there (or I can just stay with perl), that there’s no need for me to put up with this nonsense.

LIBRARIES

Both perl and python have a large repository of libraries/modules that other programmers can draw upon. So, if you want code to work with dates, it’s already been done. Just go to the repository (CPAN for perl; PyPI for python) and there’s a library already written for you.

Because of it’s popularity, the python repository is bigger (CPAN reports a bit more than 200,000 libraries, while PyPI reports just over 500,000 projects), but one problem is that comparing those two is a bit of an apples to oranges comparison. The python repository consists of both libraries and programs. A library is a set of functions that can be used in a program. So if you’re writing a program to do something, and you need to work on dates, you can use the pre-written library to do all that. A program is something you run to do one specific function. In general, CPAN only includes libraries. As a result, it’s difficult to compare the size of PyPI to CPAN. Since PyPI contains both libraries, and programs that are written using those libraries, it’s difficult to know how many of the 500,000 projects are libraries and which are programs. Even though I cannot determine the numbers, I’m pretty sure that PyPI has more libraries than CPAN. However, CPAN currently has over 200,000 libraries! To be honest, I’ve never once asked a question “Does CPAN have a module to do XXX?” and have the answer be no. So, even though PyPI may have more… CPAN has enough to handle every task I’ve ever undertaken.

But, CPAN does have one feature that PyPI does not have, and it’s a huge one. So huge that it would ALMOST be sufficient to drive me to use perl.

Every library can have a test suite. These are a set of programs that use the functions in the library to perform certain test operations. If the expected result is obtained, the test passes. If not, than the test fails. This allows you, at the time you install the library, to test to make sure it works on your system. Sometimes it doesn’t due to differences in various computer systems, different configurations on different systems, or missing prerequisites. Having a test suite for a library is an invaluable feature. It let’s me know if a library is going to work for me. It also provides me a set of sample software so that I can see exactly how to use the library.

Both python and perl support test suites in their libraries, but the perl community firmly embraced this many years ago, whereas the python community has not. It has been years since I’ve worked with a perl module which did not have a test suite included, but when I (out of curiosity) browsed through a handful of the more popular python libraries, the majority did not include test suites.

Although python COULD be as good as perl in this respect, it isn’t (not even close). And it’s a significant weakness.

OBJECT-ORIENTED

One last thing to mention is that python is object-oriented by default, whereas perl is not.

Traditionally, programming was done in a functional or procedural way. You want to do task XXX. To do that, you break task XXX into smaller tasks, and these might be broken into even smaller tasks. Eventually, each smaller task is “small enough” to be easily understood and programmed. Once all of the small tasks have been programmed, the overall task is complete you now have a working program.

Object-oriented programming approaches things from a different perspective. Instead, you think of the type of data you’re working with. You then consider the types of things you might want to do with that data. For example, if you are working with dates, you might want to look at it and determine what day of the week it is, what the date will be 10 days from then, or how to display that date in a specific format. So you spend your time creating functions (methods) that work on a specific type of data.

Once you have libraries written that perform all of the desired operations on all of the data types you’re working with, the actual program should be pretty simple and just a matter of working with those methods.

Object-oriented programming is, as a rule, significantly more powerful than functional programming (and it’s far easier to reuse the code in other programs). When python was created, it had full support for object-oriented code from the very start. Perl did not. Support for object-oriented programming was added on later. When you create an object-oriented library in perl, you have to start by adding a number of lines that, in effect, say “this is object-oriented”. In python, you don’t have to make that distinction.

I’ve used perl extensively to write object-oriented libraries, and to be honest, it’s good enough. Sure, it might be nice to have it native, but it only takes a minute to set it up and once it’s set up, you never have to think of it.

There are a number of more advanced object-oriented techniques that python supports better than perl, but to be honest, when I consider many of those techniques, I find them undesirably complex. The feature I like best about object-oriented programming is the organizational aspects which keeps a library oriented only on tasks related to a type of data. Understand the data and you mostly understand the library.

More advanced object-oriented techniques add power, but at the expense of simplicity. As a general rule, I choose to forego that compromise.

So, python does have a slight edge here, but it’s modest.

OVERVIEW

If I had to award points for how well each language fit the above, here’s how I’d rate them:

PerlPython
High-level1010I see no significant difference between perl and python.
Readability89I’ll give a slight edge to python due to it’s one-way-to-do-things. But also note that this value is really more a function of the programmer than the language.
Whitespace101No question here. Python did it wrong. Period!
Libraries84Python has a slight edge with respect to the number of libraries, but the lack of community support for test suites really hurts python.
Object-oriented79Python gets the edge here, but for simple stuff, perl is okay.
       

So the problem is that in the areas where python is the winner, perl is good enough. It holds it’s own. The reverse is not true. Python fails (miserably IMO) in the areas that perl leads.

Anyway, this is not meant to be a criticism of python programmers. I understand that the numbers in the chart will be different for others. If you choose to use python, more power to you. Just don’t expect me to be joining the python ranks.

Add a Comment

Your email address will not be published. Required fields are marked *