Monday, June 1, 2015

Write beautiful code

Writing code is much more than an engineering discipline; it is a true art form. Each line of code you write reflects your understanding of the language, your ability to communicate ideas to other engineers, and the lessons lessons from previous projects.


There is no "correct" code style, just as there is no "correct" solution to any given engineering task. The internet is littered with style guides written in absolutes, and these can be useful, but often forget that the context of a project is as important as the code itself. Mocking up a proof of concept for a hobby project is very different than implementing a base class to be used by your entire team.


When you write code, your first priority is to satisfy functional and performance requirements. This is universally requisite, but as engineers, we often forget the importance of ensuring that our code be easy to understand by other programmers (including your future self). Remember that time you looked at a project from six months ago and had no idea how it worked? It doesn't have to be that way.


Let's dispel some myths about writing beautiful code.


Myth #1 - Verbose comments make you a bad programmer


Many engineers profess that comments are unnecessary, and that a good engineer should be able to read code completely devoid of comments. This sounds cool, and conjures up an image of a badass motherfucking sorcerer-hacker wizard-god reading the matrix on 16 goddamn monitors at once. The thing is, as mere mortals, we are better at processing natural language than code. Certainly we need to be able to read and understand code quickly, but it is usually easier to parse a few sentences in your native language than to truly understand the code in an unfamiliar function.


Myth #2 - White space is bad


A common school of thought is that less code is better, and less white space is better. Surely if we remove all blank lines and keep our code as tiny as possible, everybody wins, right? I mean, classic Greek and Latin was written in all caps and without spaces, so we should apply that to our code, right?


Let's consider the following function, with and without comments and line breaks.


No comments, no line breaks


std::vector<std::complex<double>> plcp_data::encode(Rate rate, std::vector<unsigned char> payload, unsigned short service)
{
   RateParams rate_params = RateParams(rate);
   int num_symbols = std::ceil(double((16 + 8 * (payload.size() + 4) + 6)) / double(rate_params.dbps));
   int num_data_bits = num_symbols * rate_params.dbps;
   int num_data_bytes = num_data_bits / 8;
   int num_pad_bits = num_data_bits - (16 + 8 * (payload.size() + 4) + 6    std::vector<unsigned char> data(num_data_bytes+1, 0);
  memcpy(&data[0], &service, 2);
   memcpy(&data[2], payload.data(), payload.size());
  boost::crc_32_type crc;
   crc.process_bytes(&data[0], 2 + payload.size());
   unsigned int calculated_crc = crc.checksum();
   memcpy(&data[2 + payload.size()], &calculated_crc, 4);
  std::vector<unsigned char> scrambled(num_data_bytes+1, 0);
   int state = 93, feedback = 0;
   for(int x = 0; x < num_data_bytes; x++)
   {
      feedback = (!!(state & 64)) ^ (!!(state & 8));
      scrambled[x] = feedback ^ data[x];
      state = ((state << 1) & 0x7E) | feedback;
   }
   data.swap(scrambled);
std::vector<unsigned char> data_encoded(num_data_bits * 2, 0);
   viterbi v;
   v.conv_encode(&data[0], data_encoded.data(), num_data_bits-6);
   std::vector<unsigned char> data_punctured = puncturer::puncture(data_encoded, rate_params);
  interleaver inter;
   std::vector<unsigned char> data_interleaved = inter.interleave(data_punctured);
   std::vector<std::complex<double>> data_modulated = modulator::modulate(data_interleaved, rate);
  return data_modulated;
}


Verbose comments, liberal line breaks


// Encode the data portion of a PLCP frame given a rate, payload, and service field
std::vector<std::complex<double>> plcp_data::encode(Rate rate, std::vector<unsigned char> payload, unsigned short service)
{
   // Get the parameters associated with the given rate
   RateParams rate_params = RateParams(rate);

   // Calculate the number of symbols
   int num_symbols = std::ceil(
           double((16 /* service */ + 8 * (payload.size() + 4 /* CRC */) + 6 /* tail */)) /
           double(rate_params.dbps));

   // Calculate the number of data bits/bytes (including padding bits)
   int num_data_bits = num_symbols * rate_params.dbps;
   int num_data_bytes = num_data_bits / 8;

   // Calculate the number of padding bits (past the 6 tail bits)
   int num_pad_bits = num_data_bits - (16 /* service */ + 8 * (payload.size() + 4 /* CRC */) + 6 /* tail */);

   // Concatenate the service and payload
   std::vector<unsigned char> data(num_data_bytes+1, 0);
   memcpy(&data[0], &service, 2);
   memcpy(&data[2], payload.data(), payload.size());

   // Calcualate and append the CRC
   boost::crc_32_type crc;
   crc.process_bytes(&data[0], 2 + payload.size());
   unsigned int calculated_crc = crc.checksum();
   memcpy(&data[2 + payload.size()], &calculated_crc, 4);

   // Scramble (whiten) the data
   std::vector<unsigned char> scrambled(num_data_bytes+1, 0);
   int state = 93, feedback = 0;
   for(int x = 0; x < num_data_bytes; x++)
   {
      feedback = (!!(state & 64)) ^ (!!(state & 8));
      scrambled[x] = feedback ^ data[x];
      state = ((state << 1) & 0x7E) | feedback;
   }
   data.swap(scrambled);

   // Convolutionally encode the data
   std::vector<unsigned char> data_encoded(num_data_bits * 2, 0);
   viterbi v;
   v.conv_encode(&data[0], data_encoded.data(), num_data_bits-6);

   // Puncture the data
   std::vector<unsigned char> data_punctured = puncturer::puncture(data_encoded, rate_params);

   // Interleave the data
   interleaver inter;
   std::vector<unsigned char> data_interleaved = inter.interleave(data_punctured);

   // Modulate the data
   std::vector<std::complex<double>> data_modulated = modulator::modulate(data_interleaved, rate);
   
return data_modulated;
}

A good engineer can parse either function and figure out what's going on, and that makes sense, it's kinda what we do. The thing is, it takes at least an order of magnitude longer to parse and understand the uncommented version.

Programmers often argue that comments are redundant, but as projects grow in size and complexity, comments are like the notes you use to cram before a big exam. In a profession that values efficiency, it makes sense to comment early and often.

Difficult to decipher code might look cool, and obfuscated programming competitions reveal some pretty fucking rad language features. Unfortunately, we live in the real world, and in the real world we need to cater to every engineer who might look at your code in the future. Don't be a dick. Comment your code.
Myth #3 - Naming conventions matter


This is really a half-myth. In the ever raging battle between PascalCase, camelCase, underscore_case, I_LOVE_LAMP_CASE and the like, there will never be a winner. It is a pointless war. Everybody has a preference on naming conventions, and some languages overwhelmingly prefer one style over the other. The only thing that matters is consistency.


  • If you are working on an existing project, for the love of science, follow the conventions already in place.
  • If you are starting a new project, figure out what naming conventions you want to use, and stick with them.
  • If you are writing a project in a language with a style guide, it is good to follow the naming conventions defined in it. It might not matter in the scope of your project, but six months from now, your company is going to hire a C# veteran. This veteran is going to use camelCase and fuck up the underscore_case you lobbied so hard for. You are going to get sad, and it is better to be happy. Be happy.

Myth #4 - Using third party libraries is bad

Engineers like building things and can be hesitant to use others' code. I love the sense of accomplishment of implementing a complex algorithm in an efficient way. It feels great, and I think, "golly gee, I must be a good engineer, I implemented this complex mathsomethingorotherfunction(...) in only two days! Go me! I can do hard things!"

The thing is, most of the time, had I Googled around a bit, I would have found an MIT/BSD licensed library to do exactly what I needed and more.  If somebody else has already written the code you need, and it satisfies the licensing requirements of your current project, use it. As much fun as it is to have complete ownership over your code, you could have spent those two days building something new and original instead.

Get to the point already...

I could spend all day ranting about what not to do, but you would probably just get bored and go back to Reddit or Hacker News or Slashdot or whatever it is that you do at work, so it is about time to get to the point. If you take away one thing from this blog post, it is this:

Write your code so a freshman computer science student can read it. Keep it simple and don't over-complicate things.

The enlightened programmer will leave his or her ego at the door, write as little, well commented code as possible, and be more productive for it.

Write beautiful code.

8 comments:

  1. Myth 4 I think needs some additional rules – Use 3rd party Libraries – but check the code: read it, test it, looking for security holes. Sooner or later you’ll need to explain why you think it’s trustworthy.

    ReplyDelete
  2. I don't agree with verbose comments, it is not wise to write a comment that pretty much resembles the code that it comments, like in many places of your example above. However, algorithmic stuff should be heavily and verbosely commented.

    ReplyDelete
  3. It's not that comments are bad, it's just that often times code can be written such that comments are redundant. Thus any time you feel comments are necessary for the sake of comprehensibility, that's a good to time to stop and determine if it could be made more clear with some minor refactoring.

    For example (pardon the language departure, but I find c# a little less verbose):

    *** Before ***
    void UpdatePage(User currentUser){

    List permissions = _permissionService.GetUserPermissions(currentUser);

    //check if the user is allowed to edit content and ensure that the page is editable before saving changes
    if(permissions.Any(p => p.Name.Equals("Administrator", StringComparison.OrdinalIgnoreCase || p.Name.Equals("Editor", StringComparison.OrdinalIgnoreCase) && !this.ReadOnlyMode)
    {
    _contentService.Update(this.Content);
    }

    }

    *** After ***

    void UpdatePage(User currentUser){

    if(UserCanUpdatePage(currentUser)
    {
    _contentService.Update(this.Content);
    }
    }

    bool UserCanUpdatePage(User currentUser){
    return UserIsEditor(currentUser) && !this.ReadOnlyMode;
    }

    bool UserIsEditor(User currentUser){

    List permissions = _permissionService.GetUserPermissions(currentUser);

    return permissions.Any(p => p.Name.Equals("Administrator", StringComparison.OrdinalIgnoreCase || p.Name.Equals("Editor", StringComparison.OrdinalIgnoreCase)

    }


    I'm sure this could be refactored to make even more sense, but my point is that in the second example the motivation expressed in the comments has been expressed literally in the code. My trivial example really doesn't do justice to the ideas Uncle Bob Martin writes about in his book "Clean Code: A Handbook of Agile Software Craftsmanship".

    ReplyDelete
  4. Yes! This!

    The "rule" has been taking incorrectly: it's not that comments are EVIL, it's that they are indicative that you're not able to make your code expressive enough.
    The harsh truth is that sometimes you just can't -- your code caters for a situation which is just crazy (like working around an OS bug or dealing with a "super-wat" that comes out of a third-party library. But *mostly*, you can.

    Bearing in mind that it's been a while since I was in CPP-land and that I did the following refactor without the aid of a compiler to complain at me, the only comments left are:
    * to point out a redundant comment
    * to point out that my pointer logic isn't what it used to be so I hold temporary variables for dereferenced pointers so I can copy out your code into the refactored methods
    * TODO comments where a magic variable should be refactored into a constant which makes it readable:

    Code should be fully readable WITHOUT COMMENTS as maintaining comments adds maintenance overhead -- comments often rot out of sync with the code and just introduce a point of confusion.
    The burden isn't on the READER to be able to magically understand the writer's intent -- the burden is on the WRITER to write code which reads well -- preferably like a story (which is why I end up having words like "For" in method names).

    This comment and the code exceeded 4096 characters, so you can check out the refactor here:
    https://github.com/fluffynuts/write-beautiful-code-refactor-comments-must-die/blob/master/plcp_fragment.cpp
    Please bear in mind:
    1) I can't compile this to check for obvious errors
    2) I haven't lived in C++ in around 4 years, so I am rusty (hence the rusty comments where appropriate)

    I still think expressive code can be done without comments, except where absolutely necessary. The compiler doesn't care about the length of your identifier strings. Modern IDEs make long names easy to deal with. People appreciate readable code. So why not make the code self-expressive?

    ReplyDelete
  5. I agree with all of this.
    IMHO - comments should tell you the WHY - not the what.
    My most often-asked question when reading somebody else's code (including my past self from 6 months ago) is, "Why is it done like that? Is there something I don't know about and therefore something that will break if I change it?"

    Comments should warn of traps for the unwary or the un-initiated. Especially where something a bit out of the ordinary is done.
    Examples:
    // Need to read this register twice due to a fault in the silicon - see the Errata sheet
    ...
    // Deliberately using pointers and not references as we sometimes want to pass in 'nullptr'
    ...
    // Don't make this 'constexpr' as that's not supported by VS3013 yet - revisit if/when it is
    ...
    etc, etc

    ReplyDelete
  6. Technically, if that function were broken down into its sub-functions, you wouldn't really need the comments to figure out what each piece is doing:

    GetNumberOfSymbols();
    GetNumberOfBitsAndBytes();
    GetNumberOfPaddingBits();
    Concatenate();
    AppendCRC();
    Scramble();
    Encode();
    Puncture();
    Interleave();
    Modulate();

    ReplyDelete