Monday, June 1, 2015

Write beautiful code

Writing code is much more than an engineering discipline; it is a true art form. Each line of code you write reflects your understanding of the language, your ability to communicate ideas to other engineers, and the lessons lessons from previous projects.


There is no "correct" code style, just as there is no "correct" solution to any given engineering task. The internet is littered with style guides written in absolutes, and these can be useful, but often forget that the context of a project is as important as the code itself. Mocking up a proof of concept for a hobby project is very different than implementing a base class to be used by your entire team.


When you write code, your first priority is to satisfy functional and performance requirements. This is universally requisite, but as engineers, we often forget the importance of ensuring that our code be easy to understand by other programmers (including your future self). Remember that time you looked at a project from six months ago and had no idea how it worked? It doesn't have to be that way.


Let's dispel some myths about writing beautiful code.


Myth #1 - Verbose comments make you a bad programmer


Many engineers profess that comments are unnecessary, and that a good engineer should be able to read code completely devoid of comments. This sounds cool, and conjures up an image of a badass motherfucking sorcerer-hacker wizard-god reading the matrix on 16 goddamn monitors at once. The thing is, as mere mortals, we are better at processing natural language than code. Certainly we need to be able to read and understand code quickly, but it is usually easier to parse a few sentences in your native language than to truly understand the code in an unfamiliar function.


Myth #2 - White space is bad


A common school of thought is that less code is better, and less white space is better. Surely if we remove all blank lines and keep our code as tiny as possible, everybody wins, right? I mean, classic Greek and Latin was written in all caps and without spaces, so we should apply that to our code, right?


Let's consider the following function, with and without comments and line breaks.


No comments, no line breaks


std::vector<std::complex<double>> plcp_data::encode(Rate rate, std::vector<unsigned char> payload, unsigned short service)
{
   RateParams rate_params = RateParams(rate);
   int num_symbols = std::ceil(double((16 + 8 * (payload.size() + 4) + 6)) / double(rate_params.dbps));
   int num_data_bits = num_symbols * rate_params.dbps;
   int num_data_bytes = num_data_bits / 8;
   int num_pad_bits = num_data_bits - (16 + 8 * (payload.size() + 4) + 6    std::vector<unsigned char> data(num_data_bytes+1, 0);
  memcpy(&data[0], &service, 2);
   memcpy(&data[2], payload.data(), payload.size());
  boost::crc_32_type crc;
   crc.process_bytes(&data[0], 2 + payload.size());
   unsigned int calculated_crc = crc.checksum();
   memcpy(&data[2 + payload.size()], &calculated_crc, 4);
  std::vector<unsigned char> scrambled(num_data_bytes+1, 0);
   int state = 93, feedback = 0;
   for(int x = 0; x < num_data_bytes; x++)
   {
      feedback = (!!(state & 64)) ^ (!!(state & 8));
      scrambled[x] = feedback ^ data[x];
      state = ((state << 1) & 0x7E) | feedback;
   }
   data.swap(scrambled);
std::vector<unsigned char> data_encoded(num_data_bits * 2, 0);
   viterbi v;
   v.conv_encode(&data[0], data_encoded.data(), num_data_bits-6);
   std::vector<unsigned char> data_punctured = puncturer::puncture(data_encoded, rate_params);
  interleaver inter;
   std::vector<unsigned char> data_interleaved = inter.interleave(data_punctured);
   std::vector<std::complex<double>> data_modulated = modulator::modulate(data_interleaved, rate);
  return data_modulated;
}


Verbose comments, liberal line breaks


// Encode the data portion of a PLCP frame given a rate, payload, and service field
std::vector<std::complex<double>> plcp_data::encode(Rate rate, std::vector<unsigned char> payload, unsigned short service)
{
   // Get the parameters associated with the given rate
   RateParams rate_params = RateParams(rate);

   // Calculate the number of symbols
   int num_symbols = std::ceil(
           double((16 /* service */ + 8 * (payload.size() + 4 /* CRC */) + 6 /* tail */)) /
           double(rate_params.dbps));

   // Calculate the number of data bits/bytes (including padding bits)
   int num_data_bits = num_symbols * rate_params.dbps;
   int num_data_bytes = num_data_bits / 8;

   // Calculate the number of padding bits (past the 6 tail bits)
   int num_pad_bits = num_data_bits - (16 /* service */ + 8 * (payload.size() + 4 /* CRC */) + 6 /* tail */);

   // Concatenate the service and payload
   std::vector<unsigned char> data(num_data_bytes+1, 0);
   memcpy(&data[0], &service, 2);
   memcpy(&data[2], payload.data(), payload.size());

   // Calcualate and append the CRC
   boost::crc_32_type crc;
   crc.process_bytes(&data[0], 2 + payload.size());
   unsigned int calculated_crc = crc.checksum();
   memcpy(&data[2 + payload.size()], &calculated_crc, 4);

   // Scramble (whiten) the data
   std::vector<unsigned char> scrambled(num_data_bytes+1, 0);
   int state = 93, feedback = 0;
   for(int x = 0; x < num_data_bytes; x++)
   {
      feedback = (!!(state & 64)) ^ (!!(state & 8));
      scrambled[x] = feedback ^ data[x];
      state = ((state << 1) & 0x7E) | feedback;
   }
   data.swap(scrambled);

   // Convolutionally encode the data
   std::vector<unsigned char> data_encoded(num_data_bits * 2, 0);
   viterbi v;
   v.conv_encode(&data[0], data_encoded.data(), num_data_bits-6);

   // Puncture the data
   std::vector<unsigned char> data_punctured = puncturer::puncture(data_encoded, rate_params);

   // Interleave the data
   interleaver inter;
   std::vector<unsigned char> data_interleaved = inter.interleave(data_punctured);

   // Modulate the data
   std::vector<std::complex<double>> data_modulated = modulator::modulate(data_interleaved, rate);
   
return data_modulated;
}

A good engineer can parse either function and figure out what's going on, and that makes sense, it's kinda what we do. The thing is, it takes at least an order of magnitude longer to parse and understand the uncommented version.

Programmers often argue that comments are redundant, but as projects grow in size and complexity, comments are like the notes you use to cram before a big exam. In a profession that values efficiency, it makes sense to comment early and often.

Difficult to decipher code might look cool, and obfuscated programming competitions reveal some pretty fucking rad language features. Unfortunately, we live in the real world, and in the real world we need to cater to every engineer who might look at your code in the future. Don't be a dick. Comment your code.
Myth #3 - Naming conventions matter


This is really a half-myth. In the ever raging battle between PascalCase, camelCase, underscore_case, I_LOVE_LAMP_CASE and the like, there will never be a winner. It is a pointless war. Everybody has a preference on naming conventions, and some languages overwhelmingly prefer one style over the other. The only thing that matters is consistency.


  • If you are working on an existing project, for the love of science, follow the conventions already in place.
  • If you are starting a new project, figure out what naming conventions you want to use, and stick with them.
  • If you are writing a project in a language with a style guide, it is good to follow the naming conventions defined in it. It might not matter in the scope of your project, but six months from now, your company is going to hire a C# veteran. This veteran is going to use camelCase and fuck up the underscore_case you lobbied so hard for. You are going to get sad, and it is better to be happy. Be happy.

Myth #4 - Using third party libraries is bad

Engineers like building things and can be hesitant to use others' code. I love the sense of accomplishment of implementing a complex algorithm in an efficient way. It feels great, and I think, "golly gee, I must be a good engineer, I implemented this complex mathsomethingorotherfunction(...) in only two days! Go me! I can do hard things!"

The thing is, most of the time, had I Googled around a bit, I would have found an MIT/BSD licensed library to do exactly what I needed and more.  If somebody else has already written the code you need, and it satisfies the licensing requirements of your current project, use it. As much fun as it is to have complete ownership over your code, you could have spent those two days building something new and original instead.

Get to the point already...

I could spend all day ranting about what not to do, but you would probably just get bored and go back to Reddit or Hacker News or Slashdot or whatever it is that you do at work, so it is about time to get to the point. If you take away one thing from this blog post, it is this:

Write your code so a freshman computer science student can read it. Keep it simple and don't over-complicate things.

The enlightened programmer will leave his or her ego at the door, write as little, well commented code as possible, and be more productive for it.

Write beautiful code.