Encoding

Encoding is the simplest to understand. In programming, encoding typically defines how something should be rendered or displayed. An obvious example of this is character encoding. In our class, we work often with MySQL which defaults to a latin character encoding for our text. Your webpages, on the other hand, are typically declared to be in UTF-8, which is the standard for web pages to be globally friendly. You may have seen the following snippet in the source for HTML pages: <meta charset="UTF-8">. This tag must be the first tag within the <head> tag for it to render properly, and the document must be declared as HTML5. Put simply, the difference between latin and UTF-8 character encoding is how the characters should be read by the parser or rendering agent. Both can have the same value, so a computer needs to be told what the encoding is to show it properly. This is the same reason why you can't simply rename a .wav file to .mp3, the encoding would be wrong and a computer could not stream it without errors. MySQL defaults to latin character set because for the majority of websites in English, this will work fine; however, if you start using special characters, such as characters with accent marks, you may begin to see empty squares or unexpected data. If you've ever opened a binary file in Notepad, you'll have seen something like this.

Encryption

Encryption is commonly mistaken for encoding. Encryption is not encoding! Encryption allows for a message to be configured in such a way that you need some sort of key to decrypt the text. Any World War II buffs will know about the Enigma machine, which is a great example of encryption. There are many encryption protocols used for a variety of purposes. Encryption can be secure; however, it has limitations. SSL uses RSA encryption, which employs the public and private key methods. RSA works because computers are very good at multiplication, but much slower at division, and you can quickly multiply large integers but finding primes for that sum is quite difficult. You can find much more information about this subject online, and Loyola offers a mathematics course on encryption as well.

Hashing

Hashing is similar to encryption; however, the purpose of hashing is that it works only one way. You cannot decrypt hashing, because it hashes are not encrypted to start with; however, hashing uses encryption algorithms. You may ask the question, "But if I can't decrypt the password, how do I know if it matches?" Simple. You check that the hashes "match" (I use this term lightly, as for the better, more secure hashes, the values can match without the strings being identical.) There are many hashing algorithms. Most commonly known are MD5, SHA1, and Bcrypt (blowfish). Bcrypt is one of the most popular hashing options. So what is the difference between hashing and encryption, if hashing uses encryption algorithms? Why can you not decrypt a hash? Hashes use the encryption algorithms for a different purpose. Hashes are digests, which is a relatively small string (block of text) that provides a unique or close to unique representation of the original input. (Thanks to Dr. Stephen Doty for his input here.) Hashes almost always use salt to secure the text. Salts are a random piece of data that is added to a hashing algorithm to change the original input string before hashing. One of the key components of a hash is that, unlike encryption, changing one single bit in the original input string causes a completely different hash.

Rainbow Tables

Rainbow tables are one of the many reasons that MD5 and SHA1 are not acceptable solutions for securing sensitive data. Wikipedia summarizes a rainbow table far better than I can: Wikipedia - Rainbow Tables. In short, the rainbow table takes a hash from a known algorithm, such as MD5, and provides a list of possible plaintext inputs for it.

Why you should never ever ever never ever use MD5 or SHA1 (for sensitive data)

MD5 was created in 1991. The first security flaw was discovered in 1996. Several more of been found since then. The bottom line is that using rainbow tables you can crack an MD5 password in less than 1 second. If you are a business storing passwords in your database using only MD5 or SHA1 to secure them, you are making a terrible mistake and likely are subject to lawsuits for negligence if your database is hacked and these are released. SHA1 is also subject to vulnerabilities. Also keep in mind that many governments, including the US government, have been actively working to undermine all these algorithms.

An aside: there are some uses for these algorithms, such as checksums with MD5; however, you should never rely upon these to secure sensitive data.

How to implement secure hashes using PHP

Implementing secure hashing algorithms in PHP is incredibly easy as of PHP 5.5. If you are not yet running PHP 5.5, but if you are using 5.3.7-5.4.x you can get the same behaviour through an extended API. If you are not using at least 5.3.7, then Bcrypt is not properly implemented in PHP and you should strongly consider upgrading your install. If you're unsure which PHP version you are running, enter the following code into a new PHP file and run it in your browser: <?=phpversion()?>.

If you are running PHP 5.3.7 through 5.4.x, you should clone the following repository into your working directory for PHP. PHP Hashing API for 5.3.7 through 5.4.x. This library is already included in the source code for our example. You do not need to disable this if you are running 5.5 or higher as it will degrade gracefully.

Next, let us view some examples.