Assignment title: Information

Cryptographic Hash Functions This problem introduces a hash function that is similar to the SHA1

cryptographic hashes, but it operates on letters instead of bytes, and it's feasible to compute by hand. It is called the Toy Tetra-graph Hash (TTH), and just like SHA, TTH computes a fixed-length value for any given message. Despite it's triviality, it uses the same basic ideas of the cryptographically "secure" SHA hashes, and it will teach

you how they operate, familiarize you with the concept of a hash collision, and lead you to think about their implications. Here's how TTH works: First, it divides a given message into blocks of 16 letters, ignoring spaces, and punctuation, and capitalizes the letters. The letters in this matrix are converted to indices in the range

[0,25]. If the message length is not divisible by 16, then the last block is padded with 'A's. A four-number tuple, called the state of the hash, is maintained that is initialized with the value (0,0,0,0) and it is input to the TTH "compression function" for processing the first block. The compression function consists of two rounds that are applied to every 16-character segment of the message, until the entire message has been processed.

If your message creates a set of 16-character blocks: B1, B2, B3, … , BN , then the hash of the message is the following sum: (0,0,0,0) + K(B1) + K (B2) + K(B3) + … + K(BN) where all operations are addition modulo 26. 1https://en.wikipedia.org/wiki/Secure_Hash_Algorithm

16 indices 4 indices Round 1 Round 2

Sum1 Sum2 + compressor

Round 1 Arrange the given block of text into a row-wise 4x4 matrix and finally convert it to indices (A=0, B=1, etc.). For example, for the block ABCDEFGHIJKLMNOP, we have: Text

A B C D E F G H I J K L M N O P

Indices 0 1 2 3 4 5 6 7 8 9 10 11

12 13 14 15 24 2 6 10 Sum = (24, 2, 6, 10), State = (24, 2, 6, 10) +

(0,0,0,0) Then, add each index column modulo 26 and add these results to the current state, modulo 26. In this example the state becomes (24,2,6,10).

Round 2 Using the index matrix result from Round 1, rotate the first row left by 1, the second row by 2, the third row by 3, and then reverse the order of row 4. B C D A

G H E F L I J K P O N M 1 2 3 0 6 7 4 5

11 8 9 10 15 14 13 12

7 5 3 1 Sum = (7, 5, 3, 1), State = (5, 7, 9, 11) =

(24,2,6,10) + (7, 5, 3, 1) Now, add each new column modulo 26 and then add these new results to the current state, modulo 26. The new state is now (5, 7, 9, 11)2

. (And, if there are subsequent blocks in the given message, because it's longer than 16 characters, the current state is used as the input to the first round of the compression function for the next block of text, etc.) After the final block is processed, convert the final state value to letters; this set of four letters is the TTH hash of your message. In this example, when the message is ABCDEFGHIJKLMNOP, the hash is FHJL. 2The value (5, 7, 9, 11) is the sum of the state from Round 1 (24, 2, 6, 10) and the sum from Round 2

(7, 5, 3, 1). (31 mod 26, 9 mod 26, 15 mod 26, 21 mod 26) Problem 1 (40 points): Calculate the hash value for the 22-letter message: "a stitch in time saves nine", or equivalently:

ASTITCHINTIMESAVESNINE, after removing whitespace, and capitalizing. Problem 2 (40 points):

Calculate the hash value for the 22-letter message: "a stitch in time saves none", or equivalently: ASTITCHINTIMESAVESNONE

At this point you may have noticed the hash values for the two messages are different by two characters, despite the fact that the messages differ by only one character. This is a desired property of a "good" hash function. However, it's mathematically impossible to ensure this: if two given messages

are identical, their hash values will be identical too. If two messages are different then it's "likely' their hash values will be different, but, it's not mathematically possible to design a hash function that has this property. Problem 3 (20 points):

To demonstrate the weakness of TTH, find a different 22-letter block that produces the same hash value as problem 1. Hint: Use lots of As. (The "message" that creates the collision doesn't needn't be intelligible, complete gibberish will suffice.) Discussion In problem 3, you created a hash collision: http://en.wikipedia.org/wiki/Collision_

%28computer_science%29. "Good" cryptographic hash functions have the property that creating collisions is "hard". (Obviously, the TTH is a poor quality cryptographic hash algorithm.) Read the following wikipedia article on cryptographic hash functions: and become aware that cryptographic hash functions are used for the following purposes, and much,

much more:http://en.wikipedia.org/wiki/Cryptographic_hash_function 1.) Data Integrity. For example, the signature of an X.509 certificate (http://en.wikipedia.org/wiki/X.509) is based on the hash value of its content. This implies that modifying the content of a certificate will (with very high probability) make the signature invalid. i.e.,

the signature can be used to detect tampering, and this is a powerful tool. 2.) Password verification. Passwords are never stored, only their hash values, thereby making the determination of actual passwords very difficult, especially if you add salt. Salty passwords anyone? https://en.wikipedia.org/wiki/Salt_(cryptography)

3.) Message integrity checks. Some network protocols use hash values to determine if messages have been modified. WiFi WPA is an example. http://www.tech-faq.com/mic-message-integrity-check.html 4.) Malware detection. Malware executables can be hashed and the hash values can be distributed (via a secure channel). Then, when an executable is detected in an email message (for example), it's hash

value can be compared to a table of the hash values of known malware. This technique ensures that malware cannot be hidden by changing the name of the file that contains it. 5.) Verifying the integrity and authenticity of software updates. If the hash value of an update is

computed by incorporating the private key of the distributor, then the recipient can use the known public key to compute the hash value. This ensures the update wasn't hacked. And, it verifies that it

came from a trusted source. The Linux kernel source code is signed with a PGP key. In September, 2011, their PGP key was stolen. (This is a different and more secure technique than hashing, as in the case of openssl, but it serves the same purpose.) You can read about this at:

https://www.kernel.org/category/signatures.html. Some software projects, e.g., openssl, use a simpler, less secure technique for fraud prevention. I visited: http://openssl.org/source/ and downloaded the SHA1 signature of the file oenssl-1.0.2a.tar.gz

and found the value: 46ecd325b8e587fa491f6bb02ad4a9fb9f382f5f Then, I downloaded the compressed tar file and computed the SHA1 hash myself and computed the same value:

shasum -a1 openssl-1.0.2a.tar.gz 46ecd325b8e587fa491f6bb02ad4a9fb9f382f5f

The command: "shasum" is available on macintosh computers and many Linux distributions. Notice that the two hexadecimal numbers are equal, and, unless someone has hacked the openSSL WWW site and replaced the hash value of the legitimate code with the hash value of modified version,

then, this equality ensures that the code I downloaded is legitimate, and for a super important library like openSSL that's extremely important. If you ever work as a system administrator for an organization that is run by competent people, then this is something you will be expected to do.

Be Informed: you will be asked practical questions about cryptographic hash

The following tables are provided for your convenience: a b c d e f g h i j k l m

0 1 2 3 4 5 6 7 8 9 10 11 12 n o p q r s t u v w x y z 13 14 15 16 17 18 19 20 21 22 23 24 25