#!/usr/bin/env/python """ Cryptography Tools by Kris Schnee WHAT IS THIS? A collection of basic methods for making and breaking coded messages. This file discusses various historical systems, and includes functions for working with (and against) them. WHY READ THIS? This file will teach you a bit about historical cryptography and cryptanalysis methods. The information is interesting in its own right and is helpful to understand if you have any interest in learning about more advanced codes. This is a Python code file, so there's working software in it, but you can also just read the comments to learn about cryptography. HOW APPLICABLE IS THIS NOW? Most of the systems described here are mainly useful today for historical and educational interest, and are not considered secure. They should not be used for anything truly important to you. A few of these codes are secure enough to be useful against amateur snooping, and the "one-time pad" is truly secure. But anyone using cryptography should be aware of "side-channel attacks"; see below. WHAT'S A GOOD MODERN CODE? The US National Security Agency considers the AES algorithm to be adequate for protecting Top Secret government data when large key sizes are used, so AES is a good choice. (Academic research has "broken" AES only to the extent of reducing its brute-forcing time from "absurdly long" to "1/4 of absurdly long", as of this writing in 2012.) For certain applications such as identifying a message's author, the PGP system is "pretty good" and still in commercial use. The ultimate in unbreakability is the one-time pad, but it is impactical to use properly on a large scale. DEFINITIONS TO KNOW? -Code: A method of replacing words/phrases with other words, pictures &c, for instance "by 'fox' I mean 'attack', and if you see two lanterns hanging in the church that means 'they're coming by sea'." -Cipher: A method of scrambling a message by rearranging or replacing parts of it. The word is often used interchangeably with "code". Most people don't bother with the distinction, and this file will not either. -Cryptography: The making of codes. -Cryptanalysis: The breaking of codes. -Brute force: Trying every possible combination for a code, at least within the set of keys possible for a particular type of code. Cryptanalysists strive to come up with smarter methods that require less work. -Breaking a code: Figuring out how to read a coded message or to greatly reduce the difficulty of brute-forcing. This is a large part of why computers were invented! -NSA: The United States National Security Agency. Lots of mathematicians, lots of computers, HQ a featureless black cube. Treatises on cryptography speak of them with awe and fear; we don't know how good their codebreaking is. If historical experience is any guide, their tech is years ahead of anything known to the public, quite possibly including quantum computers. British people should be aware of GCHQ (Government Communications HeadQuarters), which did crucial work in defeating the Nazis' Enigma cipher among other things. -Quantum Computer: A type of computer relying on manipulation of individual atoms. In theory it can make brute-forcing of any existing code practical, or create an unbreakable code. No one is using these yet, to public knowledge. -Side-Channel Attacks: Codebreaking methods that violate the assumptions by which the code was used. Eg. secretly getting information from the electro- magnetic ("Tempest") radiation given off by a computer as it operates, or detecting patterns in a computer's "random number generator". -Plaintext and Ciphertext: A message before and after it is encrypted. -Letter Frequency Analysis: The search for clues by looking at what letters are used most frequently in a message. Eg. "E" is common in English, so if "Y" appears often in a message it might be a substitute for "E". OTHER COMMENTS? See http://web.forret.com/tools/rot13.asp for a useful tool that does the Caesar/shift cipher and automatic letter frequency analysis. """ ##### METADATA ##### __author__ = "Kris Schnee" __license__ = "Public Domain" __version__ = "2012.5" __contact__ = "VDNSYPP@IPASPC.YPE" ## ...or something like that. ##### DEPENDENCIES ##### import random ## Used for One-Time Pad import time ## Used for One-Time Pad ##### CONSTANTS ##### STANDARD_ALPHABET = "abcdefghijklmnopqrstuvwxyz" """ Brute force data: When using this file on my own computer, I have text files in the same directory, containing 2, 3 and 4-letter words separated by commas. Eg. a file containing the text: "of,to,me,so"... You can make your own by looking up word lists online. This data is not needed for most of the code below. """ try: f=open("2lw.txt","r") t=f.read() f.close() BRUTE_FORCE_2_LETTER_WORDS = t.replace("\n","").split(",") f=open("3lw.txt","r") t=f.read() f.close() BRUTE_FORCE_3_LETTER_WORDS = t.replace("\n","").split(",") f=open("4lw.txt","r") t=f.read() f.close() BRUTE_FORCE_4_LETTER_WORDS = t.replace("\n","").split(",") except: print "Couldn't load word lists for building brute-force tables." ## Tables containing all combinations of 2 or 3 letters, eg. "abc" BRUTE_FORCE_ALL_2_LETTER_COMBOS = [] BRUTE_FORCE_ALL_3_LETTER_COMBOS = [] for a in STANDARD_ALPHABET: for b in STANDARD_ALPHABET: BRUTE_FORCE_ALL_2_LETTER_COMBOS.append(a+b) for c in STANDARD_ALPHABET: BRUTE_FORCE_ALL_3_LETTER_COMBOS.append(a+b+c) ##### CAESAR (SHIFT) CIPHER ##### """ The Caesar, or Shift, Cipher is one of the oldest known forms of code (cipher). It was supposedly used by (Gaius) Julius Caesar in his military campaigns. The idea is that each letter of a message is shifted forward by a certain number of places through the alphabet (in its usual ABC sequence), wrapping around from Z to A. With a shift value of 3, "cat" becomes "fdw". To decrypt a message, you only need to shift everything back by the same amount, or equivalently, shift forward by 26 minus the original amount. The cipher called ROT13 (Rotate-13) is simply this cipher with a value of 13. Note that the cipher as shown here is not exactly what Caesar would have used, because the Roman alphabet had only 23 letters (with no J, U or W). CRYPTANALYSIS: This cipher is very easy to defeat. There are only 26 possible shift values, and a human with average intelligence can try them all within minutes. The code is only secure if the enemy has no idea how it works, meaning that once someone performs "rubber hose cryptanalysis" (ie. torture) to learn how the cipher works, reading encrypted messages becomes trivially easy. The lesson cryptographers learned from the Caesar Cipher was that it's important to distinguish between the "algorithm" (the method of encryption) and the "key" (the data fed into the algorithm). In these terms, Caesar's method used a very simple but secret algorithm and only 23 possible keys. Modern codes use a vast number of keys but make no effort to keep the algorithm secret. In fact, algorithms are publicized and studied by academics in the hopes that any vulnerabilities will be exposed. """ def CaesarEncrypt(plaintext,n,forward=True): """Shift each letter ahead/back by n places.""" ciphertext = "" for char in plaintext: place = STANDARD_ALPHABET.find(char.lower()) if place == -1: ## Not found; is " " or punctuation ciphertext += char else: if forward: ciphertext += STANDARD_ALPHABET[(place+n) % 26] else: ciphertext += STANDARD_ALPHABET[(place-n) % 26] return ciphertext def BruteForceCaesar(text): """Et tu, Brute?""" print "Here are all possible decryptions of the text:" for n in range(26): print CaesarEncrypt(text,n) ##### SKYTALE (COLUMNAR) CIPHER ##### """ Questionable ancient sources, as told through the impeccable Wikipedia, describe a Spartan technique. A writer would wrap a strip of cloth or leather around a rod of a certain diameter, to form a continuous cylindrical surface along the rod. Then, the writer would write a message along the rod's length, and then unwind the strip, scrambling the letters. To read the message, you would need an equal-sized rod. Some similar codes are "rail fence ciphers" that involve writing a message along diagonal lines and weaving those lines together to disguise their proper order. I thought I was being clever and original when I came up with one in grade school. My version serves as an example of a rail fence cipher. Simply write your message in two zigzag patterns and add some junk to either end, eg: "WE HOLD THESE TRUTHS TO BE SELF EVIDENT" becomes JU W H L T E E R T S JU NK E O D H S T U H NK + O E E F V D N T B S L E I E T = JUWOHELETFEVEDRNTJSJU NKTEBOSDLHESITEUTHUNK As with the Caesar cipher, the simple algorithm makes it trivial to defeat. One way of implementing the Skytale code without a rod is to choose a key word like "acorn" and note the alphabetical order of all non-repeated letters, in this case "12453". Then chop the message into 5 columns of equal length, adding junk "padding" at the end if needed. Shuffle the columns into the sequence determined by your key, then write out the ciphertext in one line by reading the letters left-to-right, top-to-bottom. Eg. "THE QUICK BROWN FOX" with 5 columns looks like this before shuffling: THEQU ICKBR OWNFO X.... Which becomes: THQUE ICBRK OWFON X.... And finally: THQUEICBRKOWFONX.... CRYPTANALYSIS: This code is easily defeated by the following method: -Write out the whole ciphertext. -Guess at a column length. A good starting guess is 3-8. -Write the ciphertext as a series of columns of the chosen length. -Try reordering the columns to form words, as read horizontally. That is, if there are 6 columns and the top letters of them are "HETUQ", you might guess that they should be ordered "THEQU". If you're correct, the text should appear as a readable message in the usual left-to-right, top-to-bottom order, such as "THEQUICKBROWNFOX...". The tricky parts are (1) guessing a column length, and (2) spotting the order that will give readable text. Experience with anagrams is helpful. The lesson to learn from this code is that a message can be scrambled not just by changing the letters but by rearranging them. Some modern codes like AES use rearrangement along with substitution. """ ## Columns should be in Python list format: ["abcdef","ghijkl",..."xyzabc"]. def SkytaleShowColumnsInOrder(columns,seq): """Display a set of columns in a selected order.""" ## Establish column order. ordered_cols = [] for n in range(len(seq)): ordered_cols.append(cols[seq[n]-1]) ## For each row: for row in range(len(cols[0])): text = "" ## Add this row's character from each column. for n in range(len(cols)): text += ordered_cols[n][row] print text def SkytaleShowColumns(columns): """For convenience, run the other function repeatedly.""" while True: x = raw_input("Sequence?> ") if not x: return seq = [] for n in range(len(x)): seq.append(int(x[n])) Show(columns,seq) ##### VIGENERE CIPHER ##### """ The Vigenere Cipher was once touted as unbreakable. It improves on the Caesar Cipher by shifting letters by different amounts. Imagine for instance that every third letter is shifted by 4, 15 or 9. The code would have far more possible keys than the standard Caesar version, so it would take longer to test each one. Of course, if the enemy can recognize words that are only partly decrypted, then the rest of the process becomes much easier. This cipher is used by selecting a key word/phrase, and associating each letter with a shift value. Eg. A means "change A to A", ie. shift by 0. (One could also interpret the rule as "A is the first letter, so treat that as shifting by 1, not 0." I'm applying the rule with the famous assumption that A=A.) The first letter of the plaintext is shifted by the first letter of the key, the second letter is shifted based on the second letter of the key, and so on. The key is repeated as necessary. You could picture the message and the data used to shift each letter like this, if the key is "KEY": KEYKEYKEYKEYKEYKEYKEYKEY Thisisasecretmessageguys Note that it's important to agree on whether to delete spaces and punctuation. Generally speaking, deleting those is good, because it makes analysis harder. CRYPTANALYSIS: The Vigenere Cipher is substantially more secure than Caesar's, but still falls to skilled pen-and-paper analysis. Computer tools are helpful, though. This is the general method: -Look for repeated letter sequences in the ciphertext. For instance, the sequence "JZXR" might appear twice. -Find the exact distance in letters (most likely, ignoring any spaces or punctuation left in the ciphertext) between the start of each instance of a repeated sequence. Eg. does "JZXR" appear starting at positions 5 and 25? -Propose that the repeat was caused by the same plaintext being encrypted using the same key, *a certain number of multiples of its length away*. That is, if the key is 4 letters long, two plaintext sequnces will give the same ciphertext if they are 4, 8, 12, or 16 letters apart. If the distance is 24, we can guess that the key length is a factor of 24. -(Note: While the key could logically be 1 letter long, that would be stupid. It would be the equivalent of a Caesar Cipher. You could easily check, though.) -(Note: It's possible that two different plaintext sequences would give the same ciphertext by coincidence. But this is much less likely than the other reason. Comparing factors from multiple repeated sequences will help both to eliminate this possibility and to confirm possible key lengths.) -If the key is N letters long, then every Nth letter was shifted by a constant amount. You can now perform letter frequency analysis by grabbing every Nth letter and seeing which letters are most frequent in that sample. This is easier in this case than for a letter substitution cipher, because the whole alphabet has been shifted. Graph letter frequencies and look for spikes representing A and E, the RST peak, and the lull between U and Z. -Guess at each letter of the key using this data. Eg. if the analysis data shows peaks at B and F, it's likely that every Nth letter was shifted by 1. -Use the key letters to decrypt every Nth letter of the ciphertext. Once part of it is decrypted you may be able to guess the remaining key letters by identifying words in the text. One lesson to learn is that a code can be made more secure by encrypting different parts of the message in different ways. Another thing to note is that patterns in the data are chinks in the armor. A cryptanalyst will use any patterns to deduce information about the key. The longer the key, the more pieces a cryptanalyst must divide the ciphertext into, which makes letter frequency analysis more difficult. The ideal version of this cipher would involve a key as long as the message itself. When this key comes from a known text such as a particular book, it is known as a "book cipher". It can still be defeated by a method that involves guessing words in the ciphertext and looking for what key text would turn those words into the observed ciphertext. ("If this sequence stands for 'acorn', then the key for it would be 'ERTYA'. Could that be part of 'LIBERTYAND'? Let's see what the letters next to 'acorn' would be if that's true...) A better version of the cipher would involve not just using a key as long as the plaintext, but making the key random. This technique is called a "one-time pad" and is, according to current theory at least, truly unbreakable. The reason it's little used is that the key must not be re-used, and that distributing enough keys securely is impractical. Also note that a computer's "random number generator" isn't truly random, and that some people have fretted extensively about better ways to get random data. """ def VigenereDemo(): """Demonstrates encryption of a message using the Vigenere Cipher.""" DEMO_KEY = "rum" DEMO_PLAINTEXT = "His stories were what frightened people worst of all. Dreadful stories they were - about hanging, and walking the plank, and storms at sea, and the Dry Tortugas, and wild deeds and places on the Spanish Main. By his own account he must have lived his life among some of the wickedest men that God ever allowed upon the sea, and the language in which he told these stories shocked our plain country people almost as much as the crimes that he described. My father was always saying the inn would be ruined, for people would soon cease coming there to be tyrannized over and put down, and sent shivering to their beds; but I really believe his presence did us good. People were frightened at the time, but on looking back they rather liked it; it was a fine excitement in a quiet country life, and there was even a party of the younger men who pretended to admire him, calling him a \"true sea-dog\" and a \"real old salt\" and such like names, and saying there was the sort of man that made England terrible at sea. -Treasure Island, by Robert Louis Stevenson" ## Get a key. key = "" while not key: print "Enter an encryption key consisting of spaces (which are deleted) and letters. Leave blank for demo." key = raw_input("Key?> ") if not key: key = DEMO_KEY ## Convert key to lowercase, strip spaces, and check for validity. key = key.replace(" ","").lower() for letter in key: if letter not in STANDARD_ALPHABET: key = "" print "Error: Keys must consist of letters and spaces only." print "Key will be: " + key ## Get the message to encrypt. print "Enter the message to encrypt. Leave blank for demo." raw_plaintext = raw_input("Message?> ") if not raw_plaintext: raw_plaintext = DEMO_PLAINTEXT print "Strip spaces and punctuation and convert to lowercase? (y/n)" while True: strip = raw_input("> ").lower() if strip.lower() in "yn": break if strip.lower() == "y": ## Strip everything but letters and convert to lowercase. plaintext = "" for letter in raw_plaintext: letter = letter.lower() if letter in STANDARD_ALPHABET: plaintext += letter else: plaintext = raw_plaintext print "Plaintext will be: " + plaintext ciphertext = VigenereEncipher(plaintext,key) ## Done. Display the output. print "Ciphertext:" print ciphertext def VigenereEncipher(plaintext,key,keep_extra_chars=False): ## Build the Vigenere square, a grid like: ## ABCDEFGHIJKLMNOPQRSTUVWXYZ ## BCDEFGHIJKLMNOPQRSTUVWXYZA ## CDEFGHIJKLMNOPQRSTUVWXYZAB (and so on) ALPHABETS = [] for n in range(26): a = STANDARD_ALPHABET[n:] + STANDARD_ALPHABET[:n] ALPHABETS.append(a) ## Change the key to a sequence of only lowercase letters. key = key.lower().replace(" ","") new_key = "" for letter in key: if letter in STANDARD_ALPHABET: new_key += letter key = new_key ## Encrypt the message. key_position = 0 ## We will rotate through the letters of the key. ciphertext = "" ## Encrypted letters are added to this. for letter in plaintext: letter = letter.lower() if letter in STANDARD_ALPHABET: ## Select the next letter of the key. For "acorn" we first use "a". key_letter = key[key_position] ## Find the shifted alphabet we should use (0 if key_letter is "a"). alphabet_number = STANDARD_ALPHABET.find(key_letter) alphabet = ALPHABETS[alphabet_number] ## What numbered letter is this plaintext letter? plaintext_number = STANDARD_ALPHABET.find(letter) ## What letter in alphabet has thus number? ciphertext_letter = alphabet[plaintext_number] ## Add it to the ciphertext. ciphertext += ciphertext_letter ## Switch to the next letter of the key for next time. key_position += 1 if key_position == len(key): key_position = 0 ## Start over. else: ciphertext += letter ## Do not shift key letter return ciphertext def VigenereFindEveryNthLetter(text,key_length,shift): """Return a string holding every Nth letter, starting with (shift). Note that the first letter is numbered zero.""" letters = "" while True: letters += text[shift] shift += key_length if shift >= len(text): break return letters def VigenereDecipher(text,key_guess): """Return text with key_guess used for decryption. To supply a partial key, insert non-letter characters, eg. "ac??n".""" ## Build the Vigenere square... reversed. ALPHABETS = [] ALPHABETS.append(STANDARD_ALPHABET) for n in range(1,26): a = STANDARD_ALPHABET[-n:] + STANDARD_ALPHABET[:(26-n)] ALPHABETS.append(a) ## Change the key to a sequence of only lowercase letters. key_guess = key_guess.lower().replace(" ","") new_key = "" for letter in key_guess: if letter in STANDARD_ALPHABET: new_key += letter key_guess = new_key deciphered_text = "" key_position = 0 for n in range(len(text)): if text[n] in STANDARD_ALPHABET: key_letter = key_guess[key_position] if key_letter in STANDARD_ALPHABET: alphabet_number = STANDARD_ALPHABET.find(key_letter) alphabet = ALPHABETS[alphabet_number] plaintext_number = STANDARD_ALPHABET.find(text[n]) deciphered_text += alphabet[plaintext_number].upper() else: ## This part of the password is unknown. Don't decrypt. deciphered_text += text[n] key_position += 1 if key_position == len(key_guess): key_position = 0 else: ## This character is not a letter. Don't decrypt or advance the key. deciphered_text += text[n] return deciphered_text def VigenereBruteForce(text,key_list,print_all=True,watchlist=("THE","THIS")): flagged = [] ## Potentially valid output worth special attention. Should be all uppercase. for key in key_list: plaintext = Decipher(text,key) for word in watchlist: if word in plaintext: flagged.append(plaintext) if print_all: print plaintext if flagged: print "\n\nFlagged output containing words in the watchlist:" for item in flagged: print "\n"+item ##### ONE-TIME PAD ##### """ The One-Time Pad is the ultimate version of the Vigenere Cipher, using a random key as long as the message itself. It destroys all information that could be used to find patterns and crack the code. That is, the text "ERTIWNNGUIE" could be "IHAVEABOMB" or "ILIKEBACON" or "RAINBOWDASH" just as easily, and any of these "decryptions" are just as supported by the evidence. While it's quite powerful, OTP encryption is rarely used today because it's not practical to transport and manage stockpiles of key text. It's important that any key only be used once. (Not counting the same random text happening to be generated again someday.) Imagine supplying a whole army with identical pads and not letting a single copy fall into enemy hands. It's also necessary to coordinate its use, so that each user knows which pad to use for encryption and decryption. CRYPTANALYSIS: You're screwed. More seriously, the vulnerabilities here are side-channel attacks, such as forcing someone to tell you the message, intercepting the keys themselves, improper re-use of keys, or exploiting subtle patterns in a computer's supposedly random number generation. But if the code is used properly, modern theory says that the code is truly unbreakable. """ def OneTimePadEncipher(plaintext,pad_id): try: filename = "otp_" + str(pad_id) + ".txt" f = open(filename,"r") key = f.read() f.close() os.remove(filename) ## Delete key file. "One time", remember? except: raise SystemError("Couldn't find that pad file. Maybe it was already used?") if len(key) < len(plaintext): print "Error: The OTP system requires a key at least as long as the message. The one given is too short." return VigenereEncipher(plaintext,key,False) def OneTimePadDecipher(ciphertext,pad_id): try: filename = "otp_" + pad_id + ".txt" f = open(filename,"r") key = f.read() f.close() if os.path.exists("last_used_key.txt"): os.remove(filename,"last_used_key.txt") os.rename(filename,"last_used_key.txt") ## In case of errors. except: raise SystemError("Couldn't find that pad file. Maybe it was already used?") return VigenereDecipher(ciphertext,key) def CreateOneTimePad(seed_value=None,length=25000): """Creates a one-time pad text. WARNING: While the OTP system is theoretically unbeatable if used correctly, it assumes that the random text it generates is, in fact, random. IT IS NOT, if you use a typical computer Random Number Generator such as this function uses. While a typical RNG is pretty good, it isn't *quite* truly random in its output. In other words, this system is good, and might well be impractical even for professionals to beat, but I don't suggest trusting your life to it if you're running it on a computer without special RNG hardware or software. """ random.seed(seed_value) pad = "" for n in range(length): pad += STANDARD_ALPHABET[random.randint(0,25)] return pad def CreateOneTimePadFiles(howmany=1): for n in range(howmany): filename = "otp_" + str(n).zfill(5) + ".txt" f = open(filename,"w") pad = CreateOneTimePad(time.time()) f.write(pad) f.close() ##### PICTURE SUBSTITUTION CIPHER ##### ## IN PROGRESS ## """ Picture Substitution Cipher Sample file format to read from: 4s jh 9c kd 3c 10s 6s ad 9c kh ac jd 9c kh js 8c 9c ac js 10h qh qh 10h 7c kh jd 7c 3c 4c jd 3d 3c 4c 2c This example uses notation referencing pictures of cards. Any space-separated strings like "squiggle_1" or "otter" are okay.""" def ReadFromFile(filename,words_format=True): """Returns all text from a file, with selectable format. If words_format == True, the text is split into lines by \n and each line is split into "pics" by " ", creating a 2D list array. Otherwise the format is returned as a string with no formatting.""" f = open(filename,"r") text = f.read() f.close() words = text.split("\n") for n in range(len(words)): words[n] = words[n].split(" ") return words def AssignLetters(words): ALPHABET = "abcdefghijklmnopqrstuvwxyz" next_letter = 0 translation = {} for word in words: for pic in word: ## Convert to a letter. letter = translation.get(pic) if not letter: letter = ALPHABET[next_letter] print "Assigning " + letter + " to pic " + pic+"." next_letter += 1 if next_letter == 26: raise SystemError("Error: Too many distinct \"letters\" in this code. There may be nulls or duplicates involved, though.") translation[pic] = letter return translation def DisplayCipherAsLetters(words,translation): text = "" for word in words: for pic in word: text += translation[pic] text += " " return text[:-1] def AnalyzeFrequency(words,translation): freqs = {} if type(words) is list: for word in words: for letter in word: letter = translation.get(letter) ## Add to the recorded number of instances. freqs[letter] = freqs.get(letter,0) + 1 else: for letter in text: letter = translation.get(letter) ## Add to the recorded number of instances. freqs[letter] = freqs.get(letter,0) + 1 for k in freqs.keys(): print k + ": " + str(freqs[k]) def AnalyzeNeighborFrequency(words): pass def SubstitutionDecipher(ciphertext,substitution): """Replaces letters based on a provided dictionary.""" plaintext = "" for letter in ciphertext: l = substitution.get(letter,"").upper() if not l: l = letter plaintext += l return plaintext ##### AUTORUN ##### if __name__ == "__main__": ## Do the following if this code is run by itself: pass