Decrypting Cryptography Basics: Practical Exercises to Fathom Theory (Part 1)


This article is part of the Security Tech Blog Series: Spring Cleaning for Security. My name is Josh and I’m part of the Product Security team here at Mercari. In part one of this blog post, I will be introducing a set of basic cryptography challenges that can be used to build a better understanding of cryptography fundamentals. I recommend you take a crack at the challenges yourself, but if you’re interested you can check out my solutions on GitHub, too.

The images featured in this article are based on ones generated by OpenAI’s DALL·E.


As security professionals, we are frequently asked to provide security best practices to engineers. There are many publicly available and widely accepted guidelines we can often point to in these situations, but this isn’t always the case. What if you are tasked with making new guidelines? What if you need to justify your recommendations? No one is going to be happy with the answer: “these are the rules, just follow them.” We should strive to provide strong reasoning and an in-depth explanation of any recommendations we put forward. However, without a deeper understanding of why and how things can go wrong, it can be difficult for us to put forward the justification. Now, learning the theory behind a solution is one thing; but, hands-on experience can really help you break things down (sometimes literally when it comes to security), and further build the confidence you need to make strong, well-justified, logical solutions to problems.

The goal I set out for myself was to learn the inner workings of the Advanced Encryption Standard (AES). The simplest way to do this would be to research someone else’s implementation. However, I wanted to get a deeper understanding by getting into the details and implementing it myself. After all, in order to really understand how something works, you’ve got to try to build it yourself! I suggest you also try making your own implementation as a learning exercise, too. Try to do it by only reading the spec and without referencing actual implementations.

DISCLAIMER: Custom implementations such as those I have demonstrated here serve purely as a learning exercise. DO NOT make use of custom implementations of cryptographic libraries in actual production applications. There are plenty of existing, battle-tested software implementations for use in real-world systems. Additionally, most modern CPU chipsets now have hardware implementations of AES, which are lightning fast and much more robust.

*PSA: Do not make your own custom crypto algorithms or implementations for production—use well-known and vetted standards instead.

To warm up my crypto-muscles I went through the first set of the Cryptopals Crypto Challenges, which was released several years ago originally by Matasano Security. This time around, I decided to implement everything in Go, which is a language we use heavily at Mercari! I did this to refresh my understanding of basic cryptography concepts as well as to brush up my programming skills.

The first set of challenges starts out relatively simple. As you progress, you build a library of code to be used to solve subsequent challenges. For example, you will need to implement encoders and decoders for hex and Base64 right off the bat—something I imagine most engineers haven’t done in a long time, if ever. Next, you are introduced to the world of cryptanalysis, where you must actually crack the Vigenère cipher with your own code! Finally, having honed your skills with XOR, the set culminates in the implementation of your very own AES-128 encryption in ECB mode—getting a deeper understanding of AES was my original goal! Without further ado, let’s get into reviewing some basic concepts.

Back to the Basics: Encoding, Obfuscation, and Encryption

Before we get started I want to mention a very good article by Daniel Miessler. I have expanded on his excellent explanations here.


The purpose of encoding is to represent something in a certain, pre-defined way for transport, without any regard to secrecy or confidentiality. Everything processed by modern-day computers is “encoded” with 1s and 0s, usually grouped into eight-bit chunks called bytes. Some examples of encodings are: character encodings such as ASCII (English text), Braille, Morse code, and UTF-8; and, data encodings such as Base64, Huffman coding, and percent-encoding.


The purpose of obfuscation is to make something ambiguous or hard to understand, with the intent of confidentiality, without any auxiliary information. However, in reality, obfuscation merely creates an obstacle (or several obstacles) and thus is not a strong control.

As a side note, this concept is represented in Japanese as nandoku-ka (難読化, なんどくか), which aptly means “making something hard to read.”

If you have ever opened the developer tools in a browser to look at Google’s JavaScript code, you will have seen the result of obfuscation—helpful hints like names and structures have been replaced, rendering the code very difficult to understand for humans at a glance, but still keeping its semantics (or utility) intact. It is still possible, with a bit of time and effort, to recreate something resembling the original code.

Source: JS on

Igpay atinlay isyay anyay exampleyay ofyay obfuscationyay. Utbay, ityay isyay otnay eryvay effectiveyay…


The purpose of encryption is to prevent the disclosure of information to anyone other than the intended recipient(s). Encryption uses a well-defined, well-known (i.e., public) algorithm with some kind of secret (called a “key”). Symmetric encryption algorithms use one single key for both encryption and decryption. Some examples of symmetric encryption algorithms are AES and Twofish. Asymmetric algorithms, also known as public-key cryptography, use a pair of keys where the one used for encryption cannot be used for decryption. Some examples of asymmetric encryption algorithms are ECDSA and RSA.

Hex and Base64, two encodings

Now that we have reviewed the basic concepts, it’s time for the first challenge: implementing hex and Base64.


Hex is short for hexadecimal, which means that each place of a number has 16 different possibilities (as opposed to decimal where each place has only ten, or binary where each place has only two). Hexadecimal uses the ten numbers 0 through 9 and the six letters a through f (capitalization does not matter).

We all know that computers represent data with numbers, in particular, binary numbers, or 1s and 0s. Four digits of binary (also known as a nibble) map to one digit of hexadecimal, or the 16 distinct values zero through 15. Eight digits of binary (also known as a byte) map to two digits of hexadecimal, or the 256 distinct values zero through 255. Hexadecimal numbers are convenient to use because all the possible values of a byte can be represented with just two digits (as opposed to three in decimal)!

Text on computers is incidentally just data, too. For example, the character that stands for the capital letter H is represented with a single byte in hexadecimal as 0x48 (which is \(4 \cdot 16 + 8 = 72\) in decimal). The word Hello is represented by five bytes, which in hexadecimal are: 0x48, 0x65, 0x6c, 0x6c, and 0x6f. Treating all of those bytes as strings, and concatenating them together yields: 48656c6c6f.

Can you read that? Since we can’t read it, it must be encryption, right? Well, not so fast. This is an example of encoding (and, I suppose it could technically be considered as a form of obfuscation, too, since it is difficult to interpret at a glance). A number is used—in a one-to-one mapping—to represent a letter. But, by knowing the mapping, it is trivial to recover the original string. Hex encoding therefore provides no level of confidentiality.

Encoding and Decoding

Years ago, I once was asked in an interview to describe the fastest way to reverse the bits of a byte. I later learned that their “correct answer” was to use a lookup table—in other words, it has both a time complexity and auxiliary space complexity of O(1). When encoding and decoding hex, this strategy can be very helpful.

Say we are given the hex-encoded string 7468457265. We can decode it by hand using an ASCII table like below (the second column of numbers is in hex).

Source: Wikipedia

Split the string into two-character chunks (i.e., 74, 68, 45, 72, and 65) and replace each one with the corresponding character given by the table (i.e., t, h, E, r, and e). This yields the string thEre.

A computer can take the character representation of the number 7 (i.e., hexadecimal 0x37) and convert it into the actual hexadecimal number 0x07 very easily. The next character, 4 (i.e., hexadecimal 0x34), can also easily be converted to the hexadecimal number 0x04. By shifting the first number left by four bits, or one hexadecimal digit, and bitwise-ORing it together with the second, we can arrive at 0x70 | 0x04 = 0x74, which is exactly the number we want, and represents the character t. In this way, by looping over each character of input and creating a new byte for each pair, we can automatically convert from the hexadecimal string representations of the bytes to their equivalent values, thus recreating the actual string!

    // … ensure input length is multiple of 2 ...
    var digitBuf uint8
    for i, d := range in {
        v, ok := hexDigitToByte[d]
        if !ok {
            // ... handle bad input error ...

        // put the odd digits into the HI part of the byte
        if i%2 == 0 {
            digitBuf = (v & 0xF) << 4

        // and the even digits into the LO part of the byte
        digitBuf |= v & 0xF



The whole point of Base64 is to prevent data from being mangled during transmission. This is achieved by using an alphabet of guaranteed-printable characters. Back in the day, ASCII was defined as a standard for text on computers. It defined a group of 128 characters, the highest value of which was 7f, which is a full seven bits. Memory at the time was quite expensive; so, some email systems were built with the assumption that nobody would ever need more than seven bits. However, it turned out that more than text would be sent using email, non-textual data (e.g., attachment files) would unfortunately be mangled, since they use full-octet bytes. Hence, Base64 was devised as a clever way around the problem by representing octet data with ASCII.

In order for me to implement Base64 correctly, I had to refresh myself on the finer details by reading section 4 of RFC 4648. The output of Base64 is basically a string of four-byte blocks (24-bits), each representing three bytes of input. Each byte of the output block corresponds to six bits of the original three input bytes. A key point to understand is that the real value of each byte in the output block is determined by the index of the letter in the Base64 alphabet. In other words, the alphabet of Base64 serves merely as a key into a lookup table. For example, if you come across the letter A (ASCII value 0x41) in a Base64 string, that corresponds to the actual value of 0x00 (0). The letter B (ASCII value 0x42) corresponds to 0x01 (1), and the final character of its alphabet, / (ASCII value 0x2f), corresponds to to 0x3f (63).

When implementing the encoder, there are essentially three cases to consider for the length of each block of input: if there are eight bits (i.e., one byte) of input then there will be two bytes of output; when there are 16 bits (i.e., two bytes) of input then there will be three bytes of output; and, when there are 24 bits (i.e., three bytes) of input then there will be precisely four bytes of output. Since each output block is defined as having four bytes, the output will (either logically or explicitly) end with two, one, or no padding characters, respectively. From an implementation perspective, four bytes conveniently corresponds to exactly one unsigned 32-bit integer, which can be represented easily in Go with uint32. A uint32 block can be converted back and forth between Base64 and regular octets with some simple bit masking and shifting.

The ASCII representation of Hello was 0x48, 0x65, 0x6c, 0x6c, and 0x6f. Let’s manually convert this into Base64. The first 24 bits (i.e., three bytes) lined up are: 01001000_01100101_01101100. Splitting it up into six-bit chunks yields: 010010_000110_010101_101100. Each of these chunks can in turn be changed into a decimal representation: 18, 6, 21, and 44. Referencing the Base64 alphabet in Table 1 of the RFC, we get the first four values: S, G, V, and s.

Source: RFC 4648

There are two bytes remaining (0x6c and 0x6f) so our output block will end in one padding byte. Lining up the 16 bits and splitting into six-bit chunks yields 011011 (27 → b), 000110 (6 → G), and 111100 (60 → 8). Putting those all together into one string gives SGVsbG8=!

Is this string encrypted? We can see that the answer is a resounding “no”; it is merely encoded.

Lessons Learned

As should now be blatantly clear: encoding does not an encryption make. Encodings do not ensure any level of confidentiality. At best, they provide simple forms of obfuscation. It is trivial to convert an encoded string back to its original form, or even to a different encoding. Anyone can recover the plaintext, even if they are not the intended recipient—ergo, no confidentiality.

Simple XOR Cipher

Cracking Single-Byte Key XOR

For the third challenge, we are tasked with writing a program to crack some ciphertext that has been encrypted with single-byte key XOR. This cipher is very similar to, and can be broken with the same techniques as, the Caesar cipher.

Since the key is exactly one byte, we know that there are exactly 256 possible keys (i.e., eight bits per byte, hence \(2^{8} = 256\)). Since this is a relatively small number, we can perform an exhaustive search through each of the 256 keys sequentially, declaring the one that yields the highest-scoring result as the winner. I generalized the implementation to use the same byte repeated out to the size of the plaintext, which inadvertently helped solve challenge 5.

So, how can a key be scored? What makes one prospective key better than another? Thankfully, we are given a hint: score English text using character frequency analysis. We need to compare the expected frequencies of the English alphabet—the letters A through Z—to the actual frequencies observed. The key which results in the closest-to-expected frequencies should be the correct key.

My first, naïve attempt at comparison was to calculate the sum of the arithmetic difference between the two histograms, or more concisely: \(\displaystyle\sum{expected_i - actual_i}\). I was looking for the value closest to zero. However, this method of calculating distance did not give great results.

I expanded the histograms to cover all possible values of bytes (0x00 through 0xff). I hoped that would lead to an increase in accuracy, because strings with nonsensical bytes would be scored worse. It seemed to help a bit; but, I still could not get definitively good results. I generated my own expected frequency histogram using a randomly chosen book from Project Gutenberg.

I then focused on improving the distance calculation itself. I attempted but quickly abandoned the Levenshtein distance, since it didn’t yield adequate results. After a few days of research into probability and distributions, and various attempts at different “distances”, I discovered an approach from statistics for estimating the similarity (or overlap) of two probability distributions called the Bhattacharyya distance. When two distributions have been sampled into discrete buckets (i.e., histograms, like we have), the Bhattacharyya coefficient can be used to estimate the distance. It is defined as the sum of the square root of the product of each position in both histograms, or more concisely: \(\displaystyle\sum{\sqrt{expected_i \cdot actual_i}}\). However, I am still unsure as to why this gives better results; the math is above me. It feels somewhat reminiscent of the Euclidean distance, given the square root; but, at the same time it is quite different. Interestingly, I am not the first to use it for solving this particular challenge, as a quick search will show.

Detecting Single-Byte Key XOR

In order to find the single-byte key for a particular ciphertext, it is possible to sequentially iterate over all 256 possibilities in a reasonable amount of time. However, when searching for the one encrypted string in a list of 300+ 60-byte strings, serial processing starts to take an unacceptably long amount of time.

Luckily, Go allows us to easily achieve concurrency with goroutines. The final optimization of the single-byte key search was to internally utilize a sync.WaitGroup. There are four main steps: setup; fan out and search; wait then clean up; and process. Each goroutine spawned in ② scores a single key and returns its result on a channel. The goroutine in ③ waits for all the work in the sync.WorkGroup to finish, then closes the channel. Meanwhile, the loop in ④ reads from the channel, keeping only the best-scoring result. This is an idiomatic example of concurrent processing in Go.

    // ① setup 
    var wg sync.WaitGroup
    items := make(chan item)

    // ② loop for each possible byte
    for key := byte(0); key < 0xFF; key++ {
        // increment work count and start goroutine
        go func(key byte) {
            // decrement work count
            defer wg.Done()
            // ... calculate score for key ...
            // and then write result to channel
            items <- item{ /* ... */ }

    // ③ wait for all work to finish then close channel
    go func() {

    // ④ process results until channel closed, keeping only the best
    for m := range items {
        if m.score > best.score {
            best = m

My single-byte XOR key search function thus attained super awesomeness of the most crypto-cool kind.

Cracking Repeated Multi-byte Key XOR

This challenge is pretty straight forward since they give you the steps to solve it. However, I will take the liberty to expand on why those steps actually solve the challenge.

It turns out that the repeated multi-byte key XOR cipher is a special case of the Vigenère cipher, where the scrambling is performed at the bit level (i.e., using bitwise XOR). The Vigenère cipher reduces to an interleaved single-byte key cipher, which we already know how to crack! The Hamming distance of the bits between same-sized chunks of a ciphertext is effectively a special case of the index of coincidence. The lower the average, normalized Hamming distance across chunks, the higher the probability that the chunks were all encrypted with the same key, and therefore the probable key size can be inferred.

Once the key size has been narrowed down to a few possibilities, the same frequency analysis that was used when cracking single-byte key XOR can be used to determine the actual key. For each prospective key size, the ciphertext is split into chunks of the respective size. Then, the bytes are reorganized into groups respectively for each position in the key, to de-interleave the parts.

Take for example the ciphertext abcdefghijklmnopqrstuvwxy&z and let’s assume a prospective key size of three. When split into key-sized chunks (in this case, three), it can be thought of as the following three-by-nine matrix:

chunk1: abc
chunk2: def
chunk3: ghi
chunk4: jkl
chunk5: mno
chunk6: pqr
chunk7: stu
chunk8: vwx
chunk9: y&z

The columns of the matrix correspond to each interleaved, single-byte keyed chunk. Transposing the matrix yields the chunk for each key:

key0: adgjmpsvy
key1: behknqtw&
key2: cfiloruxz

Finally, each chunk can be evaluated in exactly the same way as we did for single-byte key XOR. The highest-scoring single-byte key for each row is then concatenated to form the multi-byte key.

Lessons Learned

As has been demonstrated, for any simple XOR cipher, once the key is repeated, no matter whether it’s one byte or multiple bytes, it can still be recovered. Furthermore, it turns out that each byte of the repeated multi-byte key can be broken by rearranging the ciphertext—or de-interleaving it—to drastically reduce the computational complexity required to recover the whole key.

Conclusion of Part 1

In this installment we went through some basic concepts of cryptography. We evaluated the confidentiality of encoding and discovered that it is nonexistent. We learned that encryption has the intent to conceal information for specific and intended recipient(s). We then analyzed two encryption schemes based on XOR: repeating single- and multi-byte keys. We showed that the multi-byte key case reduces to the single-byte case, and can be cracked almost effortlessly. We can therefore conclude that these two schemes are weak—do not actually guarantee confidentiality—and should be thus avoided if the risk is unacceptable.

In the second part of this article, we will tackle the industry-standard block cipher encryption scheme that heavily employs XOR: the Advanced Encryption Standard (AES). We will delve into its algorithm, and demonstrate the weaknesses of the most rudimentary block cipher mode of operation, called electronic code book (ECB).

Stay tuned for Part 2!

If this article has piqued your interest, and you’re interested in working with the Security & Privacy team at Mercari, we’re hiring! Check out our job postings here: NB2HI4DTHIXS6Y3BOJSWK4TTFZWWK4TDMFZGSLTDN5WS643FMFZGG2BNNJXWE4ZPH5SGK4B5ONSWG5LSNF2HSLLQOJUXMYLDPE======

  • X
  • Facebook
  • linkedin
  • このエントリーをはてなブックマークに追加