Jun 10, 2021 8 min read Tank

A Shallow Dive into Private-Key Encryptions | Tank

We have the illusion that, as long as the key exchange was done securely, Private-Key, also known as Symmetric Cryptography, is solidly safe; if we ever introduce Public-Key or Asymmetric Cryptography to exchange private keys, then we are unbreakable.

Our generation lives in a world where HTTPS is widely used, and end-to-end encryption messaging is mandatory. As tech giants like Apple use Privacy as their commercial golden key, we are so spoiled to think this comes easily, including me. We have the illusion that, as long as the key exchange was done securely, Private-Key, also known as Symmetric Cryptography, is solidly safe; if we ever introduce Public-Key or Asymmetric Cryptography, to exchange private keys then we are unbreakable. That's never the case, and this article would explain why.

Introduction

Before we start to dive into the world of encryption, we must first understand its goal. Well, we must not leak any valuable information, you may think. That's true, but to what degree?

In my last talk about E2EE, we describe a kind of historical cyphers, Caesar's cypher, that simply encrypts data by shifting the letters of some alphabet places forward, 3 for example, A was replaced with D, B with E, and so on. At the very end of the alphabet, the letters wrap around so that Z was replaced by C and Y with B, etc.

Note that the "keyspace", which is the shifted length, in this case, is only 26, and one can just guess all the keys and take the most "meaningful" outcome as the original text. A fix for this is not to shift the letters by the same gap - any bijection would do, and thus the keyspace is extended to $26!$.

Doing this, we can easily convert something directly human-readable to something that's not, and some information is certainly concealed during the process. However, we know that the frequency distribution of individual letters in the English language was thoroughly studied, and by shifting the letters, we are not modifying this distribution. Hence, the encrypted text concealed the "words", not the distribution, and we can easily use this information to solve multiple variants of this technic.

Perfectly Secret

Is there a way to, besides veiling the text itself, "even out" the distribution pattern (We will give out a formal definition of "even out" later)? Luckily, we already have some of those, and one is the one-time pad.

The One-Time Pad

Say we have some integers $m, k \in \{0,1\}^l$, where $\{0,1\}^l$ is the set of all binary strings of length $l$. Take $m$ to be our message and $k$ to be some randomly generated key. Our encryption algorithm is as elegant as the following.

$$\text{Given a key} \ k \ \text{and a message} \ m, \text{we output the cyphertext} \ c:=k\oplus m.$$

To explain the title "perfectly secret" further, we can now introduce some general concept.

Formal Definition

An private-key encryption scheme contains three parts $\text{Gen}$, $\text{Enc}$ and $\text{Dec}$, as well as a specification of a message space $\mathcal{M}$, a key space $\mathcal{K}$.

The key-generation algorithm $\text{Gen}$ is a probabilistic algorithm that outputs a key $k$ chosen according to some distribution, and the set of all possible keys is denoted by $\mathcal{K}$. The encryption algorithm $\text{Enc}$ takes a key $k \in \mathcal{K}$ and a message $m \in \mathcal{M}$, and outputs a cyphertext $c$. When this process is probabilistic, i.e., $\text{Enc}$ can output different $c$ when run multiple times, we write $c \gets \text{Enc}_k(m)$, and $c:= \text{Enc}_k(m)$ in case $\text{Enc}$ is deterministic. We let $\mathcal{C}$ denotes the set of all the possible $c$ outputs by $\text{Enc}_k(m)$ for all the possible $k$ and $m$. Lastly, we assume perfect correctness, then the decryption algorithm $\text{Dec}$, taking a key $k \in \mathcal{K}$ and a cyphertext $c \gets \text{Enc}_k(m)$, satisfies $\text{Dec}_k(c)=m$ with probability 1. We can then go ahead and write this.

$$\text{Dec}_k(\text{Enc}_k(m)) = m$$

Finally, we can give out the definition of perfectly secret. For any two messages $m, m' \in \mathcal{M}$ and every $c \in \mathcal{C}$,

$$\Pr[\text{Enc}_{K}(m)=c] = \Pr[\text{Enc}_{K}(m')=c]$$

(where the probabilities are over choice of $K$ and any randomness of $\text{Enc}$).

We can propose several equivalent formulations of perfect secrecy, and this is left for readers to find out.

Evaluate Secrecy

Perfect secrecy sounds, well, perfect, but it's not practically so. I'm giving out some strawman scenarios for readers to evaluate.

Someone uses the one-time pad scheme to send several messages using the same key.
A poll is collecting "yes" or "no" using a deterministic encryption scheme.
An attacker tried to flip some bits of the cypher text.

These scenarios indicate that our assumption, that we have some adversaries passively eavesdropping on the communication (and have the ability to read only one cypher text), is not sufficient. Worse still, we haven't presented an adequate definition of adversaries' ability, before now. Formally, let $\Pi = (\text{Gen}, \text{Enc}, \text{Dec})$ be an encryption scheme with message space $\mathcal{M}$. Let $\mathcal{A}$ be an adversary which is formally a stateful (in other words, not progress-free) algorithm. We define an experiment $\text{PrivK}^{\text{eav}}_{\mathcal{A}, \Pi}$ as follows.

The adversarial indistinguishability experiment $\text{PrivK}^{\text{eav}}_{\mathcal{A}, \Pi}$

The adversary $\mathcal{A}$ outputs a pair of messages $m_0, m_1 \in \mathcal{M}$.
A key $k$ is generated using $\text{Gen}$, and a uniform bit $b \in \{0,1\}$ is chosen. Cyphertext $c \gets \text{Enc}_{k}(m_b)$ is computed and given to $\mathcal{A}$. We refer to $c$ as the challenge cyphertext.
$\mathcal{A}$ outputs a bit $b'$.
The output of the experiment is defined to be $1$ if $b' = b$, and $0$ otherwise. We write $\text{PrivK}^{\text{eav}}_{\mathcal{A}, \Pi} = 1$ if the output of the experiment is $1$ and in this case we say that $\mathcal{A}$ succeeds.

Now, we can redefine prefect secrecy as follows. Encryption scheme $\Pi = (\text{Gen}, \text{Enc}, \text{Dec})$ with message space $\mathcal{M}$ is perfectly indistinguishable if for every $\mathcal{A}$ it holds that

$$\Pr[\text{PrivK}^{\text{eav}}_{\mathcal{A}, \Pi} = 1] = \frac{1}{2}.$$

And, some scheme is perfectly secret if and only if it is perfectly indistinguishable. We leave this for readers to prove.

Move into the Real World

In the real world, adversaries are not the "every adversary" we mentioned above. They are computationally bounded, i.e., have the limited time and computational power to attack the communication, yet more versatile in terms of attack measures.

Computationally Secure Encryption

When we take the limited power an adversary holds into account, perfect secrecy is not always necessary, and this is part of the motivation to define computationally secure encryption.

If you have some experience in algorithm analysis, the big-o notion must be familiar to you. Here, we introduce a similar idea.

A function $f$ from the natural numbers to the non-negative real numbers is negligible if for every positive polynomial $p$ there is an $N$ such that for all integers $n > N$ it holds that $f(n) < \frac{1}{p(n)}$. We typically denote an arbitrary negligible function by $\text{negl}$.

Now, let's recalled the adversarial indistinguishability experiment we just described, and introduce two critical modifications in the definition itself.

We now consider only adversaries running in polynomial time, rather than unbounded.
We now concede that the adversary might determine the encrypted message with probability negligibly better than $\frac{1}{2}$.

The other thing we must introduce the security parameter $n$, which acts as the "seed" for key generation. Formally, we say the key is generated by running $\text{Gen}(1^n)$. (Since $\mathcal{A}$ runs in polynomial time, the length of $m_0$ and $m_1$ must be some polynomial in $n$.)

Then, we can redefine the adversarial indistinguishability experiment as follows.

The adversary $\mathcal{A}$ is given input $1^n$, and outputs a pair of messages $m_0, m_1 \in \mathcal{M}$ with $|m_0| = |m_1|$.
A key $k$ is generated by running $\text{Gen}(1^n)$, and a uniform bit $b \in \{0,1\}$ is chosen. Cyphertext $c \gets \text{Enc}_{k}(m_b)$ is computed and given to $\mathcal{A}$. We refer to $c$ as the challenge cyphertext.
$\mathcal{A}$ outputs a bit $b'$.
The output of the experiment is defined to be $1$ if $b' = b$, and $0$ otherwise. If $\text{PrivK}^{\text{eav}}_{\mathcal{A}, \Pi}(n) = 1$, we say that $\mathcal{A}$ succeeds.

In the end, we have this refreshing definition.

A private-key encryption scheme $\Pi = (\text{Gen}, \text{Enc}, \text{Dec})$ has indistinguishable encryptions in the presence of an eavesdropper, or is EAV-secure, if for all probabilistic polynomial-time adversaries $\mathcal{A}$ there is a negligible function $\text{negl}$ such that for all $n$,

$$\Pr[\text{PrivK}^{\text{eav}}_{\mathcal{A}, \Pi} (n)= 1] \leq \frac{1}{2} + \text{negl}(n),$$

where the probability is taken over the randomness used by $\mathcal{A}$ and the randomness used in the experiment.

Stronger Notions for Security

Adversaries are not just passively eavesdropping on a single message. They may have access to multiple messages encrypted by the same key, and in this case, the one-time pad is no longer secure (which is suggested by the name). We are leaving the proof to the readers.

In more complex situations, $\mathcal{A}$ may have access to certain oracles. An oracle can be treated as a "dark box" with which one can interact, while not knowing the mechanism behind it.

When $\mathcal{A}$ have an encryption oracle, meaning that they can choose whatever plaintext they prefer, and see the corresponding cyphertexts using this oracle. This is called Chosen-Plaintext Attack, and they are actually possible in the real world.

A close example is the algorithm running on Christopher. During World War II, the team led by Alan Turing performed the successful attack on enigma, knowing that part of the cyphertext is "Heil Hitler". Furthermore, British Army would place mines at certain places, and Germans would thus encrypt these locations. This is another source of the "chosen-plaintext".

A more powerful oracle is a decryption oracle, and $\mathcal{A}$ can obtain the decryption of any desired cyphertext (except for the target cyphertext, of course). This is called Chosen-Cyphertext Attack.

Intuitively, a scheme that enable to maintain indistinguishability under chosen-plaintext attacks is CPA-Secure, or CCA-Secure under chosen-cyphertext attacks.

Epilogue

We discussed some fundamental concept of private-key encryption. The purpose of the article is to help the readers to have a clearer understand of these methods, rather than take them as guaranteed "oracles".

Constructions and implementations of many schemes, like CPA-Secure or CCA-Secure ones, are not discussed here, as they require more mathematical calculations and other advanced techniques thus deserve a whole new article.

Thanks for reading, and we encourage you to study more through textbooks and online resources.

Reference

Katz, J., & Lindell, Y. (2014). Introduction to modern cryptography (2nd ed.). CRC Press.