3. A more sophisticated approach using Markov chains.

1. Generalize the Procedure

The very simple Educated Guess Procedure is not only very simple, the procedure is also very unrealistic. At most the assumption that x_i, \, i=1,\dots,E are independent and identically distributed random variables is not satisfied in practice. Consider for example x^*_1=l, x^*_2=o, x^*_3=v where observed, the probability that x^*_4=e should then be higher then x_4=a, even though our analyze reveals that assuming independence P(x_i= a|\mathbb{L})> P(x_i= e|\mathbb{L}). Another recommendation in terms of “how to create a good password” given by experts is often to use the first letter of some Phrase. For example “I like to eat 10 red apples in the morning” will become “Ilte10raitm”. Look’s pretty random at first glance, however language is not random and thus such passwords are weaker (and easier to detect by the following method) than truly random passwords. To model these issues we introduce

(1)   \begin{equation*} P(x_i= x^*_i|x_{i-1}= x^*_{i-1}, \dots,x_{i-k}= x^*_{i-k} ) = P(x_i= x^*_i|x_{i-1}= x^*_{i-1}, \dots,x_1= x^*_1 ) \end{equation*}

where the probability to observe a certain letter depends on the previous k letters. One usually speaks in this context of a Markov chain of order k, further it an be seen that our previous attempt is a special case of (1) where k=0. In linguistics, such models are known as k-grams.

2. Generate better “guessed” Passwords

To get estimates we will thus relay on k-gram counting, to do so in advance we need some additional notation. Let \pi_k: \mathbb{A} \rightarrow \mathbb{A} be a permutation with repetitions of length k and with \sum_{\pi_k(\mathbb{A}) } the sum over all possible permutations is meant. An estimator for the probability of a specific permutation \pi^*_{k,i}:= (x^*_i, \dots, x^*_{i-k}) and \pi^*_{k,i,-1}:= (x^*_{i-1}, \dots, x^*_{i-k}) can then be derived by

(2)   \begin{equation*} \hat{P}(x_i= x^*_i|x_{i-1}= x^*_{i-1}, \dots,x_{i-k}= x^*_{i-k} )  = \frac{ \sum_{i=k+1}^\infty \# (X \in \mathbb{L} |x_{i}\dots x_{i-k}=x^*_{i}\dots x^*_{i-k})}{ \sum_{i=k+1}^\infty \# (X \in \mathbb{L} |x_{i-1}\dots x_{i-k}=x^*_{i-1}\dots x^*_{i-k})} \end{equation*}

however, additional smoothing is needed especially if we assume an alphabet with many special characters because there is a chance that some k-grams are not observed in our training sample. Such k-grams will then be assigned with a probability 0 and will never be considered in any generation procedure, however in reality these k-grams are for sure possible. We use Good-Turing smoothing, this means for a specific permutation \pi^*_k= (x^*_1, \dots, x^*_k), each count r(\pi^*_k) =\#(X \in \mathbb{L} |x_{1}\dots x_{k-1}=x^*_{1}\dots x^*_{k-1}) and n_{r(\pi^*_k)}= \sum_{\pi_k(\mathbb{A}) } \mathbf{1}(r(\pi^*_k)=  r(\cdot) ) the number of k-grams observed r times, an adjusted count r^*(\pi^*_k) =(r(\pi^*_k) +1)\frac{n_r(\pi^*_k) +1}{n_r(\pi^*_k) } is computed. A Good-Turing smoothed version of 2 is then given by

(3)   \begin{equation*} \hat{P}_T(x_i= x^*_i|x_{i-1}= x^*_{i-1}, \dots,x_{i-k}=\frac{\sum_{i=k+1}^\infty r^*(\pi^*_{k,i})}{\sum_{i=k+1}^\infty r^*(\pi^*_{k,i,-1})} \end{equation*}

Still we are facing possibly unrealistic assumptions, for example that the length of the password does not depend on the used letters. Another hard assumption is that we are always facing a k-gram. A further generalization taking this issue into account would be to use a Variable-order Markov model with allows for different k‘s depending on a given observation.

To start the Markov Chain we need an initial state, therefore randomly draw k-1 letters according with the following probabilities

(4)   \begin{equation*} \hat{P}_T(x_{1}= x^*_{1},x_{2}= x^*_{2} \dots, x_{k-1}= x^*_{k-1} )  = \frac{ r^*(x^*_{1}, \dots, x^*_{k-1} )}{ \# (X \in \mathbb{L} )}. \end{equation*}

If we assume that |\mathbb{A}|= n,  then this will give (k-1)^n possible intal states according to a certain alphabet \mathbb{A}. We also need to construct a list of containing all possible transition probability given by (3) between the k-grams. Combinatorics then tells us that there exist k^n permutations. Using these lists we can then construct a Markov chain using the estimated probabilities. The Markov chain is then interrupted after E^*-k-1 steps where E^* ist drawn, like in the previous approach, from \hat{f}_E(E^*)= \frac{\# (X \in \mathbb{L} |E=E^*)}{\# \mathbb{L}}.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.