1. Generalize the Procedure
The very simple Educated Guess Procedure is not only very simple, the procedure is also very unrealistic. At most the assumption that are independent and identically distributed random variables is not satisfied in practice. Consider for example where observed, the probability that should then be higher then , even though our analyze reveals that assuming independence . Another recommendation in terms of “how to create a good password” given by experts is often to use the first letter of some Phrase. For example “I like to eat 10 red apples in the morning” will become “Ilte10raitm”. Look’s pretty random at first glance, however language is not random and thus such passwords are weaker (and easier to detect by the following method) than truly random passwords. To model these issues we introduce
(1)
where the probability to observe a certain letter depends on the previous letters. One usually speaks in this context of a Markov chain of order , further it an be seen that our previous attempt is a special case of (1) where . In linguistics, such models are known as -grams.
2. Generate better “guessed” Passwords
To get estimates we will thus relay on -gram counting, to do so in advance we need some additional notation. Let be a permutation with repetitions of length and with the sum over all possible permutations is meant. An estimator for the probability of a specific permutation and can then be derived by
(2)
however, additional smoothing is needed especially if we assume an alphabet with many special characters because there is a chance that some -grams are not observed in our training sample. Such -grams will then be assigned with a probability 0 and will never be considered in any generation procedure, however in reality these -grams are for sure possible. We use Good-Turing smoothing, this means for a specific permutation , each count and the number of k-grams observed times, an adjusted count is computed. A Good-Turing smoothed version of 2 is then given by
(3)
Still we are facing possibly unrealistic assumptions, for example that the length of the password does not depend on the used letters. Another hard assumption is that we are always facing a -gram. A further generalization taking this issue into account would be to use a Variable-order Markov model with allows for different ‘s depending on a given observation.
To start the Markov Chain we need an initial state, therefore randomly draw letters according with the following probabilities
(4)
If we assume that , then this will give possible intal states according to a certain alphabet . We also need to construct a list of containing all possible transition probability given by (3) between the -grams. Combinatorics then tells us that there exist permutations. Using these lists we can then construct a Markov chain using the estimated probabilities. The Markov chain is then interrupted after steps where ist drawn, like in the previous approach, from .