1. Thougts about Passwords

1. Introduction

This Project is about making an educated guess to derive an unknown password based on hacked password lists (Google: “RockYou”). In this section we will introduce the mathematical framework to handle passwords and develop a simple brute force approach based on very unrealistic Assumptions. During this Project these Assumptions will then be weaken to  derive in the end a clever method to guess passwords.

1.1. Formulation

Let X \in \mathbb{X} \, ,X=x_1 x_2 x_3 \dots x_E be a password and the random variables x_i \in \mathbb{A}, E \in \mathbb{N}^+ the corresponding letters and password length. \mathbb{X} is the space of all passwords and \mathbb{A} corresponds to the used alphabet, for example if we allow only for lower case letters then \mathbb{A} =\{a,b,c, \dots, y,z \}. As usual this forms a discrete probability space given by (\mathbb{X}, P).

2. First Attempts

2.1 A first naive approach

Let X^* \in \mathbb{X} be the true password. The idea of a brute force search, where every possible password is successive tested, is then to assume that \mathbb{P}(X^*=X) = \mathbb{P}(X^*=Y) \;  \forall X^*,X,Y \in \mathbb{X}. This means that every possible password is equipropable and the testing order is thus irrelevant. It is obvious that this Assumption is not true in general.

2.2 A very simple Educated Guess Procedure

In the next step we will replace this assumption by some less strong assumptions:

  1. E is distributed with distribution f_E.
  2. x_i, \, i=1,\dots,E are independent and identically distributed random variables with distribution f_x.
  3. x_i and E are independent.

Under the given Assumptions the best way to “guess” the password is given by the following procedure. Since x_i and E are independent we can apply a two step procedure where we first draw some \hat{E} using f_E.Then each x_i, \; i=1,\dots,\hat{E} is independently drawn using f_x. Since f_E and f_x are unknown they have to be estimated from hacked passwords list which we will denote as \mathbb{L} \subset \mathbb{X}.

2.2.1 Checking the password security

Under the given Assumptions it is also possible to estimate the “safeness” of given password X^*. Mathematically this means 1 minus the probability that a certain password is “guessed” by the procedure. The first part, the probability that the “correct” password length is “guessed” is simply given by P(E=E^*)=f_E(E^*) . The second part can be understood as E urn draws with replacement of an urn containing the elements of \mathbb{A} where the probability to draw a certain element x is given by f_x(x). Thus P(x_1= x^*_1, \dots, x_E = x^*_E |E=E^*)= \prod_{i=1}^{E^*} f_x(x_i^*). Accordingly by Bayes’ theorem: 1 -P(x_1 \dots x_E = x^*_1 \dots x^*_E  \cap E=E^* )=1- f_E(E^*) \prod_{i=1}^{E^*} f_x(x_i^*) as security index.

2.2.2 Density Estimators

A very simple estimator \hat{f}_E is based on counting the length of all passwords in \mathbb{L}. The corresponding estimator is then given by \hat{f}_E(E^*)= \frac{\# (X \in \mathbb{L} |E=E^*)}{\# \mathbb{L}} and accordingly \hat{f}_x(x^*)= \frac{ \sum_{k=1}^\infty \# (X \in \mathbb{L} |x_k=x^*)}{ \sum_{k=1}^\infty k \cdot \# (X \in \mathbb{L} |E = k) }.

 

2 thoughts on “1. Thougts about Passwords

  1. Pingback: Coding the “Educated Guess Procedure” – The Big Data Blog

  2. Pingback: 3. A more sophisticated approach using Markov chains. – The Big Data Blog

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.