Introduction to Password Score

Update. A documentation of the API is now available.

Update. The documentation as well as this post as been updated recently.

Introduction

Password Score is a JavaScript library designed to give a realistic estimation of the strength of an arbitrary password. The strength of a password will be measured in the means of entropy. For estimating the strength of a password the library may rely on several data sources: dictionaries, common passwords, keyboards, first and last names and much more.

Password Score can be found on GitHub and includes documentation as well as a demo to score arbitrary passwords - try it out!

Password Score on GitHub Password Score Demonstration

Entropy

Entropy is a term used in information theory. Usually it is used to describe the uncertainty in a random variable - i.e. a random experiment. In the context of this article we will use entropy as a measure for the strength of a password. Given a password $p$ with length $|p|$, we will define the entropy as $\log(n^{|p|})$ where $n$ is the number of possible characters and $\log$ the base-2 logarithm.

Given the entropy $H(p) := \log(n^{|p|})$ of the password $p$, we can calculate the maximum number of attempts needed for brute-forcing the password: $2^{H(p)} = 2^{\log(n^{|p|})} = n^{|p|}$.

Naive Approach

As described above the naive approach of scoring a password - i.e. estimating the strength of a password - is calculating its entropy. This is what the following code snippet does:

/**
 * Calculates a naive score based on the brute force entropy.
 * 
 * @return {number}
 */
function(password) {
  var base = 0;

  if (this.regex['lower'].test(password)) {
    base += this.LOWER;
  }
  if (this.regex['upper'].test(password)) {
    base += this.UPPER;
  }
  if (this.regex['number'].test(password)) {
    base += this.NUMBER;
  }
  if (this.regex['punctuation'].test(password)) {
    base += this.PUNCTUATION;
  }
        
  return this.lg(base)*password.length;
}

Here this.regex is an array containing regular expressions to test whether the password includes lower case characters, upper case characters, numbers or punctuation.

The problem with this approach is the user. Because the user will not choose every character of the password independent of the previous one. This means given a character sequence $p_1 \ldots p_k$ there are characters more likely to follow than others. As example consider the string helloworl. The probability of the next character to be a d is pretty high - especially if the user is familiar with programming languages and the english language.

Assumption

Given the password $p$, we assume that we know the form of the password. That is, we know how the password is made up. For example given the password david1992 we know that the first $5$ characters make up a first name while the following $4$ characters make up a year. With this knowledge the password is easily brute-forced because we would simply try all first names in combination with all years after 1900. So the entropy would be given by the base-2 logarithm of the product of the number of first names and the number of years after 1900. Of course there are a lot of possible first names, but we could try the most common ones first and crack this password in some minutes when using multiple cores in parallel.

Patterns

Knowing the form of the password $p$, we search $p$ for patterns. In the above example the first pattern is a first name, the second one is a year (or more general a date). In the course of this article we will stumble across different kinds of patterns like dates, english words, german words or country names.

Password Score will search a password for patterns. Using these patterns, Password Score tries to give a more realistic estimation of the password strength. The following patterns are considered:

Dictionary words: A dictionary may be every gathering of words - or strings in general. Most common we will use an english dictionary (or german or whatever language is used). But Password Score will treat every dictionary the same way. Thus, we may use a list of common passwords, first names, last names, city names or country names, too.
Sequences: Sequences are substrings of the alphabet or 0123456789.
Repetitions: Repetitions of single characters as well as repetitions of a group of characters easily increase the password's length - but not its strength.
Dates: Unfortunately, dates may be of many formats. A date may only be a year or consist of a year, a month and a day in some local format.
Keyboard patterns: On the list of the most common passwords, qwerty will be in the top 100. Why? - Because it is easy to remember on the keyboard. Using an adjacency matrix of an arbitrary keyboard, Password Score is able to identify these patterns.

Idea

After collecting all the patterns, Password Score will score these patterns. As patterns may overlap, Password Score tries to minimize the overall score of the password by dividing the whole password in disjoint (that is non-overlapping) patterns and taking the sum of their individual scores. As result, the strength of the password will be underestimated. But as we want to encourage the user to choose a strong password this can only be seen as advantage of this method.

Basic Usage

Password Score has no dependencies, however, the example provided with the documentation uses jQuery for visualization. Include Password Score as follows:

&lt;!-- Main JS file (necessary). --&gt;
&lt;script type=&quot;text/javascript&quot; src=&quot;dist/js/password-score.js&quot;&gt;&lt;/script&gt;
&lt;!-- This file provides the default options: --&gt;
&lt;script type=&quot;text/javascript&quot; src=&quot;dist/js/password-score-options.js&quot;&gt;&lt;/script&gt;

password-score-options.js provides the default options of Password Score including several dictionaries. The below sections give detailed explanations of the available configuration options, however, the default options are often sufficient:

English and german dictionaries;
Lists of last names, female first names and male first names;
Lists of countries (english and german);
A list of cities;
All dictionaries are checked for leet speak;
Password Score searches for dates, sequences and repititions.

To use Password Score, simply fetch a password and create a new Score():

var password = 'qwerty';
var score = new Score(password);

To get the actual entropy score using the default options use the calculateEntropyScore() method:

console.log(score.calculateEntropyScore(options));

This method accepts two parameters: a list of options, and whether these options should be appended to the default optiosn or replace them instead. After calling calculateEntropyScore(), Password Score will store some of the results in score.cache:

console.log(score.calculateEntropyScore(options));
// These are the patterns which contribute to the minimum entropy:
console.log(score.cache.minimumMatches);

Dictionaries

As mentioned above, Password Score can be configured to match against custom dictionaries the following way:

var options = {
  // For using different dictionaries different keys can be used.
  {
    'type': 'dictionary',
    'leet': false,
    'dictionary': englishDictionary
  },
};
score.calculateEntropyScore(options);

A dictionary is an object where the keys represent the words and the corresponding values represent scoring values used to calculate the entropy of the word when considered as pattern within a password. Beneath usual dictionaries for english or german, Password Score will benefit from lists of common passwords, first and last names as well as country or city names:

var englishDictionary = {
  'you': 1,
  // ... 
  'housewife': 2154,
};
var commonPasswords = {
  'password': 1,
  // ... 
  'd9ebk7': 8603,
};

The scoring value will be used to determine the entropy by taking the base-2 logarithm. Therefore, the scoring value can be used to differentiate between common patterns and less common patterns - password is the most common password whereas d9ebk7 is not that common.

Leet Speak

Using a leet speak translation table, Password Score can search dictionaries for words which occur in leet speak within the password. This translation table looks like this:

leet = {
  '1': ['i', 'l'],
  '2': ['n', 'r', 'z'],
  '3': ['e'],
  '4': ['a'],
  '5': ['s'],
  '6': ['g'],
  '7': ['t'],
  '8': ['b'],
  '9': ['g', 'o'],
  '0': ['o'],
  '@': ['a'],
  '(': ['c'],
  '[': ['c'],
  '<': ['c'],
  '&': ['g'],
  '!': ['i'],
  '|': ['i', 'l'],
  '?': ['n', 'r'],        
  '$': ['s'],
  '+': ['t'],
  '%': ['x']
};

Given a word in leet speak, Password Score generates a list of all possible substitutions using the translation table. All possible substitutions are matched against a given dictionary. To use this feature for a specific dictionary use:

var options = {
  // For using different dictionaries different keys can be used.
  {
    'type': 'dictionary',
    'leet': true,
    'dictionary': englishDictionary
  },
};
score.calculateEntropyScore(options);

Keyboard Patterns

qwerty will always be within the top ten of the most common passwords because it is easy to remember when using a QWERTY keyboard. A keyboard pattern is defined as path on the keyboard when considered as undirected graph. The QWERTY and QWERTZ keyboards are already provided. Password Score can be configured to search for keyboard patterns the following way:

var options = {
  {
    'type': 'keyboard',
    'keyboard': customKeyboard
  },
};
score.calculateEntropyScore(options);

customKeyboard has to be an object of type Keyboard, see the API documentation

The entropy of a keyboard pattern is given by the base-2 logarithm of the number of possible beginnings multiplied by the number of possible next characters for each character within the pattern.

Dates

Per default, Password Score searches the given password for dates. However, to configure Password Score to do so manually, use:

var options = {
  {
    'type': 'dates',
  },
};
score.calculateEntropyScore(options);

Dates are hard to catch because they may occur in many different formats. Fortunately, regular expressions can be used to scan efficiently for the different formats of dates:

'(0?[1-9]|1[012])([\- \/.])?(0?[1-9]|[12][0-9]|3[01])([\- \/.])?([0-9]{4})' // Middle-endian, four digit year.
'(0?[1-9]|[12][0-9]|3[01])([\- \/.])?(0?[1-9]|1[012])([\- \/.])?([0-9]{2})' // Little-endian, two digit year.
// ...

The number of possible dates is dependent on the format. In general we take $31 \cdot 13 \cdot y$ where $y$ is the number of years being considered. When assuming $y$ to be too large we will not get any difference from considering a random eight (or six) digit number. Therefore, choosing $y$ to be around $100$ to $200$ will be a realistic choice.

Sequences and Repetitions

Password Score is able to search for number sequences and substrings of the alphabet (and does so per default):

var options = {
  {
    'type': 'sequences',
  },
};
score.calculateEntropyScore(options);

Reversed sequences are checked, as well. The entropy of a sequence is only influenced by the possibilities for the first character and the length.

In addition, Password score searches for repititions (also per default):

var options = {
  {
    'type': 'repitition',
  },
};
score.calculateEntropyScore(options);

Data Sources

Password Score works best when combining several different dictionaries:

Hermit Dave's "Frequency Word Lists" are quite useful as a replacement for usual dictionaries as they also include spoken language:
Mark Burnett's "10,000 Top Passwords" can be used to identify common passwords.
Lists of common female and male names as well as last names can be taken from the US Census Data from 1990.
Lists of countries in all languages can be found here.
GeoNames.org provides lists of cities here. As only a few city names are being translated among languages these lists will work language independent.

The source code on GitHub includes a PHP script to convert these dictionaries into JSON format.

References

Password Score was mainly inspired by "zxcvbn: realistic password strength estimation", Dropbox Tech Blog.

IAM

DAVIDSTUTZ

ARTICLE