Status

This is a draft.  There is much research and refinement to be done.  This will bore you to tears if you are not interested in information security.

Summary

This page describes a proposal for a method of generating password hashes that would make it more difficult for an attacker to obtain the cleartext password from a password hash.  This is done by applying salts to the hash in dynamic and non-standard ways.  The goal is to create an system of hashing and storing passwords that will withstand dictionary attacks even for a weak password.

Hash

This proposal is not concerned with how a hash is generated, and assumes that a hash is generated using a sufficiently strong and irreversible hashing algorithm such as SHA1.

Salt

A salt is a pseudo-random string that is combined with the cleartext before it is hashed.  The hash then has the same salt combined with it (see next section).  When validating a password, the salt is extracted from the salted hash, applied to the password that is being tested, and the salted cleartext password is hashed.  The result is compared to the raw hash and tested for a match.  It looks something like this:

  // salted_hash is the hash of the correct password stored in a database somewhere.
  // guess is the cleartext password we are testing against salted_hash
  raw_hash = extract_hash(salted_hash)
  salt = extract_salt(salted_hash)
  salted_guess = apply_salt(guess, salt)
  hashed_guess = hash(salted_guess)
  is_match = (hashed_guess == raw_hash)

A few questions arise from the above: how are the salt and hash extracted from salted_hash?  See the next section.  How is the salt applied to the guess?  The reverse of extract_salt (see the next section).  How is the salted_guess hashed?  See the previous section (we don’t care, it’s not relevant to this proposal.  It’s a strong, irreversible hash).

The purpose of salting a hash, as opposed to just storing the raw hash of the cleartext password, is to render useless any pre-rendered hash dictionaries.  For an attacker to test a password against a dictionary of pre-rendered hashes, they would have to have rendered this dictionary using the same salt as your password.  If your salt is sufficiently random, the attacker is unlikely to have such a dictionary on hand.

Briefly, salting hashes slows down the attack process considerably, but does not make it any more computationally intensive.

Salting Algorithms

A salting algorithm in this context is an algorithm for combining a raw hash with a salt into a salted hash.  A hash algorithm must be reversible; given a salted hash it must be possible to programmatically extract the original raw hash and salt.

As an example, a common salting algorithm is as simple as:

  salt = generate_salt()                         // "1234"
  salted_cleartext = concatenate(salt, ":", cleartext_password) // "1234:password"
  raw_hash = hash(salted_cleartext)              // "2fa775528b939009d08f56fda4cd19cb9bcb557f"
  salted_hash = concatenate(salt, ":", raw_hash) // "1234:2fa775528b939009d08f56fda4cd19cb9bcb557f"

This is simple and reversible.  A cleartext password guess can be tested using:

  // Assume our guess is correct, "password"
  // password_hash is "1234:2fa775528b939009d08f56fda4cd19cb9bcb557f", from the previous example.
  salt = extract_salt(password_hash)             // "1234"
  raw_hash = extract_hash(password_hash)         // "2fa775528b939009d08f56fda4cd19cb9bcb557f"
  salted_guess = concatenate(salt, ":", guess)   // "1234:password"
  hashed_guess = hash(salted_guess)              // "2fa775528b939009d08f56fda4cd19cb9bcb557f"

The problem with this commonly used algorithm (and it’s practically similar cousins) is the ease with which an attacker can identify the salt from the raw hash.  More complex algorithms can be used in an attempt to obfuscate where the salt and the where the password lie.  Continuing the above example using the salt “1234” and the raw_hash “2fa775528b939009d08f56fda4cd19cb9bcb557f”, one could inject the salt into the hash as:

  2fa7175528b9329009d0384f56fda4cd19cb9bcb557f

Note that it’s much harder, given the above string, to determine where the salt lies.  If you look carefully you can see that the individual characters of the salt “1”, “2”, “3”, and “4” have been interspersed throughout the original hash.  This creates a situation wherein if the attacker doesn’t know the salting algorithm, doesn’t know the salt, doesn’t know the raw hash, and doesn’t know the original password, an attacker needs to not just test a dictionary of password guesses against the hash, but also a dictionary of salting algorithms per password.

If the attacker knows the salting algorithm, their job is no harder than if a simpler salting algorithm were used.  If the attacker knows the password, they only need to try various methods of extracting a salt from the hash to find out which method produces a salt which, when hashed with the known password, produces the raw hash.

In addition to the above challenges of cracking this hash, there is the theoretical possibility that the same salted hash can be produced by more than one combination of a cleartext password and a salting algorithm.  This is because there may be multiple sets of characters (the salt) which can be extracted from a hash that result in a valid raw hash (and thus a valid underlying cleartext password).  However, this theoritcal possibility likely becomes extremely slim when you only consider common passwords.

The above is just one example of a salting algorithm.  Neverminding practicality, there are infinite ways that a raw hash can have a salt injected into it.  Some more complex examples are:

  • Injecting the salt at specific character indexes, but including gibberish salt characters as well.
  • Modifying the salt before injecting (such as reversing capitalization, or mapping characters to other characters).
  • Modifying the salt during injection based on the preceding or following character in the raw hash.

Multiple Salting Algorithms

Using complex salting algorithms makes an attackers job harder if they don’t know your salting algorithm, but this relies entirely on a secret that (once exposed) renders your algorithm useless no matter how complex.  If this secret is leaked or already known by an inside attacker, an attackers job is no harder (except for possibly taking additional CPU cycles) than if you used the common “<salt>:<hash>” salting algorithm.  This is exacerbated by the nature of hashes which prevents the data maintainer from re-hashing existing passwords using a new algorithm.

To mitigate (not eliminate) this drawback, one can support multiple salting algorithms using salting algorithm codes (SAC).  An SAC would be an identifier attached to the beginning of the password hash that identifies which salting algorithm was used.  Checking a password guess against the known hash might change to look something like the algorithm below.  In this next example, the password_hash uses a more complicated salting algorithm of inserting the 4 character salt at indexes 4, 12, 18 and 19.  It also now begins with “001” which is a three digit code identifying which salting algorithm was used for this hash.  In other words, “001” tells us that the salt is at characters 4, 12, 18, and 19, and that the remaining characters are the raw hash.

  // Assume our guess is correct, "password"
  // password_hash is "0012fa7175528b9329009d0384f56fda4cd19cb9bcb557f".
  sac = extract_sac(password_hash)                 // "001"
  salted_hash = extract_salted_hash(password_hash) // "2fa7175528b9329009d0384f56fda4cd19cb9bcb557f"
  salt = extract_salt(salted_hash, sac)            // "1234"
  raw_hash = extract_hash(salted_hash, sac)        // "2fa775528b939009d08f56fda4cd19cb9bcb557f"
  salted_guess = concatenate(salt, ":", guess)     // "1234:password"
  hashed_guess = hash(salted_guess)                // "2fa775528b939009d08f56fda4cd19cb9bcb557f"

This allows us to create a large number of secret salting algorithms which are identified by a unique string attached to the password hash.  This means that if an attacker knows or is able to discover one SAC and its associated algorithm, their job of cracking the hash is only made easier for a subset of password hashes using that same algorithm.  Obviously, if they don’t know the meaning of the SAC codes this makes their already difficult job even more so.

The next section addresses the fact that this still relies on a secret (just a larger pool of secrets) and is easily undermined by an insider, source code leak, or other kind of exposure.

SAC Rotation

This proposal introduces a set of secrets, and that is all it will ever be.  As such, it will always be “vulnerable” (in that it doesn’t make the attackers job harder; it will never make their job easier than standard “<salt>:<hash>” mechanisms) to exposure of this secret.  This section proposes a means of mitigating the damage by rotating the set of SACs and their associated algorithms on a regular basis or following key events such as termination of an knowledgeable employee or a known security breach.

Think of a set of SACs and associated algorithms in generations.  When it makes sense to do so (following a breach, on a regular basis, etc), it is possible to rotate the current generation of SACs to produce a new generation of SACs.  Because it is not possible to obtain the original cleartext of the password hashes in the database, it is necessary to come up with a set of transitional SACs to create a bridge between the old and new generations of SAC.

At a high level, the process may work as follows:

  • A new set of SACs and associated algorithms is generated.
  • Going forward, all newly generated password hashes use the new generation of SACs.
  • For each existing hash in the database using an old generation SAC:
    • The salt and raw hash are extracted based on the old generation SAC, which is known by the SAC currently prefixing the hash.
    • The salt is injected back into the raw hash using any of the new generation SACs, and the SAC prefix on the stored hash is changed to the new generation SAC.

    The next time each user logs in following this rotation, use the opportunity to generate a new salt and re-hash their password using the current generation of SACs.

Known Issues

This presents a new weakness that must be address.  If an attacker knows the SACs and associated algorithms from a given generation (G1) and are able to obtain a set of password hashes when G1 is the current generation, they can use this information (the ability to derive the salt and raw hash from these G1 hashes) to assist in discovering the algorithm used to salt any later generation hashes they are able to obtain.  If this attacker steals a database of hashes from G9, they can systematically look for passwords from users that have not logged in between G1 and G9.  They can now use their knowledge of the raw hash and salt to determine how the salt was embedded into raw hash to produce the G9 hash, rendering any other hashes using that same G9 SAC as simple to break as if they used the common “<salt>:<hash>” mechanism.

The underlying problem is that if the raw hash and salt stay the same during a rotation, an attacker who knew the raw hash and salt during any previous generation can determine the meaning behind the new SAC so long as they are able to obtain the new password hash and the password owner has not logged in.