Clojure and MessageDigest

June 22nd, 2009

Last night I needed a quick and dirty SHA function for a clojure-based web application. MessageDigest to the rescue!


(ns com.deskchecked.utils
  (:use clojure.contrib.str-utils)
  (:import [java.security MessageDigest]))

(defn sha
  "Generates a SHA-256 hash of the given input plaintext."
  [input]
  (let [md (MessageDigest/getInstance "SHA-256")]
    (. md update (.getBytes input))
    (let [digest (.digest md)]
      (str-join "" (map #(Integer/toHexString (bit-and % 0xff)) digest)))))

There’s obviously a dependency on clojure-contrib here which you can do away with if you don’t need it. And of course, you can pick a hashing algorithm to suit your needs.

Nothing exciting, but I figure I can’t be the only person in need of this sort of stuff. :)

Categories: Functional Programming, Software Development |

2 Comments

  1. the hermit

    I’ve found a flaw in your hash function. Most of the time, based upon my
    minimal amount of testing[1], it produces the wrong hash value. This is a
    direct result of the way in which it converts the decimal values, produced
    by the ‘bit-and’ function, into hexadecimal. Using the ‘toHexString’ method
    results in an erroneous output because the conversion isn’t padded with
    leading zeroes.

    Here are some examples that illustrate the problem:

    First, I’m going to slightly modify the original function so it uses the
    much shorter (32-digit hex value) MD5 hash algorithm. Then I’m going to
    remove the ’str-join’ function so we can get a look at the list of 2-digit
    strings produced by the map function.

    (defn hash-fn [input]
    (let [md (MessageDigest/getInstance "MD5")]
    (. md update (.getBytes input))
    (let [digest (.digest md)]
    (map #(Integer/toHexString (bit-and % 0xff)) digest))))

    Now, to demonstrate that not every ‘input’ is affected:

    user=> (hash-fn “hello”)
    (”5d” “41″ “40″ “2a” “bc” “4b” “2a” “76″ “b9″ “71″ “9d” “91″ “10″ “17″ “c5″ “92″)
    user=> (every? #(= (count %) 2) (hash-fn “hello”))
    true
    user=> (count (apply str (hash-fn “hello”)))
    32

    But most others are:

    user=> (hash-fn “hello.”)
    (”d9″ “4c” “10″ “e4″ “37″ “d1″ “85″ “31″ “e1″ “22″ “ed” “b” “45″ “ba” “dd” “2a”)
    user=> (every? #(= (count %) 2) (hash-fn “hello.”))
    false
    user=> (filter #(= (count %) 1) (hash-fn “hello.”))
    (”b”)
    user=> (count (apply str (hash-fn “hello.”)))
    31

    user=> (hash-fn “hello!”)
    (”5a” “8d” “d3″ “ad” “7″ “56″ “a9″ “3d” “ed” “72″ “b8″ “23″ “b1″ “9d” “d8″ “77″)
    user=> (every? #(= (count %) 2) (hash-fn “hello!”))
    false
    user=> (filter #(= (count %) 1) (hash-fn “hello!”))
    (”7″)
    user=> (count (apply str (hash-fn “hello!”)))
    31

    user=> (def fruits ["apple" "banana" "grapefruit"
    "melon" "orange" "strawberry"
    "grape" "coconut" "kiwi"
    "peach" "apricot" "plum"])
    #’user/fruits
    user=> (def fvals (map #(hash-fn %) fruits))
    #’user/fvals
    user=> (dotimes [x 12] (println
    (count (apply str (nth fvals x)))
    (nth fruits x)))
    31 apple
    30 banana
    29 grapefruit
    31 melon
    29 orange
    31 strawberry
    31 grape
    31 coconut
    32 kiwi
    31 peach
    32 apricot
    31 plum
    nil

    Ok, now that the flaw has been identified, it needs to be fixed. The best
    solution that I came up with exchanges the ‘toHexString’ method with Clojure’s
    ‘format’ function:

    user=> (doc format)
    ————————-
    clojure.core/format
    ([fmt & args])
    Formats a string using java.lang.String.format, see java.util.Formatter for format
    string syntax
    nil

    ‘format’ can be used to convert the decimal value to hex, specify the desired
    string width, and pad the output with leading zeroes.

    user=> (map #(Integer/toHexString (bit-and 10 0xff)) (range 6 16))
    (”6″ “7″ “8″ “9″ “a” “b” “c” “d” “e” “f”)
    user=> (map #(format “%02x” (bit-and 10 0xff)) (range 6 16))
    (”06″ “07″ “08″ “09″ “0a” “0b” “0c” “0d” “0e” “0f”)

    Finally, the original function can be modified sans the ‘toHexString’
    method and the ’str-join’ function. ’str-join’ was cut because it literally
    just wraps ‘interpose’ in ‘apply str’:

    (apply str (interpose seperator sequence))

    and since the hash function only requires that the list of 2-digit character
    strings returned by ‘map’ be concatenated, using:

    (apply str coll)

    is more than sufficient.

    (defn hash-fn [input]
    (let [md (MessageDigest/getInstance "MD5")]
    (. md update (.getBytes input))
    (let [digest (.digest md)]
    (apply str (map #(format “%02x” (bit-and % 0xff)) digest)))))

    With ‘toHexString’ replaced by ‘format’ the accuracy of the hash function’s
    output should no longer be in question. It is time to test that assumption:

    user=> (def fvals (map #(hash-fn %) fruits))
    #’user/fvals
    user=> (dotimes [x 12] (println
    (count (nth fvals x))
    (nth fruits x)))
    32 apple
    32 banana
    32 grapefruit
    32 melon
    32 orange
    32 strawberry
    32 grape
    32 coconut
    32 kiwi
    32 peach
    32 apricot
    32 plum
    nil

    and with that, we have our answer.

    In parting, I want to thank the author of this blog post. Your function
    formed the starting point for a file hashing function that I wrote.

    ————————————————————
    [1] http://www.ideone.com/MoNJ14Z8

  2. tom

    Thanks for that, hermit! Should’ve checked this a little more carefully. :)

Leave a comment