Clojure and MessageDigest
June 22nd, 2009
Last night I needed a quick and dirty SHA function for a clojure-based web application. MessageDigest to the rescue!
(ns com.deskchecked.utils
(:use clojure.contrib.str-utils)
(:import [java.security MessageDigest]))
(defn sha
"Generates a SHA-256 hash of the given input plaintext."
[input]
(let [md (MessageDigest/getInstance "SHA-256")]
(. md update (.getBytes input))
(let [digest (.digest md)]
(str-join "" (map #(Integer/toHexString (bit-and % 0xff)) digest)))))
There’s obviously a dependency on clojure-contrib here which you can do away with if you don’t need it. And of course, you can pick a hashing algorithm to suit your needs.
Nothing exciting, but I figure I can’t be the only person in need of this sort of stuff.
Categories: Functional Programming, Software Development |



I’ve found a flaw in your hash function. Most of the time, based upon my
minimal amount of testing[1], it produces the wrong hash value. This is a
direct result of the way in which it converts the decimal values, produced
by the ‘bit-and’ function, into hexadecimal. Using the ‘toHexString’ method
results in an erroneous output because the conversion isn’t padded with
leading zeroes.
Here are some examples that illustrate the problem:
First, I’m going to slightly modify the original function so it uses the
much shorter (32-digit hex value) MD5 hash algorithm. Then I’m going to
remove the ’str-join’ function so we can get a look at the list of 2-digit
strings produced by the map function.
(defn hash-fn [input]
(let [md (MessageDigest/getInstance "MD5")]
(. md update (.getBytes input))
(let [digest (.digest md)]
(map #(Integer/toHexString (bit-and % 0xff)) digest))))
Now, to demonstrate that not every ‘input’ is affected:
user=> (hash-fn “hello”)
(”5d” “41″ “40″ “2a” “bc” “4b” “2a” “76″ “b9″ “71″ “9d” “91″ “10″ “17″ “c5″ “92″)
user=> (every? #(= (count %) 2) (hash-fn “hello”))
true
user=> (count (apply str (hash-fn “hello”)))
32
But most others are:
user=> (hash-fn “hello.”)
(”d9″ “4c” “10″ “e4″ “37″ “d1″ “85″ “31″ “e1″ “22″ “ed” “b” “45″ “ba” “dd” “2a”)
user=> (every? #(= (count %) 2) (hash-fn “hello.”))
false
user=> (filter #(= (count %) 1) (hash-fn “hello.”))
(”b”)
user=> (count (apply str (hash-fn “hello.”)))
31
user=> (hash-fn “hello!”)
(”5a” “8d” “d3″ “ad” “7″ “56″ “a9″ “3d” “ed” “72″ “b8″ “23″ “b1″ “9d” “d8″ “77″)
user=> (every? #(= (count %) 2) (hash-fn “hello!”))
false
user=> (filter #(= (count %) 1) (hash-fn “hello!”))
(”7″)
user=> (count (apply str (hash-fn “hello!”)))
31
user=> (def fruits ["apple" "banana" "grapefruit"
"melon" "orange" "strawberry"
"grape" "coconut" "kiwi"
"peach" "apricot" "plum"])
#’user/fruits
user=> (def fvals (map #(hash-fn %) fruits))
#’user/fvals
user=> (dotimes [x 12] (println
(count (apply str (nth fvals x)))
(nth fruits x)))
31 apple
30 banana
29 grapefruit
31 melon
29 orange
31 strawberry
31 grape
31 coconut
32 kiwi
31 peach
32 apricot
31 plum
nil
Ok, now that the flaw has been identified, it needs to be fixed. The best
solution that I came up with exchanges the ‘toHexString’ method with Clojure’s
‘format’ function:
user=> (doc format)
————————-
clojure.core/format
([fmt & args])
Formats a string using java.lang.String.format, see java.util.Formatter for format
string syntax
nil
‘format’ can be used to convert the decimal value to hex, specify the desired
string width, and pad the output with leading zeroes.
user=> (map #(Integer/toHexString (bit-and 10 0xff)) (range 6 16))
(”6″ “7″ “8″ “9″ “a” “b” “c” “d” “e” “f”)
user=> (map #(format “%02x” (bit-and 10 0xff)) (range 6 16))
(”06″ “07″ “08″ “09″ “0a” “0b” “0c” “0d” “0e” “0f”)
Finally, the original function can be modified sans the ‘toHexString’
method and the ’str-join’ function. ’str-join’ was cut because it literally
just wraps ‘interpose’ in ‘apply str’:
(apply str (interpose seperator sequence))
and since the hash function only requires that the list of 2-digit character
strings returned by ‘map’ be concatenated, using:
(apply str coll)
is more than sufficient.
(defn hash-fn [input]
(let [md (MessageDigest/getInstance "MD5")]
(. md update (.getBytes input))
(let [digest (.digest md)]
(apply str (map #(format “%02x” (bit-and % 0xff)) digest)))))
With ‘toHexString’ replaced by ‘format’ the accuracy of the hash function’s
output should no longer be in question. It is time to test that assumption:
user=> (def fvals (map #(hash-fn %) fruits))
#’user/fvals
user=> (dotimes [x 12] (println
(count (nth fvals x))
(nth fruits x)))
32 apple
32 banana
32 grapefruit
32 melon
32 orange
32 strawberry
32 grape
32 coconut
32 kiwi
32 peach
32 apricot
32 plum
nil
and with that, we have our answer.
In parting, I want to thank the author of this blog post. Your function
formed the starting point for a file hashing function that I wrote.
————————————————————
[1] http://www.ideone.com/MoNJ14Z8
Thanks for that, hermit! Should’ve checked this a little more carefully.