<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Clojure and MessageDigest</title>
	<atom:link href="http://www.deskchecked.com/2009/06/22/clojure-and-messagedigest/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.deskchecked.com/2009/06/22/clojure-and-messagedigest/</link>
	<description>Thomas Lee's programming blog</description>
	<lastBuildDate>Sat, 27 Mar 2010 16:01:14 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.4</generator>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
		<item>
		<title>By: tom</title>
		<link>http://www.deskchecked.com/2009/06/22/clojure-and-messagedigest/comment-page-1/#comment-8473</link>
		<dc:creator>tom</dc:creator>
		<pubDate>Sat, 27 Mar 2010 16:01:14 +0000</pubDate>
		<guid isPermaLink="false">http://www.deskchecked.com/?p=209#comment-8473</guid>
		<description>Thanks for that, hermit! Should&#039;ve checked this a little more carefully. :)</description>
		<content:encoded><![CDATA[<p>Thanks for that, hermit! Should&#8217;ve checked this a little more carefully. <img src='http://www.deskchecked.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
]]></content:encoded>
	</item>
	<item>
		<title>By: the hermit</title>
		<link>http://www.deskchecked.com/2009/06/22/clojure-and-messagedigest/comment-page-1/#comment-8446</link>
		<dc:creator>the hermit</dc:creator>
		<pubDate>Tue, 23 Mar 2010 10:39:14 +0000</pubDate>
		<guid isPermaLink="false">http://www.deskchecked.com/?p=209#comment-8446</guid>
		<description>I&#039;ve found a flaw in your hash function. Most of the time, based upon my
minimal amount of testing[1], it produces the wrong hash value. This is a
direct result of the way in which it converts the decimal values, produced
by the &#039;bit-and&#039; function, into hexadecimal. Using the &#039;toHexString&#039; method
results in an erroneous output because the conversion isn&#039;t padded with
leading zeroes.


Here are some examples that illustrate the problem:

First, I&#039;m going to slightly modify the original function so it uses the
much shorter (32-digit hex value) MD5 hash algorithm. Then I&#039;m going to
remove the &#039;str-join&#039; function so we can get a look at the list of 2-digit
strings produced by the map function.

  (defn hash-fn [input]
    (let [md (MessageDigest/getInstance &quot;MD5&quot;)]
      (. md update (.getBytes input))
      (let [digest (.digest md)]
        (map #(Integer/toHexString (bit-and % 0xff)) digest))))


Now, to demonstrate that not every &#039;input&#039; is affected:

  user=&gt; (hash-fn &quot;hello&quot;)
  (&quot;5d&quot; &quot;41&quot; &quot;40&quot; &quot;2a&quot; &quot;bc&quot; &quot;4b&quot; &quot;2a&quot; &quot;76&quot; &quot;b9&quot; &quot;71&quot; &quot;9d&quot; &quot;91&quot; &quot;10&quot; &quot;17&quot; &quot;c5&quot; &quot;92&quot;)
  user=&gt; (every? #(= (count %) 2) (hash-fn &quot;hello&quot;))
  true
  user=&gt; (count (apply str (hash-fn &quot;hello&quot;)))
  32


But most others are:

  user=&gt; (hash-fn &quot;hello.&quot;)
  (&quot;d9&quot; &quot;4c&quot; &quot;10&quot; &quot;e4&quot; &quot;37&quot; &quot;d1&quot; &quot;85&quot; &quot;31&quot; &quot;e1&quot; &quot;22&quot; &quot;ed&quot; &quot;b&quot; &quot;45&quot; &quot;ba&quot; &quot;dd&quot; &quot;2a&quot;)
  user=&gt; (every? #(= (count %) 2) (hash-fn &quot;hello.&quot;))
  false
  user=&gt; (filter #(= (count %) 1) (hash-fn &quot;hello.&quot;))
  (&quot;b&quot;)
  user=&gt; (count (apply str (hash-fn &quot;hello.&quot;)))
  31


  user=&gt; (hash-fn &quot;hello!&quot;)
  (&quot;5a&quot; &quot;8d&quot; &quot;d3&quot; &quot;ad&quot; &quot;7&quot; &quot;56&quot; &quot;a9&quot; &quot;3d&quot; &quot;ed&quot; &quot;72&quot; &quot;b8&quot; &quot;23&quot; &quot;b1&quot; &quot;9d&quot; &quot;d8&quot; &quot;77&quot;)
  user=&gt; (every? #(= (count %) 2) (hash-fn &quot;hello!&quot;))
  false
  user=&gt; (filter #(= (count %) 1) (hash-fn &quot;hello!&quot;))
  (&quot;7&quot;)
  user=&gt; (count (apply str (hash-fn &quot;hello!&quot;)))
  31


  user=&gt; (def fruits [&quot;apple&quot; &quot;banana&quot; &quot;grapefruit&quot;
                      &quot;melon&quot; &quot;orange&quot; &quot;strawberry&quot;
                      &quot;grape&quot; &quot;coconut&quot; &quot;kiwi&quot;
                      &quot;peach&quot; &quot;apricot&quot; &quot;plum&quot;])
  #&#039;user/fruits
  user=&gt; (def fvals (map #(hash-fn %) fruits))
  #&#039;user/fvals
  user=&gt; (dotimes [x 12] (println
                          (count (apply str (nth fvals x)))
                          (nth fruits x)))
  31 apple
  30 banana
  29 grapefruit
  31 melon
  29 orange
  31 strawberry
  31 grape
  31 coconut
  32 kiwi
  31 peach
  32 apricot
  31 plum
  nil


Ok, now that the flaw has been identified, it needs to be fixed. The best
solution that I came up with exchanges the &#039;toHexString&#039; method with Clojure&#039;s
&#039;format&#039; function:

  user=&gt; (doc format)
  -------------------------
  clojure.core/format
  ([fmt &amp; args])
    Formats a string using java.lang.String.format, see java.util.Formatter for format
    string syntax
  nil


&#039;format&#039; can be used to convert the decimal value to hex, specify the desired
string width, and pad the output with leading zeroes.

  user=&gt; (map #(Integer/toHexString (bit-and 10 0xff)) (range 6 16))
  (&quot;6&quot; &quot;7&quot; &quot;8&quot; &quot;9&quot; &quot;a&quot; &quot;b&quot; &quot;c&quot; &quot;d&quot; &quot;e&quot; &quot;f&quot;)
  user=&gt; (map #(format &quot;%02x&quot; (bit-and 10 0xff)) (range 6 16))
  (&quot;06&quot; &quot;07&quot; &quot;08&quot; &quot;09&quot; &quot;0a&quot; &quot;0b&quot; &quot;0c&quot; &quot;0d&quot; &quot;0e&quot; &quot;0f&quot;)


Finally, the original function can be modified sans the &#039;toHexString&#039;
method and the &#039;str-join&#039; function. &#039;str-join&#039; was cut because it literally
just wraps &#039;interpose&#039; in &#039;apply str&#039;:

  (apply str (interpose seperator sequence))


and since the hash function only requires that the list of 2-digit character
strings returned by &#039;map&#039; be concatenated, using:

  (apply str coll)


is more than sufficient.

  (defn hash-fn [input]
    (let [md (MessageDigest/getInstance &quot;MD5&quot;)]
      (. md update (.getBytes input))
      (let [digest (.digest md)]
        (apply str (map #(format &quot;%02x&quot; (bit-and % 0xff)) digest)))))


With &#039;toHexString&#039; replaced by &#039;format&#039; the accuracy of the hash function&#039;s
output should no longer be in question. It is time to test that assumption:

  user=&gt; (def fvals (map #(hash-fn %) fruits))
  #&#039;user/fvals
  user=&gt; (dotimes [x 12] (println
                          (count (nth fvals x))
                          (nth fruits x)))
  32 apple
  32 banana
  32 grapefruit
  32 melon
  32 orange
  32 strawberry
  32 grape
  32 coconut
  32 kiwi
  32 peach
  32 apricot
  32 plum
  nil


and with that, we have our answer.

In parting, I want to thank the author of this blog post. Your function
formed the starting point for a file hashing function that I wrote.

------------------------------------------------------------
[1] http://www.ideone.com/MoNJ14Z8</description>
		<content:encoded><![CDATA[<p>I&#8217;ve found a flaw in your hash function. Most of the time, based upon my<br />
minimal amount of testing[1], it produces the wrong hash value. This is a<br />
direct result of the way in which it converts the decimal values, produced<br />
by the &#8216;bit-and&#8217; function, into hexadecimal. Using the &#8216;toHexString&#8217; method<br />
results in an erroneous output because the conversion isn&#8217;t padded with<br />
leading zeroes.</p>
<p>Here are some examples that illustrate the problem:</p>
<p>First, I&#8217;m going to slightly modify the original function so it uses the<br />
much shorter (32-digit hex value) MD5 hash algorithm. Then I&#8217;m going to<br />
remove the &#8217;str-join&#8217; function so we can get a look at the list of 2-digit<br />
strings produced by the map function.</p>
<p>  (defn hash-fn [input]<br />
    (let [md (MessageDigest/getInstance "MD5")]<br />
      (. md update (.getBytes input))<br />
      (let [digest (.digest md)]<br />
        (map #(Integer/toHexString (bit-and % 0xff)) digest))))</p>
<p>Now, to demonstrate that not every &#8216;input&#8217; is affected:</p>
<p>  user=&gt; (hash-fn &#8220;hello&#8221;)<br />
  (&#8221;5d&#8221; &#8220;41&#8243; &#8220;40&#8243; &#8220;2a&#8221; &#8220;bc&#8221; &#8220;4b&#8221; &#8220;2a&#8221; &#8220;76&#8243; &#8220;b9&#8243; &#8220;71&#8243; &#8220;9d&#8221; &#8220;91&#8243; &#8220;10&#8243; &#8220;17&#8243; &#8220;c5&#8243; &#8220;92&#8243;)<br />
  user=&gt; (every? #(= (count %) 2) (hash-fn &#8220;hello&#8221;))<br />
  true<br />
  user=&gt; (count (apply str (hash-fn &#8220;hello&#8221;)))<br />
  32</p>
<p>But most others are:</p>
<p>  user=&gt; (hash-fn &#8220;hello.&#8221;)<br />
  (&#8221;d9&#8243; &#8220;4c&#8221; &#8220;10&#8243; &#8220;e4&#8243; &#8220;37&#8243; &#8220;d1&#8243; &#8220;85&#8243; &#8220;31&#8243; &#8220;e1&#8243; &#8220;22&#8243; &#8220;ed&#8221; &#8220;b&#8221; &#8220;45&#8243; &#8220;ba&#8221; &#8220;dd&#8221; &#8220;2a&#8221;)<br />
  user=&gt; (every? #(= (count %) 2) (hash-fn &#8220;hello.&#8221;))<br />
  false<br />
  user=&gt; (filter #(= (count %) 1) (hash-fn &#8220;hello.&#8221;))<br />
  (&#8221;b&#8221;)<br />
  user=&gt; (count (apply str (hash-fn &#8220;hello.&#8221;)))<br />
  31</p>
<p>  user=&gt; (hash-fn &#8220;hello!&#8221;)<br />
  (&#8221;5a&#8221; &#8220;8d&#8221; &#8220;d3&#8243; &#8220;ad&#8221; &#8220;7&#8243; &#8220;56&#8243; &#8220;a9&#8243; &#8220;3d&#8221; &#8220;ed&#8221; &#8220;72&#8243; &#8220;b8&#8243; &#8220;23&#8243; &#8220;b1&#8243; &#8220;9d&#8221; &#8220;d8&#8243; &#8220;77&#8243;)<br />
  user=&gt; (every? #(= (count %) 2) (hash-fn &#8220;hello!&#8221;))<br />
  false<br />
  user=&gt; (filter #(= (count %) 1) (hash-fn &#8220;hello!&#8221;))<br />
  (&#8221;7&#8243;)<br />
  user=&gt; (count (apply str (hash-fn &#8220;hello!&#8221;)))<br />
  31</p>
<p>  user=&gt; (def fruits ["apple" "banana" "grapefruit"<br />
                      "melon" "orange" "strawberry"<br />
                      "grape" "coconut" "kiwi"<br />
                      "peach" "apricot" "plum"])<br />
  #&#8217;user/fruits<br />
  user=&gt; (def fvals (map #(hash-fn %) fruits))<br />
  #&#8217;user/fvals<br />
  user=&gt; (dotimes [x 12] (println<br />
                          (count (apply str (nth fvals x)))<br />
                          (nth fruits x)))<br />
  31 apple<br />
  30 banana<br />
  29 grapefruit<br />
  31 melon<br />
  29 orange<br />
  31 strawberry<br />
  31 grape<br />
  31 coconut<br />
  32 kiwi<br />
  31 peach<br />
  32 apricot<br />
  31 plum<br />
  nil</p>
<p>Ok, now that the flaw has been identified, it needs to be fixed. The best<br />
solution that I came up with exchanges the &#8216;toHexString&#8217; method with Clojure&#8217;s<br />
&#8216;format&#8217; function:</p>
<p>  user=&gt; (doc format)<br />
  &#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;-<br />
  clojure.core/format<br />
  ([fmt &amp; args])<br />
    Formats a string using java.lang.String.format, see java.util.Formatter for format<br />
    string syntax<br />
  nil</p>
<p>&#8216;format&#8217; can be used to convert the decimal value to hex, specify the desired<br />
string width, and pad the output with leading zeroes.</p>
<p>  user=&gt; (map #(Integer/toHexString (bit-and 10 0xff)) (range 6 16))<br />
  (&#8221;6&#8243; &#8220;7&#8243; &#8220;8&#8243; &#8220;9&#8243; &#8220;a&#8221; &#8220;b&#8221; &#8220;c&#8221; &#8220;d&#8221; &#8220;e&#8221; &#8220;f&#8221;)<br />
  user=&gt; (map #(format &#8220;%02x&#8221; (bit-and 10 0xff)) (range 6 16))<br />
  (&#8221;06&#8243; &#8220;07&#8243; &#8220;08&#8243; &#8220;09&#8243; &#8220;0a&#8221; &#8220;0b&#8221; &#8220;0c&#8221; &#8220;0d&#8221; &#8220;0e&#8221; &#8220;0f&#8221;)</p>
<p>Finally, the original function can be modified sans the &#8216;toHexString&#8217;<br />
method and the &#8217;str-join&#8217; function. &#8217;str-join&#8217; was cut because it literally<br />
just wraps &#8216;interpose&#8217; in &#8216;apply str&#8217;:</p>
<p>  (apply str (interpose seperator sequence))</p>
<p>and since the hash function only requires that the list of 2-digit character<br />
strings returned by &#8216;map&#8217; be concatenated, using:</p>
<p>  (apply str coll)</p>
<p>is more than sufficient.</p>
<p>  (defn hash-fn [input]<br />
    (let [md (MessageDigest/getInstance "MD5")]<br />
      (. md update (.getBytes input))<br />
      (let [digest (.digest md)]<br />
        (apply str (map #(format &#8220;%02x&#8221; (bit-and % 0xff)) digest)))))</p>
<p>With &#8216;toHexString&#8217; replaced by &#8216;format&#8217; the accuracy of the hash function&#8217;s<br />
output should no longer be in question. It is time to test that assumption:</p>
<p>  user=&gt; (def fvals (map #(hash-fn %) fruits))<br />
  #&#8217;user/fvals<br />
  user=&gt; (dotimes [x 12] (println<br />
                          (count (nth fvals x))<br />
                          (nth fruits x)))<br />
  32 apple<br />
  32 banana<br />
  32 grapefruit<br />
  32 melon<br />
  32 orange<br />
  32 strawberry<br />
  32 grape<br />
  32 coconut<br />
  32 kiwi<br />
  32 peach<br />
  32 apricot<br />
  32 plum<br />
  nil</p>
<p>and with that, we have our answer.</p>
<p>In parting, I want to thank the author of this blog post. Your function<br />
formed the starting point for a file hashing function that I wrote.</p>
<p>&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;<br />
[1] <a href="http://www.ideone.com/MoNJ14Z8" rel="nofollow">http://www.ideone.com/MoNJ14Z8</a></p>
]]></content:encoded>
	</item>
</channel>
</rss>
