Hash Fusing

Exploring Hash Fusing via Upper Triangular Matrix Multiplication.
Author
Published

January 12, 2026

Keywords

data, hash, sha256, matrix, fusing, merkle-tree

This is an article about fusing hashes together while having the key properties of associativity and non-commutativity. I describe the basic approach of using matrix multiplication of Upper Triangle Matrices and then show the results of two experiments showing the quality and limits of this hash fusing approach.

The basic question this article tries to answer is: can fusing hashes together provide unique identifiers for arbitrary data? The answer is no. Specifically, the approach in this article can not provide unique identifiers for sufficiently large runs of repeated values. That is, sufficiently low entropy data is not uniquely represented by fusing hashes together.

The good news is that low entropy data may always be converted into high entropy data either via compression or injecting noise. A practical fusing operator can easily detect fuses of low entropy data and throw an exception when fusing produces those ‘bad’ hash values.

Introduction

I have begun working on implementing distributed immutable data structures as a way to efficiently share structured data between multiple clients. One of the key challenges in this area is being able to efficiently reference ordered collections of data. The usual approach is to use Merkle Trees which have the limitation that they are non-associative and the shape of the tree determines the final hash value at the root of the tree. But, since I plan on using Finger Trees to efficiently represent ordered collections of data, I need computed hashes that are insensitive to the tree shape. This means that I need to be able to fuse hashes together in a way that is associative and also non-commutative. As a starting point, I have been reading the HP paper describing hash fusing via matrix multiplication: https://www.labs.hpe.com/techreports/2017/HPE-2017-08.pdf

Hash Fusing via Upper Triangular Matrix Multiplication

The basic approach of fusing two hashes is to represent each hash as an upper triangular matrix and then to multiply the two matrices together. Since matrix multiplication is associative but not commutative. The following sections define hashes and build up the necessary functions to perform hash fusing via upper triangular matrix multiplication.

Hash Representation

A hash is represented as a string of 64 hexadecimal values (256 bits). To start with, here is a zero hash and a function to generate random hashes.

(def zero-hex (apply str (vec (repeat 64 0))))
"0000000000000000000000000000000000000000000000000000000000000000"
(defn random-hex
  "Generate a random hex string representing length `n` bytes."
  [n]
  (let [hex-chars "0123456789abcdef"]
    (apply str (repeatedly (* 2 n) #(rand-nth hex-chars)))))

Generate some random hashes for testing:

(def a-hex (random-hex 32))
"7cf5c332ae1eb7b3b907cf3ccd88a8b0f0300d0a4d6e12c0874e25293b868917"
(def b-hex (random-hex 32))
"027fe391a6aabd7601f1f4faae7662ea827b3690826787b0736d88f17bc49ae3"
(def c-hex (random-hex 32))
"64486ef5d61900b74c8885063f478e9b25e983cc4f7efb29611401e9a749e5c9"

To fuse two hashes, we convert each hash to an upper triangular matrix and then multiply the two matrices together. The result is another upper triangular matrix which can be converted back to a hash by taking the elements above the main diagonal. The following several sections defines this mapping between hashes and upper triangular matrices. For the experiments below four different bit sizes of cells and four corresponding matrices are defined.

8 bit cells in a 9x9 matrix:

\[%% Example: 9x9 Upper Triangular Matrix for 8 bit cells %% representing a 256 bit hash hash \to [ h_0, h_1, h_2, h_3, \ldots, h_{31} ] \to {\begin{bmatrix} 1 & h_0 & h_8 & h_{15} & h_{21} & h_{26} & h_{29} & h_{31} & h_{25} \\ 0 & 1 & h_1 & h_9 & h_{16} & h_{22} & h_{27} & h_{30} & h_{20} \\ 0 & 0 & 1 & h_2 & h_{10} & h_{17} & h_{23} & h_{28} & h_{14} \\ 0 & 0 & 0 & 1 & h_3 & h_{11} & h_{18} & h_{24} & h_7 \\ 0 & 0 & 0 & 0 & 1 & h_4 & h_{12} & h_{19} & 0 \\ 0 & 0 & 0 & 0 & 0 & 1 & h_5 & h_{13} & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 1 & h_6 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 \end{bmatrix}} \]

16 bit cells in a 7x7 matrix:

\[%% Example: 7x7 Upper Triangular Matrix for 16 bit cells %% representing a 256 bit hash hash \to [ h_0, h_1, h_2, h_3, \ldots, h_{15} ] \to {\begin{bmatrix} 1 & h_0 & h_6 & h_{10} & h_{13} & h_{15} & h_5 \\ 0 & 1 & h_1 & h_7 & h_{11} & h_{14} & 0 \\ 0 & 0 & 1 & h_2 & h_8 & h_{12} & 0 \\ 0 & 0 & 0 & 1 & h_3 & h_9 & 0 \\ 0 & 0 & 0 & 0 & 1 & h_4 & 0 \\ 0 & 0 & 0 & 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 1 \end{bmatrix}} \]

32 bit cells in a 5x5 matrix:

\[%% Example: 5x5 Upper Triangular Matrix for 32 bit cells %% representing a 256 bit hash hash \to [ h_0, h_1, h_2, h_3, \ldots, h_7 ] \to {\begin{bmatrix} 1 & h_0 & h_4 & h_7 & h_6 \\ 0 & 1 & h_1 & h_5 & h_3 \\ 0 & 0 & 1 & h_2 & 0 \\ 0 & 0 & 0 & 1 & 0 \end{bmatrix}} \]

64 bit cells in a 4x4 matrix:

\[%% Example: 4x4 Upper Triangular Matrix for 64 bit cells %% representing a 256 bit hash hash \to [ h_0, h_1, h_2, h_3 ] \to {\begin{bmatrix} 1 & h_0 & h_3 & h_2 \\ 0 & 1 & h_1 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \end{bmatrix}} \]

Hex and Byte Vector Conversion Functions

The first type of conversion is between hex strings and byte vectors.

(defn hex->hash8
  "Convert a hex string to a byte vector."
  [s]
  (with-meta
    (->> s
         (partition 2)
         (map #(-> (apply str %)
                   (Integer/parseInt 16)))
         (vec))
    {:cell-size 8}))
(defn hash8->hex
  "Convert a byte vector to a hex string."
  [bytes]
  (->> bytes
       (map #(format "%02x" %))
       (apply str)))
(= a-hex (-> a-hex hex->hash8 hash8->hex))
true

Hash Conversion Functions

The next type of conversion is from bytes and larger word sizes.

(defn hash8->hash16
  "Convert a vector of 32 bytes into a vector of 16 2-byte words."
  [hash8]
  (with-meta
    (vec (map (fn [i]
                (+ (bit-shift-left (hash8 (* 2 i)) 8)
                   (hash8 (+ 1 (* 2 i)))))
              (range 16)))
    {:cell-size 16}))
(defn hash16->hash8
  "Convert a vector 2-byte words into a vector of bytes."
  [hash16]
  (with-meta
    (vec (mapcat (fn [word]
                   [(bit-and (bit-shift-right word 8) 0xFF)
                    (bit-and word 0xFF)])
                 hash16))
    {:cell-size 8}))
(= a-hex (-> a-hex hex->hash8 hash8->hash16 hash16->hash8 hash8->hex))
true
(defn hash8->hash32
  "Convert a vector of 32 bytes into a vector of 8 4-byte words."
  [hash8]
  (with-meta
    (vec (map (fn [i]
                (+ (bit-shift-left (hash8 (* 4 i)) 24)
                   (bit-shift-left (hash8 (+ 1 (* 4 i))) 16)
                   (bit-shift-left (hash8 (+ 2 (* 4 i))) 8)
                   (hash8 (+ 3 (* 4 i)))))
              (range 8)))
    {:cell-size 32}))
(defn hash32->hash8
  "Convert a vector of 4-byte words into a vector of bytes."
  [hash32]
  (with-meta
    (vec (mapcat (fn [word]
                   [(bit-and (bit-shift-right word 24) 0xFF)
                    (bit-and (bit-shift-right word 16) 0xFF)
                    (bit-and (bit-shift-right word 8) 0xFF)
                    (bit-and word 0xFF)])
                 hash32))
    {:cell-size 8}))
(= a-hex (-> a-hex hex->hash8 hash8->hash32 hash32->hash8 hash8->hex))
true
(defn hash8->hash64
  "Convert a vector of 32 bytes into a vector of 4 8-byte words."
  [hash8]
  (with-meta
    (vec (map (fn [i]
                (+ (bit-shift-left (hash8 (* 8 i)) 56)
                   (bit-shift-left (hash8 (+ 1 (* 8 i))) 48)
                   (bit-shift-left (hash8 (+ 2 (* 8 i))) 40)
                   (bit-shift-left (hash8 (+ 3 (* 8 i))) 32)
                   (bit-shift-left (hash8 (+ 4 (* 8 i))) 24)
                   (bit-shift-left (hash8 (+ 5 (* 8 i))) 16)
                   (bit-shift-left (hash8 (+ 6 (* 8 i))) 8)
                   (hash8 (+ 7 (* 8 i)))))
              (range 4)))
    {:cell-size 64}))
(defn hash64->hash8
  "Convert a vector of 8-byte words into a vector of bytes."
  [hash64]
  (with-meta
    (vec (mapcat (fn [word]
                   [(bit-and (bit-shift-right word 56) 0xFF)
                    (bit-and (bit-shift-right word 48) 0xFF)
                    (bit-and (bit-shift-right word 40) 0xFF)
                    (bit-and (bit-shift-right word 32) 0xFF)
                    (bit-and (bit-shift-right word 24) 0xFF)
                    (bit-and (bit-shift-right word 16) 0xFF)
                    (bit-and (bit-shift-right word 8) 0xFF)
                    (bit-and word 0xFF)])
                 hash64))
    {:cell-size 8}))
(= a-hex (-> a-hex hex->hash8 hash8->hash64 hash64->hash8 hash8->hex))
true

Matrix Conversion Functions

The following functions convert between byte vectors representing hashes and four different upper triangular matrix sizes based on cell bit size.

(defn hash8->utm8
  "Convert a vector of 32 bytes into a 9x9 upper triangular matrix."
  [[h00 h01 h02 h03 h04 h05 h06 h07
    h08 h09 h10 h11 h12 h13 h14 h15
    h16 h17 h18 h19 h20 h21 h22 h23
    h24 h25 h26 h27 h28 h29 h30 h31]]
  (with-meta
    [[1 h00 h08 h15 h21 h26 h29 h31 h25]
     [0   1 h01 h09 h16 h22 h27 h30 h20]
     [0   0   1 h02 h10 h17 h23 h28 h14]
     [0   0   0   1 h03 h11 h18 h24 h07]
     [0   0   0   0   1 h04 h12 h19   0]
     [0   0   0   0   0   1 h05 h13   0]
     [0   0   0   0   0   0   1 h06   0]
     [0   0   0   0   0   0   0   1   0]
     [0   0   0   0   0   0   0   0   1]]
    {:cell-size 8}))
(defn utm8->hash8
  "Convert a 9x9 upper triangular matrix back to a vector of 32 bytes."
  [[[_ h00 h08 h15 h21 h26 h29 h31 h25]
    [_   _ h01 h09 h16 h22 h27 h30 h20]
    [_   _   _ h02 h10 h17 h23 h28 h14]
    [_   _   _   _ h03 h11 h18 h24 h07]
    [_   _   _   _   _ h04 h12 h19   _]
    [_   _   _   _   _   _ h05 h13   _]
    [_   _   _   _   _   _   _ h06   _]
    [_   _   _   _   _   _   _   _   _]
    [_   _   _   _   _   _   _   _   _]]]
  (with-meta
    (mapv #(bit-and % 0xFF)
          [h00 h01 h02 h03 h04 h05 h06 h07
           h08 h09 h10 h11 h12 h13 h14 h15
           h16 h17 h18 h19 h20 h21 h22 h23
           h24 h25 h26 h27 h28 h29 h30 h31])
    {:cell-size 8}))
(defn hash16->utm16
  "Convert a vector of 16 2-byte words into a 7x7 upper triangular matrix."
  [[h00 h01 h02 h03 h04 h05 h06 h07
    h08 h09 h10 h11 h12 h13 h14 h15]]
  (with-meta
    [[1 h00 h06 h10 h13 h15 h05]
     [0   1 h01 h07 h11 h14   0]
     [0   0   1 h02 h08 h12   0]
     [0   0   0   1 h03 h09   0]
     [0   0   0   0   1 h04   0]
     [0   0   0   0   0   1   0]
     [0   0   0   0   0   0   1]]
    {:cell-size 16}))
(defn utm16->hash16
  "Convert a 7x7 upper triangular matrix back to a vector of 16 2-byte words."
  [[[_ h00 h06 h10 h13 h15 h05]
    [_   _ h01 h07 h11 h14   _]
    [_   _   _ h02 h08 h12   _]
    [_   _   _   _ h03 h09   _]
    [_   _   _   _   _ h04   _]
    [_   _   _   _   _   _   _]
    [_   _   _   _   _   _   _]]]
  (with-meta
    (mapv #(bit-and % 0xFFFF)
          [h00 h01 h02 h03 h04 h05 h06 h07
           h08 h09 h10 h11 h12 h13 h14 h15])
    {:cell-size 16}))
(defn hash32->utm32
  "Convert a vector of 8 4-byte words into a 5x5 upper triangular matrix."
  [[h00 h01 h02 h03 h04 h05 h06 h07]]
  (with-meta
    [[1 h00 h04 h07 h06]
     [0   1 h01 h05 h03]
     [0   0   1 h02   0]
     [0   0   0   1   0]
     [0   0   0   0   1]]
    {:cell-size 32}))
(defn utm32->hash32
  "Convert a 5x5 upper triangular matrix back to a vector of 8 4-byte words."
  [[[_ h00 h04 h07 h06]
    [_   _ h01 h05 h03]
    [_   _   _ h02   _]
    [_   _   _   _   _]
    [_   _   _   _   _]]]
  (with-meta
    (mapv #(bit-and % 0xFFFFFFFF)
          [h00 h01 h02 h03 h04 h05 h06 h07])
    {:cell-size 32}))
(defn hash64->utm64
  "Convert a vector of 4 8-byte words into a 4x4 upper triangular matrix."
  [[h00 h01 h02 h03]]
  (with-meta
    [[1 h00 h03 h02]
     [0   1 h01   0]
     [0   0   1   0]
     [0   0   0   1]]
    {:cell-size 64}))
(defn utm64->hash64
  "Convert a 4x4 upper triangular matrix back to a vector of 4 8-byte words."
  [[[_ h00 h03 h02]
    [_   _ h01   _]
    [_   _   _   _]
    [_   _   _   _]]]
  (with-meta
    [h00 h01 h02 h03]
    {:cell-size 64}))

Combined Conversion Functions

The following combined conversion functions convert between hex strings into upper triangular matrices and back for the four different cell bit sizes.

(def hex->utm8
  "Convert a hex string to an upper triangular matrix with 8-bit cells."
  (comp hash8->utm8 hex->hash8))
(def utm8->hex
  "Convert an upper triangular matrix with 8-bit cells to a hex string."
  (comp hash8->hex utm8->hash8))
(= a-hex (-> a-hex hex->utm8 utm8->hex))
true
(def hex->utm16
  "Convert a hex string to an upper triangular matrix with 16-bit cells."
  (comp hash16->utm16 hash8->hash16 hex->hash8))
(def utm16->hex
  "Convert an upper triangular matrix with 16-bit cells to a hex string."
  (comp hash8->hex hash16->hash8 utm16->hash16))
(= a-hex (-> a-hex hex->utm16 utm16->hex))
true
(def hex->utm32
  "Convert a hex string to an upper triangular matrix with 32-bit cells."
  (comp hash32->utm32 hash8->hash32 hex->hash8))
(def utm32->hex
  "Convert an upper triangular matrix with 32-bit cells to a hex string."
  (comp hash8->hex hash32->hash8 utm32->hash32))
(= a-hex (-> a-hex hex->utm32 utm32->hex))
true
(def hex->utm64
  "Convert a hex string to an upper triangular matrix with 64-bit cells."
  (comp hash64->utm64 hash8->hash64 hex->hash8))
(def utm64->hex
  "Convert an upper triangular matrix with 64-bit cells to a hex string."
  (comp hash8->hex hash64->hash8 utm64->hash64))
(= a-hex (-> a-hex hex->utm64 utm64->hex))
true
(defn hex->utm
  "Convert a hex string to an upper triangular matrix with the given cell size
  (8, 16, 32, or 64 bits)."
  [hex cell-size]
  (case cell-size
    8  (hex->utm8 hex)
    16 (hex->utm16 hex)
    32 (hex->utm32 hex)
    64 (hex->utm64 hex)
    (throw (ex-info "Unsupported cell size for upper triangular matrix."
                    {:cell-size cell-size}))))
(defn utm->hex
  "Convert an upper triangular matrix with the given cell size (8, 16, 32,
  or 64 bits) to a hex string."
  [utm cell-size]
  (case cell-size
    8  (utm8->hex utm)
    16 (utm16->hex utm)
    32 (utm32->hex utm)
    64 (utm64->hex utm)
    (throw (ex-info "Unsupported cell size for upper triangular matrix."
                    {:cell-size cell-size}))))
(= a-hex (-> a-hex (hex->utm 32) (utm->hex 32)))
true

Upper Triangular Matrix Multiplication

This is the core function that performs the multiplication of two upper triangular matrices. The multiplication ignores the lower triangular part of of the matrices since they are always 0 (and always 1 on the main diagonal).

Note that unchecked math is enabled to ignore integer overflow since the cells are treated as fixed size bit fields.

(set! *unchecked-math* true)
true
(defn utm-multiply
  "Multiply two upper triangular matrices `a` and `b`."
  [a b]
  (let [dim (count a)
        cell-size (-> a meta :cell-size)
        bit-mask (if (= cell-size 64)
                   -1
                   (dec (bit-shift-left 1 cell-size)))]
    (with-meta
      (vec (for [i (range dim)]
             (vec (for [j (range dim)]
                    (cond
                      (< j i) 0
                      (= j i) 1
                      :else
                      (reduce (fn [sum k]
                                (-> sum
                                    (+ (* (get-in a [i k])
                                          (get-in b [k j])))
                                    (bit-and bit-mask)))
                              0
                              (range i (inc j))))))))
      {:cell-size cell-size})))

Associativity & Non-Commutativity Properties

Show that upper triangular matrix multiplication is associative and non-commutative. Associativity is necessary for hash fusing to work with Finger Trees so that different tree shapes produce the same fused hash. Non-commutativity is necessary for sequences of data where the order of data affects the fused hash.

(-> (for [cell-size [8 16 32 64]]
      (let [a (hex->utm a-hex cell-size)
            b (hex->utm b-hex cell-size)
            c (hex->utm c-hex cell-size)
            ab (utm-multiply a b)
            ba (utm-multiply b a)
            bc (utm-multiply b c)
            ab*c (utm-multiply ab c)
            a*bc (utm-multiply a bc)]
        {:cell-size cell-size
         :associative? (= ab*c a*bc)
         :commutative? (= ab ba)}))
    (kind/table))
cell-size associative? commutative?
8 true false
16 true false
32 true false
64 true false

Experiment 1: Random Fuses

This experiment is run with two different hashes which are fused onto an accumulator iteratively. For each iteration, one of the two hashes is randomly selected and is fused to the accumulator. This random fusing is repeated many times and the quality of the accumulator is measured both by keeping track of global uniqueness after each fuse and by the uniform distribution of bit values.

(defn random-fuses
  "Perform random fuses of two hashes onto an accumulator keeping track of all
  unique hashes produced and all duplicate hashes produced. Repeat this for all
  four utm sizes based on cell bit size."
  []
  (let [hashes {:a (random-hex 32)
                :b (random-hex 32)}
        ;; convert hashes to utms for each bit size and store in a map
        utms {8  (update-vals hashes hex->utm8)
              16 (update-vals hashes hex->utm16)
              32 (update-vals hashes hex->utm32)
              64 (update-vals hashes hex->utm64)}
        results {8  {:acc (hex->utm8 zero-hex)
                     :uniques   #{}
                     :duplicates #{}}
                 16 {:acc (hex->utm16 zero-hex)
                     :uniques   #{}
                     :duplicates #{}}
                 32 {:acc (hex->utm32 zero-hex)
                     :uniques   #{}
                     :duplicates #{}}
                 64 {:acc (hex->utm64 zero-hex)
                     :uniques   #{}
                     :duplicates #{}}}
        results (reduce
                 (fn [results _]
                   (reduce
                    (fn [results cell-size]
                      (let [curr-acc (get-in results [cell-size :acc])
                            ;; randomly select one of the two hashes
                            selected-hash (if (< (rand) 0.5)
                                            (get-in utms [cell-size :a])
                                            (get-in utms [cell-size :b]))
                            ;; fuse the selected hash onto the accumulator
                            new-acc (utm-multiply curr-acc selected-hash)]
                        ;; update results with new accumulator and uniqueness info
                        (if (contains? (get-in results [cell-size :uniques]) new-acc)
                          (update-in results [cell-size :duplicates] conj new-acc)
                          (-> results
                              (assoc-in [cell-size :acc] new-acc)
                              (update-in [cell-size :uniques] conj new-acc)))))
                    results
                    [8 16 32 64]))
                 results
                 (range 10000))]
    ;; convert final results to totals of unique and duplicate hashes
    (->> results
         (map (fn [[cell-size {:keys [uniques duplicates]}]]
                {:cell-size   cell-size
                 :uniques     (count uniques)
                 :duplicates (count duplicates)}))
         (kind/table))))

Random Fuses Results

(random-fuses)
cell-size uniques duplicates
8 10000 0
16 10000 0
32 10000 0
64 10000 0

These results show that high entropy data can be well represented by fusing hashes together. All four bit sizes for the cells of the upper triangular matrices performed well with high entroy data. I have rerun this experiment with millions of fuses and have never observed a duplicate hash being produced. Since 64 bit cells fit in the smallest matrix, this is the fasted to compute.

Experiment 2: Folded Fuses

This experiment is run with a hash fused together with itself (i.e. folding) and then the result in turn fused with itself and so on. This folding is repeated many times and the quality of the accumulator is measured both by keeping track of global uniqueness after each fuse and by the uniform distribution of bit values. Once the accumulator becomes zero the folding stops and the number of folds to reach this state is recorded. Also, the number of lower bits that are zero across all cells is recorded after each fold.

(defn calc-zero-lower-bits
  "Calculate the number of lower bits that are zero across all upper cells in
  the upper triangular matrix."
  [utm]
  (let [cell-size (-> utm meta :cell-size)]
    (loop [mask 1 zero-lower-bits 0]
      (if (and (< zero-lower-bits cell-size)
               (->> (for [i (range (count utm))
                          j (range (inc i) (count utm))]
                      (get-in utm [i j]))
                    (map #(bit-and mask %))
                    (every? zero?)))
        (recur (bit-shift-left mask 1)
               (inc zero-lower-bits))
        zero-lower-bits))))
(defn folded-fuses
  "Perform folded fuses starting from the fixed hash `a-hex`, converted to an
  upper triangular matrix (UTM) for each cell size. After each fold, track how
  many lower bits of each cell are zero across all upper cells using
  `calc-zero-lower-bits`. Stop when all those lower bits are zero (i.e.,
  `zero-lower-bits` equals the cell size) or when 1000 folds have been
  performed. Report, for each cell size and fold, the fold count, the number
  of zero lower bits, and the resulting hash."
  []
  (let [;; convert hashes to utms for each bit size and store in a map
        results (reduce
                 (fn [results cell-size]
                   (let [fold (hex->utm a-hex cell-size)
                         zero-lower-bits (calc-zero-lower-bits fold)
                         fold-result
                         (loop [folds [{:fold fold
                                        :zero-lower-bits zero-lower-bits}]]
                           (let [{:keys [fold zero-lower-bits]} (last folds)]
                             (if (or (= zero-lower-bits cell-size)
                                     (>= (count folds) 1000))
                               folds
                               (let [fold (utm-multiply fold fold)
                                     zero-lower-bits (calc-zero-lower-bits fold)]
                                 (recur (conj folds
                                              {:fold fold
                                               :zero-lower-bits zero-lower-bits}))))))]
                     (assoc results cell-size fold-result)))
                 {}
                 [8 16 32 64])]
    ;; format results into a table
    (->> results
         (mapcat (fn [[cell-size folds]]
                   (map-indexed (fn [index {:keys [fold zero-lower-bits]}]
                                  {:cell-size      cell-size
                                   :fold-count     index
                                   :zero-lower-bits zero-lower-bits
                                   :fold (-> [:pre (utm->hex fold cell-size)]
                                             (kind/hiccup))})

                                folds)))
         (kind/table))))

Folded Fuses Results

(folded-fuses)
cell-size fold-count zero-lower-bits fold
8 0 0
7cf5c332ae1eb7b3b907cf3ccd88a8b0f0300d0a4d6e12c0874e25293b868917
8 1 0
f8ea86645c3c6e661eadb474fe82a9af59c62c0f4713d8ca2d60b6b9210479bb
8 2 1
f0d40cc8b878dcccecd6c0d88cccb6aacef4c0faf6d20c9c061000c0b482ea90
8 3 2
e0a8189070f0b898989ce07058b8fc04cc0820e42c94c8987c009078b02ca438
8 4 3
c0503020e0e0703030f840e0b0f038c85890c08858e850b0b88060d080f80850
8 5 4
80a06040c0c0e06060f080c060e07090b0208010b0d0a0607000c02080701020
8 6 5
0040c0808080c0c0c0e00080c0c0e0206040002060a040c0e000804000e02040
8 7 6
008080000000808080c000008080c040c0800040c0408080c000008000c04080
8 8 7
0000000000000000008000000000808080000080808000008000000000808000
8 9 8
0000000000000000000000000000000000000000000000000000000000000000
16 0 0
7cf5c332ae1eb7b3b907cf3ccd88a8b0f0300d0a4d6e12c0874e25293b868917
16 1 0
f9ea86645c3c6f66720e9e78a1ea2d3c115a7af9d13c61f0a3180b7c18688a47
16 2 1
f3d40cc8b878decce41c3cf05f3cc9e8e69c79866228b0f09b78e48448acabb2
16 3 2
e7a8199070f0bd98c83879e02c185190dcd8015cf490f0a0a6d0f83818c89954
16 4 3
cf503320e1e07b309070f3c00eb09a20f8303bf83620f040e3209330d950d168
16 5 4
9ea06640c3c0f66020e0e780f7601040ea605cf00040bc80cc40c16041a055d0
16 6 5
3d40cc808780ecc041c0cf0056c09080bcc04de05080e90030806ec03f4037a0
16 7 6
7a8099000f00d98083809e004d80e1001980ebc0e1009200c1008d806e809f40
16 8 7
f50032001e00b30007003c001b00c200b3001780c20024000200db009d00fe80
16 9 8
ea0064003c0066000e0078003600840066002f00840048000400b6003a00fd00
16 10 9
d400c8007800cc001c00f0006c000800cc005e000800900008006c007400fa00
16 11 10
a8009000f00098003800e000d80010009800bc00100020001000d800e800f400
16 12 11
50002000e00030007000c000b000200030007800200040002000b000d000e800
16 13 12
a0004000c0006000e0008000600040006000f0004000800040006000a000d000
16 14 13
400080008000c000c0000000c0008000c000e000800000008000c0004000a000
16 15 14
800000000000800080000000800000008000c000000000000000800080004000
16 16 15
0000000000000000000000000000000000008000000000000000000000008000
16 17 16
0000000000000000000000000000000000000000000000000000000000000000
32 0 0
7cf5c332ae1eb7b3b907cf3ccd88a8b0f0300d0a4d6e12c0874e25293b868917
32 1 1
f9eb86645c3d6f66720f9e789b115160ca9c540aa17df074fc3b4cb21ba92006
32 2 2
f3d70cc8b87adecce41f3cf03622a2c03e298fec5d830cb8aef2a2e49c98760c
32 3 3
e7ae199070f5bd98c83e79e06c4545802016bf382522c8b037d56bc864f9b918
32 4 4
cf5c3320e1eb7b30907cf3c0d88a8b00cf3bfbf0f2b84e60d76b6f902e964e30
32 5 5
9eb86640c3d6f66020f9e780b1151600dab1ede0873b90c04dd93f209bb54c60
32 6 6
3d70cc8087adecc041f3cf00622a2c00a64bb3c095a2f18017bbfe40917758c0
32 7 7
7ae199000f5bd98083e79e00c45458001036c78047f523001f9dfc808a71b180
32 8 8
f5c332001eb7b30007cf3c0088a8b0002eeb0f0002a74600ffd3f900ad6f6300
32 9 9
eb8664003d6f66000f9e78001151600097cc1e00d0428c000207f200910ec600
32 10 10
d70cc8007adecc001f3cf00022a2c00017703c00cc5518000d8fe4009add8c00
32 11 11
ae199000f5bd98003e79e00045458000ce40780047ea3000411fc80018bb1800
32 12 12
5c332000eb7b30007cf3c0008a8b00001a00f0004cd460001a3f9000bd763000
32 13 13
b8664000d6f66000f9e78000151600002a01e0008da8c000947f2000aaec6000
32 14 14
70cc8000adecc000f3cf00002a2c00002c03c000eb518000a8fe400015d8c000
32 15 15
e19900005bd98000e79e000054580000b807800016a3000051fc80002bb18000
32 16 16
c3320000b7b30000cf3c0000a8b00000f00f00002d460000a3f9000057630000
32 17 17
866400006f6600009e78000051600000e01e00005a8c000047f20000aec60000
32 18 18
0cc80000decc00003cf00000a2c00000c03c0000b51800008fe400005d8c0000
32 19 19
19900000bd98000079e0000045800000807800006a3000001fc80000bb180000
32 20 20
332000007b300000f3c000008b00000000f00000d46000003f90000076300000
32 21 21
66400000f6600000e78000001600000001e00000a8c000007f200000ec600000
32 22 22
cc800000ecc00000cf0000002c00000003c0000051800000fe400000d8c00000
32 23 23
99000000d98000009e0000005800000007800000a3000000fc800000b1800000
32 24 24
32000000b30000003c000000b00000000f00000046000000f900000063000000
32 25 25
640000006600000078000000600000001e0000008c000000f2000000c6000000
32 26 26
c8000000cc000000f0000000c00000003c00000018000000e40000008c000000
32 27 27
9000000098000000e0000000800000007800000030000000c800000018000000
32 28 28
2000000030000000c000000000000000f0000000600000009000000030000000
32 29 29
40000000600000008000000000000000e0000000c00000002000000060000000
32 30 30
80000000c00000000000000000000000c00000008000000040000000c0000000
32 31 31
0000000080000000000000000000000080000000000000008000000080000000
32 32 32
0000000000000000000000000000000000000000000000000000000000000000
64 0 0
7cf5c332ae1eb7b3b907cf3ccd88a8b0f0300d0a4d6e12c0874e25293b868917
64 1 1
f9eb86655c3d6f66720f9e799b115160e0601a149adc25808ea8a3ce42d0d53e
64 2 2
f3d70ccab87adecce41f3cf33622a2c0c0c0342935b84b001d82ad8bb4b0b6bc
64 3 3
e7ae199570f5bd98c83e79e66c454580818068526b7096003bcaf2d4259d9e78
64 4 4
cf5c332ae1eb7b30907cf3ccd88a8b000300d0a4d6e12c007aac449b3c2c00f0
64 5 5
9eb86655c3d6f66020f9e799b11516000601a149adc2580001b205023c1b11e0
64 6 6
3d70ccab87adecc041f3cf33622a2c000c0342935b84b00034c9f933874263c0
64 7 7
7ae199570f5bd98083e79e66c454580018068526b70960002f2baf234ab5c780
64 8 8
f5c332ae1eb7b30007cf3ccd88a8b000300d0a4d6e12c00074b65137862f8f00
64 9 9
eb86655c3d6f66000f9e799b11516000601a149adc25800042e86e32cf6f1e00
64 10 10
d70ccab87adecc001f3cf33622a2c000c0342935b84b0000ebc00b74ab1e3c00
64 11 11
ae199570f5bd98003e79e66c454580008068526b709600006f3cd325873c7800
64 12 12
5c332ae1eb7b30007cf3ccd88a8b000000d0a4d6e12c00003d6c973bd278f000
64 13 13
b86655c3d6f66000f9e799b11516000001a149adc2580000f6a4f23ab4f1e000
64 14 14
70ccab87adecc000f3cf33622a2c00000342935b84b00000dc78f381a9e3c000
64 15 15
e199570f5bd98000e79e66c454580000068526b70960000075ae233453c78000
64 16 16
c332ae1eb7b30000cf3ccd88a8b000000d0a4d6e12c00000de4d372ca78f0000
64 17 17
86655c3d6f6600009e799b11516000001a149adc25800000885e31694f1e0000
64 18 18
0ccab87adecc00003cf33622a2c00000342935b84b0000003fcb6f129e3c0000
64 19 19
199570f5bd98000079e66c454580000068526b70960000003bd30f253c780000
64 20 20
332ae1eb7b300000f3ccd88a8b000000d0a4d6e12c0000006896e24a78f00000
64 21 21
6655c3d6f6600000e799b11516000000a149adc25800000094f0d494f1e00000
64 22 22
ccab87adecc00000cf33622a2c00000042935b84b000000038ede929e3c00000
64 23 23
99570f5bd98000009e66c454580000008526b70960000000ae0cd253c7800000
64 24 24
32ae1eb7b30000003ccd88a8b00000000a4d6e12c00000004cdda4a78f000000
64 25 25
655c3d6f66000000799b115160000000149adc25800000005ccb494f1e000000
64 26 26
cab87adecc000000f33622a2c00000002935b84b00000000c5d6929e3c000000
64 27 27
9570f5bd98000000e66c454580000000526b709600000000bcad253c78000000
64 28 28
2ae1eb7b30000000ccd88a8b00000000a4d6e12c000000003d5a4a78f0000000
64 29 29
55c3d6f66000000099b115160000000049adc258000000008ab494f1e0000000
64 30 30
ab87adecc000000033622a2c00000000935b84b000000000556929e3c0000000
64 31 31
570f5bd98000000066c454580000000026b7096000000000aad253c780000000
64 32 32
ae1eb7b300000000cd88a8b0000000004d6e12c00000000055a4a78f00000000
64 33 33
5c3d6f66000000009b115160000000009adc258000000000ab494f1e00000000
64 34 34
b87adecc000000003622a2c00000000035b84b000000000056929e3c00000000
64 35 35
70f5bd98000000006c454580000000006b70960000000000ad253c7800000000
64 36 36
e1eb7b3000000000d88a8b0000000000d6e12c00000000005a4a78f000000000
64 37 37
c3d6f66000000000b115160000000000adc2580000000000b494f1e000000000
64 38 38
87adecc000000000622a2c00000000005b84b000000000006929e3c000000000
64 39 39
0f5bd98000000000c454580000000000b709600000000000d253c78000000000
64 40 40
1eb7b3000000000088a8b000000000006e12c00000000000a4a78f0000000000
64 41 41
3d6f6600000000001151600000000000dc25800000000000494f1e0000000000
64 42 42
7adecc000000000022a2c00000000000b84b000000000000929e3c0000000000
64 43 43
f5bd98000000000045458000000000007096000000000000253c780000000000
64 44 44
eb7b3000000000008a8b000000000000e12c0000000000004a78f00000000000
64 45 45
d6f66000000000001516000000000000c25800000000000094f1e00000000000
64 46 46
adecc000000000002a2c00000000000084b000000000000029e3c00000000000
64 47 47
5bd98000000000005458000000000000096000000000000053c7800000000000
64 48 48
b7b3000000000000a8b000000000000012c0000000000000a78f000000000000
64 49 49
6f66000000000000516000000000000025800000000000004f1e000000000000
64 50 50
decc000000000000a2c00000000000004b000000000000009e3c000000000000
64 51 51
bd98000000000000458000000000000096000000000000003c78000000000000
64 52 52
7b300000000000008b000000000000002c0000000000000078f0000000000000
64 53 53
f66000000000000016000000000000005800000000000000f1e0000000000000
64 54 54
ecc00000000000002c00000000000000b000000000000000e3c0000000000000
64 55 55
d98000000000000058000000000000006000000000000000c780000000000000
64 56 56
b300000000000000b000000000000000c0000000000000008f00000000000000
64 57 57
6600000000000000600000000000000080000000000000001e00000000000000
64 58 58
cc00000000000000c00000000000000000000000000000003c00000000000000
64 59 59
9800000000000000800000000000000000000000000000007800000000000000
64 60 60
300000000000000000000000000000000000000000000000f000000000000000
64 61 61
600000000000000000000000000000000000000000000000e000000000000000
64 62 62
c00000000000000000000000000000000000000000000000c000000000000000
64 63 63
8000000000000000000000000000000000000000000000008000000000000000
64 64 64
0000000000000000000000000000000000000000000000000000000000000000

These results show that repeated folding devolves into a zero hash after only a few folds. The number of bits in a cell nearly exactly defines the number of folds needed before reaching a zero hash. Since each fold represents a doubling of the number of fuses of the same value, folding shows that fusing can not reliably represent long runs of repeated values.

Practical Usage

Based on the experiments above, here are recommendations for using hash fusing in practice.

Detecting Low-Entropy Failures

The key insight from Experiment 2 is that low-entropy data causes the lower bits to become zero progressively. We can detect this condition and fail fast rather than silently producing degenerate hashes.

The following function fuses two hashes and raises an error if the result shows signs of low-entropy degeneration (lower 32 bits all zero):

(defn high-entropy-fuse
  "Fuse two 256-bit hashes together via upper triangular matrix multiplication
  with 64-bit cells. Raise an error when the lower 32 bits of the result are
  all zero."
  [a-hex b-hex]
  (let [a (hex->utm64 a-hex)
        b (hex->utm64 b-hex)
        ab (utm-multiply a b)]
    (if (->> (for [i (range (count a))
                   j (range (inc i) (count a))]
               (get-in ab [i j]))
             (map #(bit-and 0xFFFFFFFF %))
             (every? zero?))
      (throw (ex-info "Low entropy data detected during hash fusing."
                      {:a a-hex :b b-hex :fused (utm64->hex ab)}))
      (utm64->hex ab))))

Example of high entropy data fusing successfully:

(high-entropy-fuse a-hex b-hex)
"7f75a6c454c97529baf9c4377bff0b9a72ab439acfd59a70a767a4e7f8339398"

Example of low entropy data causing an error:

(try
  (high-entropy-fuse zero-hex zero-hex)
  (catch Exception e
         (.getMessage e)))
"Low entropy data detected during hash fusing."

Handling Low-Entropy Data

When working with data that may contain long runs of repeated values, consider these strategies:

  1. Compression: Run-length encode or compress the data before hashing. This converts low-entropy patterns into high-entropy representations.

  2. Salting: Inject position-dependent noise into the hash sequence. For example, XOR each element’s hash with a hash of its index.

  3. Chunking: Break long sequences into smaller chunks and hash each chunk separately before fusing. This limits the damage from repetition.

Conclusion

This article described a method for fusing hashes together using upper triangular matrix multiplication. The method is associative and non-commutative.

Experiment 1 showed that high entropy data can be well represented by fusing hashes together and that the size of cell fields, from 8 to 64 bits, all performed well with high entropy data. Since 64 bit cells fit in the smallest matrix, this is the fastest to compute.

Experiment 2 showed that low entropy data, specifically long runs of repeated values, can not be well represented by fusing hashes together since they all approach a zero hash. The rate of approach is determined by the bit count in the cells. 64 bit cells were the most able to handle repeated values.

The Practical Usage section provides a working implementation with low-entropy detection, along with strategies for handling problematic data.

Appendix: Why Low Entropy Data Fails — Nilpotent Group Structure

The following explanation, based on interactions with a Claude-based AI assistant, is very well written and accurate.

The low-entropy failure observed in Experiment 2 is not an accident of implementation — it is a fundamental consequence of the algebraic structure we chose. Understanding this helps clarify both the limitations and the design space for alternatives.

Upper Triangular Matrices Form a Nilpotent Group

An n×n upper triangular matrix with 1s on the diagonal can be written as I + N, where I is the identity matrix and N is strictly upper triangular (all diagonal entries are 0). The key property of strictly upper triangular matrices is that they are nilpotent: repeated multiplication eventually yields zero.

Specifically, for an n×n strictly upper triangular matrix N:

N^n = 0 (the zero matrix)

This happens because each multiplication “shifts” the non-zero entries one diagonal further from the main diagonal. After n-1 multiplications, all entries have been shifted off the matrix.

How This Affects Hash Fusing

When we fuse a hash with itself (A × A), we are computing (I + N) × (I + N):

(I + N)² = I + 2N + N²

With our bit masking over w-bit cells, arithmetic is effectively done modulo 2^w. Multiplying by 2 shifts all bits left and, after masking back to w bits, forces the least significant bit of each cell to 0. After k squarings:

(I + N)(2k) = I + 2^k · N + (higher powers of N)

The coefficient 2^k means the lowest k bits of each cell are forced to zero under mod 2^w / bitmasking. This is exactly what Experiment 2 observed: each fold (squaring) zeros out one more bit, and after cell-size folds, all bits are zero.

Why This Is Fundamental

The nilpotent structure is intrinsic to upper triangular matrices. Any associative, non-commutative composition based on this structure will exhibit the same behavior. This is not a bug — it’s a mathematical property of the group we chose.

The Trade-off

We chose upper triangular matrices because they provide:

  • Associativity: Essential for Finger Trees
  • Non-commutativity: Essential for ordered sequences
  • Efficient representation: The matrix structure maps cleanly to hash bits

The cost is nilpotency, which manifests as the low-entropy failure mode. This trade-off is acceptable for high-entropy data (which is the common case for content hashes) and can be mitigated with detection and preprocessing for edge cases.

Alternative Structures

For applications where low-entropy data is common, one could explore non-nilpotent groups such as GL(n, F_p) — the general linear group over a finite field. However, this would sacrifice the sparse matrix efficiency and require more complex encoding. The upper triangular approach remains practical for most content-addressed data structure applications.

Appendix B: Alternatives Explored

Before settling on upper triangular matrices, several alternative approaches were investigated. This section documents those explorations and explains why they were not suitable.

Quaternion Multiplication

Quaternions are a natural candidate: they form a non-commutative algebra and have efficient 4-component representations. However, experiments with quaternion-based fusing revealed the same nilpotent-like behavior observed with upper triangular matrices.

When quaternions are encoded similarly to the UTM approach — with values close to the identity element (1 + εi + εj + εk where ε represents small perturbations) — repeated self-multiplication causes the perturbation terms to degenerate. While pure unit quaternions form a proper group (isomorphic to SU(2)), the encoding required for hash values reintroduces the same fundamental problem.

Modified Fusing Operations

Various modifications to the basic fusing operation were attempted, including:

  • XOR combined with rotations
  • Nonlinear mixing functions
  • Polynomial operations over finite fields

The consistent finding was that associativity is surprisingly fragile. Most intuitive “mixing” operations that might improve entropy preservation break associativity, which is essential for the Finger Tree use case.

Matrix multiplication is special precisely because it is one of the few operations that provides both associativity and non-commutativity naturally.

Why Upper Triangular Matrices Were Chosen

Despite the low-entropy limitation, upper triangular matrices remain the best practical choice because:

  1. Guaranteed associativity: Matrix multiplication is inherently associative — this cannot be broken by implementation choices.

  2. Guaranteed non-commutativity: For n > 1, matrix multiplication is non-commutative, preserving sequence order.

  3. Efficient sparse encoding: The upper triangular structure means only n(n-1)/2 cells carry information, mapping cleanly to hash bits.

  4. Predictable failure mode: The low-entropy degeneration is detectable and follows a known pattern (one bit per fold), making it possible to implement safeguards.

The alternatives explored either broke associativity (disqualifying them for Finger Trees) or exhibited the same degeneration behavior without the other benefits of the matrix approach.

source: src/math/hashing/hashfusing.clj