Hash Fusing

Exploring Hash Fusing via Upper Triangular Matrix Multiplication.
Author
Published

January 12, 2026

Keywords

data, hash, sha256, matrix, fusing, merkle-tree

This is an article about fusing hashes together while having the key properties of associativity and non-commutativity. I describe the basic approach of using matrix multiplication of Upper Triangle Matrices and then show the results of two experiments showing the quality and limits of this hash fusing approach.

The basic question this article tries to answer is: can fusing hashes together provide unique identifiers for arbitrary data? The answer is no. Specifically, the approach in this article can not provide unique identifiers for sufficiently large runs of repeated values. That is, sufficiently low entropy data is not uniquely represented by fusing hashes together.

The good news is that low entropy data may always be converted into high entropy data either via compression or injecting noise. A practical fusing operator can easily detect fuses of low entropy data and throw an exception when fusing produces those ‘bad’ hash values.

Introduction

I have begun working on implementing distributed immutable data structures as a way to efficiently share structured data between multiple clients. One of the key challenges in this area is being able to efficiently reference ordered collections of data. The usual approach is to use Merkle Trees which have the limitation that they are non-associative and the shape of the tree determines the final hash value at the root of the tree. But, since I plan on using Finger Trees to efficiently represent ordered collections of data, I need computed hashes that are insensitive to the tree shape. This means that I need to be able to fuse hashes together in a way that is associative and also non-commutative. As a starting point, I have been reading the HP paper describing hash fusing via matrix multiplication: https://www.labs.hpe.com/techreports/2017/HPE-2017-08.pdf

Hash Fusing via Upper Triangular Matrix Multiplication

The basic approach of fusing two hashes is to represent each hash as an upper triangular matrix and then to multiply the two matrices together. Since matrix multiplication is associative but not commutative. The following sections define hashes and build up the necessary functions to perform hash fusing via upper triangular matrix multiplication.

Hash Representation

A hash is represented as a string of 64 hexadecimal values (256 bits). To start with, here is a zero hash and a function to generate random hashes.

(def zero-hex (apply str (vec (repeat 64 0))))
"0000000000000000000000000000000000000000000000000000000000000000"
(defn random-hex
  "Generate a random hex string representing length `n` bytes."
  [n]
  (let [hex-chars "0123456789abcdef"]
    (apply str (repeatedly (* 2 n) #(rand-nth hex-chars)))))

Generate some random hashes for testing:

(def a-hex (random-hex 32))
"9797493849aa264bee226ed62dbaf9516dbb47e34a5639d96e20dd1dc26437b6"
(def b-hex (random-hex 32))
"f92b6017a947c02334112b6790b31f74fd33174d2e41037ea65aefede3ed012e"
(def c-hex (random-hex 32))
"98d4ae56cb3a929ad367c2cca3a9264f21d1d4edde552ea5adaf4afabf7ddff8"

To fuse two hashes, we convert each hash to an upper triangular matrix and then multiply the two matrices together. The result is another upper triangular matrix which can be converted back to a hash by taking the elements above the main diagonal. The following several sections defines this mapping between hashes and upper triangular matrices. For the experiments below four different bit sizes of cells and four corresponding matrices are defined.

8 bit cells in a 9x9 matrix:

\[%% Example: 9x9 Upper Triangular Matrix for 8 bit cells %% representing a 256 bit hash hash \to [ h_0, h_1, h_2, h_3, \ldots, h_{31} ] \to {\begin{bmatrix} 1 & h_0 & h_8 & h_{15} & h_{21} & h_{26} & h_{29} & h_{31} & h_{25} \\ 0 & 1 & h_1 & h_9 & h_{16} & h_{22} & h_{27} & h_{30} & h_{20} \\ 0 & 0 & 1 & h_2 & h_{10} & h_{17} & h_{23} & h_{28} & h_{14} \\ 0 & 0 & 0 & 1 & h_3 & h_{11} & h_{18} & h_{24} & h_7 \\ 0 & 0 & 0 & 0 & 1 & h_4 & h_{12} & h_{19} & 0 \\ 0 & 0 & 0 & 0 & 0 & 1 & h_5 & h_{13} & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 1 & h_6 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 \end{bmatrix}} \]

16 bit cells in a 7x7 matrix:

\[%% Example: 7x7 Upper Triangular Matrix for 16 bit cells %% representing a 256 bit hash hash \to [ h_0, h_1, h_2, h_3, \ldots, h_{15} ] \to {\begin{bmatrix} 1 & h_0 & h_6 & h_{10} & h_{13} & h_{15} & h_5 \\ 0 & 1 & h_1 & h_7 & h_{11} & h_{14} & 0 \\ 0 & 0 & 1 & h_2 & h_8 & h_{12} & 0 \\ 0 & 0 & 0 & 1 & h_3 & h_9 & 0 \\ 0 & 0 & 0 & 0 & 1 & h_4 & 0 \\ 0 & 0 & 0 & 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 1 \end{bmatrix}} \]

32 bit cells in a 5x5 matrix:

\[%% Example: 5x5 Upper Triangular Matrix for 32 bit cells %% representing a 256 bit hash hash \to [ h_0, h_1, h_2, h_3, \ldots, h_7 ] \to {\begin{bmatrix} 1 & h_0 & h_4 & h_7 & h_6 \\ 0 & 1 & h_1 & h_5 & h_3 \\ 0 & 0 & 1 & h_2 & 0 \\ 0 & 0 & 0 & 1 & 0 \end{bmatrix}} \]

64 bit cells in a 4x4 matrix:

\[%% Example: 4x4 Upper Triangular Matrix for 64 bit cells %% representing a 256 bit hash hash \to [ h_0, h_1, h_2, h_3 ] \to {\begin{bmatrix} 1 & h_0 & h_3 & h_2 \\ 0 & 1 & h_1 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \end{bmatrix}} \]

Hex and Byte Vector Conversion Functions

The first type of conversion is between hex strings and byte vectors.

(defn hex->hash8
  "Convert a hex string to a byte vector."
  [s]
  (with-meta
    (->> s
         (partition 2)
         (map #(-> (apply str %)
                   (Integer/parseInt 16)))
         (vec))
    {:cell-size 8}))
(defn hash8->hex
  "Convert a byte vector to a hex string."
  [bytes]
  (->> bytes
       (map #(format "%02x" %))
       (apply str)))
(= a-hex (-> a-hex hex->hash8 hash8->hex))
true

Hash Conversion Functions

The next type of conversion is from bytes and larger word sizes.

(defn hash8->hash16
  "Convert a vector of 32 bytes into a vector of 16 2-byte words."
  [hash8]
  (with-meta
    (vec (map (fn [i]
                (+ (bit-shift-left (hash8 (* 2 i)) 8)
                   (hash8 (+ 1 (* 2 i)))))
              (range 16)))
    {:cell-size 16}))
(defn hash16->hash8
  "Convert a vector 2-byte words into a vector of bytes."
  [hash16]
  (with-meta
    (vec (mapcat (fn [word]
                   [(bit-and (bit-shift-right word 8) 0xFF)
                    (bit-and word 0xFF)])
                 hash16))
    {:cell-size 8}))
(= a-hex (-> a-hex hex->hash8 hash8->hash16 hash16->hash8 hash8->hex))
true
(defn hash8->hash32
  "Convert a vector of 32 bytes into a vector of 8 4-byte words."
  [hash8]
  (with-meta
    (vec (map (fn [i]
                (+ (bit-shift-left (hash8 (* 4 i)) 24)
                   (bit-shift-left (hash8 (+ 1 (* 4 i))) 16)
                   (bit-shift-left (hash8 (+ 2 (* 4 i))) 8)
                   (hash8 (+ 3 (* 4 i)))))
              (range 8)))
    {:cell-size 32}))
(defn hash32->hash8
  "Convert a vector of 4-byte words into a vector of bytes."
  [hash32]
  (with-meta
    (vec (mapcat (fn [word]
                   [(bit-and (bit-shift-right word 24) 0xFF)
                    (bit-and (bit-shift-right word 16) 0xFF)
                    (bit-and (bit-shift-right word 8) 0xFF)
                    (bit-and word 0xFF)])
                 hash32))
    {:cell-size 8}))
(= a-hex (-> a-hex hex->hash8 hash8->hash32 hash32->hash8 hash8->hex))
true
(defn hash8->hash64
  "Convert a vector of 32 bytes into a vector of 4 8-byte words."
  [hash8]
  (with-meta
    (vec (map (fn [i]
                (+ (bit-shift-left (hash8 (* 8 i)) 56)
                   (bit-shift-left (hash8 (+ 1 (* 8 i))) 48)
                   (bit-shift-left (hash8 (+ 2 (* 8 i))) 40)
                   (bit-shift-left (hash8 (+ 3 (* 8 i))) 32)
                   (bit-shift-left (hash8 (+ 4 (* 8 i))) 24)
                   (bit-shift-left (hash8 (+ 5 (* 8 i))) 16)
                   (bit-shift-left (hash8 (+ 6 (* 8 i))) 8)
                   (hash8 (+ 7 (* 8 i)))))
              (range 4)))
    {:cell-size 64}))
(defn hash64->hash8
  "Convert a vector of 8-byte words into a vector of bytes."
  [hash64]
  (with-meta
    (vec (mapcat (fn [word]
                   [(bit-and (bit-shift-right word 56) 0xFF)
                    (bit-and (bit-shift-right word 48) 0xFF)
                    (bit-and (bit-shift-right word 40) 0xFF)
                    (bit-and (bit-shift-right word 32) 0xFF)
                    (bit-and (bit-shift-right word 24) 0xFF)
                    (bit-and (bit-shift-right word 16) 0xFF)
                    (bit-and (bit-shift-right word 8) 0xFF)
                    (bit-and word 0xFF)])
                 hash64))
    {:cell-size 8}))
(= a-hex (-> a-hex hex->hash8 hash8->hash64 hash64->hash8 hash8->hex))
true

Matrix Conversion Functions

The following functions convert between byte vectors representing hashes and four different upper triangular matrix sizes based on cell bit size.

(defn hash8->utm8
  "Convert a vector of 32 bytes into a 9x9 upper triangular matrix."
  [[h00 h01 h02 h03 h04 h05 h06 h07
    h08 h09 h10 h11 h12 h13 h14 h15
    h16 h17 h18 h19 h20 h21 h22 h23
    h24 h25 h26 h27 h28 h29 h30 h31]]
  (with-meta
    [[1 h00 h08 h15 h21 h26 h29 h31 h25]
     [0   1 h01 h09 h16 h22 h27 h30 h20]
     [0   0   1 h02 h10 h17 h23 h28 h14]
     [0   0   0   1 h03 h11 h18 h24 h07]
     [0   0   0   0   1 h04 h12 h19   0]
     [0   0   0   0   0   1 h05 h13   0]
     [0   0   0   0   0   0   1 h06   0]
     [0   0   0   0   0   0   0   1   0]
     [0   0   0   0   0   0   0   0   1]]
    {:cell-size 8}))
(defn utm8->hash8
  "Convert a 9x9 upper triangular matrix back to a vector of 32 bytes."
  [[[_ h00 h08 h15 h21 h26 h29 h31 h25]
    [_   _ h01 h09 h16 h22 h27 h30 h20]
    [_   _   _ h02 h10 h17 h23 h28 h14]
    [_   _   _   _ h03 h11 h18 h24 h07]
    [_   _   _   _   _ h04 h12 h19   _]
    [_   _   _   _   _   _ h05 h13   _]
    [_   _   _   _   _   _   _ h06   _]
    [_   _   _   _   _   _   _   _   _]
    [_   _   _   _   _   _   _   _   _]]]
  (with-meta
    (mapv #(bit-and % 0xFF)
          [h00 h01 h02 h03 h04 h05 h06 h07
           h08 h09 h10 h11 h12 h13 h14 h15
           h16 h17 h18 h19 h20 h21 h22 h23
           h24 h25 h26 h27 h28 h29 h30 h31])
    {:cell-size 8}))
(defn hash16->utm16
  "Convert a vector of 16 2-byte words into a 7x7 upper triangular matrix."
  [[h00 h01 h02 h03 h04 h05 h06 h07
    h08 h09 h10 h11 h12 h13 h14 h15]]
  (with-meta
    [[1 h00 h06 h10 h13 h15 h05]
     [0   1 h01 h07 h11 h14   0]
     [0   0   1 h02 h08 h12   0]
     [0   0   0   1 h03 h09   0]
     [0   0   0   0   1 h04   0]
     [0   0   0   0   0   1   0]
     [0   0   0   0   0   0   1]]
    {:cell-size 16}))
(defn utm16->hash16
  "Convert a 7x7 upper triangular matrix back to a vector of 16 2-byte words."
  [[[_ h00 h06 h10 h13 h15 h05]
    [_   _ h01 h07 h11 h14   _]
    [_   _   _ h02 h08 h12   _]
    [_   _   _   _ h03 h09   _]
    [_   _   _   _   _ h04   _]
    [_   _   _   _   _   _   _]
    [_   _   _   _   _   _   _]]]
  (with-meta
    (mapv #(bit-and % 0xFFFF)
          [h00 h01 h02 h03 h04 h05 h06 h07
           h08 h09 h10 h11 h12 h13 h14 h15])
    {:cell-size 16}))
(defn hash32->utm32
  "Convert a vector of 8 4-byte words into a 5x5 upper triangular matrix."
  [[h00 h01 h02 h03 h04 h05 h06 h07]]
  (with-meta
    [[1 h00 h04 h07 h06]
     [0   1 h01 h05 h03]
     [0   0   1 h02   0]
     [0   0   0   1   0]
     [0   0   0   0   1]]
    {:cell-size 32}))
(defn utm32->hash32
  "Convert a 5x5 upper triangular matrix back to a vector of 8 4-byte words."
  [[[_ h00 h04 h07 h06]
    [_   _ h01 h05 h03]
    [_   _   _ h02   _]
    [_   _   _   _   _]
    [_   _   _   _   _]]]
  (with-meta
    (mapv #(bit-and % 0xFFFFFFFF)
          [h00 h01 h02 h03 h04 h05 h06 h07])
    {:cell-size 32}))
(defn hash64->utm64
  "Convert a vector of 4 8-byte words into a 4x4 upper triangular matrix."
  [[h00 h01 h02 h03]]
  (with-meta
    [[1 h00 h03 h02]
     [0   1 h01   0]
     [0   0   1   0]
     [0   0   0   1]]
    {:cell-size 64}))
(defn utm64->hash64
  "Convert a 4x4 upper triangular matrix back to a vector of 4 8-byte words."
  [[[_ h00 h03 h02]
    [_   _ h01   _]
    [_   _   _   _]
    [_   _   _   _]]]
  (with-meta
    [h00 h01 h02 h03]
    {:cell-size 64}))

Combined Conversion Functions

The following combined conversion functions convert between hex strings into upper triangular matrices and back for the four different cell bit sizes.

(def hex->utm8
  "Convert a hex string to an upper triangular matrix with 8-bit cells."
  (comp hash8->utm8 hex->hash8))
(def utm8->hex
  "Convert an upper triangular matrix with 8-bit cells to a hex string."
  (comp hash8->hex utm8->hash8))
(= a-hex (-> a-hex hex->utm8 utm8->hex))
true
(def hex->utm16
  "Convert a hex string to an upper triangular matrix with 16-bit cells."
  (comp hash16->utm16 hash8->hash16 hex->hash8))
(def utm16->hex
  "Convert an upper triangular matrix with 16-bit cells to a hex string."
  (comp hash8->hex hash16->hash8 utm16->hash16))
(= a-hex (-> a-hex hex->utm16 utm16->hex))
true
(def hex->utm32
  "Convert a hex string to an upper triangular matrix with 32-bit cells."
  (comp hash32->utm32 hash8->hash32 hex->hash8))
(def utm32->hex
  "Convert an upper triangular matrix with 32-bit cells to a hex string."
  (comp hash8->hex hash32->hash8 utm32->hash32))
(= a-hex (-> a-hex hex->utm32 utm32->hex))
true
(def hex->utm64
  "Convert a hex string to an upper triangular matrix with 64-bit cells."
  (comp hash64->utm64 hash8->hash64 hex->hash8))
(def utm64->hex
  "Convert an upper triangular matrix with 64-bit cells to a hex string."
  (comp hash8->hex hash64->hash8 utm64->hash64))
(= a-hex (-> a-hex hex->utm64 utm64->hex))
true
(defn hex->utm
  "Convert a hex string to an upper triangular matrix with the given cell size
  (8, 16, 32, or 64 bits)."
  [hex cell-size]
  (case cell-size
    8  (hex->utm8 hex)
    16 (hex->utm16 hex)
    32 (hex->utm32 hex)
    64 (hex->utm64 hex)
    (throw (ex-info "Unsupported cell size for upper triangular matrix."
                    {:cell-size cell-size}))))
(defn utm->hex
  "Convert an upper triangular matrix with the given cell size (8, 16, 32,
  or 64 bits) to a hex string."
  [utm cell-size]
  (case cell-size
    8  (utm8->hex utm)
    16 (utm16->hex utm)
    32 (utm32->hex utm)
    64 (utm64->hex utm)
    (throw (ex-info "Unsupported cell size for upper triangular matrix."
                    {:cell-size cell-size}))))
(= a-hex (-> a-hex (hex->utm 32) (utm->hex 32)))
true

Upper Triangular Matrix Multiplication

This is the core function that performs the multiplication of two upper triangular matrices. The multiplication ignores the lower triangular part of of the matrices since they are always 0 (and always 1 on the main diagonal).

Note that unchecked math is enabled to ignore integer overflow since the cells are treated as fixed size bit fields.

(set! *unchecked-math* true)
true
(defn utm-multiply
  "Multiply two upper triangular matrices `a` and `b`."
  [a b]
  (let [dim (count a)
        cell-size (-> a meta :cell-size)
        bit-mask (if (= cell-size 64)
                   -1
                   (dec (bit-shift-left 1 cell-size)))]
    (with-meta
      (vec (for [i (range dim)]
             (vec (for [j (range dim)]
                    (cond
                      (< j i) 0
                      (= j i) 1
                      :else
                      (reduce (fn [sum k]
                                (-> sum
                                    (+ (* (get-in a [i k])
                                          (get-in b [k j])))
                                    (bit-and bit-mask)))
                              0
                              (range i (inc j))))))))
      {:cell-size cell-size})))

Associativity & Non-Commutativity Properties

Show that upper triangular matrix multiplication is associative and non-commutative. Associativity is necessary for hash fusing to work with Finger Trees so that different tree shapes produce the same fused hash. Non-commutativity is necessary for sequences of data where the order of data affects the fused hash.

(-> (for [cell-size [8 16 32 64]]
      (let [a (hex->utm a-hex cell-size)
            b (hex->utm b-hex cell-size)
            c (hex->utm c-hex cell-size)
            ab (utm-multiply a b)
            ba (utm-multiply b a)
            bc (utm-multiply b c)
            ab*c (utm-multiply ab c)
            a*bc (utm-multiply a bc)]
        {:cell-size cell-size
         :associative? (= ab*c a*bc)
         :commutative? (= ab ba)}))
    (kind/table))
cell-size associative? commutative?
8 true false
16 true false
32 true false
64 true false

Experiment 1: Random Fuses

This experiment is run with two different hashes which are fused onto an accumulator iteratively. For each iteration, one of the two hashes is randomly selected and is fused to the accumulator. This random fusing is repeated many times and the quality of the accumulator is measured both by keeping track of global uniqueness after each fuse and by the uniform distribution of bit values.

(defn random-fuses
  "Perform random fuses of two hashes onto an accumulator keeping track of all
  unique hashes produced and all duplicate hashes produced. Repeat this for all
  four utm sizes based on cell bit size."
  []
  (let [hashes {:a (random-hex 32)
                :b (random-hex 32)}
        ;; convert hashes to utms for each bit size and store in a map
        utms {8  (update-vals hashes hex->utm8)
              16 (update-vals hashes hex->utm16)
              32 (update-vals hashes hex->utm32)
              64 (update-vals hashes hex->utm64)}
        results {8  {:acc (hex->utm8 zero-hex)
                     :uniques   #{}
                     :duplicates #{}}
                 16 {:acc (hex->utm16 zero-hex)
                     :uniques   #{}
                     :duplicates #{}}
                 32 {:acc (hex->utm32 zero-hex)
                     :uniques   #{}
                     :duplicates #{}}
                 64 {:acc (hex->utm64 zero-hex)
                     :uniques   #{}
                     :duplicates #{}}}
        results (reduce
                 (fn [results _]
                   (reduce
                    (fn [results cell-size]
                      (let [curr-acc (get-in results [cell-size :acc])
                            ;; randomly select one of the two hashes
                            selected-hash (if (< (rand) 0.5)
                                            (get-in utms [cell-size :a])
                                            (get-in utms [cell-size :b]))
                            ;; fuse the selected hash onto the accumulator
                            new-acc (utm-multiply curr-acc selected-hash)]
                        ;; update results with new accumulator and uniqueness info
                        (if (contains? (get-in results [cell-size :uniques]) new-acc)
                          (update-in results [cell-size :duplicates] conj new-acc)
                          (-> results
                              (assoc-in [cell-size :acc] new-acc)
                              (update-in [cell-size :uniques] conj new-acc)))))
                    results
                    [8 16 32 64]))
                 results
                 (range 10000))]
    ;; convert final results to totals of unique and duplicate hashes
    (->> results
         (map (fn [[cell-size {:keys [uniques duplicates]}]]
                {:cell-size   cell-size
                 :uniques     (count uniques)
                 :duplicates (count duplicates)}))
         (kind/table))))

Random Fuses Results

(random-fuses)
cell-size uniques duplicates
8 10000 0
16 10000 0
32 10000 0
64 10000 0

These results show that high entropy data can be well represented by fusing hashes together. All four bit sizes for the cells of the upper triangular matrices performed well with high entroy data. I have rerun this experiment with millions of fuses and have never observed a duplicate hash being produced. Since 64 bit cells fit in the smallest matrix, this is the fasted to compute.

Experiment 2: Folded Fuses

This experiment is run with a hash fused together with itself (i.e. folding) and then the result in turn fused with itself and so on. This folding is repeated many times and the quality of the accumulator is measured both by keeping track of global uniqueness after each fuse and by the uniform distribution of bit values. Once the accumulator becomes zero the folding stops and the number of folds to reach this state is recorded. Also, the number of lower bits that are zero across all cells is recorded after each fold.

(defn calc-zero-lower-bits
  "Calculate the number of lower bits that are zero across all upper cells in
  the upper triangular matrix."
  [utm]
  (let [cell-size (-> utm meta :cell-size)]
    (loop [mask 1 zero-lower-bits 0]
      (if (and (< zero-lower-bits cell-size)
               (->> (for [i (range (count utm))
                          j (range (inc i) (count utm))]
                      (get-in utm [i j]))
                    (map #(bit-and mask %))
                    (every? zero?)))
        (recur (bit-shift-left mask 1)
               (inc zero-lower-bits))
        zero-lower-bits))))
(defn folded-fuses
  "Perform folded fuses starting from the fixed hash `a-hex`, converted to an
  upper triangular matrix (UTM) for each cell size. After each fold, track how
  many lower bits of each cell are zero across all upper cells using
  `calc-zero-lower-bits`. Stop when all those lower bits are zero (i.e.,
  `zero-lower-bits` equals the cell size) or when 1000 folds have been
  performed. Report, for each cell size and fold, the fold count, the number
  of zero lower bits, and the resulting hash."
  []
  (let [;; convert hashes to utms for each bit size and store in a map
        results (reduce
                 (fn [results cell-size]
                   (let [fold (hex->utm a-hex cell-size)
                         zero-lower-bits (calc-zero-lower-bits fold)
                         fold-result
                         (loop [folds [{:fold fold
                                        :zero-lower-bits zero-lower-bits}]]
                           (let [{:keys [fold zero-lower-bits]} (last folds)]
                             (if (or (= zero-lower-bits cell-size)
                                     (>= (count folds) 1000))
                               folds
                               (let [fold (utm-multiply fold fold)
                                     zero-lower-bits (calc-zero-lower-bits fold)]
                                 (recur (conj folds
                                              {:fold fold
                                               :zero-lower-bits zero-lower-bits}))))))]
                     (assoc results cell-size fold-result)))
                 {}
                 [8 16 32 64])]
    ;; format results into a table
    (->> results
         (mapcat (fn [[cell-size folds]]
                   (map-indexed (fn [index {:keys [fold zero-lower-bits]}]
                                  {:cell-size      cell-size
                                   :fold-count     index
                                   :zero-lower-bits zero-lower-bits
                                   :fold (-> [:pre (utm->hex fold cell-size)]
                                             (kind/hiccup))})

                                folds)))
         (kind/table))))

Folded Fuses Results

(folded-fuses)
cell-size fold-count zero-lower-bits fold
8 0 0
9797493849aa264bee226ed62dbaf9516dbb47e34a5639d96e20dd1dc26437b6
8 1 0
2e2e927092544c96ed53d4a4d4b0558e2cda827e69f340758a1f6faa80f8d7b3
8 2 0
5c5c24e024a8982c1ee2882890503630c024944cba32f0268c01bef0a8918c9e
8 3 1
b8b848c0485030584cb490d0c0609cd020086858b454203cf86e7cb0306690cc
8 4 2
7070908090a060b0d82820a000c0f860c010d0b068684038700cf8a0e05c0058
8 5 3
e0e020002040c060b05040400080f0c08020a060d0d08070e0d8f040c0f880b0
8 6 4
c0c04000408080c060a080800000e080004040c0a0a000e0c0b0e08080f00060
8 7 5
8080800080000080c04000000000c00000808080404000c08060c00000e000c0
8 8 6
00000000000000008080000000008000000000008080008000c0800000c00080
8 9 7
0000000000000000000000000000000000000000000000000080000000800000
8 10 8
0000000000000000000000000000000000000000000000000000000000000000
16 0 0
9797493849aa264bee226ed62dbaf9516dbb47e34a5639d96e20dd1dc26437b6
16 1 0
2f2e927093544c96dc44ddac937c89d2ac445fbcd1f7d855c0d4fc49b26dcfc4
16 2 1
5e5c24e026a8992cb888bb58071870649bc0ff508e5aef7611684a8250666e96
16 3 2
bcb849c04d503258711076b08eb053c84460fe0070643c1c4d5012e43dbcd6c4
16 4 3
797093809aa064b0e220ed601f607390bc40f9807f88fcf8a0a05648d538ade8
16 5 4
f2e027003540c960c440dac046c017204680e900fa108cf03940b69041701d50
16 6 5
e5c04e006a8092c08880b580ad80ee40c500aa00e02065e05280d5205ee040a0
16 7 6
cb809c00d500258011006b00db00dc806a00b4007040fbc025004a402dc09940
16 8 7
97003800aa004b002200d600b600b9005400e800a080b7804a0014801b809280
16 9 8
2e007000540096004400ac006c007200a800d00041006f00940029003700a500
16 10 9
5c00e000a8002c0088005800d800e4005000a0008200de00280052006e004a00
16 11 10
b800c000500058001000b000b000c800a00040000400bc005000a400dc009400
16 12 11
70008000a000b00020006000600090004000800008007800a0004800b8002800
16 13 12
e0000000400060004000c000c0002000800000001000f0004000900070005000
16 14 13
c00000008000c0008000800080004000000000002000e00080002000e000a000
16 15 14
80000000000080000000000000008000000000004000c00000004000c0004000
16 16 15
0000000000000000000000000000000000000000800080000000800080008000
16 17 16
0000000000000000000000000000000000000000000000000000000000000000
32 0 0
9797493849aa264bee226ed62dbaf9516dbb47e34a5639d96e20dd1dc26437b6
32 1 1
2f2e927093544c96dc44ddac5b75f2a250d7532e1552b06404075cf245ba99a6
32 2 2
5e5d24e026a8992cb889bb58b6ebe5447731b3fc2d3e5390a72544c4e9e5fff4
32 3 3
bcba49c04d513258711376b06dd7ca88446f9e7864e07240caa4b50822d05088
32 4 4
797493809aa264b0e226ed60dbaf9510e11016f0f350110086b218102bbad390
32 5 5
f2e927003544c960c44ddac0b75f2a2022e395e08cdcd400d306e8204025f120
32 6 6
e5d24e006a8992c0889bb5806ebe5440c8d4cbc0b2ac7000bc98b040a5490a40
32 7 7
cba49c00d513258011376b00dd7ca8809de01780c9240000d35ce080f066b480
32 8 8
97493800aa264b00226ed600baf951006c9a2f00217480000f67c100071fe900
32 9 9
2e927000544c960044ddac0075f2a2009c9c5e007f9b0000c18782001f89d200
32 10 10
5d24e000a8992c0089bb5800ebe5440046d8bc00f1fe00000def0400443ba400
32 11 11
ba49c000513258001376b000d7ca8800c4317800af1c0000475e08009d174800
32 12 12
74938000a264b00026ed6000af9510006262f0008ab800003cbc10008cae9000
32 13 13
e927000044c960004ddac0005f2a20002cc5e000c770000031782000635d2000
32 14 14
d24e00008992c0009bb58000be544000f98bc00056e0000042f04000eeba4000
32 15 15
a49c000013258000376b00007ca8800073178000cdc0000005e080007d748000
32 16 16
49380000264b00006ed60000f9510000e62f00001b8000000bc100007ae90000
32 17 17
927000004c960000ddac0000f2a20000cc5e00003700000017820000f5d20000
32 18 18
24e00000992c0000bb580000e544000098bc00006e0000002f040000eba40000
32 19 19
49c000003258000076b00000ca88000031780000dc0000005e080000d7480000
32 20 20
9380000064b00000ed6000009510000062f00000b8000000bc100000ae900000
32 21 21
27000000c9600000dac000002a200000c5e0000070000000782000005d200000
32 22 22
4e00000092c00000b5800000544000008bc00000e0000000f0400000ba400000
32 23 23
9c000000258000006b000000a880000017800000c0000000e080000074800000
32 24 24
380000004b000000d6000000510000002f00000080000000c1000000e9000000
32 25 25
7000000096000000ac000000a20000005e0000000000000082000000d2000000
32 26 26
e00000002c0000005800000044000000bc0000000000000004000000a4000000
32 27 27
c000000058000000b00000008800000078000000000000000800000048000000
32 28 28
80000000b00000006000000010000000f0000000000000001000000090000000
32 29 29
0000000060000000c000000020000000e0000000000000002000000020000000
32 30 30
00000000c00000008000000040000000c0000000000000004000000040000000
32 31 31
0000000080000000000000008000000080000000000000008000000080000000
32 32 32
0000000000000000000000000000000000000000000000000000000000000000
64 0 0
9797493849aa264bee226ed62dbaf9516dbb47e34a5639d96e20dd1dc26437b6
64 1 0
2f2e927093544c96dc44ddac5b75f2a2db768fc694ac73b291d64fbe545b8027
64 2 1
5e5d24e126a8992cb889bb58b6ebe544b6ed1f8d2958e764f9fef587e703433a
64 3 2
bcba49c24d513258711376b16dd7ca886dda3f1a52b1cec84d47433cc7379224
64 4 3
797493849aa264b0e226ed62dbaf9510dbb47e34a5639d90ffb3e72d73335308
64 5 4
f2e927093544c960c44ddac5b75f2a20b768fc694ac73b2093fd512a79776110
64 6 5
e5d24e126a8992c0889bb58b6ebe54406ed1f8d2958e76407a50ad933f31ae20
64 7 6
cba49c24d513258011376b16dd7ca880dda3f1a52b1cec803df9881faf6f0c40
64 8 7
97493849aa264b00226ed62dbaf95100bb47e34a5639d900a153c424230cd880
64 9 8
2e927093544c960044ddac5b75f2a200768fc694ac73b200d82a57db56d4b100
64 10 9
5d24e126a8992c0089bb58b6ebe54400ed1f8d2958e76400065fee02f0956200
64 11 10
ba49c24d513258001376b16dd7ca8800da3f1a52b1cec80064ecd536ecdac400
64 12 11
7493849aa264b00026ed62dbaf951000b47e34a5639d90002a8d8f3208758800
64 13 12
e927093544c960004ddac5b75f2a200068fc694ac73b2000d7eab174cbeb1000
64 14 13
d24e126a8992c0009bb58b6ebe544000d1f8d2958e764000bb13af2c83d62000
64 15 14
a49c24d513258000376b16dd7ca88000a3f1a52b1cec8000a3208f64b7ac4000
64 16 15
493849aa264b00006ed62dbaf951000047e34a5639d90000fa25e2f82f588000
64 17 16
927093544c960000ddac5b75f2a200008fc694ac73b20000c3ded6ab5eb10000
64 18 17
24e126a8992c0000bb58b6ebe54400001f8d2958e7640000c609f042bd620000
64 19 18
49c24d513258000076b16dd7ca8800003f1a52b1cec800008544ec357ac40000
64 20 19
93849aa264b00000ed62dbaf951000007e34a5639d900000ef4e072af5880000
64 21 20
27093544c9600000dac5b75f2a200000fc694ac73b20000071acc955eb100000
64 22 21
4e126a8992c00000b58b6ebe54400000f8d2958e764000002f9c7eabd6200000
64 23 22
9c24d513258000006b16dd7ca8800000f1a52b1cec8000009044ad57ac400000
64 24 23
3849aa264b000000d62dbaf951000000e34a5639d9000000e4b81aaf58800000
64 25 24
7093544c96000000ac5b75f2a2000000c694ac73b2000000da2b355eb1000000
64 26 25
e126a8992c00000058b6ebe5440000008d2958e764000000f7426abd62000000
64 27 26
c24d513258000000b16dd7ca880000001a52b1cec8000000fa34d57ac4000000
64 28 27
849aa264b000000062dbaf951000000034a5639d900000002329aaf588000000
64 29 28
093544c960000000c5b75f2a20000000694ac73b20000000015355eb10000000
64 30 29
126a8992c00000008b6ebe5440000000d2958e7640000000eea6abd620000000
64 31 30
24d513258000000016dd7ca880000000a52b1cec800000008d4d57ac40000000
64 32 31
49aa264b000000002dbaf951000000004a5639d900000000da9aaf5880000000
64 33 32
93544c96000000005b75f2a20000000094ac73b200000000b5355eb100000000
64 34 33
26a8992c00000000b6ebe544000000002958e764000000006a6abd6200000000
64 35 34
4d513258000000006dd7ca880000000052b1cec800000000d4d57ac400000000
64 36 35
9aa264b000000000dbaf951000000000a5639d9000000000a9aaf58800000000
64 37 36
3544c96000000000b75f2a20000000004ac73b20000000005355eb1000000000
64 38 37
6a8992c0000000006ebe544000000000958e764000000000a6abd62000000000
64 39 38
d513258000000000dd7ca880000000002b1cec80000000004d57ac4000000000
64 40 39
aa264b0000000000baf95100000000005639d900000000009aaf588000000000
64 41 40
544c96000000000075f2a20000000000ac73b20000000000355eb10000000000
64 42 41
a8992c0000000000ebe544000000000058e76400000000006abd620000000000
64 43 42
5132580000000000d7ca880000000000b1cec80000000000d57ac40000000000
64 44 43
a264b00000000000af95100000000000639d900000000000aaf5880000000000
64 45 44
44c96000000000005f2a200000000000c73b20000000000055eb100000000000
64 46 45
8992c00000000000be544000000000008e76400000000000abd6200000000000
64 47 46
13258000000000007ca88000000000001cec80000000000057ac400000000000
64 48 47
264b000000000000f95100000000000039d9000000000000af58800000000000
64 49 48
4c96000000000000f2a200000000000073b20000000000005eb1000000000000
64 50 49
992c000000000000e544000000000000e764000000000000bd62000000000000
64 51 50
3258000000000000ca88000000000000cec80000000000007ac4000000000000
64 52 51
64b000000000000095100000000000009d90000000000000f588000000000000
64 53 52
c9600000000000002a200000000000003b20000000000000eb10000000000000
64 54 53
92c000000000000054400000000000007640000000000000d620000000000000
64 55 54
2580000000000000a880000000000000ec80000000000000ac40000000000000
64 56 55
4b000000000000005100000000000000d9000000000000005880000000000000
64 57 56
9600000000000000a200000000000000b200000000000000b100000000000000
64 58 57
2c00000000000000440000000000000064000000000000006200000000000000
64 59 58
58000000000000008800000000000000c800000000000000c400000000000000
64 60 59
b000000000000000100000000000000090000000000000008800000000000000
64 61 60
6000000000000000200000000000000020000000000000001000000000000000
64 62 61
c000000000000000400000000000000040000000000000002000000000000000
64 63 62
8000000000000000800000000000000080000000000000004000000000000000
64 64 63
0000000000000000000000000000000000000000000000008000000000000000
64 65 64
0000000000000000000000000000000000000000000000000000000000000000

These results show that repeated folding devolves into a zero hash after only a few folds. The number of bits in a cell nearly exactly defines the number of folds needed before reaching a zero hash. Since each fold represents a doubling of the number of fuses of the same value, folding shows that fusing can not reliably represent long runs of repeated values.

Practical Usage

Based on the experiments above, here are recommendations for using hash fusing in practice.

Detecting Low-Entropy Failures

The key insight from Experiment 2 is that low-entropy data causes the lower bits to become zero progressively. We can detect this condition and fail fast rather than silently producing degenerate hashes.

The following function fuses two hashes and raises an error if the result shows signs of low-entropy degeneration (lower 32 bits all zero):

(defn high-entropy-fuse
  "Fuse two 256-bit hashes together via upper triangular matrix multiplication
  with 64-bit cells. Raise an error when the lower 32 bits of the result are
  all zero."
  [a-hex b-hex]
  (let [a (hex->utm64 a-hex)
        b (hex->utm64 b-hex)
        ab (utm-multiply a b)]
    (if (->> (for [i (range (count a))
                   j (range (inc i) (count a))]
               (get-in ab [i j]))
             (map #(bit-and 0xFFFFFFFF %))
             (every? zero?))
      (throw (ex-info "Low entropy data detected during hash fusing."
                      {:a a-hex :b b-hex :fused (utm64->hex ab)}))
      (utm64->hex ab))))

Example of high entropy data fusing successfully:

(high-entropy-fuse a-hex b-hex)
"90c2a94ff2f1e66e22339a3dbe6e18c56aee5f3078973d57f1910428987ea7e0"

Example of low entropy data causing an error:

(try
  (high-entropy-fuse zero-hex zero-hex)
  (catch Exception e
         (.getMessage e)))
"Low entropy data detected during hash fusing."

Handling Low-Entropy Data

When working with data that may contain long runs of repeated values, consider these strategies:

  1. Compression: Run-length encode or compress the data before hashing. This converts low-entropy patterns into high-entropy representations.

  2. Salting: Inject position-dependent noise into the hash sequence. For example, XOR each element’s hash with a hash of its index.

  3. Chunking: Break long sequences into smaller chunks and hash each chunk separately before fusing. This limits the damage from repetition.

Conclusion

This article described a method for fusing hashes together using upper triangular matrix multiplication. The method is associative and non-commutative.

Experiment 1 showed that high entropy data can be well represented by fusing hashes together and that the size of cell fields, from 8 to 64 bits, all performed well with high entropy data. Since 64 bit cells fit in the smallest matrix, this is the fastest to compute.

Experiment 2 showed that low entropy data, specifically long runs of repeated values, can not be well represented by fusing hashes together since they all approach a zero hash. The rate of approach is determined by the bit count in the cells. 64 bit cells were the most able to handle repeated values.

The Practical Usage section provides a working implementation with low-entropy detection, along with strategies for handling problematic data.

Appendix: Why Low Entropy Data Fails — Nilpotent Group Structure

The following explanation, based on interactions with a Claude-based AI assistant, is very well written and accurate.

The low-entropy failure observed in Experiment 2 is not an accident of implementation — it is a fundamental consequence of the algebraic structure we chose. Understanding this helps clarify both the limitations and the design space for alternatives.

Upper Triangular Matrices Form a Nilpotent Group

An n×n upper triangular matrix with 1s on the diagonal can be written as I + N, where I is the identity matrix and N is strictly upper triangular (all diagonal entries are 0). The key property of strictly upper triangular matrices is that they are nilpotent: repeated multiplication eventually yields zero.

Specifically, for an n×n strictly upper triangular matrix N:

N^n = 0 (the zero matrix)

This happens because each multiplication “shifts” the non-zero entries one diagonal further from the main diagonal. After n-1 multiplications, all entries have been shifted off the matrix.

How This Affects Hash Fusing

When we fuse a hash with itself (A × A), we are computing (I + N) × (I + N):

(I + N)² = I + 2N + N²

With our bit masking over w-bit cells, arithmetic is effectively done modulo 2^w. Multiplying by 2 shifts all bits left and, after masking back to w bits, forces the least significant bit of each cell to 0. After k squarings:

(I + N)(2k) = I + 2^k · N + (higher powers of N)

The coefficient 2^k means the lowest k bits of each cell are forced to zero under mod 2^w / bitmasking. This is exactly what Experiment 2 observed: each fold (squaring) zeros out one more bit, and after cell-size folds, all bits are zero.

Why This Is Fundamental

The nilpotent structure is intrinsic to upper triangular matrices. Any associative, non-commutative composition based on this structure will exhibit the same behavior. This is not a bug — it’s a mathematical property of the group we chose.

The Trade-off

We chose upper triangular matrices because they provide:

  • Associativity: Essential for Finger Trees
  • Non-commutativity: Essential for ordered sequences
  • Efficient representation: The matrix structure maps cleanly to hash bits

The cost is nilpotency, which manifests as the low-entropy failure mode. This trade-off is acceptable for high-entropy data (which is the common case for content hashes) and can be mitigated with detection and preprocessing for edge cases.

Alternative Structures

For applications where low-entropy data is common, one could explore non-nilpotent groups such as GL(n, F_p) — the general linear group over a finite field. However, this would sacrifice the sparse matrix efficiency and require more complex encoding. The upper triangular approach remains practical for most content-addressed data structure applications.

Appendix B: Alternatives Explored

Before settling on upper triangular matrices, several alternative approaches were investigated. This section documents those explorations and explains why they were not suitable.

Quaternion Multiplication

Quaternions are a natural candidate: they form a non-commutative algebra and have efficient 4-component representations. However, experiments with quaternion-based fusing revealed the same nilpotent-like behavior observed with upper triangular matrices.

When quaternions are encoded similarly to the UTM approach — with values close to the identity element (1 + εi + εj + εk where ε represents small perturbations) — repeated self-multiplication causes the perturbation terms to degenerate. While pure unit quaternions form a proper group (isomorphic to SU(2)), the encoding required for hash values reintroduces the same fundamental problem.

Modified Fusing Operations

Various modifications to the basic fusing operation were attempted, including:

  • XOR combined with rotations
  • Nonlinear mixing functions
  • Polynomial operations over finite fields

The consistent finding was that associativity is surprisingly fragile. Most intuitive “mixing” operations that might improve entropy preservation break associativity, which is essential for the Finger Tree use case.

Matrix multiplication is special precisely because it is one of the few operations that provides both associativity and non-commutativity naturally.

Why Upper Triangular Matrices Were Chosen

Despite the low-entropy limitation, upper triangular matrices remain the best practical choice because:

  1. Guaranteed associativity: Matrix multiplication is inherently associative — this cannot be broken by implementation choices.

  2. Guaranteed non-commutativity: For n > 1, matrix multiplication is non-commutative, preserving sequence order.

  3. Efficient sparse encoding: The upper triangular structure means only n(n-1)/2 cells carry information, mapping cleanly to hash bits.

  4. Predictable failure mode: The low-entropy degeneration is detectable and follows a known pattern (one bit per fold), making it possible to implement safeguards.

The alternatives explored either broke associativity (disqualifying them for Finger Trees) or exhibited the same degeneration behavior without the other benefits of the matrix approach.

source: src/math/hashing/hashfusing.clj