Hash Fusing

Exploring Hash Fusing via Upper Triangular Matrix Multiplication.
Author
Published

January 12, 2026

Keywords

data, hash, sha256, matrix, fusing, merkle-tree

This is an article about fusing hashes together while having the key properties of associativity and non-commutativity. I describe the basic approach of using matrix multiplication of Upper Triangle Matrices and then show the results of two experiments showing the quality and limits of this hash fusing approach.

The basic question this article tries to answer is: can fusing hashes together provide unique identifiers for arbitrary data? The answer is no. Specifically, the approach in this article can not provide unique identifiers for sufficiently large runs of repeated values. That is, sufficiently low entropy data is not uniquely represented by fusing hashes together.

The good news is that low entropy data may always be converted into high entropy data either via compression or injecting noise. A practical fusing operator can easily detect fuses of low entropy data and throw an exception when fusing produces those ‘bad’ hash values.

Introduction

I have begun working on implementing distributed immutable data structures as a way to efficiently share structured data between multiple clients. One of the key challenges in this area is being able to efficiently reference ordered collections of data. The usual approach is to use Merkle Trees which have the limitation that they are non-associative and the shape of the tree determines the final hash value at the root of the tree. But, since I plan on using Finger Trees to efficiently represent ordered collections of data, I need computed hashes that are insensitive to the tree shape. This means that I need to be able to fuse hashes together in a way that is associative and also non-commutative. As a starting point, I have been reading the HP paper describing hash fusing via matrix multiplication: https://www.labs.hpe.com/techreports/2017/HPE-2017-08.pdf

Hash Fusing via Upper Triangular Matrix Multiplication

The basic approach of fusing two hashes is to represent each hash as an upper triangular matrix and then to multiply the two matrices together. Since matrix multiplication is associative but not commutative. The following sections define hashes and build up the necessary functions to perform hash fusing via upper triangular matrix multiplication.

Hash Representation

A hash is represented as a string of 64 hexadecimal values (256 bits). To start with, here is a zero hash and a function to generate random hashes.

(def zero-hex (apply str (vec (repeat 64 0))))
"0000000000000000000000000000000000000000000000000000000000000000"
(defn random-hex
  "Generate a random hex string representing length `n` bytes."
  [n]
  (let [hex-chars "0123456789abcdef"]
    (apply str (repeatedly (* 2 n) #(rand-nth hex-chars)))))

Generate some random hashes for testing:

(def a-hex (random-hex 32))
"a6b453a2eaac05a2fd932ee955d53085a1f350b3da14de75d6e3cc3926ffc49c"
(def b-hex (random-hex 32))
"439d8b50c2ee48494af38fa3184386f78c6d0e2602a84a7858dd66f741465f7c"
(def c-hex (random-hex 32))
"fcb44172d409233e12683daae4ec7efa6851e1e045ab3971bcd5d06b4f36d1c0"

To fuse two hashes, we convert each hash to an upper triangular matrix and then multiply the two matrices together. The result is another upper triangular matrix which can be converted back to a hash by taking the elements above the main diagonal. The following several sections defines this mapping between hashes and upper triangular matrices. For the experiments below four different bit sizes of cells and four corresponding matrices are defined.

8 bit cells in a 9x9 matrix:

\[%% Example: 9x9 Upper Triangular Matrix for 8 bit cells %% representing a 256 bit hash hash \to [ h_0, h_1, h_2, h_3, \ldots, h_{31} ] \to {\begin{bmatrix} 1 & h_0 & h_8 & h_{15} & h_{21} & h_{26} & h_{29} & h_{31} & h_{25} \\ 0 & 1 & h_1 & h_9 & h_{16} & h_{22} & h_{27} & h_{30} & h_{20} \\ 0 & 0 & 1 & h_2 & h_{10} & h_{17} & h_{23} & h_{28} & h_{14} \\ 0 & 0 & 0 & 1 & h_3 & h_{11} & h_{18} & h_{24} & h_7 \\ 0 & 0 & 0 & 0 & 1 & h_4 & h_{12} & h_{19} & 0 \\ 0 & 0 & 0 & 0 & 0 & 1 & h_5 & h_{13} & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 1 & h_6 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 \end{bmatrix}} \]

16 bit cells in a 7x7 matrix:

\[%% Example: 7x7 Upper Triangular Matrix for 16 bit cells %% representing a 256 bit hash hash \to [ h_0, h_1, h_2, h_3, \ldots, h_{15} ] \to {\begin{bmatrix} 1 & h_0 & h_6 & h_{10} & h_{13} & h_{15} & h_5 \\ 0 & 1 & h_1 & h_7 & h_{11} & h_{14} & 0 \\ 0 & 0 & 1 & h_2 & h_8 & h_{12} & 0 \\ 0 & 0 & 0 & 1 & h_3 & h_9 & 0 \\ 0 & 0 & 0 & 0 & 1 & h_4 & 0 \\ 0 & 0 & 0 & 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 1 \end{bmatrix}} \]

32 bit cells in a 5x5 matrix:

\[%% Example: 5x5 Upper Triangular Matrix for 32 bit cells %% representing a 256 bit hash hash \to [ h_0, h_1, h_2, h_3, \ldots, h_7 ] \to {\begin{bmatrix} 1 & h_0 & h_4 & h_7 & h_6 \\ 0 & 1 & h_1 & h_5 & h_3 \\ 0 & 0 & 1 & h_2 & 0 \\ 0 & 0 & 0 & 1 & 0 \end{bmatrix}} \]

64 bit cells in a 4x4 matrix:

\[%% Example: 4x4 Upper Triangular Matrix for 64 bit cells %% representing a 256 bit hash hash \to [ h_0, h_1, h_2, h_3 ] \to {\begin{bmatrix} 1 & h_0 & h_3 & h_2 \\ 0 & 1 & h_1 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \end{bmatrix}} \]

Hex and Byte Vector Conversion Functions

The first type of conversion is between hex strings and byte vectors.

(defn hex->hash8
  "Convert a hex string to a byte vector."
  [s]
  (with-meta
    (->> s
         (partition 2)
         (map #(-> (apply str %)
                   (Integer/parseInt 16)))
         (vec))
    {:cell-size 8}))
(defn hash8->hex
  "Convert a byte vector to a hex string."
  [bytes]
  (->> bytes
       (map #(format "%02x" %))
       (apply str)))
(= a-hex (-> a-hex hex->hash8 hash8->hex))
true

Hash Conversion Functions

The next type of conversion is from bytes and larger word sizes.

(defn hash8->hash16
  "Convert a vector of 32 bytes into a vector of 16 2-byte words."
  [hash8]
  (with-meta
    (vec (map (fn [i]
                (+ (bit-shift-left (hash8 (* 2 i)) 8)
                   (hash8 (+ 1 (* 2 i)))))
              (range 16)))
    {:cell-size 16}))
(defn hash16->hash8
  "Convert a vector 2-byte words into a vector of bytes."
  [hash16]
  (with-meta
    (vec (mapcat (fn [word]
                   [(bit-and (bit-shift-right word 8) 0xFF)
                    (bit-and word 0xFF)])
                 hash16))
    {:cell-size 8}))
(= a-hex (-> a-hex hex->hash8 hash8->hash16 hash16->hash8 hash8->hex))
true
(defn hash8->hash32
  "Convert a vector of 32 bytes into a vector of 8 4-byte words."
  [hash8]
  (with-meta
    (vec (map (fn [i]
                (+ (bit-shift-left (hash8 (* 4 i)) 24)
                   (bit-shift-left (hash8 (+ 1 (* 4 i))) 16)
                   (bit-shift-left (hash8 (+ 2 (* 4 i))) 8)
                   (hash8 (+ 3 (* 4 i)))))
              (range 8)))
    {:cell-size 32}))
(defn hash32->hash8
  "Convert a vector of 4-byte words into a vector of bytes."
  [hash32]
  (with-meta
    (vec (mapcat (fn [word]
                   [(bit-and (bit-shift-right word 24) 0xFF)
                    (bit-and (bit-shift-right word 16) 0xFF)
                    (bit-and (bit-shift-right word 8) 0xFF)
                    (bit-and word 0xFF)])
                 hash32))
    {:cell-size 8}))
(= a-hex (-> a-hex hex->hash8 hash8->hash32 hash32->hash8 hash8->hex))
true
(defn hash8->hash64
  "Convert a vector of 32 bytes into a vector of 4 8-byte words."
  [hash8]
  (with-meta
    (vec (map (fn [i]
                (+ (bit-shift-left (hash8 (* 8 i)) 56)
                   (bit-shift-left (hash8 (+ 1 (* 8 i))) 48)
                   (bit-shift-left (hash8 (+ 2 (* 8 i))) 40)
                   (bit-shift-left (hash8 (+ 3 (* 8 i))) 32)
                   (bit-shift-left (hash8 (+ 4 (* 8 i))) 24)
                   (bit-shift-left (hash8 (+ 5 (* 8 i))) 16)
                   (bit-shift-left (hash8 (+ 6 (* 8 i))) 8)
                   (hash8 (+ 7 (* 8 i)))))
              (range 4)))
    {:cell-size 64}))
(defn hash64->hash8
  "Convert a vector of 8-byte words into a vector of bytes."
  [hash64]
  (with-meta
    (vec (mapcat (fn [word]
                   [(bit-and (bit-shift-right word 56) 0xFF)
                    (bit-and (bit-shift-right word 48) 0xFF)
                    (bit-and (bit-shift-right word 40) 0xFF)
                    (bit-and (bit-shift-right word 32) 0xFF)
                    (bit-and (bit-shift-right word 24) 0xFF)
                    (bit-and (bit-shift-right word 16) 0xFF)
                    (bit-and (bit-shift-right word 8) 0xFF)
                    (bit-and word 0xFF)])
                 hash64))
    {:cell-size 8}))
(= a-hex (-> a-hex hex->hash8 hash8->hash64 hash64->hash8 hash8->hex))
true

Matrix Conversion Functions

The following functions convert between byte vectors representing hashes and four different upper triangular matrix sizes based on cell bit size.

(defn hash8->utm8
  "Convert a vector of 32 bytes into a 9x9 upper triangular matrix."
  [[h00 h01 h02 h03 h04 h05 h06 h07
    h08 h09 h10 h11 h12 h13 h14 h15
    h16 h17 h18 h19 h20 h21 h22 h23
    h24 h25 h26 h27 h28 h29 h30 h31]]
  (with-meta
    [[1 h00 h08 h15 h21 h26 h29 h31 h25]
     [0   1 h01 h09 h16 h22 h27 h30 h20]
     [0   0   1 h02 h10 h17 h23 h28 h14]
     [0   0   0   1 h03 h11 h18 h24 h07]
     [0   0   0   0   1 h04 h12 h19   0]
     [0   0   0   0   0   1 h05 h13   0]
     [0   0   0   0   0   0   1 h06   0]
     [0   0   0   0   0   0   0   1   0]
     [0   0   0   0   0   0   0   0   1]]
    {:cell-size 8}))
(defn utm8->hash8
  "Convert a 9x9 upper triangular matrix back to a vector of 32 bytes."
  [[[_ h00 h08 h15 h21 h26 h29 h31 h25]
    [_   _ h01 h09 h16 h22 h27 h30 h20]
    [_   _   _ h02 h10 h17 h23 h28 h14]
    [_   _   _   _ h03 h11 h18 h24 h07]
    [_   _   _   _   _ h04 h12 h19   _]
    [_   _   _   _   _   _ h05 h13   _]
    [_   _   _   _   _   _   _ h06   _]
    [_   _   _   _   _   _   _   _   _]
    [_   _   _   _   _   _   _   _   _]]]
  (with-meta
    (mapv #(bit-and % 0xFF)
          [h00 h01 h02 h03 h04 h05 h06 h07
           h08 h09 h10 h11 h12 h13 h14 h15
           h16 h17 h18 h19 h20 h21 h22 h23
           h24 h25 h26 h27 h28 h29 h30 h31])
    {:cell-size 8}))
(defn hash16->utm16
  "Convert a vector of 16 2-byte words into a 7x7 upper triangular matrix."
  [[h00 h01 h02 h03 h04 h05 h06 h07
    h08 h09 h10 h11 h12 h13 h14 h15]]
  (with-meta
    [[1 h00 h06 h10 h13 h15 h05]
     [0   1 h01 h07 h11 h14   0]
     [0   0   1 h02 h08 h12   0]
     [0   0   0   1 h03 h09   0]
     [0   0   0   0   1 h04   0]
     [0   0   0   0   0   1   0]
     [0   0   0   0   0   0   1]]
    {:cell-size 16}))
(defn utm16->hash16
  "Convert a 7x7 upper triangular matrix back to a vector of 16 2-byte words."
  [[[_ h00 h06 h10 h13 h15 h05]
    [_   _ h01 h07 h11 h14   _]
    [_   _   _ h02 h08 h12   _]
    [_   _   _   _ h03 h09   _]
    [_   _   _   _   _ h04   _]
    [_   _   _   _   _   _   _]
    [_   _   _   _   _   _   _]]]
  (with-meta
    (mapv #(bit-and % 0xFFFF)
          [h00 h01 h02 h03 h04 h05 h06 h07
           h08 h09 h10 h11 h12 h13 h14 h15])
    {:cell-size 16}))
(defn hash32->utm32
  "Convert a vector of 8 4-byte words into a 5x5 upper triangular matrix."
  [[h00 h01 h02 h03 h04 h05 h06 h07]]
  (with-meta
    [[1 h00 h04 h07 h06]
     [0   1 h01 h05 h03]
     [0   0   1 h02   0]
     [0   0   0   1   0]
     [0   0   0   0   1]]
    {:cell-size 32}))
(defn utm32->hash32
  "Convert a 5x5 upper triangular matrix back to a vector of 8 4-byte words."
  [[[_ h00 h04 h07 h06]
    [_   _ h01 h05 h03]
    [_   _   _ h02   _]
    [_   _   _   _   _]
    [_   _   _   _   _]]]
  (with-meta
    (mapv #(bit-and % 0xFFFFFFFF)
          [h00 h01 h02 h03 h04 h05 h06 h07])
    {:cell-size 32}))
(defn hash64->utm64
  "Convert a vector of 4 8-byte words into a 4x4 upper triangular matrix."
  [[h00 h01 h02 h03]]
  (with-meta
    [[1 h00 h03 h02]
     [0   1 h01   0]
     [0   0   1   0]
     [0   0   0   1]]
    {:cell-size 64}))
(defn utm64->hash64
  "Convert a 4x4 upper triangular matrix back to a vector of 4 8-byte words."
  [[[_ h00 h03 h02]
    [_   _ h01   _]
    [_   _   _   _]
    [_   _   _   _]]]
  (with-meta
    [h00 h01 h02 h03]
    {:cell-size 64}))

Combined Conversion Functions

The following combined conversion functions convert between hex strings into upper triangular matrices and back for the four different cell bit sizes.

(def hex->utm8
  "Convert a hex string to an upper triangular matrix with 8-bit cells."
  (comp hash8->utm8 hex->hash8))
(def utm8->hex
  "Convert an upper triangular matrix with 8-bit cells to a hex string."
  (comp hash8->hex utm8->hash8))
(= a-hex (-> a-hex hex->utm8 utm8->hex))
true
(def hex->utm16
  "Convert a hex string to an upper triangular matrix with 16-bit cells."
  (comp hash16->utm16 hash8->hash16 hex->hash8))
(def utm16->hex
  "Convert an upper triangular matrix with 16-bit cells to a hex string."
  (comp hash8->hex hash16->hash8 utm16->hash16))
(= a-hex (-> a-hex hex->utm16 utm16->hex))
true
(def hex->utm32
  "Convert a hex string to an upper triangular matrix with 32-bit cells."
  (comp hash32->utm32 hash8->hash32 hex->hash8))
(def utm32->hex
  "Convert an upper triangular matrix with 32-bit cells to a hex string."
  (comp hash8->hex hash32->hash8 utm32->hash32))
(= a-hex (-> a-hex hex->utm32 utm32->hex))
true
(def hex->utm64
  "Convert a hex string to an upper triangular matrix with 64-bit cells."
  (comp hash64->utm64 hash8->hash64 hex->hash8))
(def utm64->hex
  "Convert an upper triangular matrix with 64-bit cells to a hex string."
  (comp hash8->hex hash64->hash8 utm64->hash64))
(= a-hex (-> a-hex hex->utm64 utm64->hex))
true
(defn hex->utm
  "Convert a hex string to an upper triangular matrix with the given cell size
  (8, 16, 32, or 64 bits)."
  [hex cell-size]
  (case cell-size
    8  (hex->utm8 hex)
    16 (hex->utm16 hex)
    32 (hex->utm32 hex)
    64 (hex->utm64 hex)
    (throw (ex-info "Unsupported cell size for upper triangular matrix."
                    {:cell-size cell-size}))))
(defn utm->hex
  "Convert an upper triangular matrix with the given cell size (8, 16, 32,
  or 64 bits) to a hex string."
  [utm cell-size]
  (case cell-size
    8  (utm8->hex utm)
    16 (utm16->hex utm)
    32 (utm32->hex utm)
    64 (utm64->hex utm)
    (throw (ex-info "Unsupported cell size for upper triangular matrix."
                    {:cell-size cell-size}))))
(= a-hex (-> a-hex (hex->utm 32) (utm->hex 32)))
true

Upper Triangular Matrix Multiplication

This is the core function that performs the multiplication of two upper triangular matrices. The multiplication ignores the lower triangular part of of the matrices since they are always 0 (and always 1 on the main diagonal).

Note that unchecked math is enabled to ignore integer overflow since the cells are treated as fixed size bit fields.

(set! *unchecked-math* true)
true
(defn utm-multiply
  "Multiply two upper triangular matrices `a` and `b`."
  [a b]
  (let [dim (count a)
        cell-size (-> a meta :cell-size)
        bit-mask (if (= cell-size 64)
                   -1
                   (dec (bit-shift-left 1 cell-size)))]
    (with-meta
      (vec (for [i (range dim)]
             (vec (for [j (range dim)]
                    (cond
                      (< j i) 0
                      (= j i) 1
                      :else
                      (reduce (fn [sum k]
                                (-> sum
                                    (+ (* (get-in a [i k])
                                          (get-in b [k j])))
                                    (bit-and bit-mask)))
                              0
                              (range i (inc j))))))))
      {:cell-size cell-size})))

Associativity & Non-Commutativity Properties

Show that upper triangular matrix multiplication is associative and non-commutative. Associativity is necessary for hash fusing to work with Finger Trees so that different tree shapes produce the same fused hash. Non-commutativity is necessary for sequences of data where the order of data affects the fused hash.

(-> (for [cell-size [8 16 32 64]]
      (let [a (hex->utm a-hex cell-size)
            b (hex->utm b-hex cell-size)
            c (hex->utm c-hex cell-size)
            ab (utm-multiply a b)
            ba (utm-multiply b a)
            bc (utm-multiply b c)
            ab*c (utm-multiply ab c)
            a*bc (utm-multiply a bc)]
        {:cell-size cell-size
         :associative? (= ab*c a*bc)
         :commutative? (= ab ba)}))
    (kind/table))
cell-size associative? commutative?
8 true false
16 true false
32 true false
64 true false

Experiment 1: Random Fuses

This experiment is run with two different hashes which are fused onto an accumulator iteratively. For each iteration, one of the two hashes is randomly selected and is fused to the accumulator. This random fusing is repeated many times and the quality of the accumulator is measured both by keeping track of global uniqueness after each fuse and by the uniform distribution of bit values.

(defn random-fuses
  "Perform random fuses of two hashes onto an accumulator keeping track of all
  unique hashes produced and all duplicate hashes produced. Repeat this for all
  four utm sizes based on cell bit size."
  []
  (let [hashes {:a (random-hex 32)
                :b (random-hex 32)}
        ;; convert hashes to utms for each bit size and store in a map
        utms {8  (update-vals hashes hex->utm8)
              16 (update-vals hashes hex->utm16)
              32 (update-vals hashes hex->utm32)
              64 (update-vals hashes hex->utm64)}
        results {8  {:acc (hex->utm8 zero-hex)
                     :uniques   #{}
                     :duplicates #{}}
                 16 {:acc (hex->utm16 zero-hex)
                     :uniques   #{}
                     :duplicates #{}}
                 32 {:acc (hex->utm32 zero-hex)
                     :uniques   #{}
                     :duplicates #{}}
                 64 {:acc (hex->utm64 zero-hex)
                     :uniques   #{}
                     :duplicates #{}}}
        results (reduce
                 (fn [results _]
                   (reduce
                    (fn [results cell-size]
                      (let [curr-acc (get-in results [cell-size :acc])
                            ;; randomly select one of the two hashes
                            selected-hash (if (< (rand) 0.5)
                                            (get-in utms [cell-size :a])
                                            (get-in utms [cell-size :b]))
                            ;; fuse the selected hash onto the accumulator
                            new-acc (utm-multiply curr-acc selected-hash)]
                        ;; update results with new accumulator and uniqueness info
                        (if (contains? (get-in results [cell-size :uniques]) new-acc)
                          (update-in results [cell-size :duplicates] conj new-acc)
                          (-> results
                              (assoc-in [cell-size :acc] new-acc)
                              (update-in [cell-size :uniques] conj new-acc)))))
                    results
                    [8 16 32 64]))
                 results
                 (range 10000))]
    ;; convert final results to totals of unique and duplicate hashes
    (->> results
         (map (fn [[cell-size {:keys [uniques duplicates]}]]
                {:cell-size   cell-size
                 :uniques     (count uniques)
                 :duplicates (count duplicates)}))
         (kind/table))))

Random Fuses Results

(random-fuses)
cell-size uniques duplicates
8 10000 0
16 10000 0
32 10000 0
64 10000 0

These results show that high entropy data can be well represented by fusing hashes together. All four bit sizes for the cells of the upper triangular matrices performed well with high entroy data. I have rerun this experiment with millions of fuses and have never observed a duplicate hash being produced. Since 64 bit cells fit in the smallest matrix, this is the fasted to compute.

Experiment 2: Folded Fuses

This experiment is run with a hash fused together with itself (i.e. folding) and then the result in turn fused with itself and so on. This folding is repeated many times and the quality of the accumulator is measured both by keeping track of global uniqueness after each fuse and by the uniform distribution of bit values. Once the accumulator becomes zero the folding stops and the number of folds to reach this state is recorded. Also, the number of lower bits that are zero across all cells is recorded after each fold.

(defn calc-zero-lower-bits
  "Calculate the number of lower bits that are zero across all upper cells in
  the upper triangular matrix."
  [utm]
  (let [cell-size (-> utm meta :cell-size)]
    (loop [mask 1 zero-lower-bits 0]
      (if (and (< zero-lower-bits cell-size)
               (->> (for [i (range (count utm))
                          j (range (inc i) (count utm))]
                      (get-in utm [i j]))
                    (map #(bit-and mask %))
                    (every? zero?)))
        (recur (bit-shift-left mask 1)
               (inc zero-lower-bits))
        zero-lower-bits))))
(defn folded-fuses
  "Perform folded fuses starting from the fixed hash `a-hex`, converted to an
  upper triangular matrix (UTM) for each cell size. After each fold, track how
  many lower bits of each cell are zero across all upper cells using
  `calc-zero-lower-bits`. Stop when all those lower bits are zero (i.e.,
  `zero-lower-bits` equals the cell size) or when 1000 folds have been
  performed. Report, for each cell size and fold, the fold count, the number
  of zero lower bits, and the resulting hash."
  []
  (let [;; convert hashes to utms for each bit size and store in a map
        results (reduce
                 (fn [results cell-size]
                   (let [fold (hex->utm a-hex cell-size)
                         zero-lower-bits (calc-zero-lower-bits fold)
                         fold-result
                         (loop [folds [{:fold fold
                                        :zero-lower-bits zero-lower-bits}]]
                           (let [{:keys [fold zero-lower-bits]} (last folds)]
                             (if (or (= zero-lower-bits cell-size)
                                     (>= (count folds) 1000))
                               folds
                               (let [fold (utm-multiply fold fold)
                                     zero-lower-bits (calc-zero-lower-bits fold)]
                                 (recur (conj folds
                                              {:fold fold
                                               :zero-lower-bits zero-lower-bits}))))))]
                     (assoc results cell-size fold-result)))
                 {}
                 [8 16 32 64])]
    ;; format results into a table
    (->> results
         (mapcat (fn [[cell-size folds]]
                   (map-indexed (fn [index {:keys [fold zero-lower-bits]}]
                                  {:cell-size      cell-size
                                   :fold-count     index
                                   :zero-lower-bits zero-lower-bits
                                   :fold (-> [:pre (utm->hex fold cell-size)]
                                             (kind/hiccup))})

                                folds)))
         (kind/table))))

Folded Fuses Results

(folded-fuses)
cell-size fold-count zero-lower-bits fold
8 0 0
a6b453a2eaac05a2fd932ee955d53085a1f350b3da14de75d6e3cc3926ffc49c
8 1 0
4c68a644d4580a44b282e2e6e206e663a07df6c17a2e8d645fbc084350d988bf
8 2 0
98d04c88a8b014884474dc1ca47ce4ca9846044eec4c2ec802e8e0ca729c5a93
8 3 1
30a098105060281008a81878c8b828a4903c68ccb898acd054d0a0e40c209c4a
8 4 2
60403020a0c050201050b0f09070d088a0385058f03098a0e8a0c008b8e0d824
8 5 3
c08060404080a04020a060e020e0a0104070a0b0e0603040d0408010f0403088
8 6 4
8000c080800040804040c0c040c0402080e04060c0c06080a0800020e0806010
8 7 5
0000800000008000808080808080804000c080c08080c00040000040c000c020
8 8 6
0000000000000000000000000000008000800080000080008000008080008040
8 9 7
0000000000000000000000000000000000000000000000000000000000000080
8 10 8
0000000000000000000000000000000000000000000000000000000000000000
16 0 0
a6b453a2eaac05a2fd932ee955d53085a1f350b3da14de75d6e3cc3926ffc49c
16 1 0
4d68a744d5580b44fb265dd28592a5e220bef76c6cc84edaa993b38dbdd2ce1a
16 2 1
9ad04e88aab01688f64cbba472c45f24b4dc46f09b900834627ae126e764baa8
16 3 2
35a09d1055602d10ec98774884080bc83738ee403b20d0680344297cbcc8ea40
16 4 3
6b403a20aac05a20d930ee9082104d90a4705e00664050d007c807b8e190dd40
16 5 4
d68074405580b440b260dd20ec20732020e0c2008c80e1a05490a27063204580
16 6 5
ad00e880ab00688064c0ba4078404640a1c09c001900c340bd2090e04640f700
16 7 6
5a00d1005600d100c980748070800c80c380980032008680ca4051c08c809e00
16 8 7
b400a200ac00a2009300e900e10019008700b00064000d00d48063801900fc00
16 9 8
68004400580044002600d200c20032000e006000c8001a00a900c7003200f800
16 10 9
d0008800b00088004c00a400840064001c00c0009000340052008e006400f000
16 11 10
a000100060001000980048000800c8003800800020006800a4001c00c800e000
16 12 11
40002000c00020003000900010009000700000004000d000480038009000c000
16 13 12
80004000800040006000200020002000e00000008000a0009000700020008000
16 14 13
0000800000008000c000400040004000c0000000000040002000e00040000000
16 15 14
0000000000000000800080008000800080000000000080004000c00080000000
16 16 15
0000000000000000000000000000000000000000000000008000800000000000
16 17 16
0000000000000000000000000000000000000000000000000000000000000000
32 0 0
a6b453a2eaac05a2fd932ee955d53085a1f350b3da14de75d6e3cc3926ffc49c
32 1 0
4d68a744d5580b44fb265dd2abaa610a8c7db7ea89c3f95c78ef6b9cb855db2d
32 2 1
9ad14e88aab01688f64cbba45754c2143b57c9e469f0e4801e7e23e07f71d6be
32 3 2
35a29d1055602d10ec997748aea984280020fc082d859020ef797a606562f38c
32 4 3
6b453a20aac05a20d932ee905d53085026079910c19a3cc0a8e7bf40bff92358
32 5 4
d68a74405580b440b265dd20baa610a0e325b6201d70eb8079a2a8802e7057b0
32 6 5
ad14e880ab01688064cbba40754c214022a57c40a3d39f009291f900e461f360
32 7 6
5a29d1005602d100c9977480ea984280b6b33880eb6e5e00a25692005310f6c0
32 8 7
b453a200ac05a200932ee900d53085003307710065f93c003977a40031962d80
32 9 8
68a74400580b4400265dd200aa610a007c92e2000864780046194800a2fd5b00
32 10 9
d14e8800b01688004cbba40054c214005335c4000290f000d8da9000d53eb600
32 11 10
a29d1000602d100099774800a98428000eab8800cc41e000e4552000678d6c00
32 12 11
453a2000c05a200032ee900053085000be571000b503c000932a4000c35ad800
32 13 12
8a74400080b4400065dd2000a610a00000ae2000dc0780005054800057b5b000
32 14 13
14e8800001688000cbba40004c214000115c4000800f000048a90000f36b6000
32 15 14
29d1000002d10000977480009842800062b88000201e000031520000f6d6c000
32 16 15
53a2000005a200002ee9000030850000c5710000c03c0000e2a400002dad8000
32 17 16
a74400000b4400005dd20000610a00008ae2000080780000c54800005b5b0000
32 18 17
4e88000016880000bba40000c214000015c4000000f000008a900000b6b60000
32 19 18
9d1000002d10000077480000842800002b88000001e00000152000006d6c0000
32 20 19
3a2000005a200000ee900000085000005710000003c000002a400000dad80000
32 21 20
74400000b4400000dd20000010a00000ae2000000780000054800000b5b00000
32 22 21
e880000068800000ba400000214000005c4000000f000000a90000006b600000
32 23 22
d1000000d10000007480000042800000b88000001e00000052000000d6c00000
32 24 23
a2000000a2000000e900000085000000710000003c000000a4000000ad800000
32 25 24
4400000044000000d20000000a000000e200000078000000480000005b000000
32 26 25
8800000088000000a400000014000000c4000000f000000090000000b6000000
32 27 26
1000000010000000480000002800000088000000e0000000200000006c000000
32 28 27
2000000020000000900000005000000010000000c000000040000000d8000000
32 29 28
400000004000000020000000a0000000200000008000000080000000b0000000
32 30 29
8000000080000000400000004000000040000000000000000000000060000000
32 31 30
00000000000000008000000080000000800000000000000000000000c0000000
32 32 31
0000000000000000000000000000000000000000000000000000000080000000
32 33 32
0000000000000000000000000000000000000000000000000000000000000000
64 0 0
a6b453a2eaac05a2fd932ee955d53085a1f350b3da14de75d6e3cc3926ffc49c
64 1 1
4d68a745d5580b44fb265dd2abaa610a43e6a167b429bcea9d383387f436d662
64 2 2
9ad14e8baab01688f64cbba55754c21487cd42cf685379d4f832d366814ae16c
64 3 3
35a29d1755602d10ec99774aaea984280f9a859ed0a6f3a8e76f5827660a9578
64 4 4
6b453a2eaac05a20d932ee955d5308501f350b3da14de750ab0575b859e87570
64 5 5
d68a745d5580b440b265dd2abaa610a03e6a167b429bcea0c6a60116eb1e14e0
64 6 6
ad14e8baab01688064cbba55754c21407cd42cf685379d404fb858c6b370d1c0
64 7 7
5a29d1755602d100c99774aaea984280f9a859ed0a6f3a80a9220bf0dbb44380
64 8 8
b453a2eaac05a200932ee955d5308500f350b3da14de75007909816f8ab30700
64 9 9
68a745d5580b4400265dd2abaa610a00e6a167b429bcea008d28a91662900e00
64 10 10
d14e8baab01688004cbba55754c21400cd42cf685379d40086a7eb09f9c81c00
64 11 11
a29d1755602d100099774aaea98428009a859ed0a6f3a800beaa3988c6303800
64 12 12
453a2eaac05a200032ee955d53085000350b3da14de7500042be00e4d6e07000
64 13 13
8a745d5580b4400065dd2abaa610a0006a167b429bcea0009b223916d7c0e000
64 14 14
14e8baab01688000cbba55754c214000d42cf685379d40008cdd4f625781c000
64 15 15
29d1755602d100009774aaea98428000a859ed0a6f3a8000741e13974f038000
64 16 16
53a2eaac05a200002ee955d53085000050b3da14de75000051c9fa791e070000
64 17 17
a745d5580b4400005dd2abaa610a0000a167b429bcea000049cb421c3c0e0000
64 18 18
4e8baab016880000bba55754c214000042cf685379d400002c73b8e0781c0000
64 19 19
9d1755602d100000774aaea984280000859ed0a6f3a80000bc5c4460f0380000
64 20 20
3a2eaac05a200000ee955d53085000000b3da14de7500000068bd341e0700000
64 21 21
745d5580b4400000dd2abaa610a00000167b429bcea000004464d083c0e00000
64 22 22
e8baab0168800000ba55754c214000002cf685379d40000065fe490781c00000
64 23 23
d1755602d100000074aaea984280000059ed0a6f3a80000040cf320f03800000
64 24 24
a2eaac05a2000000e955d53085000000b3da14de7500000054e8e41e07000000
64 25 25
45d5580b44000000d2abaa610a00000067b429bcea000000f6fbc83c0e000000
64 26 26
8baab01688000000a55754c214000000cf685379d4000000229f90781c000000
64 27 27
1755602d100000004aaea984280000009ed0a6f3a800000017df20f038000000
64 28 28
2eaac05a20000000955d5308500000003da14de7500000007a3e41e070000000
64 29 29
5d5580b4400000002abaa610a00000007b429bcea00000001e7c83c0e0000000
64 30 30
baab01688000000055754c2140000000f685379d40000000e4f90781c0000000
64 31 31
755602d100000000aaea984280000000ed0a6f3a8000000069f20f0380000000
64 32 32
eaac05a20000000055d5308500000000da14de750000000053e41e0700000000
64 33 33
d5580b4400000000abaa610a00000000b429bcea00000000a7c83c0e00000000
64 34 34
aab01688000000005754c21400000000685379d4000000004f90781c00000000
64 35 35
55602d1000000000aea9842800000000d0a6f3a8000000009f20f03800000000
64 36 36
aac05a20000000005d53085000000000a14de750000000003e41e07000000000
64 37 37
5580b44000000000baa610a000000000429bcea0000000007c83c0e000000000
64 38 38
ab01688000000000754c21400000000085379d4000000000f90781c000000000
64 39 39
5602d10000000000ea984280000000000a6f3a8000000000f20f038000000000
64 40 40
ac05a20000000000d53085000000000014de750000000000e41e070000000000
64 41 41
580b440000000000aa610a000000000029bcea0000000000c83c0e0000000000
64 42 42
b01688000000000054c21400000000005379d4000000000090781c0000000000
64 43 43
602d100000000000a984280000000000a6f3a8000000000020f0380000000000
64 44 44
c05a20000000000053085000000000004de750000000000041e0700000000000
64 45 45
80b4400000000000a610a000000000009bcea0000000000083c0e00000000000
64 46 46
01688000000000004c21400000000000379d4000000000000781c00000000000
64 47 47
02d100000000000098428000000000006f3a8000000000000f03800000000000
64 48 48
05a20000000000003085000000000000de750000000000001e07000000000000
64 49 49
0b44000000000000610a000000000000bcea0000000000003c0e000000000000
64 50 50
1688000000000000c21400000000000079d4000000000000781c000000000000
64 51 51
2d100000000000008428000000000000f3a8000000000000f038000000000000
64 52 52
5a200000000000000850000000000000e750000000000000e070000000000000
64 53 53
b44000000000000010a0000000000000cea0000000000000c0e0000000000000
64 54 54
688000000000000021400000000000009d4000000000000081c0000000000000
64 55 55
d10000000000000042800000000000003a800000000000000380000000000000
64 56 56
a200000000000000850000000000000075000000000000000700000000000000
64 57 57
44000000000000000a00000000000000ea000000000000000e00000000000000
64 58 58
88000000000000001400000000000000d4000000000000001c00000000000000
64 59 59
10000000000000002800000000000000a8000000000000003800000000000000
64 60 60
2000000000000000500000000000000050000000000000007000000000000000
64 61 61
4000000000000000a000000000000000a000000000000000e000000000000000
64 62 62
800000000000000040000000000000004000000000000000c000000000000000
64 63 63
0000000000000000800000000000000080000000000000008000000000000000
64 64 64
0000000000000000000000000000000000000000000000000000000000000000

These results show that repeated folding devolves into a zero hash after only a few folds. The number of bits in a cell nearly exactly defines the number of folds needed before reaching a zero hash. Since each fold represents a doubling of the number of fuses of the same value, folding shows that fusing can not reliably represent long runs of repeated values.

Practical Usage

Based on the experiments above, here are recommendations for using hash fusing in practice.

Detecting Low-Entropy Failures

The key insight from Experiment 2 is that low-entropy data causes the lower bits to become zero progressively. We can detect this condition and fail fast rather than silently producing degenerate hashes.

The following function fuses two hashes and raises an error if the result shows signs of low-entropy degeneration (lower 32 bits all zero):

(defn high-entropy-fuse
  "Fuse two 256-bit hashes together via upper triangular matrix multiplication
  with 64-bit cells. Raise an error when the lower 32 bits of the result are
  all zero."
  [a-hex b-hex]
  (let [a (hex->utm64 a-hex)
        b (hex->utm64 b-hex)
        ab (utm-multiply a b)]
    (if (->> (for [i (range (count a))
                   j (range (inc i) (count a))]
               (get-in ab [i j]))
             (map #(bit-and 0xFFFFFFFF %))
             (every? zero?))
      (throw (ex-info "Low entropy data detected during hash fusing."
                      {:a a-hex :b b-hex :fused (utm64->hex ab)}))
      (utm64->hex ab))))

Example of high entropy data fusing successfully:

(high-entropy-fuse a-hex b-hex)
"ea51def3ad9a4deb4886be8c6e18b77c2e605ed9dcbd28ed449bcbe988985f66"

Example of low entropy data causing an error:

(try
  (high-entropy-fuse zero-hex zero-hex)
  (catch Exception e
         (.getMessage e)))
"Low entropy data detected during hash fusing."

Handling Low-Entropy Data

When working with data that may contain long runs of repeated values, consider these strategies:

  1. Compression: Run-length encode or compress the data before hashing. This converts low-entropy patterns into high-entropy representations.

  2. Salting: Inject position-dependent noise into the hash sequence. For example, XOR each element’s hash with a hash of its index.

  3. Chunking: Break long sequences into smaller chunks and hash each chunk separately before fusing. This limits the damage from repetition.

Conclusion

This article described a method for fusing hashes together using upper triangular matrix multiplication. The method is associative and non-commutative.

Experiment 1 showed that high entropy data can be well represented by fusing hashes together and that the size of cell fields, from 8 to 64 bits, all performed well with high entropy data. Since 64 bit cells fit in the smallest matrix, this is the fastest to compute.

Experiment 2 showed that low entropy data, specifically long runs of repeated values, can not be well represented by fusing hashes together since they all approach a zero hash. The rate of approach is determined by the bit count in the cells. 64 bit cells were the most able to handle repeated values.

The Practical Usage section provides a working implementation with low-entropy detection, along with strategies for handling problematic data.

Appendix: Why Low Entropy Data Fails — Nilpotent Group Structure

The following explanation, based on interactions with a Claude-based AI assistant, is very well written and accurate.

The low-entropy failure observed in Experiment 2 is not an accident of implementation — it is a fundamental consequence of the algebraic structure we chose. Understanding this helps clarify both the limitations and the design space for alternatives.

Upper Triangular Matrices Form a Nilpotent Group

An n×n upper triangular matrix with 1s on the diagonal can be written as I + N, where I is the identity matrix and N is strictly upper triangular (all diagonal entries are 0). The key property of strictly upper triangular matrices is that they are nilpotent: repeated multiplication eventually yields zero.

Specifically, for an n×n strictly upper triangular matrix N:

N^n = 0 (the zero matrix)

This happens because each multiplication “shifts” the non-zero entries one diagonal further from the main diagonal. After n-1 multiplications, all entries have been shifted off the matrix.

How This Affects Hash Fusing

When we fuse a hash with itself (A × A), we are computing (I + N) × (I + N):

(I + N)² = I + 2N + N²

With our bit masking over w-bit cells, arithmetic is effectively done modulo 2^w. Multiplying by 2 shifts all bits left and, after masking back to w bits, forces the least significant bit of each cell to 0. After k squarings:

(I + N)(2k) = I + 2^k · N + (higher powers of N)

The coefficient 2^k means the lowest k bits of each cell are forced to zero under mod 2^w / bitmasking. This is exactly what Experiment 2 observed: each fold (squaring) zeros out one more bit, and after cell-size folds, all bits are zero.

Why This Is Fundamental

The nilpotent structure is intrinsic to upper triangular matrices. Any associative, non-commutative composition based on this structure will exhibit the same behavior. This is not a bug — it’s a mathematical property of the group we chose.

The Trade-off

We chose upper triangular matrices because they provide:

  • Associativity: Essential for Finger Trees
  • Non-commutativity: Essential for ordered sequences
  • Efficient representation: The matrix structure maps cleanly to hash bits

The cost is nilpotency, which manifests as the low-entropy failure mode. This trade-off is acceptable for high-entropy data (which is the common case for content hashes) and can be mitigated with detection and preprocessing for edge cases.

Alternative Structures

For applications where low-entropy data is common, one could explore non-nilpotent groups such as GL(n, F_p) — the general linear group over a finite field. However, this would sacrifice the sparse matrix efficiency and require more complex encoding. The upper triangular approach remains practical for most content-addressed data structure applications.

Appendix B: Alternatives Explored

Before settling on upper triangular matrices, several alternative approaches were investigated. This section documents those explorations and explains why they were not suitable.

Quaternion Multiplication

Quaternions are a natural candidate: they form a non-commutative algebra and have efficient 4-component representations. However, experiments with quaternion-based fusing revealed the same nilpotent-like behavior observed with upper triangular matrices.

When quaternions are encoded similarly to the UTM approach — with values close to the identity element (1 + εi + εj + εk where ε represents small perturbations) — repeated self-multiplication causes the perturbation terms to degenerate. While pure unit quaternions form a proper group (isomorphic to SU(2)), the encoding required for hash values reintroduces the same fundamental problem.

Modified Fusing Operations

Various modifications to the basic fusing operation were attempted, including:

  • XOR combined with rotations
  • Nonlinear mixing functions
  • Polynomial operations over finite fields

The consistent finding was that associativity is surprisingly fragile. Most intuitive “mixing” operations that might improve entropy preservation break associativity, which is essential for the Finger Tree use case.

Matrix multiplication is special precisely because it is one of the few operations that provides both associativity and non-commutativity naturally.

Why Upper Triangular Matrices Were Chosen

Despite the low-entropy limitation, upper triangular matrices remain the best practical choice because:

  1. Guaranteed associativity: Matrix multiplication is inherently associative — this cannot be broken by implementation choices.

  2. Guaranteed non-commutativity: For n > 1, matrix multiplication is non-commutative, preserving sequence order.

  3. Efficient sparse encoding: The upper triangular structure means only n(n-1)/2 cells carry information, mapping cleanly to hash bits.

  4. Predictable failure mode: The low-entropy degeneration is detectable and follows a known pattern (one bit per fold), making it possible to implement safeguards.

The alternatives explored either broke associativity (disqualifying them for Finger Trees) or exhibited the same degeneration behavior without the other benefits of the matrix approach.

source: src/math/hashing/hashfusing.clj