Relaunching tablecloth.time: Composability over Abstraction

A composable approach to time series analysis in Clojure

Author

Published

March 27, 2026

Victoria electricity demand by day of year, colored by year — Half-hourly electricity demand in Victoria, Australia (2012–2014). Each line is one day, phased over the time of day (0 = midnight, 1 = midnight). Colors indicate year.

I recently relaunched the experimental time series processing library tablecloth.time — this time without an index. Turns out that’s a feature, not a limitation. Here’s why, and a walkthrough of the composable primitives that replace it, using the Victoria electricity demand dataset.

Why No Index?

The original tablecloth.time was built around an index for two reasons: performance (tree-based indexes offer O(log n) lookups) and convenience (you don’t have to keep specifying which column is the time column). Anyone who has used the Python Pandas data processing library is likely familiar with this feature.

But when tech.ml.dataset removed its indexing mechanism in v7, it forced a rethink. And the rethink revealed that neither rationale held up.

On performance: Unlike Python DataFrames, Clojure’s datasets are immutable. They’re rebuilt on each transformation. Under these conditions, maintaining a tree-based index is pure overhead — you’d rebuild it constantly. As Chris Nuernberger (author of tech.ml.dataset) put it: “Just sorting the dataset and using binary search will outperform most/all tree structures in this scenario.” (This is the same conclusion that Polars, the fastest-growing Pandas alternative, reached — no index by design.)

On convenience: The index adds implicit state threaded through your data. Tablecloth’s API avoids this — you always say which columns you’re operating on. The pipeline reads like what it does. This aligns with Clojure’s broader preference for explicit, composable operations over hidden magic.

For the full discussion of this design shift, see Composability Over Abstraction on humanscodes.

Throughout these examples, tc refers to tablecloth.api, tct refers to tablecloth.time.api, and tctc refers to tablecloth.time.column.api.

Loading the Data

We’ll use the vic_elec dataset: half-hourly electricity demand from Victoria, Australia, spanning 2012-2014. Strings are parsed to datetime types on load:

(def vic-elec
  (-> (tc/dataset "https://gist.githubusercontent.com/ezmiller/6edf3e0f41848f532436c15bc94c2f4d/raw/vic_elec.csv"
                  {:key-fn keyword})
      (tc/convert-types :Time :local-date-time)))

(tc/head vic-elec)

https://gist.githubusercontent.com/ezmiller/6edf3e0f41848f532436c15bc94c2f4d/raw/vic_elec.csv [5 5]:

:Time	:Demand	:Temperature	:Date	:Holiday
2011-12-31T13:00	4382.825174	21.40	2012-01-01	True
2011-12-31T13:30	4263.365526	21.05	2012-01-01	True
2011-12-31T14:00	4048.966046	20.70	2012-01-01	True
2011-12-31T14:30	3877.563330	20.55	2012-01-01	True
2011-12-31T15:00	4036.229746	20.40	2012-01-01	True

The dataset has half-hourly readings with :Time, :Demand (in MW), :Temperature, and other fields.

Time at the Column Level

Before diving into the high-level API, it’s worth understanding what’s underneath. tablecloth.time mirrors tablecloth’s two-level design: a dataset API and a column API. The column API is where the actual time manipulation happens, built on dtype-next’s vectorized operations.

Why does this matter? Because manipulating time data is notoriously fiddly. Clojure has excellent time libraries — tick, cljc.java-time — that tame java.time’s verbosity. But they operate on scalars. Working with columns of timestamps still means mapping functions over sequences. tablecloth.time’s column API gives you operations that work on entire columns at once, using the same fast, primitive-backed machinery as the rest of tech.ml.dataset.

The building blocks fall into three categories:

Parsing — tablecloth.time.parse/parse handles ISO-8601 strings and custom formats with cached formatters for performance. For now this is scalar (single value), but bulk parsing happens automatically when loading datasets with tc/convert-types.

Conversion — convert-time moves between representations (Instants, LocalDateTimes, LocalDates, epoch milliseconds) with timezone awareness. This is the workhorse for preparing time columns for different operations.

Flooring and extraction — down-to-nearest, floor-to-month, and field extractors like year, hour, day-of-week operate on columns using dtype-next’s vectorized arithmetic. These are column in, column out:

Extract just the hour from the Time column:

(tctc/hour (:Time vic-elec))

#tech.v3.dataset.column<int64>[52608]
null
[13, 13, 14, 14, 15, 15, 16, 16, 17, 17, 18, 18, 19, 19, 20, 20, 21, 21, 22, 22...]

Floor timestamps to hour buckets:

(tctc/down-to-nearest (:Time vic-elec) 1 :hours {:zone "UTC"})

#tech.v3.dataset.column<local-date-time>[52608]
null
[2011-12-31T13:00, 2011-12-31T13:00, 2011-12-31T14:00, 2011-12-31T14:00, 2011-12-31T15:00, 2011-12-31T15:00, 2011-12-31T16:00, 2011-12-31T16:00, 2011-12-31T17:00, 2011-12-31T17:00, 2011-12-31T18:00, 2011-12-31T18:00, 2011-12-31T19:00, 2011-12-31T19:00, 2011-12-31T20:00, 2011-12-31T20:00, 2011-12-31T21:00, 2011-12-31T21:00, 2011-12-31T22:00, 2011-12-31T22:00...]

The key thing to notice: these operations work on primitive arrays under the hood, just like dtype-next’s numeric operations. The result is a column that can be added directly to a dataset.

Building Up: add-time-columns

With these column-level tools in hand, the dataset-level API is just convenience. A core example is add-time-columns. It is just a thin wrapper around the extractors we just saw.

Here’s what it does internally:

Take the source time column from the dataset
Look up extractor functions from a map (:year → tctc/year, etc.)
Apply each extractor to produce new columns
Add those columns back to the dataset

The “primitive” is just composition of lower-level pieces. This matters because it means you can drop down when the high-level API doesn’t quite fit. Need a custom computed field? Build it from the column tools and add it yourself.

Let’s see it in action:

(-> vic-elec
    (tct/add-time-columns :Time [:day-of-week :hour])
    (tc/head 10))

https://gist.githubusercontent.com/ezmiller/6edf3e0f41848f532436c15bc94c2f4d/raw/vic_elec.csv [10 7]:

:Time	:Demand	:Temperature	:Date	:Holiday	:day-of-week	:hour
2011-12-31T13:00	4382.825174	21.40	2012-01-01	True	6	13
2011-12-31T13:30	4263.365526	21.05	2012-01-01	True	6	13
2011-12-31T14:00	4048.966046	20.70	2012-01-01	True	6	14
2011-12-31T14:30	3877.563330	20.55	2012-01-01	True	6	14
2011-12-31T15:00	4036.229746	20.40	2012-01-01	True	6	15
2011-12-31T15:30	3865.597244	20.25	2012-01-01	True	6	15
2011-12-31T16:00	3694.097664	20.10	2012-01-01	True	6	16
2011-12-31T16:30	3561.623686	19.60	2012-01-01	True	6	16
2011-12-31T17:00	3433.035352	19.10	2012-01-01	True	6	17
2011-12-31T17:30	3359.468000	18.95	2012-01-01	True	6	17

The Resampling Pattern

With time fields extracted, standard tablecloth operations take over. Resampling, which in time series means aggregating to coarser time granularity, is just another pattern of composition: add time columns, group, aggregate, order.

Let’s break it into two steps. First, the data transformation:

(def demand-by-day
  (-> vic-elec
      (tct/add-time-columns :Time [:day-of-week])
      (tc/group-by [:day-of-week])
      (tc/aggregate {:Demand #(dfn/mean (:Demand %))})
      (tc/order-by [:day-of-week])))

Look at the aggregated data:

(tc/head demand-by-day 7)

_unnamed [7 2]:

:day-of-week	:Demand
1	4848.65554632
2	4891.60069604
3	4884.86457635
4	4923.60046615
5	4617.70390105
6	4145.35140584
7	4346.43981711

Then visualize:

(plotly/layer-bar demand-by-day
                  {:=x :day-of-week :=y :Demand})

Weekends (days 6 and 7) clearly have lower demand. The :day-of-week field came from add-time-columns; the group-by, aggregate, and order-by are pure tablecloth. tablecloth.time provides the time-specific pieces, then gets out of the way.

The same pattern scales to different granularities. Here are daily and monthly averages:

Daily averages (first 10 days):

(-> vic-elec
    (tct/add-time-columns :Time [:year :month :day])
    (tc/group-by [:year :month :day])
    (tc/aggregate {:Demand #(dfn/mean (:Demand %))
                   :Temperature #(dfn/mean (:Temperature %))})
    (tc/order-by [:year :month :day])
    (tc/head 10))

_unnamed [10 5]:

:year	:month	:day	:Demand	:Temperature
2011	12	31	3751.44299627	21.04772727
2012	1	1	4745.38036050	26.57812500
2012	1	2	5739.39560171	31.75104167
2012	1	3	5394.90269629	24.56770833
2012	1	4	4454.00785304	18.19166667
2012	1	5	4397.21721979	17.81250000
2012	1	6	4277.88988792	19.51041667
2012	1	7	4181.10979787	24.09895833
2012	1	8	4167.95030704	20.22395833
2012	1	9	4504.20493425	19.16145833

Monthly averages — each bar is a month, colored by year:

(-> vic-elec
    (tct/add-time-columns :Time [:year :month])
    (tc/group-by [:year :month])
    (tc/aggregate {:Demand #(dfn/mean (:Demand %))})
    (tc/order-by [:year :month])
    (plotly/layer-bar {:=x :month :=y :Demand :=color :year :=color-type :nominal}))

Note that tablecloth.time is just a light layer here. You could do this with tablecloth alone by manually extracting datetime components. add-time-columns just adds concision — it composes naturally with the tablecloth operations you’re already using.

Slicing Time Ranges

slice selects rows within a time range using binary search on sorted data. Here is where we would have previously leaned on an index. Now we use binary search on a sorted column. It’s fast even on large datasets — the O(log n) lookup without the overhead of maintaining a tree structure, though it may need to sort the data if unsorted.

(-> vic-elec
    (tct/slice :Time "2012-01-09" "2012-01-15")
    (tc/row-count))

One week of data — 336 half-hourly observations. Let’s visualize it:

(-> vic-elec
    (tct/slice :Time "2012-01-09" "2012-01-15")
    (plotly/layer-line {:=x :Time :=y :Demand}))

The daily oscillation is clearly visible: demand peaks during the day and drops at night.

Lag and Lead Columns

add-lag shifts column values by a fixed number of rows — useful for autocorrelation analysis. Note this is row-based, not time-aware: you need to know your data’s frequency and calculate the offset. Since this dataset has half-hourly readings, a lag of 48 rows equals 24 hours:

(-> vic-elec
    (tct/add-lag :Demand 48 :Demand_lag48)
    (tc/drop-missing)
    (tc/head 10))

https://gist.githubusercontent.com/ezmiller/6edf3e0f41848f532436c15bc94c2f4d/raw/vic_elec.csv [10 6]:

:Time	:Demand	:Temperature	:Date	:Holiday	:Demand_lag48
2012-01-01T13:00	4367.914468	21.60	2012-01-02	True	4382.825174
2012-01-01T13:30	4164.015958	21.40	2012-01-02	True	4263.365526
2012-01-01T14:00	3898.239882	21.20	2012-01-02	True	4048.966046
2012-01-01T14:30	3752.016756	20.95	2012-01-02	True	3877.563330
2012-01-01T15:00	3941.369156	20.70	2012-01-02	True	4036.229746
2012-01-01T15:30	3776.118154	20.55	2012-01-02	True	3865.597244
2012-01-01T16:00	3601.350916	20.40	2012-01-02	True	3694.097664
2012-01-01T16:30	3490.138704	20.35	2012-01-02	True	3561.623686
2012-01-01T17:00	3433.902124	20.30	2012-01-02	True	3433.035352
2012-01-01T17:30	3416.646580	20.50	2012-01-02	True	3359.468000

Let’s see if demand correlates with the same time yesterday:

(-> vic-elec
    (tct/add-lag :Demand 48 :Demand_lag48)
    (tc/drop-missing)
    (plotly/layer-point {:=x :Demand_lag48
                         :=y :Demand
                         :=mark-opacity 0.3}))

The tight diagonal shows strong positive correlation — demand at any given time is highly predictive of demand at the same time the previous day.

add-lead works the same way but shifts values forward. Current demand aligns with demand 24 hours ahead — useful when you need to align past observations with future outcomes for predictive modeling:

(-> vic-elec
    (tct/add-lead :Demand 48 :Demand_lead48)
    (tc/drop-missing)
    (plotly/layer-point {:=x :Demand
                         :=y :Demand_lead48
                         :=mark-opacity 0.3}))

Combining Primitives

Let’s do something more interesting: analyze the daily demand profile, comparing weekdays to weekends.

(-> vic-elec
    (tct/add-time-columns :Time [:day-of-week :hour])
    (tc/map-columns :weekend? [:day-of-week] #(>= % 6))
    (tc/group-by [:weekend? :hour])
    (tc/aggregate {:Demand #(dfn/mean (:Demand %))})
    (tc/order-by [:hour])
    (plotly/layer-line {:=x :hour
                        :=y :Demand
                        :=color :weekend?}))

Weekday demand shows the classic two-peak pattern (morning and evening), while weekend demand is flatter and lower overall.

What’s Next

tablecloth.time is experimental. The current release provides focused primitives built on solid foundations: parsing, conversion, and field extraction at the column level; convenient dataset-level wrappers that compose with standard tablecloth operations. My hope is this provides a solid basis for building convinient abstractions that are just patterns of composition.

Planned additions include rolling windows, differencing, and higher-level patterns like resample that wrap the composable building blocks.

The repository is on GitHub. For more worked examples, see the fpp3 Chapter 2 notebook.

source: src/ezmiller/relaunching_tablecloth_time.clj