Relaunching tablecloth.time: Composability over Abstraction
I recently relaunched the experimental time series processing library tablecloth.time — this time without an index. Turns out that’s a feature, not a limitation. Here’s why, and a walkthrough of the composable primitives that replace it, using the Victoria electricity demand dataset.
Why No Index?
The original tablecloth.time was built around an index for two reasons: performance (tree-based indexes offer O(log n) lookups) and convenience (you don’t have to keep specifying which column is the time column). Anyone who has used the Python Pandas data processing library is likely familiar with this feature.
But when tech.ml.dataset removed its indexing mechanism in v7, it forced a rethink. And the rethink revealed that neither rationale held up.
On performance: Unlike Python DataFrames, Clojure’s datasets are immutable. They’re rebuilt on each transformation. Under these conditions, maintaining a tree-based index is pure overhead — you’d rebuild it constantly. As Chris Nuernberger (author of tech.ml.dataset) put it: “Just sorting the dataset and using binary search will outperform most/all tree structures in this scenario.” (This is the same conclusion that Polars, the fastest-growing Pandas alternative, reached — no index by design.)
On convenience: The index adds implicit state threaded through your data. Tablecloth’s API avoids this — you always say which columns you’re operating on. The pipeline reads like what it does. This aligns with Clojure’s broader preference for explicit, composable operations over hidden magic.
For the full discussion of this design shift, see Composability Over Abstraction on humanscodes.
Throughout these examples, tc refers to tablecloth.api, tct refers to tablecloth.time.api, and tctc refers to tablecloth.time.column.api.
Loading the Data
We’ll use the vic_elec dataset: half-hourly electricity demand from Victoria, Australia, spanning 2012-2014. Strings are parsed to datetime types on load:
(def vic-elec
(-> (tc/dataset "https://gist.githubusercontent.com/ezmiller/6edf3e0f41848f532436c15bc94c2f4d/raw/vic_elec.csv"
{:key-fn keyword})
(tc/convert-types :Time :local-date-time)))(tc/head vic-elec)https://gist.githubusercontent.com/ezmiller/6edf3e0f41848f532436c15bc94c2f4d/raw/vic_elec.csv [5 5]:
| :Time | :Demand | :Temperature | :Date | :Holiday |
|---|---|---|---|---|
| 2011-12-31T13:00 | 4382.825174 | 21.40 | 2012-01-01 | True |
| 2011-12-31T13:30 | 4263.365526 | 21.05 | 2012-01-01 | True |
| 2011-12-31T14:00 | 4048.966046 | 20.70 | 2012-01-01 | True |
| 2011-12-31T14:30 | 3877.563330 | 20.55 | 2012-01-01 | True |
| 2011-12-31T15:00 | 4036.229746 | 20.40 | 2012-01-01 | True |
The dataset has half-hourly readings with :Time, :Demand (in MW), :Temperature, and other fields.
Time at the Column Level
Before diving into the high-level API, it’s worth understanding what’s underneath. tablecloth.time mirrors tablecloth’s two-level design: a dataset API and a column API. The column API is where the actual time manipulation happens, built on dtype-next’s vectorized operations.
Why does this matter? Because manipulating time data is notoriously fiddly. Clojure has excellent time libraries — tick, cljc.java-time — that tame java.time’s verbosity. But they operate on scalars. Working with columns of timestamps still means mapping functions over sequences. tablecloth.time’s column API gives you operations that work on entire columns at once, using the same fast, primitive-backed machinery as the rest of tech.ml.dataset.
The building blocks fall into three categories:
Parsing — tablecloth.time.parse/parse handles ISO-8601 strings and custom formats with cached formatters for performance. For now this is scalar (single value), but bulk parsing happens automatically when loading datasets with tc/convert-types.
Conversion — convert-time moves between representations (Instants, LocalDateTimes, LocalDates, epoch milliseconds) with timezone awareness. This is the workhorse for preparing time columns for different operations.
Flooring and extraction — down-to-nearest, floor-to-month, and field extractors like year, hour, day-of-week operate on columns using dtype-next’s vectorized arithmetic. These are column in, column out:
Extract just the hour from the Time column:
(tctc/hour (:Time vic-elec))#tech.v3.dataset.column<int64>[52608]
null
[13, 13, 14, 14, 15, 15, 16, 16, 17, 17, 18, 18, 19, 19, 20, 20, 21, 21, 22, 22...]Floor timestamps to hour buckets:
(tctc/down-to-nearest (:Time vic-elec) 1 :hours {:zone "UTC"})#tech.v3.dataset.column<local-date-time>[52608]
null
[2011-12-31T13:00, 2011-12-31T13:00, 2011-12-31T14:00, 2011-12-31T14:00, 2011-12-31T15:00, 2011-12-31T15:00, 2011-12-31T16:00, 2011-12-31T16:00, 2011-12-31T17:00, 2011-12-31T17:00, 2011-12-31T18:00, 2011-12-31T18:00, 2011-12-31T19:00, 2011-12-31T19:00, 2011-12-31T20:00, 2011-12-31T20:00, 2011-12-31T21:00, 2011-12-31T21:00, 2011-12-31T22:00, 2011-12-31T22:00...]The key thing to notice: these operations work on primitive arrays under the hood, just like dtype-next’s numeric operations. The result is a column that can be added directly to a dataset.
Building Up: add-time-columns
With these column-level tools in hand, the dataset-level API is just convenience. A core example is add-time-columns. It is just a thin wrapper around the extractors we just saw.
Here’s what it does internally:
- Take the source time column from the dataset
- Look up extractor functions from a map (
:year→tctc/year, etc.) - Apply each extractor to produce new columns
- Add those columns back to the dataset
The “primitive” is just composition of lower-level pieces. This matters because it means you can drop down when the high-level API doesn’t quite fit. Need a custom computed field? Build it from the column tools and add it yourself.
Let’s see it in action:
(-> vic-elec
(tct/add-time-columns :Time [:day-of-week :hour])
(tc/head 10))https://gist.githubusercontent.com/ezmiller/6edf3e0f41848f532436c15bc94c2f4d/raw/vic_elec.csv [10 7]:
| :Time | :Demand | :Temperature | :Date | :Holiday | :day-of-week | :hour |
|---|---|---|---|---|---|---|
| 2011-12-31T13:00 | 4382.825174 | 21.40 | 2012-01-01 | True | 6 | 13 |
| 2011-12-31T13:30 | 4263.365526 | 21.05 | 2012-01-01 | True | 6 | 13 |
| 2011-12-31T14:00 | 4048.966046 | 20.70 | 2012-01-01 | True | 6 | 14 |
| 2011-12-31T14:30 | 3877.563330 | 20.55 | 2012-01-01 | True | 6 | 14 |
| 2011-12-31T15:00 | 4036.229746 | 20.40 | 2012-01-01 | True | 6 | 15 |
| 2011-12-31T15:30 | 3865.597244 | 20.25 | 2012-01-01 | True | 6 | 15 |
| 2011-12-31T16:00 | 3694.097664 | 20.10 | 2012-01-01 | True | 6 | 16 |
| 2011-12-31T16:30 | 3561.623686 | 19.60 | 2012-01-01 | True | 6 | 16 |
| 2011-12-31T17:00 | 3433.035352 | 19.10 | 2012-01-01 | True | 6 | 17 |
| 2011-12-31T17:30 | 3359.468000 | 18.95 | 2012-01-01 | True | 6 | 17 |
The Resampling Pattern
With time fields extracted, standard tablecloth operations take over. Resampling, which in time series means aggregating to coarser time granularity, is just another pattern of composition: add time columns, group, aggregate, order.
Let’s break it into two steps. First, the data transformation:
(def demand-by-day
(-> vic-elec
(tct/add-time-columns :Time [:day-of-week])
(tc/group-by [:day-of-week])
(tc/aggregate {:Demand #(dfn/mean (:Demand %))})
(tc/order-by [:day-of-week])))Look at the aggregated data:
(tc/head demand-by-day 7)_unnamed [7 2]:
| :day-of-week | :Demand |
|---|---|
| 1 | 4848.65554632 |
| 2 | 4891.60069604 |
| 3 | 4884.86457635 |
| 4 | 4923.60046615 |
| 5 | 4617.70390105 |
| 6 | 4145.35140584 |
| 7 | 4346.43981711 |
Then visualize:
(plotly/layer-bar demand-by-day
{:=x :day-of-week :=y :Demand})Weekends (days 6 and 7) clearly have lower demand. The :day-of-week field came from add-time-columns; the group-by, aggregate, and order-by are pure tablecloth. tablecloth.time provides the time-specific pieces, then gets out of the way.
The same pattern scales to different granularities. Here are daily and monthly averages:
Daily averages (first 10 days):
(-> vic-elec
(tct/add-time-columns :Time [:year :month :day])
(tc/group-by [:year :month :day])
(tc/aggregate {:Demand #(dfn/mean (:Demand %))
:Temperature #(dfn/mean (:Temperature %))})
(tc/order-by [:year :month :day])
(tc/head 10))_unnamed [10 5]:
| :year | :month | :day | :Demand | :Temperature |
|---|---|---|---|---|
| 2011 | 12 | 31 | 3751.44299627 | 21.04772727 |
| 2012 | 1 | 1 | 4745.38036050 | 26.57812500 |
| 2012 | 1 | 2 | 5739.39560171 | 31.75104167 |
| 2012 | 1 | 3 | 5394.90269629 | 24.56770833 |
| 2012 | 1 | 4 | 4454.00785304 | 18.19166667 |
| 2012 | 1 | 5 | 4397.21721979 | 17.81250000 |
| 2012 | 1 | 6 | 4277.88988792 | 19.51041667 |
| 2012 | 1 | 7 | 4181.10979787 | 24.09895833 |
| 2012 | 1 | 8 | 4167.95030704 | 20.22395833 |
| 2012 | 1 | 9 | 4504.20493425 | 19.16145833 |
Monthly averages — each bar is a month, colored by year:
(-> vic-elec
(tct/add-time-columns :Time [:year :month])
(tc/group-by [:year :month])
(tc/aggregate {:Demand #(dfn/mean (:Demand %))})
(tc/order-by [:year :month])
(plotly/layer-bar {:=x :month :=y :Demand :=color :year :=color-type :nominal}))Note that tablecloth.time is just a light layer here. You could do this with tablecloth alone by manually extracting datetime components. add-time-columns just adds concision — it composes naturally with the tablecloth operations you’re already using.
Slicing Time Ranges
slice selects rows within a time range using binary search on sorted data. Here is where we would have previously leaned on an index. Now we use binary search on a sorted column. It’s fast even on large datasets — the O(log n) lookup without the overhead of maintaining a tree structure, though it may need to sort the data if unsorted.
(-> vic-elec
(tct/slice :Time "2012-01-09" "2012-01-15")
(tc/row-count))289One week of data — 336 half-hourly observations. Let’s visualize it:
(-> vic-elec
(tct/slice :Time "2012-01-09" "2012-01-15")
(plotly/layer-line {:=x :Time :=y :Demand}))The daily oscillation is clearly visible: demand peaks during the day and drops at night.
Lag and Lead Columns
add-lag shifts column values by a fixed number of rows — useful for autocorrelation analysis. Note this is row-based, not time-aware: you need to know your data’s frequency and calculate the offset. Since this dataset has half-hourly readings, a lag of 48 rows equals 24 hours:
(-> vic-elec
(tct/add-lag :Demand 48 :Demand_lag48)
(tc/drop-missing)
(tc/head 10))https://gist.githubusercontent.com/ezmiller/6edf3e0f41848f532436c15bc94c2f4d/raw/vic_elec.csv [10 6]:
| :Time | :Demand | :Temperature | :Date | :Holiday | :Demand_lag48 |
|---|---|---|---|---|---|
| 2012-01-01T13:00 | 4367.914468 | 21.60 | 2012-01-02 | True | 4382.825174 |
| 2012-01-01T13:30 | 4164.015958 | 21.40 | 2012-01-02 | True | 4263.365526 |
| 2012-01-01T14:00 | 3898.239882 | 21.20 | 2012-01-02 | True | 4048.966046 |
| 2012-01-01T14:30 | 3752.016756 | 20.95 | 2012-01-02 | True | 3877.563330 |
| 2012-01-01T15:00 | 3941.369156 | 20.70 | 2012-01-02 | True | 4036.229746 |
| 2012-01-01T15:30 | 3776.118154 | 20.55 | 2012-01-02 | True | 3865.597244 |
| 2012-01-01T16:00 | 3601.350916 | 20.40 | 2012-01-02 | True | 3694.097664 |
| 2012-01-01T16:30 | 3490.138704 | 20.35 | 2012-01-02 | True | 3561.623686 |
| 2012-01-01T17:00 | 3433.902124 | 20.30 | 2012-01-02 | True | 3433.035352 |
| 2012-01-01T17:30 | 3416.646580 | 20.50 | 2012-01-02 | True | 3359.468000 |
Let’s see if demand correlates with the same time yesterday:
(-> vic-elec
(tct/add-lag :Demand 48 :Demand_lag48)
(tc/drop-missing)
(plotly/layer-point {:=x :Demand_lag48
:=y :Demand
:=mark-opacity 0.3}))The tight diagonal shows strong positive correlation — demand at any given time is highly predictive of demand at the same time the previous day.
add-lead works the same way but shifts values forward. Current demand aligns with demand 24 hours ahead — useful when you need to align past observations with future outcomes for predictive modeling:
(-> vic-elec
(tct/add-lead :Demand 48 :Demand_lead48)
(tc/drop-missing)
(plotly/layer-point {:=x :Demand
:=y :Demand_lead48
:=mark-opacity 0.3}))Combining Primitives
Let’s do something more interesting: analyze the daily demand profile, comparing weekdays to weekends.
(-> vic-elec
(tct/add-time-columns :Time [:day-of-week :hour])
(tc/map-columns :weekend? [:day-of-week] #(>= % 6))
(tc/group-by [:weekend? :hour])
(tc/aggregate {:Demand #(dfn/mean (:Demand %))})
(tc/order-by [:hour])
(plotly/layer-line {:=x :hour
:=y :Demand
:=color :weekend?}))Weekday demand shows the classic two-peak pattern (morning and evening), while weekend demand is flatter and lower overall.
What’s Next
tablecloth.time is experimental. The current release provides focused primitives built on solid foundations: parsing, conversion, and field extraction at the column level; convenient dataset-level wrappers that compose with standard tablecloth operations. My hope is this provides a solid basis for building convinient abstractions that are just patterns of composition.
Planned additions include rolling windows, differencing, and higher-level patterns like resample that wrap the composable building blocks.
The repository is on GitHub. For more worked examples, see the fpp3 Chapter 2 notebook.