Quickstart¶
pytest-benchmem is a drop-in for an existing pytest-benchmark suite: add one flag and
every benchmark(...) call records peak memory too — no test changes. The memray pass is
Linux/macOS only; timing works everywhere.
1. You already have a benchmark¶
A normal pytest-benchmark test — nothing pytest-benchmem-specific. If you have a suite already, you're done with this step:
# test_sortbench.py
import pytest
@pytest.mark.parametrize("n", [10_000, 100_000, 1_000_000])
def test_sort(benchmark, n):
benchmark(sorted, list(range(n, 0, -1)))
2. Run it with --benchmark-memory¶
Add the flag and peak memory appends to pytest-benchmark's own table:
──────────────────────────────────────────────────────────────────────────────
test_sort[10000] 32.5830 (1.0) 41.2080 (1.0) │ 0.08
test_sort[100000] 321.2080 (9.86) 419.9160 (10.19) │ 0.76
test_sort[1000000] 3,669.2920 (112.61) 4,331.5421 (105.11) │ 7.63
That's it. Left of the divider is pytest-benchmark's timing, untouched; peak (right) is a
separate, untimed memray pass on the same call — so the allocator hooks cost the timing
nothing. It's opt-in at the run level: without the flag, your suite runs exactly as
before. Add --benchmark-json=run.json to save both metrics to one file.
3. Where to go next¶
- Want
allocated/allocationstoo, or a different table layout? → Choosing a metric - Want to diff two runs or fail CI on a regression? → Compare & gate CI
- Want to slice tables and plots by an axis (input size, op, …)? → Grouping by dims
Going further¶
Want memory on specific tests only? Use the benchmark_memory fixture
--benchmark-memory measures the whole suite. To opt in per test instead — no
run-level flag — swap benchmark for benchmark_memory on just those tests; it's
always measured:
The fixture also gives you the pedantic form
for explicit control (a setup that rebuilds fresh state each round, custom rounds).
Stateful benchmarks: reuse your setup
Memory rides a separate, untimed invocation, sampled several times (adaptive) — so a stateful action (mutates a fixture, fills a cache on a carried-over object) drifts across those samples (the first fills the state, later ones reuse it) instead of reporting the cold cost on each.
The fix is free if you already do it for timing: a setup passed to the
pedantic form — or to benchmark.pedantic
under --benchmark-memory — is reused, untracked, before each memory sample, so
setup-based suites stay accurate with no extra changes. Otherwise, benchmark a pure
call, or add such a setup.