Skip to content

Quickstart

pytest-benchmem is a drop-in for an existing pytest-benchmark suite: add one flag and every benchmark(...) call records peak memory too — no test changes. The memray pass is Linux/macOS only; timing works everywhere.

1. You already have a benchmark

A normal pytest-benchmark test — nothing pytest-benchmem-specific. If you have a suite already, you're done with this step:

# test_sortbench.py
import pytest


@pytest.mark.parametrize("n", [10_000, 100_000, 1_000_000])
def test_sort(benchmark, n):
    benchmark(sorted, list(range(n, 0, -1)))

2. Run it with --benchmark-memory

Add the flag and peak memory appends to pytest-benchmark's own table:

pytest test_sortbench.py --benchmark-only --benchmark-memory Name (time in us) Min Median │ peak (MiB)
──────────────────────────────────────────────────────────────────────────────
test_sort[10000] 32.5830 (1.0) 41.2080 (1.0) │ 0.08
test_sort[100000] 321.2080 (9.86) 419.9160 (10.19) │ 0.76
test_sort[1000000] 3,669.2920 (112.61) 4,331.5421 (105.11) │ 7.63

That's it. Left of the divider is pytest-benchmark's timing, untouched; peak (right) is a separate, untimed memray pass on the same call — so the allocator hooks cost the timing nothing. It's opt-in at the run level: without the flag, your suite runs exactly as before. Add --benchmark-json=run.json to save both metrics to one file.

3. Where to go next


Going further

Want memory on specific tests only? Use the benchmark_memory fixture

--benchmark-memory measures the whole suite. To opt in per test instead — no run-level flag — swap benchmark for benchmark_memory on just those tests; it's always measured:

def test_sort(benchmark_memory, n):
    benchmark_memory(sorted, list(range(n, 0, -1)))

The fixture also gives you the pedantic form for explicit control (a setup that rebuilds fresh state each round, custom rounds).

Stateful benchmarks: reuse your setup

Memory rides a separate, untimed invocation, sampled several times (adaptive) — so a stateful action (mutates a fixture, fills a cache on a carried-over object) drifts across those samples (the first fills the state, later ones reuse it) instead of reporting the cold cost on each.

The fix is free if you already do it for timing: a setup passed to the pedantic form — or to benchmark.pedantic under --benchmark-memory — is reused, untracked, before each memory sample, so setup-based suites stay accurate with no extra changes. Otherwise, benchmark a pure call, or add such a setup.