We compare USearch, FAISS-CPU, and sqlite-vec for CPU-bound M4 retrieval when your embedding model already runs locally or behind a gateway. Pair with the local RAG chunk and embedding quota matrix, wire retriever calls like Haystack on a remote Mac, and read freelancer Mac mini rental notes before you offload the index. The matrix below is intentionally operational: it names starting batches, thread caps, and cache directories you can paste into runbooks today.
Pain points
- RAM cliffs: HNSW build RAM dwarfs steady query RAM until compaction; Activity Monitor lags reality.
- Bad paths: indexes under Downloads or synced folders corrupt when cloud throttles writes.
- Rental math: tiny batches plus oversubscribed threads stretch ingest wall clock past cheap hourly rates.
Decision matrix
| Dimension | USearch | FAISS-CPU | sqlite-vec |
|---|---|---|---|
| Best fit | Compact APIs | Index variety plus baselines | SQL deletes and single files |
| Ingest batch | 4096–16384 vectors dim 768 fp32 | IVF train then 8192–32768 adds | 500–2000 rows per txn WAL SSD |
| Build threads | Perf cores minus one (~9 M4 Pro) | Same; faiss omp_set_num_threads | One writer bulk load |
| Memory | ~2–3× raw vector bytes peak build | IVFPQ smaller RAM riskier recall | Page cache friendly overhead |
| Disk paths | ~/Library/Caches/LlmMac/vec/usearch/<corpus> | ~/Library/Caches/LlmMac/vec/faiss/<corpus> + json | ~/Library/Caches/LlmMac/vec/sqlite/<t>.sqlite |
Executable environment defaults
Paste into launchd, tmux, or CI so laptop and rental hosts match; tune batch after embedding throughput.
export VEC_INDEX_ROOT="${HOME}/Library/Caches/LlmMac/vec"
export VEC_CACHE_ROOT="${HOME}/Library/Caches/LlmMac/vec/tmp"
export VEC_INGEST_BATCH=8192
export VEC_BUILD_THREADS=9
export VEC_QUERY_THREADS=4
mkdir -p "${VEC_INDEX_ROOT}/usearch" "${VEC_INDEX_ROOT}/faiss" "${VEC_INDEX_ROOT}/sqlite" "${VEC_CACHE_ROOT}"Co-locate embeddings and indexes on the internal SSD; external APFS volumes work if Spotlight is off that mount. Document the volume serial in your ops ticket so remote replays never mix disks.
Five-step rollout
Step one: Hold out one thousand queries with brute-force neighbors in prod dim and dtype so every later recall claim references the same frozen slice instead of a one-off notebook.
Step two: Match normalization and metric across engines; store manifest hash beside each index.
Step three: Sweep ingest batch four thousand to thirty-two thousand at fixed threads; stop before memory pressure yellow.
Step four: Warm five minutes then log p50 and p95 top k latency plus recall against that brute slice so procurement sees steady-state behavior not a cold cache fantasy.
Step five: Tar index manifest and bench JSON with checksums; replay on the rental Mac before signing cost.
Remote node acceptance checklist
| Gate | Pass criteria |
|---|---|
| Cost realism | Twenty-four hour soak without restarts; rent times wall clock beats DIY power plus engineer time |
| Latency parity | p95 within ten percent of laptop at same top k efSearch or probes |
| Recall | Within two points of brute holdout average |
| Thermal | No sustained clock drop under ingest plus query replay |
| Audit | Logs carry corpus version params env snapshot tarball checksum |
Citable guardrails
- Batch 8k–16k often saturates fp32 dim 768 ingest bandwidth on M4 before RAM does.
- Nine build threads matches perf cores minus one on typical M4 Pro leaving headroom for embed workers.
- Twenty-four hour soak surfaces leaks launchd loops and thermal sag short demos miss.
FAQ
sqlite-vec when? Per-tenant sqlite files with legal-friendly delete beats peak QPS.
USearch vs FAISS? USearch for HNSW-class CPU services; FAISS when IVFADC or lab baselines matter.
Recall drop on rent? Re-check efSearch probes dtype float sixteen versus float thirty two.
No-login: pricing, purchase, Tech Blog.
Summary: Pick engine by ops needs; pin batch threads paths; run remote checklist before betting rent on prod retrieval.