turbopuffer is designed to be performant by default, but there are ways to optimize performance further. These suggestions aren't requirements for good performance--rather, they highlight opportunities for improvement when you have the flexibility to choose.
For example, while a single namespace with 100M documents works fine, splitting it into 10 namespaces of 10M documents each may yield better query performance if there's a natural way to group the documents.
Turbopuffer client instance for as
many requests as possible. This uses a connection pool behind the scenes to
avoid the overhead of a TCP and TLS handshake on every request.int8 output
matches f32 precision (benchmarks), so you can pass int8
values directly as JSON integers to an f16 namespace for f16 speed with no
precision loss.file_id). At query time, do a
vector search on chunks, then look up the metadata using the unique IDs from
your results. This way, patches to chunk-specific attributes never touch the
large metadata.rank_by expressions can quickly become quite sophisticated. For best
peformance, we recommend keeping the first-stage ranking function simple, with
only a few attributes being used to compute BM25 scores and/or attribute
scores, retrieving in the order of 100 to 1,000 hits, and then applying more
sophisticated ranking in the second stage.turbo* or *puffer) are much more specific than unanchored patterns (*tpuf*),
and thus will perform better. Avoid unspecific patterns like [a-z]*, which require a full-table
scan.