Go API
The types and functions the tatami library exports: writing and reading document-store files, predicates and lookups, collections, and the search-segment API.
The library lives at github.com/tamnd/tatami. The full, authoritative reference is on pkg.go.dev; this page is a guided tour of the surface. The search subpackage is at github.com/tamnd/tatami/search.
Schema
type Field struct {
Name string
Type LogicalType
Nullable bool
SortKey bool // the single column the file is sorted on
BlobSeparated bool // keep this column's payload in the blob region
BloomFilter bool // build a membership filter on this column
DictHint bool // prefer dictionary encoding for this column
}
func NewSchema(fields ...Field) (*Schema, error)
The logical types are the constants TypeBool, TypeInt8 through TypeInt64, TypeUint8 through TypeUint64, TypeFloat32, TypeFloat64, TypeString, TypeBytes, TypeTimestampMicros, TypeList, and TypeBlobRef.
Writing
func Create(path string, schema *Schema, opts WriterOptions) (*Writer, *os.File, error)
func NewWriter(w io.WriterAt, schema *Schema, opts WriterOptions) (*Writer, error)
func (w *Writer) Append(b Batch) error
func (w *Writer) Close() error
Create opens a file; NewWriter writes to any io.WriterAt. A Batch is a set of Column values, one per schema field, each holding a typed Data slice and an optional Valid slice for nullable columns.
type Batch struct {
Columns []Column
}
type Column struct {
Data any // a typed slice: []string, []int32, [][]byte, ...
Valid []bool // optional, for nullable columns
}
WriterOptions tunes the layout (RowGroupMaxRows, RowGroupMaxBytes, PageMaxValues, PageSizeHint, BlobRunTargetBytes) and the header metadata (UUID, CreatedMillis, CreatorID). The zero value is a good default.
Reading
func OpenFile(path string) (*Reader, *os.File, error)
func Open(r io.ReaderAt, size int64) (*Reader, error)
func (r *Reader) NumRowGroups() int
func (r *Reader) ReadColumn(group, col int) (Column, error)
OpenFile opens a path; Open reads from any io.ReaderAt. ReadColumn reads one column of one row group, decoding only that column.
Predicates, scans, and lookups
func Eq(col string, val any) *Pred
func Ne(col string, val any) *Pred
func Lt(col string, val any) *Pred
func Le(col string, val any) *Pred
func Gt(col string, val any) *Pred
func Ge(col string, val any) *Pred
func Between(col string, lo, hi any) *Pred
func IsNull(col string) *Pred
func And(kids ...*Pred) *Pred
func Or(kids ...*Pred) *Pred
func (r *Reader) Scan(pred *Pred, projection ...string) (*ScanResult, error)
func (r *Reader) Lookup(key any) (RowRef, bool, error)
Scan projects the named columns and pushes the predicate down, returning the surviving rows and counters (GroupsScanned, GroupsTotal) that show the pruning. Lookup on a sorted file returns a RowRef{Group, Row} with a bounded seek.
Collections
func OpenCollection(dir string) (*Collection, error)
func (c *Collection) Scan(pred *Pred, projection ...string) (*CollectionScan, error)
func (c *Collection) Lookup(key any) (CollHit, bool, int, error)
func (c *Collection) Merge(inRels []string, outRel string, opts WriterOptions, createdMillis uint64) error
A Collection is a directory of files cataloged by a manifest. Scan prunes files by their rollup before opening them and reports FilesScanned against FilesTotal. Lookup returns a CollHit (the member and the RowRef) and the fan-out count. Merge decodes several members and re-encodes them into one, swapping the manifest atomically.
Search segments
type SearchDoc struct {
DocID string // stable identity, e.g. sha256 of the url
URL string
Title string
Body string
Anchor string
}
func NewSearchBuilder() *SearchBuilder
func (b *SearchBuilder) Add(doc SearchDoc)
func (b *SearchBuilder) Write(path string, opts WriterOptions) error
func OpenSearch(path string) (*SearchSegment, error)
func (s *SearchSegment) Search(query string, k int) ([]SearchResult, error)
func (s *SearchSegment) Query(query string, k int) []search.Hit
func (s *SearchSegment) Delete(docID string) (bool, error)
func (s *SearchSegment) NumDocs() int
func (s *SearchSegment) NumTerms() int
func (s *SearchSegment) NumDeleted() int
func (s *SearchSegment) Close() error
Search returns the top-k resolved to url, title, and score; Query is the retrieval-only hot path. Delete clears a document by its stable id and is honored at query time.
Merging and serving
func MergeSegments(segs []*SearchSegment, outPath string, opts WriterOptions) error
func OpenIndex(paths []string) (*Index, error)
func NewIndex(segs []*SearchSegment) *Index
func (ix *Index) Search(query string, k int) ([]SearchResult, error)
func (ix *Index) Query(query string, k int) []IndexHit
func (ix *Index) SelectMerge(p search.MergePolicy) []int
func (ix *Index) Segments() []*SearchSegment
func (ix *Index) NumDocs() int
func (ix *Index) Close() error
MergeSegments folds segments into one, dropping deletions and re-deriving dense ids. An Index serves many segments behind one query with a global top-k and stable-id dedup. SelectMerge applies the tiered policy from the search subpackage (search.DefaultMergePolicy) and returns the segment indices to merge.