Architecture

Understanding ck’s modular Rust workspace architecture.

Workspace Structure

ck uses a Cargo workspace with specialized crates:

ck/
├── ck-cli/          # Command-line interface and MCP server
├── ck-core/         # Shared types, configuration, utilities
├── ck-engine/       # Search engine implementations
├── ck-index/        # File indexing and sidecar management
├── ck-embed/        # Text embedding providers
├── ck-chunk/        # Text segmentation and parsing
└── ck-models/       # Model registry and configuration

Crate Responsibilities

ck-cli

Purpose: User-facing CLI and MCP server

Key components:

Argument parsing (clap)
MCP JSON-RPC server
Output formatting
User interaction

Dependencies: All other crates

ck-core

Purpose: Shared types and utilities

Key components:

SearchResult types
Configuration structures
Error types (anyhow)
Common utilities

Dependencies: None (foundation crate)

ck-engine

Purpose: Search implementations

Key components:

RegexEngine: Pattern matching
SemanticEngine: Vector similarity search
HybridEngine: Reciprocal Rank Fusion
Result ranking and scoring

Dependencies: ck-core, ck-index, ck-embed

ck-index

Purpose: File indexing and management

Key components:

File discovery and traversal
Hash-based change detection
Incremental index updates
Sidecar file management
Exclusion pattern handling

Dependencies: ck-core, ck-chunk

ck-embed

Purpose: Embedding generation

Key components:

FastEmbed integration
Multiple model support (BGE, Nomic, Jina)
Token-aware chunking
Embedding caching
Model download management

Dependencies: ck-core, ck-models

ck-chunk

Purpose: Intelligent code chunking

Key components:

Tree-sitter parsing (7+ languages)
Semantic boundary detection
Token counting (HuggingFace tokenizers)
Content-based text detection
Language detection

Dependencies: ck-core, ck-models

ck-models

Purpose: Model configuration

Key components:

Model registry (BGE, Nomic, Jina)
Token limits and dimensions
Model aliases
Chunking configuration

Dependencies: ck-core

Data Flow

Indexing Flow

User Command (ck --index .)
    ↓
ck-cli: Parse arguments
    ↓
ck-index: Discover files
    ↓
ck-chunk: Parse and segment code
    ↓
ck-embed: Generate embeddings
    ↓
ck-index: Save embeddings in sidecars under .ck/

Search Flow (Semantic)

User Query (ck --sem "pattern" .)
    ↓
ck-cli: Parse arguments
    ↓
ck-embed: Embed query
    ↓
ck-engine: Score stored embeddings (cosine) and rank results
    ↓
ck-cli: Format and display output

Search Flow (Hybrid)

User Query (ck --hybrid "pattern" .)
    ↓
ck-cli: Parse arguments
    ↓
[Parallel]
├─ ck-engine (SemanticEngine): Semantic search
└─ ck-engine (RegexEngine): Keyword search
    ↓
ck-engine (HybridEngine): RRF fusion
    ↓
ck-cli: Format and display output

Key Design Patterns

Error Handling

Uses anyhow::Result consistently:

rust

use anyhow::Result;

pub fn search(query: &str) -> Result<Vec<SearchResult>> {
    // ...
}

Async/Await

Tokio runtime for I/O operations:

rust

#[tokio::main]
async fn main() -> Result<()> {
    // ...
}

Parallel Processing

Rayon for CPU-intensive tasks:

rust

use rayon::prelude::*;

files.par_iter()
    .map(|f| process_file(f))
    .collect()

Memory-Mapped Files

Efficient large file access:

rust

use memmap2::Mmap;

let mmap = unsafe { Mmap::map(&file)? };
// Access file contents without full load

Storage Format

Index Structure

.ck/
├── manifest.json          # Index metadata
│   └── { model, dimensions, timestamp, ... }
├── embeddings.json        # Vector embeddings
│   └── { file_path: [vectors...], ... }
├── ann_index.bin          # ANN index (binary)
└── tantivy_index/         # Keyword search index
    ├── meta.json
    └── *.seg files

Sidecar Files

Each source file gets a sidecar:

src/
├── main.rs
└── .ck/
    └── main.rs.ck         # Sidecar with chunks and hashes

Sidecar contains:

File hash (for change detection)
Chunk boundaries
Embedding IDs
Metadata

Performance Considerations

Indexing Performance

Parallel file processing – Rayon thread pool
Incremental updates – Hash-based change detection
Efficient I/O – Memory-mapped files
Smart exclusions – Early filtering of non-code files

Search Performance

Vector search – O(log n) with ANN index
Keyword search – Tantivy inverted index
Caching – Embedding cache, model cache
Streaming results – Generator patterns for large result sets

Memory Management

Lazy loading – Files loaded only when needed
Streaming processing – Process files one at a time
Index compression – Binary format for vectors
Model caching – Reuse loaded models

Testing Strategy

Unit Tests

Each crate has unit tests:

bash

cargo test --workspace

Integration Tests

End-to-end testing in ck-cli:

bash

cargo test --package ck-cli

Feature Tests

Test each feature combination:

bash

cargo hack test --each-feature --workspace

Build Process

Development

bash

# Build all crates
cargo build --workspace

# Build release
cargo build --workspace --release

# Run tests
cargo test --workspace

# Lint
cargo clippy --workspace --all-features

# Format
cargo fmt --all

Release

bash

# Version bump (all crates)
# Update Cargo.toml in each crate

# Build and test
cargo test --workspace
cargo clippy --workspace --all-features
cargo fmt --all --check

# Publish to crates.io
cargo publish --package ck-core
cargo publish --package ck-models
# ... (publish in dependency order)
cargo publish --package ck-cli

Architecture

Workspace Structure

Crate Responsibilities

ck-cli

ck-core

ck-engine

ck-index

ck-embed

ck-chunk

ck-models

Data Flow

Indexing Flow

Search Flow (Semantic)

Search Flow (Hybrid)

Key Design Patterns

Error Handling

Async/Await

Parallel Processing

Memory-Mapped Files

Storage Format

Index Structure

Sidecar Files

Performance Considerations

Indexing Performance

Search Performance

Memory Management

Testing Strategy

Unit Tests

Integration Tests

Feature Tests

Build Process

Development

Release

Extension Points

Adding New Embedding Models

Adding New Languages

Adding New Search Modes

Next Steps

Architecture ​

Workspace Structure ​

Crate Responsibilities ​

ck-cli ​

ck-core ​

ck-engine ​

ck-index ​

ck-embed ​

ck-chunk ​

ck-models ​

Data Flow ​

Indexing Flow ​

Search Flow (Semantic) ​

Search Flow (Hybrid) ​

Key Design Patterns ​

Error Handling ​

Async/Await ​

Parallel Processing ​

Memory-Mapped Files ​

Storage Format ​

Index Structure ​

Sidecar Files ​

Performance Considerations ​

Indexing Performance ​

Search Performance ​

Memory Management ​

Testing Strategy ​

Unit Tests ​

Integration Tests ​

Feature Tests ​

Build Process ​

Development ​

Release ​

Extension Points ​

Adding New Embedding Models ​

Adding New Languages ​

Adding New Search Modes ​

Next Steps ​

Architecture

Workspace Structure

Crate Responsibilities

ck-cli

ck-core

ck-engine

ck-index

ck-embed

ck-chunk

ck-models

Data Flow

Indexing Flow

Search Flow (Semantic)

Search Flow (Hybrid)

Key Design Patterns

Error Handling

Async/Await

Parallel Processing

Memory-Mapped Files

Storage Format

Index Structure

Sidecar Files

Performance Considerations

Indexing Performance

Search Performance

Memory Management

Testing Strategy

Unit Tests

Integration Tests

Feature Tests

Build Process

Development

Release

Extension Points

Adding New Embedding Models

Adding New Languages

Adding New Search Modes

Next Steps