Known Limitations

Current limitations and constraints of ck. These are areas where ck has known constraints or where requested features haven’t been implemented yet.

Indexing Limitations

Partial Directory Indexing

Issue: Cannot index only a specific subdirectory.

Details:

bash

# This doesn't work as expected
ck --index ./docs  # Still indexes from repository root

Indexing always starts from the repository root, even when specifying a subdirectory path.

Workaround: Use .ckignore to exclude everything except your target directory:

txt

# Exclude everything
/*

# Include only docs
!/docs/

Tracking: GitHub issue #50

Full File Re-indexing for Large Files

Issue: Large files (26K+ LOC) are fully re-indexed on any change.

Details: When a file changes, ck re-indexes the entire file rather than using git diff to index only changed sections. This is particularly noticeable with large files like database schema dumps.

Why this happens:

Semantic chunking boundaries can change non-trivially based on file content
A diff might split one semantic chunk into two
Size-based chunking could theoretically use git diff, but semantic chunking cannot

Workaround:

Exclude very large files from indexing:
txt
```
# .ckignore
schema.sql
*.generated.sql
```
Use size-based chunking for large data files (semantic chunking not needed)
Split large files if possible

Future consideration: Git diff-based incremental indexing is being explored for size-based chunking.

Tracking: GitHub issue #69

Repository Root Requirement

Issue: Index operations require running from repository root.

Details: The .ck/ index directory is always created at the repository root, and all index operations assume this location.

Impact:

Cannot have multiple independent indexes in subdirectories
Must run ck commands from repository root for semantic search

Workaround: Use cd to navigate to repository root before running semantic searches, or use git to determine root:

bash

cd $(git rev-parse --show-toplevel)
ck --sem "pattern" .

Search Limitations

No File Type Filtering

Issue: No built-in --type flag like ripgrep for filtering by file type.

Details: You cannot currently do:

bash

ck --type rust "pattern"  # Not implemented

Workaround: Use glob patterns or path specifications:

bash

# Glob patterns
ck "pattern" **/*.rs
ck "pattern" **/*.{js,ts,jsx,tsx}

# Directory filtering
ck "pattern" src/

# Exclusions
ck --exclude "*.test.js" "pattern" src/

Tracking: GitHub issue #28

Single Embedding Model Per Index

Issue: Can only use one embedding model at a time for a repository.

Details: Each index is tied to a specific embedding model. You cannot simultaneously search with multiple models.

Impact:

Cannot compare models side-by-side without rebuilding
Switching models requires full index rebuild

Workaround:

bash

# Test model A
ck --switch-model bge-small .
ck --sem "pattern" src/

# Test model B (triggers rebuild)
ck --switch-model jina-code .
ck --sem "pattern" src/

Rebuilding takes time but is necessary when changing models.

Model & Embedding Limitations

Limited Embedding Model Options

Issue: Only models supported by the built-in providers are available.

Current models:

FastEmbed provider: bge-small (default), nomic-v1.5, jina-code
Mixedbread provider: mxbai-xsmall (embedding), mxbai (reranker)

Not supported:

Custom ONNX models (beyond Mixedbread)
External API-based models (OpenAI, Anthropic, HuggingFace Inference API)
Proprietary embedding services

Why: ck uses fastembed-rs and Mixedbread ONNX Runtime for fast local inference, which limits options to supported models. The provider abstraction allows adding new providers in the future.

Future consideration: External embedding API support is being considered for users who want to use specific models.

Tracking: GitHub issue #49

No Custom Model Fine-tuning

Issue: Cannot fine-tune embedding models on your specific codebase.

Details: Models are pre-trained and used as-is. Domain-specific fine-tuning is not supported.

Impact:

May not perform optimally on domain-specific code (medical, financial, etc.)
Cannot adapt to company-specific terminology
No way to improve performance on specific codebases

Workaround:

Try different available models to find best fit
Use hybrid search to combine semantic with keyword matching
Adjust threshold to filter irrelevant results

HuggingFace Cache Location

Issue: Model cache location is determined by fastembed/hf-hub defaults.

Details: While environment variables like $HF_HOME work, there’s no ck-specific configuration for cache location.

Workaround: Use HuggingFace environment variables:

bash

export HF_HOME=~/.cache/huggingface
export HF_HUB_CACHE=~/.cache/huggingface/hub

Language Support Limitations

Limited Language Parsers

Current support: 8 languages with tree-sitter parsing

Python, JavaScript/TypeScript, Rust, Go, Ruby, Haskell, C#, Zig

Not yet supported:

Java (#21)
C/C++ (#21)
Swift (#21)
Kotlin (#21)
PHP (roadmap)
Scala
Objective-C

Impact: Unsupported languages fall back to simple text chunking without semantic boundaries.

Workaround: Unsupported languages still work but with less intelligent chunking:

Chunks based on token count rather than function boundaries
May split functions/classes awkwardly
Still provides semantic search, just with less precise boundaries

Tracking: GitHub issue #21

PDF Support (Experimental)

Status: Basic PDF text extraction is supported but has limitations.

Details: ck can index and search PDF files by extracting text content, but uses a different code path than standard source code indexing.

Limitations:

Text extraction only (no semantic PDF parsing)
Doesn’t always work perfectly with all PDF formats
Not as sophisticated as cloud-based parsers like llamaparse
May produce lower-quality chunks compared to source code

Trade-off: This is a local/privacy-first approach vs cloud parsing quality. Your PDFs never leave your machine, but parsing quality is lower than cloud services like semtools.

Workaround:

txt

# .ckignore
# Exclude problematic PDFs
docs/large-spec.pdf
*.scanned.pdf

For mission-critical PDF search, consider cloud-based tools like semtools.

Feature Limitations

No Configuration File

Issue: All configuration must be done via CLI flags or .ckignore.

Missing:

No .ck.toml or ck.config.json
Cannot set default model, threshold, topk
Cannot configure per-project preferences

Workaround: Use shell aliases for common patterns:

bash

# .bashrc or .zshrc
alias cks='ck --sem --threshold 0.7 --topk 10'
alias ckh='ck --hybrid --scores'

Future: Configuration file support is on the roadmap (v0.6+).

No Bug Detection Category

Issue: No specialized “bug” search category or pattern.

Details: Cannot specifically search for potential bugs using semantic patterns.

Example not supported:

bash

ck --bugs src/  # Not implemented

Workaround: Use semantic queries describing bug patterns:

bash

ck --sem "null pointer dereference" src/
ck --sem "unchecked array access" src/
ck --sem "resource leak" src/
ck --hybrid "TODO|FIXME|XXX" src/

Tracking: GitHub issue #23

No Refactoring Assistance

Issue: MCP server is read-only; cannot modify code.

Details: Current MCP tools only search and read code. No tools for:

Writing code
Refactoring assistance
Automated fixes
Code generation

Future: Enhanced MCP tools with file writing capabilities are planned (v0.6+).

Performance Limitations

CPU-Only Inference

Issue: No GPU acceleration for embedding generation.

Details: Embedding models run on CPU via ONNX. This is generally fast enough but could be faster with GPU support.

Impact:

First-time indexing of very large codebases (>1M LOC) can take 5-10 minutes
Delta updates are still fast (only changed files)

Workaround:

Use smaller/faster model (bge-small) for large codebases
Exclude unnecessary files with .ckignore
Index overnight for initial build

Memory Usage for Large Codebases

Issue: Memory usage scales with codebase size during indexing.

Impact: Very large codebases (10M+ LOC) may require significant memory during initial indexing.

Workaround:

Index in smaller batches by temporarily excluding directories
Close other applications during indexing
Upgrade system memory if regularly working with massive codebases

Platform Limitations

Windows-Specific Considerations

Issue: Some features have Windows-specific behavior differences.

Details:

Path handling differences (backslashes vs forward slashes)
File permissions handled differently
Case sensitivity differences

Impact: Generally minimal, but edge cases may behave differently on Windows vs Unix.

No Pre-built Binaries (Yet)

Issue: Must install via cargo or build from source.

Missing:

No .deb packages for apt
No .rpm packages for yum/dnf
No homebrew formula for macOS (yet)
No chocolatey package for Windows

Workaround: Install via cargo (available on all platforms):

bash

cargo install ck-search

Future: Package manager distributions planned for v0.6+.

MCP Server Limitations

No File Writing

Issue: MCP server can only read/search code, not modify it.

Current capabilities:

Search (semantic, regex, hybrid)
Index management
Status checking

Not available:

Writing files
Refactoring code
Applying fixes
Code generation

Future: Enhanced MCP tools planned for v0.6+.

Pagination Required for Large Results

Issue: Large result sets must be paginated in MCP server.

Details: Default page size is 25 results. Getting more requires pagination cursors.

Impact: AI agents must implement pagination logic for comprehensive results.

Mitigation:

Adjust page_size parameter (max recommended: 100)
Use top_k to limit total results
Use threshold filtering to reduce result count

Workarounds & Alternatives

For most limitations, see the FAQ for workarounds and alternative approaches.

Reporting Limitations

Found a limitation not listed here?

Known Limitations ​

Indexing Limitations ​

Partial Directory Indexing ​

Full File Re-indexing for Large Files ​

Repository Root Requirement ​

Search Limitations ​

No File Type Filtering ​

Single Embedding Model Per Index ​

Model & Embedding Limitations ​

Limited Embedding Model Options ​

No Custom Model Fine-tuning ​

HuggingFace Cache Location ​

Language Support Limitations ​

Limited Language Parsers ​

PDF Support (Experimental) ​

Feature Limitations ​

No Configuration File ​

No Bug Detection Category ​

No Refactoring Assistance ​

Performance Limitations ​

CPU-Only Inference ​

Memory Usage for Large Codebases ​

Platform Limitations ​

Windows-Specific Considerations ​

No Pre-built Binaries (Yet) ​

MCP Server Limitations ​

No File Writing ​

Pagination Required for Large Results ​

Workarounds & Alternatives ​

Reporting Limitations ​

See Also ​

Known Limitations

Indexing Limitations

Partial Directory Indexing

Full File Re-indexing for Large Files

Repository Root Requirement

Search Limitations

No File Type Filtering

Single Embedding Model Per Index

Model & Embedding Limitations

Limited Embedding Model Options

No Custom Model Fine-tuning

HuggingFace Cache Location

Language Support Limitations

Limited Language Parsers

PDF Support (Experimental)

Feature Limitations

No Configuration File

No Bug Detection Category

No Refactoring Assistance

Performance Limitations

CPU-Only Inference

Memory Usage for Large Codebases

Platform Limitations

Windows-Specific Considerations

No Pre-built Binaries (Yet)

MCP Server Limitations

No File Writing

Pagination Required for Large Results

Workarounds & Alternatives

Reporting Limitations

See Also