Configuring ck for AI Coding Agents
Best practices for integrating ck with AI coding assistants like Claude Code, Cursor, and other agent-based development tools.
Overview
AI coding agents benefit significantly from semantic code search capabilities. This guide shows how to configure ck to work optimally with AI assistants, providing them with powerful code understanding and navigation features.
Benefits of ck for AI Agents
- Conceptual search - Agents can find code by describing functionality, not just keywords
- Context gathering - Quickly retrieve relevant code sections for agent analysis
- Codebase understanding - Help agents map out architecture and patterns
- Precise results - Tunable thresholds ensure high-quality matches
- Efficient output - Structured formats (JSON/JSONL) for easy parsing
Claude Code Integration
Claude Code is Anthropic’s official CLI for Claude. Here’s how to configure it for optimal ck usage.
Tool Permissions
Add ck commands to your allowed tools in settings.local.json:
{
"approvedCommandPatterns": [
"Bash(ck:*)",
"Bash(ck --index:*)",
"Bash(ck --sem:*)",
"Bash(ck --hybrid:*)",
"Bash(ck --lex:*)"
]
}This allows Claude Code to:
- Run any ck command without prompting
- Index your codebase automatically
- Perform semantic, hybrid, and lexical searches
- Access all ck features seamlessly
Usage Patterns Documentation
Create a .claude/ directory in your project with guidance for Claude:
# .claude/ck-semantic-search.md
## Hybrid Code Search with ck
Use `ck` for finding code by meaning, not just keywords.
### Search Modes
- `ck --sem "concept"` - Semantic search (by meaning)
- `ck --lex "keyword"` - Lexical search (full-text)
- `ck --hybrid "query"` - Combined regex + semantic
- `ck --regex "pattern"` - Traditional regex search
### Best Practices
::: tip Recommended Usage Patterns
1. **Index once per session**: Run `ck --index .` at project start
2. **Use semantic for concepts**: "error handling", "database queries"
3. **Use lexical for names**: "getUserById", "AuthController"
4. **Tune threshold**: `--threshold 0.7` for high-confidence results
5. **Limit results**: `--limit 20` for focused output
:::
### Example Workflows
# Find authentication logic
ck --sem "user authentication" src/
# Find all TODO comments
ck --lex "TODO" .
# Find error handling patterns with high confidence
ck --sem --threshold 0.8 "error handling" src/Real-World Example
From BeaconBay/ck#36:
# Initial setup
ck --index .
# Semantic search for concepts
ck --sem "error handling" src/
ck --sem "database connection pooling" lib/
# High-confidence matches only
ck --sem --threshold 0.8 "authentication logic" src/
# Limited result set
ck --sem --limit 10 "caching strategy" .General AI Agent Configuration
These practices apply to any AI coding assistant using ck.
Index Strategy
Index once, search many times:
# At project start or after significant changes
ck --index .
# Check index status
ck --status .
# Incremental updates happen automatically
# Manual rebuild only if needed
ck --clean . && ck --index .Why: Indexing is expensive (1-2 minutes for 1M LOC). Searches are fast (<500ms). Let ck handle incremental updates automatically.
Search Mode Selection
Choose the right mode for the task:
| Mode | Flag | Best For | Example |
|---|---|---|---|
| Semantic | --sem | Conceptual understanding | “retry logic”, “input validation” |
| Lexical | --lex | Exact terms, names | “AuthService”, “TODO”, “FIXME” |
| Hybrid | --hybrid | Balance precision/recall | “JWT token validation” |
| Regex | --regex | Pattern matching | class.*Handler, fn \w+Error |
Performance Tuning
Relevance Thresholds
Control result quality with --threshold:
# Exploratory search (cast wide net)
ck --sem --threshold 0.3 "pattern matching" src/
# Balanced search (default)
ck --sem --threshold 0.5 "authentication" src/
# High-confidence only (precise results)
ck --sem --threshold 0.8 "error handling" src/
# Very strict (only closest matches)
ck --sem --threshold 0.9 "exact concept" src/Threshold Sweet Spot for AI Agents
Start with 0.7 for focused, high-confidence results. If you get too few matches, adjust down to 0.5 for broader coverage.
Result Limiting
Prevent overwhelming output:
# Top 10 results (quick overview)
ck --sem --limit 10 "pattern" src/
# Top 20 results (balanced)
ck --sem --limit 20 "pattern" src/
# Top 50 results (comprehensive)
ck --sem --limit 50 "pattern" src/Recommendation for AI agents: Use --limit 20 by default to keep context windows manageable.
Pagination for Large Results
For comprehensive analysis:
# First page (25 results)
ck --sem --page-size 25 "pattern" src/
# Next page (if needed)
ck --sem --page-size 25 --cursor "abc123" "pattern" src/Output Formats for AI Consumption
JSONL (Recommended)
One JSON object per line - optimal for streaming and parsing:
# Full output
ck --jsonl --sem "pattern" src/
# Metadata only (smaller, faster)
ck --jsonl --no-snippet --sem "pattern" src/
# Custom snippet length
ck --jsonl --snippet-length 150 --sem "pattern" src/Why JSONL for AI Agents
- Stream-friendly: Process results as they arrive
- Memory-efficient: Parse one result at a time
- Error-resilient: One malformed line doesn’t break everything
- Standard format: Used by OpenAI, Anthropic, modern ML pipelines
JSON
Single array - good for small result sets:
ck --json --sem "pattern" src/ | jq '.[].file' | sort -uEmbedding Model Selection
Choose the right model for your use case:
| Model | Best For AI Agents | Trade-offs |
|---|---|---|
| bge-small (default) | Quick searches, fast indexing | Smaller chunks (400 tokens) |
| nomic-v1.5 | Large functions, documentation | Larger download (~500MB) |
| jina-code | Code-specialized understanding | Larger download (~500MB) |
Recommendation:
- Start with
bge-small(default) - works well for most codebases - Switch to
jina-codeif agent struggles with code-specific concepts - Use
nomic-v1.5for documentation-heavy projects
# Switch models
ck --switch-model jina-code .Recommended .ckignore for AI Agents
Optimize indexing for AI-assisted development:
# .ckignore
# Build artifacts (not useful for understanding code)
target/
dist/
build/
*.o
*.so
*.dylib
# Dependencies (too large, low signal)
node_modules/
vendor/
venv/
.venv/
# Media files (not code)
*.png
*.jpg
*.gif
*.mp4
*.pdf
# Large data files
*.csv
*.json
*.xml
*.log
# Test fixtures (unless you search them)
fixtures/
__snapshots__/
*.snap
# Generated code (if not relevant)
*_pb2.py
*.generated.*
# Documentation (include if you want AI to reference docs)
# docs/
# *.mdKey principle: Exclude anything that doesn’t help the AI understand your codebase’s logic and architecture.
Common Workflows
Codebase Onboarding
Help AI agents understand new projects:
# Index the project
ck --index .
# Find entry points
ck --sem "main entry point" .
ck --sem "application initialization" .
# Understand architecture
ck --sem "dependency injection" src/
ck --sem "data flow" src/
# Map out key components
ck --sem "authentication" .
ck --sem "database queries" .
ck --sem "API endpoints" .Code Review Assistance
Find code patterns for review:
# Security checks
ck --sem "authentication logic" src/
ck --sem "input validation" src/
ck --sem "SQL queries" src/
# Code quality
ck --lex "TODO" .
ck --lex "FIXME" .
ck --sem "error handling" src/
# Performance
ck --sem "caching" src/
ck --sem "database connection" src/Refactoring Support
Find code to refactor:
# Find all implementations of a pattern
ck --sem "user validation" src/
# Find similar code for consistency
ck --sem "error response formatting" src/
# Find candidates for extraction
ck --sem --threshold 0.8 "duplicate logic" src/Documentation Generation
Gather code for documentation:
# Find public APIs
ck --sem "public api" src/
# Find configuration
ck --sem "configuration options" .
# Extract examples
ck --sem "example usage" tests/MCP Integration
For MCP-compatible AI agents (Claude Desktop, etc.), use the MCP server:
Setup
# Start MCP server
ck --serve
# Configure in Claude Desktop
claude mcp add ck-search -s user -- ck --serveMCP-Specific Best Practices
When using ck via MCP protocol:
- Use pagination - Large result sets are paginated automatically
- Set reasonable page_size - Default 25 is good for most cases
- Use threshold parameter - Filter results for relevance
- Check index status - Call
index_statustool before heavy searches - Reindex after bulk changes - Call
reindextool after git pull
See MCP Integration for complete API reference.
Troubleshooting
Agent Gets Irrelevant Results
Solution 1: Increase threshold
ck --sem --threshold 0.8 "your query" src/Solution 2: Switch to hybrid search
ck --hybrid "your query" src/Solution 3: Try a different model
ck --switch-model jina-code .Agent Searches Take Too Long
Solution 1: Limit results
ck --sem --limit 10 "query" src/Solution 2: Exclude large directories
# Add to .ckignore
node_modules/
vendor/
dist/Solution 3: Use metadata-only output
ck --jsonl --no-snippet --sem "query" src/Agent Misses Obvious Matches
Solution 1: Lower threshold
ck --sem --threshold 0.3 "query" src/Solution 2: Use hybrid search
ck --hybrid "query" src/Solution 3: Use lexical search for exact terms
ck --lex "ExactFunctionName" src/Near-Miss Threshold Hints
Feature: When no results are found above the specified threshold, ck provides helpful “near-miss” hints.
How it helps AI agents: When semantic search finds no matches above your threshold, ck will show the closest match that fell just below the threshold, along with its score. This gives AI agents a clear signal to adjust the threshold parameter intelligently.
Example output:
$ ck --sem --threshold 0.7 "retry logic" src/
No matches found above threshold 0.7
Near-miss:
src/network/client.rs:156 (score: 0.68)
"Implements exponential backoff for failed requests"
Hint: Try lowering threshold to 0.65 or use --threshold 0.6AI agent workflow:
- Initial search with
--threshold 0.7returns no results - ck shows near-miss at 0.68
- Agent automatically retries with
--threshold 0.65 - Finds relevant matches
This feedback loop helps agents converge on optimal thresholds without overshooting.
Why This Matters for AI Agents
AI agents naturally understand threshold tuning. Near-miss hints provide actionable feedback, allowing agents to adjust parameters intelligently rather than guessing or giving up. This is one of ck’s “nice affordances” for AI-first usage.
Index Out of Date
Solution: Check status and reindex if needed
ck --status .
# If stale:
ck --index .Performance Benchmarks
Typical performance for AI agent workflows:
| Operation | Time | Notes |
|---|---|---|
| Initial index (100K LOC) | ~10s | One-time cost |
| Semantic search | <500ms | Cached index |
| Hybrid search | <300ms | Leverages both indices |
| Lexical search | <100ms | Full-text index |
| Incremental update | <1s | Only changed files |
Best Practices Summary
✅ Do:
- Index once at project start
- Use
--threshold 0.7as default for AI agents - Use
--limit 20to keep context manageable - Prefer JSONL output for parsing
- Use semantic search for concepts
- Use lexical search for exact identifiers
- Tune thresholds based on result quality
❌ Don’t:
- Re-index unnecessarily (ck handles incremental updates)
- Use threshold above 0.9 (too restrictive)
- Use threshold below 0.3 (too noisy)
- Omit
--limitfor exploratory searches (too many results) - Index huge binary/media/data files
- Mix search modes without understanding differences
Next Steps
- Read MCP Integration for API details
- See Choosing an Interface for workflow comparison
- Check Semantic Search for search fundamentals
- Review Advanced Usage for power-user techniques
- Explore CLI Reference for all flags and options
