Build a Code Assistant
Fine-tune a model on your codebase, conventions, and internal APIs.
This guide walks through creating a fine-tuned model that understands your codebase, follows your team's conventions, and knows your internal APIs.
What you'll need
- Source code — your actual codebase (or key parts of it)
- Documentation — READMEs, API docs, architecture docs, style guides
- Code review comments (optional) — examples of feedback your team gives
Step by step
Select your training data
Don't upload your entire monorepo. Be selective:
High-value data:
- Core modules and libraries your team wrote
- Internal API documentation and OpenAPI specs
- Style guides and coding conventions
- Architecture decision records (ADRs)
- Well-written code with good comments
- Code review discussions (from PRs)
Skip:
- Generated code (build artifacts, lock files)
- Third-party dependencies
- Test fixtures and mock data (unless you want the model to write tests)
- Binary files
Prepare the files
Concatenate related files into logical groups and save as .txt or .md:
# Example: bundle your API layer
find src/api -name "*.ts" -exec cat {} + > api-layer.txt
# Example: bundle your core library
find packages/core/src -name "*.ts" -exec cat {} + > core-lib.txtAlso include documentation separately:
README.mdfiles- API documentation (
.mdor.txt) - Style guide (if you have one)
Remove any secrets, API keys, credentials, or connection strings before uploading.
Create the fine-tune
Upload your files and describe the use case:
"Create a coding assistant specialized in our [language/framework] codebase. It should follow our coding conventions: [list key conventions, e.g., 'we use TypeScript with strict mode, prefer functional components, use Tailwind for styling']. It knows our internal API at [briefly describe your API structure]. When writing code, it should match the patterns in the training data."
Recommended base model: GPT-4.1 (best code understanding and generation quality)
Test with real tasks
Try the kinds of tasks your team does daily:
- "Write a new API endpoint for [feature] following our conventions"
- "Refactor this function to match our style guide: [paste code]"
- "How does our authentication middleware work?"
- "Write a unit test for the UserService.createUser method"
- "What's the right way to add a new database migration in our project?"
The model should produce code that looks like it was written by your team.
Integration ideas
IDE extension / CLI tool
Call the API from a script or custom extension:
def ask_code_assistant(question: str, code_context: str = "") -> str:
messages = [{"role": "user", "content": question}]
if code_context:
messages.insert(0, {
"role": "system",
"content": f"Here's the relevant code context:\n\n{code_context}"
})
response = client.chat.completions.create(
model="your-code-model-id",
messages=messages,
)
return response.choices[0].message.contentCode review bot
Feed PR diffs to your model for automated review suggestions:
def review_code(diff: str) -> str:
return ask_code_assistant(
f"Review this code change and suggest improvements "
f"based on our team's conventions:\n\n{diff}"
)Tips
- Update regularly — as your codebase evolves, retrain with current code
- Include examples of good and bad code — if you have style guide violations and corrections, include both
- Don't include everything — a focused model trained on well-written code is better than one trained on every file in the repo