Commissioned
Guides

Build a Code Assistant

Fine-tune a model on your codebase, conventions, and internal APIs.

This guide walks through creating a fine-tuned model that understands your codebase, follows your team's conventions, and knows your internal APIs.

What you'll need

  • Source code — your actual codebase (or key parts of it)
  • Documentation — READMEs, API docs, architecture docs, style guides
  • Code review comments (optional) — examples of feedback your team gives

Step by step

Select your training data

Don't upload your entire monorepo. Be selective:

High-value data:

  • Core modules and libraries your team wrote
  • Internal API documentation and OpenAPI specs
  • Style guides and coding conventions
  • Architecture decision records (ADRs)
  • Well-written code with good comments
  • Code review discussions (from PRs)

Skip:

  • Generated code (build artifacts, lock files)
  • Third-party dependencies
  • Test fixtures and mock data (unless you want the model to write tests)
  • Binary files

Prepare the files

Concatenate related files into logical groups and save as .txt or .md:

# Example: bundle your API layer
find src/api -name "*.ts" -exec cat {} + > api-layer.txt

# Example: bundle your core library
find packages/core/src -name "*.ts" -exec cat {} + > core-lib.txt

Also include documentation separately:

  • README.md files
  • API documentation (.md or .txt)
  • Style guide (if you have one)

Remove any secrets, API keys, credentials, or connection strings before uploading.

Create the fine-tune

Upload your files and describe the use case:

"Create a coding assistant specialized in our [language/framework] codebase. It should follow our coding conventions: [list key conventions, e.g., 'we use TypeScript with strict mode, prefer functional components, use Tailwind for styling']. It knows our internal API at [briefly describe your API structure]. When writing code, it should match the patterns in the training data."

Recommended base model: GPT-4.1 (best code understanding and generation quality)

Test with real tasks

Try the kinds of tasks your team does daily:

  • "Write a new API endpoint for [feature] following our conventions"
  • "Refactor this function to match our style guide: [paste code]"
  • "How does our authentication middleware work?"
  • "Write a unit test for the UserService.createUser method"
  • "What's the right way to add a new database migration in our project?"

The model should produce code that looks like it was written by your team.

Integration ideas

IDE extension / CLI tool

Call the API from a script or custom extension:

def ask_code_assistant(question: str, code_context: str = "") -> str:
    messages = [{"role": "user", "content": question}]
    if code_context:
        messages.insert(0, {
            "role": "system",
            "content": f"Here's the relevant code context:\n\n{code_context}"
        })

    response = client.chat.completions.create(
        model="your-code-model-id",
        messages=messages,
    )
    return response.choices[0].message.content

Code review bot

Feed PR diffs to your model for automated review suggestions:

def review_code(diff: str) -> str:
    return ask_code_assistant(
        f"Review this code change and suggest improvements "
        f"based on our team's conventions:\n\n{diff}"
    )

Tips

  • Update regularly — as your codebase evolves, retrain with current code
  • Include examples of good and bad code — if you have style guide violations and corrections, include both
  • Don't include everything — a focused model trained on well-written code is better than one trained on every file in the repo

On this page