AI-Assisted Reviews with GitHub Copilot

In my last article, I argued that engineering teams should fix the review process before scaling code generation. AI review tools won’t solve the capacity problem, but they handle what humans struggle to sustain: speed, consistency, and pattern recognition.

This is the practical follow-up. Copilot code review has added path-specific instructions, agentic tool calling, a CLI code-review agent, and local IDE review since going GA. Most teams install it, leave the defaults, and wonder why it floods their PRs with noise. Configuration separates a tool the team ignores after a month from one that sticks.

Thanks for reading Invoke.dev! Subscribe for free to receive new posts and support my work.

Instruction Files: Two Layers

Copilot’s instruction system has two layers. Repository-wide instructions live in .github/copilot-instructions.md and apply to every review. Path-specific instructions live in .github/instructions/ and target file patterns through YAML frontmatter.

Encode your universal standards in the repository-wide file. Write short, imperative directives rather than paragraphs. Copilot processes “Flag hardcoded API keys or credentials” far more reliably than “Please be careful to look for any secrets that might have been accidentally committed.”

GitHub’s own guide is explicit: vague directives like “be more accurate” add noise that confuses the LLM.

<!-- .github/copilot-instructions.md -->

# Code Review Instructions

## Security
- Flag hardcoded API keys, tokens, or credentials
- Check that user input is validated before use
- Verify that sensitive data is not logged or exposed in error messages

## Error Handling
- Verify errors are handled, not silently ignored
- Check that error messages provide useful context for debugging
- Flag empty catch blocks

## Quality
- Flag functions longer than 40 lines
- Flag deeply nested logic (more than 3 levels)
- Check for missing nil/null guards on optional values

## Do Not Comment On
- Code formatting or style (handled by linters)
- Import ordering
- Trailing whitespace or line length

Keep this file under 1,000 lines. The “Do Not Comment On” section matters. Every low-value comment competes for attention with the ones that count. If your linters already handle formatting, tell Copilot explicitly.

Path-specific instructions target parts of the codebase where generic rules fall short. Each file uses an applyTo frontmatter property with glob patterns. The excludeAgent property controls which Copilot agent reads which file, so you can run different rules for the code review agent and the coding agent.

<!-- .github/instructions/views.instructions.md -->
---
applyTo: "**/Views/**/*.swift"
excludeAgent: "coding-agent"
---

# View Layer Review Standards

- Flag views exceeding 30 lines of body content
- Check that views do not make network calls directly
- Verify accessibility modifiers are present on interactive elements
- Flag any business logic in view code; it belongs in the view model
- Check that navigation is handled through the coordinator, not inline
<!-- .github/instructions/tests.instructions.md -->
---
applyTo: "**/Tests/**/*.swift"
---

# Test Review Standards

- Verify each test method tests a single behavior
- Check that test names describe the expected behavior
- Flag tests without assertions
- Flag any test that hits the network or file system without mocking
- Check that test data is created within the test, not shared across tests

Organize by concern: views, networking, models, tests, security. GitHub recommends separating topics into distinct instruction files rather than cramming everything into one. A typical iOS project:

.github/
├── copilot-instructions.md          # Repository-wide standards
└── instructions/
    ├── views.instructions.md         # View layer conventions
    ├── networking.instructions.md    # API and networking patterns
    ├── models.instructions.md        # Data model conventions
    ├── tests.instructions.md         # Testing standards
    └── security.instructions.md      # Security-specific checks

Linters Handle Rules, AI Handles Judgment

Don’t let Copilot duplicate what linters already catch. Linters enforce formatting, flag syntax errors, and check naming conventions deterministically. They’re fast, consistent, and produce zero false positives for well-defined rules.

Copilot’s strength is semantic analysis: logic correctness, edge cases, security vulnerabilities, and fuzzy checks that depend on context. Can a function name communicate its intent? Does the error handling account for this specific situation? Does this change duplicate logic from another module? Linters can’t make those calls.

The Goose project maintainers discovered this when they enabled Copilot code review and the other maintainers said the results were too noisy. The fix was telling Copilot exactly what the CI pipeline already covers:

## CI Pipeline Context

Important: You review PRs before CI completes.
Do not flag issues that CI will catch.

### What Our CI Checks
- cargo fmt --check
- cargo test
- clippy lints
- npm run lint:check

## Skip These (Low Value)
Do not comment on:
- Style/formatting (handled by rustfmt, prettier)
- Clippy warnings
- Test failures
- Missing dependencies

They also set a confidence threshold: “Only comment when you have HIGH CONFIDENCE (>80%) that an issue exists. Be concise: one sentence per comment when possible.” That alone cut the noise dramatically.

The filtering pipeline looks like this: linters catch mechanical issues in CI, Copilot handles semantic analysis, human reviewers focus on architecture, business logic, and mentoring.

Tuning: Start Minimal and Iterate

Research from Cubic found that up to 40% of AI code review alerts get ignored. A developer who receives fifteen low-value comments on their first AI-reviewed PR will ignore comment sixteen, even if it’s the one that matters.

Start with five to ten rules that address your most common review feedback. Add rules one at a time and observe the results. If Copilot flags something your team doesn’t care about, add an explicit exclusion.

Show, don’t describe. Copilot is better at mimicry than interpretation. Telling it “prefer protocol-based dependency injection” may or may not flag violations. Show it a concrete example of the wrong approach alongside the right one, and accuracy improves noticeably.

## Dependency Injection

Prefer protocol-based dependency injection over concrete types.

Bad:
class ProfileViewModel {
    let service = UserService()
}

Good:
class ProfileViewModel {
    let service: UserServiceProtocol
    init(service: UserServiceProtocol) {
        self.service = service
    }
}

Watch for hallucinations. Copilot will invent concerns that don’t exist in the code, and vague instructions make this worse. The more specific your directives, the less room the model has to fabricate.

GitHub warns against including external links (Copilot won’t follow them) or requesting product behavior changes like blocking merges or altering comment formatting. Stick to what it can do: analyze code and leave comments.

Review Before the PR Exists

Most teams overlook Copilot’s ability to review locally, before code reaches a pull request.

In the CLI, the /review slash command analyzes staged or unstaged changes without leaving the terminal. Start an interactive copilot session in your project directory and run /review. Copilot delegates to a specialized code-review agent that focuses on surfacing genuine issues rather than style nitpicks. It reads the same .github/copilot-instructions.md from your repository.

# Start an interactive Copilot session in your project
$ copilot

# Review current changes (staged or unstaged)
> /review

# Target a specific branch diff and focus area
> /review Review changes in my current branch against main. Focus on security issues.

The /review command runs inside an interactive session, not as a standalone flag. You can specify what to focus on. The code-review agent can also run in parallel with other specialized agents (Explore, Task, Plan), so a complex debugging session might analyze code, run tests, and review changes concurrently.

In VS Code, open the Source Control view, hover over “Changes,” and click “Copilot Code Review.” Copilot reviews staged or unstaged changes and leaves inline comments using the same instruction files from your repository.

In JetBrains, open the Commit tool window and select “Copilot: Review Code Changes” to get feedback before committing.

This shifts feedback earlier, while the code is still fresh in the developer’s mind. Cleaner PRs follow, and human reviewers stop wasting time on issues Copilot could have caught locally.

Put Review Logic in Skills, Not Just Instructions

Agent Skills go deeper than instruction files. Skills are folders containing instructions, scripts, and resources that Copilot loads when relevant. They work across VS Code, the CLI, and the coding agent. Where instruction files provide guidelines, skills enable specialized workflows with procedural knowledge and deterministic scripts.

There’s a practical reason to prefer skills over instruction files for the bulk of your review logic: portability. Instruction files in .github/copilot-instructions.md are GitHub-specific. They work with Copilot and nothing else. Skills follow the agentskills.io open standard, which means the same skill folder works across any tool that supports the format. If your team uses Claude Code alongside Copilot, or switches between Cursor and the CLI, review standards encoded as skills travel with you. Instruction files don’t.

Keep the repository-wide instruction file thin. Use it for Copilot-specific behavior: what to skip, confidence thresholds, response format. Move the substantive review logic, your team’s conventions, security checks, and architecture rules into skills that any agentic tool can consume.

You can get the best of both worlds by using path-specific instruction files to point Copilot toward portable skills. Since instruction files and skills are both natural language that the LLM reads and follows, an instruction file can reference a skill by name or path, and Copilot will load and use it. This gives you path-scoped triggering from instruction files with portable review logic in skills:

<!-- .github/instructions/swiftui-review.instructions.md -->
---
applyTo: "**/Views/**/*.swift"
---
When reviewing SwiftUI views, use the swiftui-review skill
for team conventions and accessibility checks.
Flag any UIKit usage in SwiftUI view files.

The instruction file stays thin: a few lines of path-specific context. The skill holds the detailed review logic and works across Copilot, Claude Code, and any other tool that supports the agentskills.io standard.

A code review skill for your team might encode the checks that matter most:

code-review-skill/
├── SKILL.md
├── SECURITY.md
└── scripts/
    └── check_conventions.sh
<!-- code-review-skill/SKILL.md -->
---
name: team-code-review
description: >
  Review code changes against team conventions and security standards.
  Use when asked to review code, PRs, or diffs.
---

# Code Review Skill

## Review Process
1. Run scripts/check_conventions.sh on changed files
2. Review SECURITY.md for security-specific checks
3. Evaluate logic correctness and edge case handling
4. Check for duplicated logic across the codebase

## What to Flag
- Functions exceeding cyclomatic complexity of 10
- Missing error handling on network calls
- Force unwraps in production code (test code is fine)
- Direct dependency on concrete implementations

## What to Skip
- Formatting issues (handled by SwiftLint)
- Import ordering
- Minor naming preferences

The scripts/check_conventions.sh handles deterministic checks that don’t need an LLM. Token generation is probabilistic; scripts are deterministic. Use scripts for validation (lint checks, schema conformance), transformation (reformatting, boilerplate updates), and integration (API calls with specific auth requirements).

Skills use progressive disclosure. At startup, the model sees only the skill’s name and description from the YAML frontmatter, roughly 100 tokens. The full instructions load only when the skill is relevant. Additional files like SECURITY.md load on demand, keeping context lean until expertise is needed.

For Copilot specifically, store skills in the repository (.github/skills/) for your team, or in your home directory (~/.copilot/skills/) for personal use across projects. For broader use, keep skills in a shared repository that any tool can reference. They’re just folders, so they work with git and can be shared through whatever mechanism your team already uses.

Automate PR Reviews

Repository rulesets trigger Copilot review automatically. Go to Settings > Rules > Rulesets and create a new branch ruleset. Under “Branch rules,” select “Automatically request Copilot code review.” Two subsettings control the behavior:

Review new pushes re-runs Copilot review when new commits land on the PR, so feedback stays current as the code evolves. Without this, Copilot only reviews once at PR creation.

Review draft pull requests triggers reviews on drafts so authors can iterate with Copilot before requesting human review. This pairs well with local review: catch what you can locally, push a draft, let Copilot do a full pass, then mark ready for review.

Organization owners can apply rulesets across multiple repositories using pattern matching (*feature matches all repository names ending in “feature”). This rolls out Copilot review consistently without per-repo configuration.

Copilot code review also integrates CodeQL for security analysis (enabled by default) and optionally ESLint and PMD. These tools run alongside the AI review, combining deterministic security scanning with probabilistic code analysis.

Know If It’s Working

Atlassian cut PR cycle time by 45% by making Copilot the automated first reviewer on every PR. Their 18-hour average wait for first feedback dropped to minutes. New engineers merged their first PR five days faster.

But faster individual throughput doesn’t guarantee better team outcomes. The 2025 DORA report found that individual developers merged 98% more PRs while organizational delivery stability decreased 7.2%. Cortex’s 2026 benchmark found incidents per PR up 24% and change failure rates up 30% with AI adoption.

What GitHub’s Dashboard Tells You

GitHub’s Copilot usage metrics dashboard, currently in public preview for Enterprise customers, tracks four categories: daily and weekly active users, code completion acceptance rates, chat interactions by mode (Ask, Edit, Agent), and agent adoption percentage. A separate code generation dashboardbreaks down lines of code changed by users versus agents, grouped by model and language.

The usage metrics API provides user-level granularity through JSON exports of user_initiated_interaction_countcode_acceptance_activity_count, and lines of code suggested versus accepted. Organization-level analytics arrived in December 2025. Team-level data is accessible through the Copilot Metrics API. All of these derive from IDE telemetry, so users must have telemetry enabled to appear in reports.

This data answers adoption questions well. You can see who’s using Copilot, how often, in which IDEs, and with which models. You can spot teams with low adoption or users who interact frequently but rarely accept suggestions. For a rollout, these are useful leading indicators.

What It Doesn’t Tell You

None of GitHub’s built-in metrics track code review activity. No dashboard shows how many PRs Copilot reviewed, how many comments it left, how often authors accepted or dismissed suggestions, or the ratio of actionable feedback to noise. Nothing links Copilot usage data to PR workflow outcomes such as cycle time, change failure rate, or reviewer load distribution.

That’s a significant gap. These metrics measure whether people use Copilot, not whether it helps. Acceptance rate for code completions tells you suggestions are relevant. It says nothing about whether the code that ships is better or whether reviewers spend less time on routine checks.

Measuring What Matters

For code review, you’ll need to instrument your own measurements. GitHub’s Pull Request API and Reviews API provide the raw data. Track these before enabling AI review and compare after.

PR cycle time. Time from PR creation to merge. This is the headline metric. If AI review works, the wait for first feedback drops and the overall cycle compresses.

Reviewer load distribution. How many reviews each team member performs. AI review should flatten the curve, reducing the burden on the one or two senior engineers who currently review everything.

Actionable comment rate. How many AI comments developers address versus dismiss. This is the best signal for instruction quality. If the team ignores most of Copilot’s feedback, the instructions need work, not the team.

Change failure rate. Deployment failures or incidents tied to merged PRs. If this increases alongside faster cycle times, you’re trading quality for speed. The DORA and Cortex findings suggest this is the default outcome without deliberate quality gates.

Combine GitHub’s usage metrics API with your PR data for a fuller picture. Correlating code_acceptance_activity_count with PR cycle time per user reveals whether developers who engage more with Copilot also ship faster, or just generate more code that sits in review.

Run a focused experiment. Pick a two-week sprint. Measure current cycle times and reviewer load. Enable Copilot review on routine code first, where a false positive costs little and the team can calibrate without pressure. If the numbers improve, expand. If they don’t, tune the instructions before scaling.

Start Simple

The instinct is to write exhaustive instruction files covering every convention your team has discussed. Resist it. Start with the repository-wide file and five to ten rules. Add path-specific instructions only after the base is stable. Introduce skills when workflows justify the complexity. Expand based on what Copilot gets wrong.

The teams getting consistent value from AI code review share one trait: they treat configuration as a living document, not a one-time setup. The bottleneck was never writing code. It was proving the code works.


Implementation Checklist

Baseline

Measure current PR cycle time across the team, from creation to merge. Record the median and 90th percentile.

Identify your reviewer load distribution. Who reviews the most PRs? How does your top reviewer’s load compare to the team average?

Document your current change failure rate. Without a baseline, you won’t know whether AI review improves or degrades quality.

Confirm your linting and CI pipeline catches formatting, syntax, and known anti-patterns. AI review should never duplicate what deterministic tools already handle.

Configure

Create .github/copilot-instructions.md with five to ten rules targeting your most common review feedback. Keep it under 1,000 lines. Include a “Do Not Comment On” section for anything linters already cover.

Add path-specific instruction files in .github/instructions/ only after the base file is stable. Organize by concern: views, networking, models, tests, security.

Include concrete code examples showing correct and incorrect patterns. Copilot is better at mimicry than interpretation.

Set a confidence threshold. Tell Copilot to comment only when confidence is high and to keep comments concise.

Pilot

Enable Copilot review on routine code first, where false positives cost little. Use repository rulesets to automate review on a subset of branches or repositories.

Run the pilot for at least two weeks. Collect feedback from PR authors and human reviewers on comment quality and noise.

Track the actionable comment rate. If the team ignores most feedback, revise the instructions before expanding.

Encourage local review in the IDE or CLI before pushing. Cleaner PRs mean less noise for both AI and human reviewers.

Expand

Once the pilot is stable, enable automatic review on draft PRs so authors iterate with AI feedback before requesting human review.

Add skills for workflows complex enough to justify them: security checks, architectural conventions, team-specific patterns. Store them in a shared repository so they work across Copilot, Claude Code, Cursor, and other agentic tools.

Apply rulesets across repositories using organization-level pattern matching.

Monitor GitHub’s usage metrics dashboard for adoption trends. Correlate code_acceptance_activity_count with your PR cycle time data to see whether engagement translates to faster delivery.

Sustain

Treat instruction files as living documents. When Copilot flags something your team ignores, add an exclusion. When it misses something important, add a directive.

Review metrics monthly against your baseline. If the change failure rate climbs alongside faster cycle times, tighten quality gates before scaling further.

Verify senior engineers spend less time on repetitive checks and more on architecture, business logic, and mentoring. That outcome justifies the investment.

Never commit code you can’t explain.

Subscribe to Invoke.dev

Don’t miss out on the latest issues. Sign up now to get access to the library of members-only issues.
jamie@example.com
Subscribe