MCPs, Agents, Skills. Oh My!
Understanding the emerging ecosystem of LLM building blocks
LLMs are, without question, an impressive achievement at a scale few companies have the resources to pull off. However, when first introduced, LLMs had very limited practical applications. They’re unaware of the world after their training cutoff and unable to connect to tools that would let them accomplish real-world tasks. Ask a model to check your calendar, query a database, or search for recent news, and it can only apologize or hallucinate. The model has general intelligence but no hands to work with.
The model ecosystem has grown to include tools that allow the models to connect to the real world, moving beyond a static understanding. Most notably among these tools are MCPs, agents, and skills. Each of these tools solves a specific limitation, building toward AI tools that can actually get work done.
MCP: Connecting Models to the World
Model Context Protocol gives LLMs the ability to connect to external tools and data sources. Through MCP, a model can access your file system, query databases, call APIs, and search the web. The protocol standardizes these connections, enabling tool integrations to work across different models and platforms. Build an MCP server once, and it works with Claude, GPT, and any other model that supports the protocol.
This was a game-changer. MCP quickly became a de facto standard adopted across model providers and tools. Suddenly, AI assistants could do things rather than just discuss them.
But MCP brought its own problems. Every tool definition loads into the context window upfront, whether you need it or not. Connect several MCP servers, and you might burn 50,000+ tokens before the conversation even starts. This exacerbates issues with context windows, leaving less room for the actual work and accelerating context rot, the degradation in model performance as token counts climb.
Agents: Isolating Context
Agents address the context window pressure by allowing subprocesses to work with isolated context windows. When a lead agent spawns a subagent, that worker gets a fresh context, separate from the main conversation. The subagent might consume 100,000 tokens investigating a problem, but returns only a condensed summary. The main thread stays clean.
This alleviates much of the pressure from context rot, but agents remain general-purpose rather than specialized. An agent knows how to reason and use tools, but it doesn’t know your company’s specific processes, conventions, or requirements. Every agent needs the same detailed instructions repeated, and there’s no good way to share that expertise across your team.
Skills: Lightweight Specialization
This is where skills come into play. Skills package procedural knowledge, the “how we do things here” that agents lack, into lightweight, reusable modules. A skill is just a folder containing instructions, workflows, and optional scripts.
code-review-skill/
├── SKILL.md # Main instructions with metadata
├── CHECKLIST.md # Review criteria for your team
└── scripts/
└── lint_check.py # Deterministic validationSkills are only fully loaded on demand. At startup, the model sees just the skill’s metadata, roughly 100 tokens describing what it does. The full instructions load only when the model determines the skill is relevant. This progressive disclosure keeps the context window lean until expertise is actually needed.
Skills can also leverage scripts for deterministic tasks, making them more efficient than general reasoning. When you need exact behavior, like applying a specific code transformation or validating against a checklist, the skill runs a script rather than asking the model to generate the logic each time. Token generation is probabilistic; scripts are deterministic.
Skills can be used in combination with MCPs and agents, but they provide significant value on their own. You don’t need a complex multi-agent setup to benefit. A single model with the right skills can handle sophisticated workflows that would otherwise require elaborate prompting or custom tooling.
Deep Dive: Working with Skills
Let’s look at how to derive practical value from skills in engineering workflows.
What Makes a Good Skill
The best skills encode knowledge that would take an outsider weeks to absorb. Your team’s code review standards, your deployment procedures, your API design conventions. This institutional knowledge typically lives in wikis nobody reads or in the heads of senior engineers. Skills make it actionable.
A skill should have a clear, narrow purpose. “Code review” is better than “help with development.” The model needs to know when to activate the skill, and vague descriptions lead to false matches or missed opportunities.
Include enough context that the skill works without additional explanation. If your code review skill references your team’s error handling patterns, include those patterns in the skill rather than assuming the model knows them.
Structure and Organization
Every skill starts with a SKILL.md file containing YAML frontmatter and markdown instructions:
---
name: api-design
description: Design REST APIs following team conventions
version: 1.0.0
---
# API Design Skill
## Conventions
- Use plural nouns for resource names
- Version APIs in the URL path (/v1/resources)
- Return 201 for successful creation, 200 for updates
...Additional markdown files can provide specialized guidance. A code review skill might use separate files for security, performance, and style reviews. The model loads these as needed based on the task.
Scripts go in a scripts directory. These handle operations where deterministic execution matters more than flexibility. A skill for database migrations might include a script that validates migration files against your schema conventions before the model even looks at the content.
code-review-skill/
├── SKILL.md ─────────── Instructions + YAML metadata
│ (loaded when skill activates)
├── CHECKLIST.md ─────── Team-specific review criteria
│ (loaded when needed)
├── SECURITY.md ──────── Security-focused guidance
│ (loaded for security reviews)
└── scripts/
└── lint_check.py ── Deterministic validation
(executed, not generated)Where Scripts Add Value
Scripts shine for three types of tasks:
Validation: Check that the generated code meets structural requirements before presenting it. Lint checks, schema validation, and format verification.
Transformation: Apply consistent changes that would be tedious to describe in natural language. Reformatting imports, updating boilerplate, applying code style rules.
Integration: Connect to external systems where precise interactions are required. API calls with specific authentication, database queries with exact syntax, and file operations with particular permissions.
The model handles reasoning and judgment. Scripts handle precision and reliability.
Sharing and Versioning
Skills are just folders, so they work with your existing tools. Store them in Git for version control. Share them through Google Drive or your team’s documentation system. The agentskills.io open standard means skills you create work across platforms that support the format.
For team adoption, start with skills that codify processes you’re already documenting. Onboarding checklists, incident response procedures, and release processes. These have clear value, and the content already exists in some form.
Measuring Impact
Track token consumption before and after skill adoption. The theoretical gains are significant, skills loading ~5,000 tokens versus MCP tools loading 60,000+, but your specific workflows will vary.
More importantly, track whether the model’s outputs improve. Are code reviews catching the issues your team cares about? Are the generated APIs following your conventions? Skills should produce noticeably better results for your specific context, not just save tokens.
Putting It Together
The full stack works like this:
- LLM: Provides reasoning and generation capabilities
- MCP: Connects the model to external tools and data sources
- Agents: Enable parallel processing with isolated context windows
- Skills: Supply domain expertise and deterministic procedures
You don’t need all layers for every workflow. A model with skills but no MCP connections can still provide substantial value for tasks that don’t require access to external tools. Skills layered on MCP give you both connectivity and expertise. Agents add parallel processing when tasks genuinely benefit from isolation.
For most engineering teams, skills offer the highest return for the lowest complexity. Start there. Add agents when you have workflows that clearly need parallel, isolated execution. Expand MCP connections as you identify tools the model needs to access.
Looking Ahead
In the course of a year, we’ve seen MCP, agents, and skills emerge as standards for getting more out of LLMs. Each solves a real limitation: MCP connects models to tools, agents manage context isolation, and skills provide lightweight specialization.
It will be interesting to see what this year brings.