Agent Code Academy
Home/Phase 2: Building
Week 7 of 12

Testing & Quality

Maintaining code quality at scale

Objective

Understand why testing matters and how to direct Claude to write tests.

Deliverable

Your task manager app with a test suite that runs automatically.

Topics

  • What are tests and why they matter
  • Unit tests, integration tests, end-to-end tests
  • The verification loop: instruct → code → test → verify
  • Using Claude to write tests
  • Reading test output
  • The /code-review workflow

Activities

  • Ask Claude to write tests for your task manager
  • Run the tests and read the output
  • Find a bug → write a test → fix the bug
  • Use /code-review to audit your project
  • Add test commands to CLAUDE.md

Skills You'll Gain

Testing concepts, verification loops, quality assurance


Learning Objectives

By the end of this week, you will be able to:

  1. Explain what a test is and why software projects need them
  2. Distinguish between unit tests, integration tests, and end-to-end tests
  3. Read test output to understand what passed, what failed, and why
  4. Direct Claude Code to write meaningful tests for your application
  5. Follow the red-green-refactor cycle to fix bugs systematically

Lesson

What Is a Test?

A test is code that checks if your other code works correctly. Think of it like a smoke detector — you install it once, and it alerts you whenever something goes wrong, even when you are not paying attention.

Without tests, the only way to know if your app works is to manually click through every feature after every change. That might work when your app has 2 features, but what about 20? Or 200? You will miss something. Tests check automatically, every time, without getting tired or distracted.

Why Code Breaks

Code breaks for three main reasons:

  1. Edge cases — Situations you did not think of. What happens if a user submits an empty task? What if they paste in 10,000 characters? What if they click "Delete" twice quickly?

  2. Regressions — You fix one thing and accidentally break something else. You change how tasks are saved, and now the search filter stops working. This is the most common type of bug in growing applications.

  3. Typos and logic errors — Simple mistakes like writing > when you meant <, or forgetting to handle the case where the task list is empty.

Tests catch all three. Once you write a test for a feature, that feature is protected forever.

Types of Tests

There are three main types, each checking at a different level:

Unit Tests — Testing One Brick

A unit test checks one small piece of code in isolation. Like testing a single LEGO brick to make sure it is the right shape and color.

Test: "formatDate should return a human-readable date"
Input: "2026-02-10T09:00:00Z"
Expected: "February 10, 2026"

Unit tests are small, fast, and numerous. A project might have hundreds of unit tests.

Integration Tests — Testing Bricks Together

An integration test checks that multiple pieces work correctly together. Like testing that two LEGO bricks actually snap together properly.

Test: "Adding a task should save to the database and appear in the task list"
Steps: Call the API to create a task → Query the database → Check the task exists

Integration tests are slower but catch problems that unit tests miss — like miscommunication between your front end and your API.

End-to-End Tests — Testing the Whole House

An end-to-end (E2E) test simulates a real user using your app. Like moving into the LEGO house and checking if the doors open, the lights turn on, and the plumbing works.

Test: "A user can sign up, create a task, mark it complete, and delete it"
Steps: Open the browser → Fill in signup form → Click Create Task → Check the checkbox → Click Delete → Verify it is gone

E2E tests are the slowest but most realistic. They catch bugs that only appear when everything runs together.

The Testing Pyramid

        /\
       /  \      End-to-End
      / E2E\     (few, slow, realistic)
     /------\
    /        \   Integration
   / Integr.  \  (some, medium speed)
  /------------\
 /              \ Unit
/    Unit Tests  \ (many, fast, focused)
/________________\

Most of your tests should be unit tests (fast, cheap), some should be integration (medium), and a few should be E2E (slow, expensive). This is called the "testing pyramid."

The Red-Green-Refactor Cycle

This is a systematic way to fix bugs and add features using tests:

  1. Red — Write a test that describes what SHOULD happen. Run it. It fails (red) because the feature does not exist or the bug is present. This proves your test actually detects the problem.

  2. Green — Write the minimum code to make the test pass. Run it. It passes (green). The feature works or the bug is fixed.

  3. Refactor — Clean up the code you just wrote, making it more readable or efficient. Run the tests again to make sure they still pass.

RED:      Write a failing test     →  ✗ FAIL (expected)
GREEN:    Write code to fix it     →  ✓ PASS (feature works)
REFACTOR: Clean up the code        →  ✓ PASS (still works)

Example: Your search filter does not handle uppercase letters. A user searches "BUY" but the task "Buy groceries" does not appear.

  1. Red: Write a test — search("BUY") should return "Buy groceries". Run it. It fails.
  2. Green: Fix the search to be case-insensitive. Run the test. It passes.
  3. Refactor: Clean up the search function code. Run the test again. Still passes.

Test Syntax Basics

Most JavaScript testing uses a framework called Jest or Vitest. The syntax looks like this:

describe('Task Manager', () => {
  it('should add a new task', () => {
    const tasks = [];
    const newTask = addTask(tasks, 'Buy groceries');
    expect(newTask.title).toBe('Buy groceries');
    expect(newTask.completed).toBe(false);
  });

  it('should mark a task as completed', () => {
    const task = { title: 'Buy groceries', completed: false };
    const updated = completeTask(task);
    expect(updated.completed).toBe(true);
  });
});

Reading this:

  • describe — Groups related tests together ("Task Manager")
  • it — One individual test ("should add a new task")
  • expect — States what the result SHOULD be
  • .toBe() — Checks that the actual result matches the expected result

Reading Test Output

When you run tests, the output looks like this:

 PASS  tests/tasks.test.js
  Task Manager
    ✓ should add a new task (3ms)
    ✓ should mark a task as completed (1ms)
    ✓ should delete a task (2ms)
    ✗ should filter tasks by search term (5ms)

  3 passed, 1 failed

  FAIL: should filter tasks by search term
    Expected: ["Buy groceries"]
    Received: []

Green ✓ means pass. Red ✗ means fail. The failure message tells you exactly what went wrong: the search returned an empty array instead of finding "Buy groceries."

How to Prompt Claude for Good Tests

When asking Claude to write tests, be specific:

Write tests for my task manager that cover:
1. Adding a new task (title, default completed=false)
2. Marking a task as completed
3. Deleting a task
4. Searching tasks (case-insensitive)
5. Filtering tasks (all, active, completed)
6. Edge cases: empty title, very long title, special characters

Use Vitest. Put tests in a __tests__ folder.

The more scenarios you describe, the better the tests will be.

Adding Test Commands to CLAUDE.md

Update your CLAUDE.md so Claude knows how to run tests:

## Commands
- `npm run dev` — Start the development server
- `npm run build` — Build for production
- `npm test` — Run all tests
- `npm test -- --watch` — Run tests in watch mode (re-runs on file changes)

This way, Claude can run tests as part of its workflow without you telling it every time.

Practice Exercises

Exercise 1 (Guided): Your First Test

  1. Ask Claude: "Set up Vitest for my project and write a simple test"
  2. Run the tests: npm test
  3. Read the output — identify which tests passed and which failed (if any)
  4. Ask Claude to add 3 more tests for your task creation function

Verification: Running npm test shows at least 4 passing tests. The output clearly shows test names and results.

Exercise 2 (Independent): Red-Green-Refactor

Goal: Use the red-green-refactor cycle to fix a real bug.

  1. Find something in your app that does not work correctly (or intentionally introduce a bug — for example, make the search case-sensitive)
  2. Write a test that exposes the bug — it should fail (RED)
  3. Fix the code — the test should now pass (GREEN)
  4. Clean up your code — the test should still pass (REFACTOR)
  5. Commit with a message like: "Fix case-sensitive search with test"

Hints:

  • If you cannot find a real bug, try these intentional ones: search does not handle uppercase, deleting the last task causes an error, empty task titles are allowed

Verification: Git log shows a commit with a test fix. Running npm test shows all tests passing.

Exercise 3 (Challenge): Comprehensive Test Suite

Ask Claude to write a comprehensive test suite covering:

  • All CRUD operations (create, read, update, delete tasks)
  • Search and filter functionality
  • Edge cases (empty inputs, special characters, very long inputs)
  • API route responses (correct status codes, error handling)

Run /code-review afterward and fix any issues Claude identifies.

Target: At least 10 tests covering different scenarios.

Self-Assessment Quiz

1. What is a unit test? Give an example from a to-do list app.

2. Explain the red-green-refactor cycle in your own words.

3. What is a regression, and how do tests prevent them?

4. In test output, what does a failing test tell you?

5. Why should you add test commands to your CLAUDE.md file?

Answers:

  1. A unit test checks one small piece of code in isolation. Example: testing that the addTask function creates a new task with the correct title and completed set to false.

  2. Red-green-refactor is a three-step cycle: (1) Write a test that fails, proving the bug exists or the feature is missing (red). (2) Write the minimum code to make the test pass (green). (3) Clean up the code while keeping the test passing (refactor).

  3. A regression is when fixing or changing one thing accidentally breaks something else. Tests prevent regressions because once you write a test for a feature, that test runs every time you make any change. If your change breaks the feature, the test immediately fails and alerts you.

  4. A failing test shows: (1) The test name — what was being tested, (2) The expected result — what should have happened, (3) The actual result — what actually happened. This tells you exactly where and how the code is wrong.

  5. Adding test commands to CLAUDE.md tells Claude how to run your tests. Claude can then automatically run tests as part of its workflow — for example, running tests after making changes to verify nothing broke. Without these commands in CLAUDE.md, you would need to tell Claude how to run tests every session.