TheFieldNotes
I Built a Local Multi-Agent AI Dev Team on a Budget GPU

I Built a Local Multi-Agent AI Dev Team on a Budget GPU

·3 min read
by Akshay
AI AgentsLocal LLMsQwen2.5-Coder

I built a local multi-agent AI development system to see if small open-source models could collaborate on software tasks using low-end hardware. I ran three separate agents on an older NVIDIA GTX 1650 graphics card with 4GB of VRAM. While basic tasks compiled and self-healed, the system failed entirely when pushed slightly past standard textbook logic.

The Multi-Agent Setup

The architecture relies on three specialized agents working in a sequential loop to execute feature requests. I restricted the code output specifically to Java to keep the testing environment predictable.

[Feature Request]
       │
       ▼
1. Product Agent  ──► Generates specifications
       │
       ▼
2. Developer Agent ◄─┐ Writes Java code
       │             │
       ▼             │ (Feedback loop: maximum 3 attempts)
3. QA Agent       ───┘ Reviews and verifies code
       │
       ▼
[Final Code / Mark Failed]

The system operates step-by-step:

  1. Product Agent: Defines the functional and technical specifications based on the initial feature prompt.

  2. Developer Agent: Interprets those specifications and writes the Java code.

  3. QA Agent: Reviews the code and checks for syntax or logical execution issues.

The Developer and QA agents communicate directly with each other. If the QA agent detects a compilation error, it sends the output back to the developer. The developer attempts a rewrite, and the QA agent re-evaluates it. To avoid infinite loops, the system terminates and marks the task as failed if it cannot generate working code within three tries.

To make this run locally on budget hardware, I used the Qwen 2.5 Coder 3-Billion parameter model. The model occupies roughly 2.4GB of space, allowing it to fit inside the 4GB VRAM threshold alongside the active script architecture.

What Worked

For fundamental data structures and algorithms, the local pipeline worked reliably and responded quickly due to running entirely offline.

When prompted to Generate Bubble Sort, the Product Agent generated the specification, the Developer Agent wrote the array sorting function, and the QA Agent verified the execution path without errors. Manual verification confirmed the generated class sorted an unsorted array properly.

The system also successfully built a singly linked list. During this execution, a validation error occurred. The system routed the error back to the Developer Agent, which triggered the built-in error memorization routine. By logging the bug state for reference, the developer agent corrected the syntax on the next loop. The final single-file output included methods for head insertion and element removal, passing the internal testing phase.

What Failed

The system collapsed when tasked with a basic, non-standard application. I asked it to build a text-based game called "Guess the Word" where the program selects a random word, strips out the consonants, shows only the vowels, and reads user input to evaluate the guess.

The model stalled during generation and remained stuck on the Developer Agent step. I restarted the setup and submitted the exact same prompt multiple times, but it consistently failed to compile a complete program.

Even after adjusting the model file to increase the context window size, the 3-Billion parameter model could not process the longer history of error logs. The model began hallucinating code structures mid-generation because its context capacity was insufficient to handle the overhead of the agent dialogue loops.

Verdict

Running a multi-agent development team on a 3B local model is useful for introductory data structures, core algorithms, and simple conditional logic loops. It struggles with anything outside standard textbook definitions due to hardware constraints. This architecture works within a narrow band. Push it past standard textbook logic and it collapses. A larger model would handle it better — but that is a different hardware conversation entirely.

Resources

Full code in the YouTube video description. Run it, break it, see where your GPU hits its limit.

Resources & Attachments