LocalLlama Server
Winter Project 2025/26
ALUMINUM AND PLASTIC
The frame is built from 4040 aluminum extrusion and 3D printed brackets. I downloaded the corners from Thingiverse or Printables, the rest I designed for this project in TinkerCAD. Everything is custom. A little way into the project I invested in a filament dryer for the PETG spools and boy oh boy, it's one of the best $50 I ever spent. You can see the difference in print quality if you look at the two photos of feet, below.
CREATING THE INTERNAL STRUCTURE
All the parts are black PETG. Threaded inserts are brass heat-press. Feet are squash balls pressed into PETG holders. The EVGA PSU is just for sizing; the real PSU (Superflower Leadex 2800W) was in use while the frame got built. The GPUs are hooked up via PCIe -> MCIO -> PCIe to maintain PCIe Gen5 x16 while running long lengths of cable. Risers would be useless here.
GPU MOUNTS
The idea is that the GPUs vent out the sides of the frame. To do that I made custom brackets. The CPU is an AMD APYC 9B45 128-core Zen5 with cooling from a Silverstone AIO system with upgraded Corsair RS120 Max fans. System RAM is 768GB in 12x 64GB DDR5 6400 MT/s Samsung ECC RDIMMs. SSDs are RAID with 2x 4TB Samsung 9100 PRO and 2x 8TB Samsung 9100 PRO, plus a shitty old 512GB SATA SSD doing boot device duty. The big black SSD heatsinks are silly and I love them. I added DRAM shrouds with 92mm Noctua Chromax fans because the RDIMMs were overheating and throttling. Now they stay chilly and fast. If you look carefully you can see the shround on the 3D printer - the line where a new spool of shitty undried PETG was added is clearly visible!! It was only a prototype so it didn't matter.
SSDs AND LED MATRIX
The SSDs (the ones without heatsink) are mounted in the direct airflow of one of the Chromax 200mm fans. The LED matrix is made from a pair of stacked 32x8 matrices bought from Amazon (https://www.amazon.com/dp/B0B771455N?th=1) with a vibe-coded library for controlling them. It can do scrolling text, static images, and even animated GIFs! The idea is to have it do WOPR-style graphics as it crunches LLMs. Heat is an issue. When doing training runs this server is basically a 2800W heater and the ambient temperature of my office quickly rises to the point where the system cannot keep itself cool simply by moving air. I added a portable minisplit to cool the office, for which I needed to design and build a window venting system! These are made of think insulating board with 3D-printed panels custom-sized to fit my window. Keeps everything nice and temperate.
FINISHED
I'll post some specs of it running LLMs and doing training, batch inference, etc. For now... just server porn.