gameocr.com DECOMMISSIONED
Upload a screenshot of a game, get back JSON with the recognized text and per-game context.
What it did
You uploaded a screenshot of a game, and the API returned JSON with the recognized text plus per-game context (item affixes, stats, etc.). Only Diablo 2: Resurrected was supported.
Why I built it
I wanted a clean programmatic way to read in-game UI text. Off-the-shelf OCR struggled with stylized fonts and game backgrounds, so I trained my own model. I wrote about that part here: Training OCR for Diablo 2.
Architecture
This was built before AI coding tools were a thing, so the whole pipeline was hand-rolled. It also had a slightly weird shape, because the heavy work was running on my Synology NAS at home rather than in the cloud.
From memory (this is roughly right, not exact):
- Client uploads a screenshot.
- A Django web server on Hetzner takes the request.
- It forwards the image to proxyx, a small firewall and size gate I wrote, running on a (back then free) fly.io pod.
- proxyx pushes the image through a Cloudflare tunnel into my home network.
- A Celery worker on my Synology NAS picks it up, runs the OCR model, and returns the JSON back up the chain.
Routing the uploads to the NAS kept the compute bill at basically zero, which was the whole point of the architecture.
Under the hood
- Pay-per-OCR with three wallet types. Each user had a free, premium, and promo wallet. New signups got 15 credits via a signal handler on user creation. The transaction manager deducted credits at the moment a screenshot was queued, with rollback on processing failure so nobody got billed for OCR runs that crashed.
- Polymorphic per-game models.
D2RScreenshotsubclassed a baseScreenshotvia django-polymorphic, so adding another game would have meant a subclass and a parser, not surgery on the core code. The architecture was set up to expand beyond Diablo 2. - Diablo 2 output structure. The API returned more than a flat string. A successful parse included character life and mana, mercenary info, item detection with bounding-box pixel coordinates, and the in-game clock time, all scoped per game.
Why it’s dead
Killed slightly before the LLM/vision wave made this kind of pipeline trivial. By then I had also lost interest in figuring out the rest of the botting stack for D2:R, and an alternative solution came out shortly after that covered the same ground.