Bob is just getting started. Here's where we're hoping to take him — better animation, faster production, more interactivity, and eventually a proper studio setup running in a shed in South Australia.
The pipeline is running. Episodes are being generated end-to-end with no human involvement after the vote is counted. It works — but it's rough around the edges.
SadTalker is good for what it is, but it only animates the face. The next step is full-body animation — characters that move, gesture, and react physically to the dialogue.
Currently Bob's world is mostly silent except for voices. Real storytelling needs ambient sound — the creak of a pub, the wind across the Birdsville Track, the clunk of a blown tyre.
Right now each episode takes 30-45 minutes to render sequentially. With better hardware and parallelisation, that drops to under 10 minutes — meaning same-hour episode release after voting closes.
The long-term vision is a fully interactive AI story universe. Bob is just the start.
Bob's existence depends on people watching. We're building in mechanics that make that explicit and turn it into part of the story.
Every AI task in the pipeline — lip sync, background removal, image generation, voice synthesis — runs on a single NVIDIA GTX 1060 6GB from 2016. It's a remarkable machine that punches well above its weight, but it's showing its limits.
The GTX 1060 has 1280 CUDA cores and 6GB of VRAM. Newer cards have tensor cores specifically designed for the matrix operations that power these AI models — meaning the same task runs 5-10x faster on modern hardware.