Why This Exists
Generative video has made it trivially easy to produce striking images. It hasn't made it any easier to tell a coherent story.
Working with these tools, I kept hitting the same problems:
- Visual style drifting across scenes
- Characters who subtly stop looking like themselves
- Emotional beats that never quite land
- Getting seduced by spectacle at the expense of narrative
- Burning days on iteration without structural progress
- Confusing "generating more options" with "editing"
Traditional film language took over a century to develop. Synthetic cinema is being figured out in real time, and there's no shared grammar yet.
This is my attempt to write one down — or at least, to write down the constraints that have actually helped me make better work. It's not anti-experimentation. It's a recognition that without some structure, generative tools default to noise.
Beyond Imitation
This isn't about cheaply replicating traditional filmmaking. I'm not interested in building cut-rate digital substitutes for real production.
What excites me is the stuff these tools can do that nothing else can:
- Impossible camera movement
- Environments that evolve in real time
- Audio-reactive imagery
- Interactive narrative structures
- Remixable worlds
- Fluid, unstable point of view
The goal isn't imitation — it's expansion. I use structure to protect the story so I can explore what conventional cinema physically can't do.
Principles
-
Meaning over beauty.
The machine generates infinite stunning visuals. I've learned that unless an image advances the story and serves the beat, it's worthless — however gorgeous it looks. The story is sovereign.
-
The script has to carry intent.
A traditional screenplay isn't enough. AI doesn't read subtext; it executes literally. I now write emotional and visual intent explicitly into the script, because if there's any ambiguity, the machine will find the least interesting interpretation.
-
Human performance anchors emotion.
When prompts fail to land a crucial emotional beat — and they will — I don't burn hours regenerating. I step in. A recorded vocal take or a physical performance reference can guide the machine in ways that no prompt manages alone.
-
Stress-test the aesthetic early.
A visual style is a technical commitment. Before I lock anything, I test it in motion, in wide shots, with multiple characters. If it can't be reproduced consistently and efficiently, I discard it. This has saved me from beautiful dead ends more times than I can count.
-
Lock identity before you generate.
In a medium where everything morphs, character identity has to be fixed. I fully design every character before a single scene is generated. No improvising identity mid-project — that way lies drift and continuity nightmares.
-
Impose geography.
AI doesn't understand space. I define camera position, character blocking, sightlines, and light direction before generation. Sketches, diagrams, 3D blockouts — format doesn't matter. What matters is a coherent, repeatable spatial layout that the machine can't silently rearrange.
-
One frame anchors the scene.
Before I generate any motion, I lock a single definitive frame. This establishes lighting, tone, and the reality of the scene. Everything generated after has to obey that anchor.
-
Singular authority, locked workflow.
I've seen what happens when creative authority is diffused and everyone invents their own pipeline mid-production. It's chaos. The workflow gets standardized before production begins. Assets, models, prompt structures — all locked. No one invents new processes once we're rolling.
-
Generation is deliberate.
Generation is a production act, not a slot machine. Thinking happens before the machine is turned on. I plan rigorously specifically so that when a meaningful accident appears, I can recognise it and use it — instead of drowning in options.
-
Resist the drift.
Characters and worlds must not slowly mutate over time. Object permanence is a promise to the audience. Every time I've let it slide, thinking "it's close enough," I've regretted it.
-
Truth beats accuracy.
If an unpredictable generation captures the right emotion, it's valid. A flawed image that feels true is better than a flawless image that feels empty. This one took me a while to trust.
-
Embrace meaningful chaos.
The medium is inherently unstable. Texture and atmosphere will breathe and shift — that's fine, sometimes it's beautiful. But the narrative can't. The story stays locked even when the surface moves.
-
Don't chase perfection.
Endless regeneration kills momentum. If the beat lands, the shot is done. Move on. I've lost whole days to the "one more try" loop and the final result was rarely better than the version I had at hour one.
-
Sound is structural, not secondary.
Sound carries emotion as much as image — sometimes more. I define dialogue, silence, score, and texture at the script stage, not as an afterthought. When AI audio fails (and it does), human performance anchors truth.
-
Editing completes the work.
Generations provide raw material. The edit defines rhythm, pacing, and emotional flow. My job during generation is to deliver clear, consistent footage that serves the script. The structural decisions — timing, momentum, the shape of the whole thing — happen in the edit.
-
Explore what only this medium can do.
This is the most important one. Synthetic cinema isn't a cost-saving replica of traditional film. It introduces capabilities that didn't exist before. I actively push into those spaces. Structure exists to protect the story — not to limit what the form can become.
Revision Clause
This medium is evolving fast. These principles are working constraints, not permanent law. As the tools change and I learn more, this framework should change too.
Version 1.1 is a starting point, not a conclusion.