Skip to content

Media Creation Curriculum in the AI Era #3 — A Storyboard-Driven AI Video Curriculum

Updated: 2026-05*

1. Introduction

This essay is a Q&A between the author and Claude AI about the author’s vision for redesigning a media-production course. As the third installment in the “Media Creation Curriculum in the AI Era” series, it proposes a curriculum that teaches AI video production starting from the storyboard. The materials are:

  • Kiyoshi Yamamoto’s “design-sheet method” (a 15-second short film made with Seedance 2.0)
  • Node-based workflows, exemplified by Higgsfield Canvas

The first installment used Andrew Price’s argument as a starting point to set the overall direction for “Visual Practicum I/II.” The second installment examined what the syllabus would look like if all 180 min × 15 sessions were built on ComfyCloud alone. Building on those, this installment analyzes how the storyboard-driven AI workflow maps onto the traditional director’s flow in film and animation — backed by academic and industry sources — and presents a 180-min × 15-session curriculum plus a compressed 13-session version that incorporates flipped-classroom delivery. The intended audience is art-school students, including beginners with no programming background.

1.1 References

1.2 Reference tutorial videos


2. The materials: the design-sheet method and the node-based canvas

These two are not separate phenomena. They are the same underlying idea at different granularities — one targeting per-shot consistency, the other targeting whole-pipeline reproducibility.

2.1 The design-sheet method

Reference: https://note.com/kiyoshi_yamamoto/n/nea762402afbb

On 2026-05-15, Yamamoto produced a 15-second, five-shot short film “Man and the Precog Dog (tentative)” by handing Seedance 2.0 (BytePlus) just two 4K reference images. The model preserved consistent characters, props, and locations across all shots. Total time from spec to finished cut was about an hour; total cost — including image generation — was $4.08 (~¥630). The three load-bearing components are:

  • Production design sheet: a single 4K image that consolidates a character turnaround, props, set design, and tone guidance.
  • Shot list: shot number / shot size / angle / action / beat, rendered as a tabular image.
  • English labels are mandatory: Seedance reads the English label text inside the image to match each element, so visual symbolization is a precondition.

In other words: the conventional pre-production deliverables — art-direction packets plus storyboards — are used as-is as the reference input to the AI.

(Yamamoto notes that existing tools like Higgsfield CLI didn’t fit his needs — “the I2V mode is too strict, doesn’t support 15 seconds, and has no audio” — so he built his own CLI (fal-seedance) in Node.js. It calls the Seedance 2.0 API via fal.ai.)

2.2 Higgsfield Canvas

Reference: https://higgsfield.ai/canvas-intro

Higgsfield Canvas, released in early May 2026, is a workflow environment that lets you connect prompts, reference images, and outputs from various models on an “infinite node-based canvas” — shareable and re-runnable across a team.

As i-scoop.eu summarizes the situation: AI production is not linear. It branches — from reference images, into style trials, character generation, scene generation, motion adjustment, and multiple variations — so a canvas-style approach that visualizes decision-making is well-suited to the work.

Technically, it’s the same idea as ComfyUI’s “pipeline-as-a-node-graph” approach that has lived on individual local machines, wrapped around cloud-hosted commercial models with team-collaboration features.

2.3 What both share

Three things:

  • They treat the “design” — not the result — as the reusable asset.
  • They push consistency control (character, style, tone) upstream into the workflow.
  • They take “the whole pipeline,” not the individual output, as the deliverable.

3. Where we are now: how the traditional director’s workflow maps onto the AI era

The hypothesis worth testing is: “The high-level flow is the same as a traditional director’s workflow; only the material-creation steps have been replaced by AI.” Let me check that.

3.1 The traditional production flow

Reference: arXiv 2504.08296 https://arxiv.org/html/2504.08296v1

The academic synthesis is that film production traditionally consists of three phases (per the same paper):

  • Pre-production: script, storyboard, character design
  • Production: directing, cinematography, and the collaborative work of all departments
  • Post-production: editing, VFX, sound design, mixing

The director is involved across all phases, but the pre-production decisions (script, shot design, look development) are what set the direction of the final product.

3.2 Why the high-level flow is largely unchanged

Three threads of evidence support the hypothesis.

Evidence 1: Process order doesn’t change; the inside of each phase compresses.

Frameo’s writeup observes that AI video workflows span the same phases as traditional production pipelines — what changes is not the order but the failure modes. In traditional film production, time and cost accumulate in logistics (crew, locations, equipment, scheduling); in AI-driven production, they accumulate in decision latency, regenerations, and inconsistencies. That’s Frameo’s framing.

Evidence 2: The director’s judgment stays; the execution compresses.

In Drawstory’s previsualization article, work that used to take weeks is now done in hours with AI, while creative output quality is held constant. The reason given: “the director still controls every visual decision; AI handles the production work.”

The Monthly Film Festival, citing Adobe’s documentation, characterizes the best fit for AI generation tools inside a traditional film workflow as not a replacement for the director, DP, production designer, or editor — but as a “speed layer” wrapping around them.

Evidence 3: The unit of output shifts from “the cut” to “ingredients + pipeline.”

Ability.ai’s working analysis describes professional workflows that use pre-generated “ingredients” — character reference sheets, depth maps — to control consistency and minimize AI hallucination. Rather than generating from text alone, the approach feeds high-quality inputs (character reference images, lighting-consistent keyframes, storyboard compositions) to the AI video generator — an “ingredients-to-video” approach.

Yamamoto’s design sheet is precisely this: the ingredients made visible as production-design documents.

3.3 But it isn’t quite the same flow

It would be inaccurate to call them identical. The following work items are new — none of them existed in the traditional director’s job:

  • Prompt design and retrieval design (the reference-image selection strategy)
  • Model selection (allocating across Seedance / Veo / Kling / Wan according to their characteristics)
  • Building and maintaining the node graph that defines the pipeline
  • Managing generation variance (quality assurance on the assumption that the same input produces varying results)

Frameo’s framing of the same point: a prompt doesn’t replace structure; it translates structure into a generable instruction. If you use prompts as a substitute for structure, output quality varies unpredictably; if you run prompts against a defined script, results are consistent and reproducible.

In short, the closest framing of the current situation is: “The director’s job hasn’t changed; a new AI-specialist role (prompt engineer / pipeline builder) has been added underneath.” Ability.ai’s description of actual team composition retains the role names — writer, director, cinematographer, animator, editor — while each person’s tools become AI-native.


4. Why a node-based workflow is worth teaching

Reference: https://dl.acm.org/doi/10.1145/3757372.3771864

The pedagogical case for teaching node-based environments (ComfyUI, Higgsfield Canvas, TouchDesigner, etc.) is articulated at the SIGGRAPH Asia 2025 Educator’s Forum as follows.

ComfyUI’s node-based interface is more than a technical tool: it functions as a visual language for thinking about the creative process. Students learn to decompose complex creative goals into modular, reusable components, developing artistic and computational thinking in parallel — that’s the forum’s argument.

This is especially relevant for art-school students. Without learning a programming language, they pick up the software-engineering instincts of systems thinking, reproducibility, and modularity — visually.


5. Coverage comparison: where YouTube tutorials end

Since the Higgsfield Canvas release, operating-instruction tutorials have flooded YouTube. The question is whether they already cover the level a university course would target.

5.1 The typical scope of the videos

The walkthrough videos that have proliferated since early May 2026 have converged in structure and granularity. The typical table of contents looks like:

  • How to read the Canvas screen and the basic node operations (drag, wire connections, node deletion)
  • Soul ID / dropping reference images and generating characters
  • The minimum text-to-image → image-to-video workflow
  • Switching between models (Seedance, Veo, Kling, WAN, etc.)
  • Fork functionality and template sharing
  • Simple worked examples (fashion, product, short clip)

In other words, they’re operating tutorials — Canvas-101.

5.2 How this curriculum relates to them

Phase C of the curriculum below (sessions 8–11) covers essentially the same territory as those videos. At the level of operating proficiency, this curriculum already meets or exceeds what YouTube provides.

But most YouTube tutorials don’t cover the following, which this curriculum does cover on its own:

  • Structural mapping to the traditional film production flow (Phase A)
  • The discipline of pre-production — script, shot list, design sheet (Phase B)
  • The conceptual background: “designs as reusable assets”
  • A theoretical treatment of the consistency problem
  • Critique, iteration, and integrated production (Phase D)

5.3 The takeaway

Video tutorials teach “how to use Canvas.” They don’t teach “how to become someone who makes films.” The structure of this curriculum is: enclose the YouTube-tutorial level inside Phase C as a single phase, then surround it with its own pedagogical content on either side.

This observation is what justifies the compressed 13-session, flipped-classroom version below.


6. Is a 15-session, 180-minute course the right shape?

6.1 Verdict

Twelve sessions can work. Fifteen has its rationale. Adding flipped-classroom delivery makes thirteen the realistic answer.

6.2 Argument

180 min × 12 sessions (= 36 hours) can minimally cover the following:

  • Production-flow fundamentals (1 session)
  • Pre-production: script and storyboard (2 sessions)
  • The design-sheet method (2 sessions)
  • Node-based workflow fundamentals (2 sessions)
  • Generation and editing (2 sessions)
  • Production exercises (3 sessions)

For the intended audience — art-school students who are also programming beginners — the case for 15 sessions rests on the following:

  • Getting comfortable with a node-based UI itself takes at least two hands-on sessions
  • Consistency control (character, lighting, camera) requires substantial trial-and-error, so dedicated feedback sessions are non-negotiable
  • The final assignment (a 15–30 second short) needs iteration sessions across pre-production, production, and post-production
  • Without a standalone critique session, design education doesn’t reach closure

This essay therefore designs for 15 sessions, and Section 12 presents a 13-session compressed version using flipped classroom delivery.


7. Curriculum overall structure

7.1 Four phases

The 180 min × 15 sessions split into four phases:

  • Phase A: Foundations and orientation (sessions 1–3)
  • Phase B: Pre-production (sessions 4–7)
  • Phase C: Production and pipeline construction (sessions 8–11)
  • Phase D: Integrated production and presentation (sessions 12–15)

Each phase’s purpose:

  • Phase A: Understand the mapping between traditional and AI production flows; survey the tool landscape
  • Phase B: Build the skill of authoring the “designs” — script, shot list, design sheet
  • Phase C: Build pipelines on a node-based canvas; learn to control consistency
  • Phase D: Complete a self-contained short film, individually or in teams, and present it

7.2 Standard session block

Each session uses the following four blocks:

  • Lecture: 45 min
  • Demo: 45 min
  • Hands-on: 75 min
  • Critique: 15 min

8. Phase A: Foundations and orientation (sessions 1–3)

8.1 Session 1 — Orientation and current state

Position the course (in relation to essays #1 and #2) and survey the current state of AI video production.

  • Lecture: The three traditional phases of film/animation production, and where AI now intervenes in each
  • Demo: Watch Yamamoto’s 15-second short and decompose how a $4.08 / one-hour production was assembled
  • Hands-on: Account setup for everyone (Higgsfield, fal.ai, Nano Banana, each student’s editor)
  • Critique: Survey students’ production backgrounds and align the baseline

8.2 Session 2 — Production flow, in full

Lay out the traditional flow in academic terms and make the structural correspondence to the AI flow explicit.

  • Lecture: Summarize arXiv 2504.08296 and map AI’s intervention points across all three phases
  • Demo: Watch 30 minutes of a making-of documentary (short animation or live-action) and identify each step
  • Hands-on: Each student picks a favorite work and writes a phase-decomposition report
  • Critique: Share the decomposition results and standardize terminology

8.3 Session 3 — Tool landscape

Classify the 2026 AI video tool landscape and understand each tool’s coverage.

  • Lecture: Tools by category
    • text-to-image: Nano Banana 2, Flux, Seedream
    • text-to-video / image-to-video: Seedance 2.0, Veo, Kling, Wan
    • node canvases: Higgsfield Canvas, ComfyUI
    • dedicated pre-production tools: Drawstory, Shai, Boords
  • Demo: Run the same prompt through three models and compare the outputs
  • Hands-on: Each student does their own three-model comparison and writes a characteristics table
  • Critique: Share the model-selection criteria across the class

9. Phase B: Pre-production (sessions 4–7)

9.1 Session 4 — Scripting and 15-second composition

Learn screenwriting techniques and “beat” design for short formats.

  • Lecture: Three-act structure, ki-shō-ten-ketsu, five-shot construction (Yamamoto’s example: setup·anomaly / recognition / disturbance / impact / reveal)
  • Demo: Live walk-through of building a five-shot composition from a one-line idea
  • Hands-on: Each student writes a one-line logline and a five-shot composition for a 15-second short
  • Critique: Feedback on the composition

9.2 Session 5 — Shot list and beat design

Learn to record shot size, angle, and action in tabular form.

  • Lecture: The basic notation — MS / CU / WS / OTS, eye-level / high / low — and the emotional effect of each
  • Demo: Decompose Yamamoto’s five-shot table and reverse-engineer why each size and angle was chosen
  • Hands-on: Convert the Session 4 composition into a shot list (tabular form)
  • Critique: Complete English labeling (required for the AI inputs in later sessions)

9.3 Session 6 — Production design sheet I: characters and props

Learn to consolidate a character turnaround and props into a single 4K image.

  • Lecture: The historical role of production design and the requirements as an AI reference image (resolution, English labels, clear separation)
  • Demo: Generate a character turnaround in Nano Banana 2 and composite it with props into a single sheet — the full workflow
  • Hands-on: Each student produces their own character turnaround + props sheet (4K)
  • Critique: Check label placement, resolution, legibility

9.4 Session 7 — Production design sheet II: sets and tone

Consolidate three location shots and a color/material specification onto one sheet.

  • Lecture: Tone-and-manner, mood boards, lighting design
  • Demo: Generate three locations (entrance, living room, exterior) with consistent lighting
  • Hands-on: Finish a design sheet that includes three set-design shots and a tone-specification panel
  • Critique: Verify integration with the Session 6 sheet

10. Phase C: Production and pipeline construction (sessions 8–11)

10.1 Session 8 — Higgsfield Canvas, introduction

Understand the basic operations of a node-based canvas and how nodes connect.

  • Lecture: The four concepts — Node / Wire / Reference / Output
  • Demo: Build the minimum text-to-image → image-to-video workflow on the Canvas
  • Hands-on: Each student reproduces the minimum workflow
  • Critique: Confirm node naming and organization rules

The minimum workflow structure:

  • Text Prompt Node
    • prompt: A woman walking in a Tokyo street at dusk
    • aspect ratio: 16:9
  • Image Generation Node
    • model: Seedream 4.5
    • reference: none
  • Image-to-Video Node
    • model: Seedance 2.0 Fast
    • duration: 5s
    • motion: subtle camera push-in

(Because the audience is programming beginners, introduce this session through the metaphor of “replace your sense of files and folders with nodes.”)

10.2 Session 9 — Consistency control: Soul ID and design-sheet integration

Learn techniques for securing character and style consistency.

  • Lecture: How Higgsfield Canvas’s Soul ID relates to the design-sheet method. As Drawstory frames it: generative video models can produce beautiful results, but they fail at identity — the face changes between shots, costume details drift, the “same” character looks subtly different in each scene. For campaign work this becomes a problem fast.
  • Demo: Create a Soul ID and generate the same character across multiple scenes
  • Hands-on: Feed the Session 6–7 design sheets into the Canvas, combine with Soul ID, and generate three shots
  • Critique: Identify the points where consistency breaks down — as a group

The node structure:

  • Soul ID Node
    • character name: protagonist_male_40s
    • reference images: 3–5 (front, side, back)
  • Design Sheet Reference Node
    • image: production_design_sheet_4K.png
  • Image Generation Node
    • model: Seedream 4.5
    • reference: wire from the two nodes above
    • prompt: medium shot, entrance hall, eye-level, holding red leash

10.3 Session 10 — Multi-shot generation and reference-to-video

Learn to batch-generate multiple shots with reference-to-video models like Seedance 2.0.

  • Lecture: The input specification for reference-to-video and how to hand off a shot-list image
  • Demo: Full workflow — hand Seedance 2.0 two reference images (design sheet + shot list) via fal.ai and generate five shots in a single call. Walk through Yamamoto’s CLI example as well.
  • Hands-on: Each student tries a five-shot generation with their own design sheet
  • Critique: Evaluate inter-shot consistency in the generated output

For reference, the form of Yamamoto’s public fal-seedance CLI:

fal-seedance r2v \
  --image design-sheet/precog_dog_design_sheet_4K.png \
  --image shotlist/shotlist_4K.png \
  --duration 15 \
  --fast \
  --download output/precog_dog_v1.mp4

(Yamamoto records two gotchas:

  • The fal-ai/ prefix in the endpoint name is unnecessary (bytedance/seedance-2.0/reference-to-video is correct).
  • When passing multiple reference images, the --image argument must be repeated (space-separated doesn’t work).)

10.4 Session 11 — Saving and reusing pipelines

Learn to save and share the workflows you’ve built as reusable assets.

  • Lecture: Higgsfield Canvas’s sharing features. The design lets Soul ID characters, uploaded products, brand references, and prior outputs all be brought in as nodes — leverage that for asset management.
  • Demo: Fork your own workflow and repurpose it for a different project
  • Hands-on: Name and organize each student’s workflow into a shareable state
  • Critique: Standardize the workflow naming convention

11. Phase D: Integrated production and presentation (sessions 12–15)

11.1 Session 12 — Final-assignment brief and pre-production wrap

Lock in the final assignment (a 15–30 second short or commercial) and complete the design sheet.

  • Lecture: Explain the critique rubric (concept / consistency / editing — three axes)
  • Demo: Walk through one reference proposal as the instructor’s demonstration
  • Hands-on: Each student completes the full pre-production pipeline (script → shot list → design sheet)
  • Critique: Individual feedback for everyone

11.2 Session 13 — Generation

Generate the locked-in proposal from Session 12 and finish the first cut.

  • Lecture (brief): Common failure patterns (Yamamoto’s fal-ai/ prefix problem, the --image repetition problem, and similar pitfalls)
  • Hands-on: All-day generation work; the instructor rotates as support
  • Critique: First-cut review; lock in the regeneration plan

11.3 Session 14 — Post-production and audio

Bring footage into an editor (Premiere / DaVinci / CapCut), do color grading, add audio.

  • Lecture: Pitfalls of post-producing AI-generated material (frame rate, resolution, audio mismatches)
  • Demo: Edit one sample from scratch
  • Hands-on: Each student edits their own piece
  • Critique: Mid-edit review

11.4 Session 15 — Final presentation and synthesis

Present the finished work and integrate the learning through critique.

  • Lecture (brief): The outlook for AI video production
  • Presentation: 10–15 minutes each (screening + process explanation)
  • Critique: Peer critique and instructor critique
  • Synthesis: Confirm the skill set the course has built

12. Compressed version: a flipped-classroom, 13-session structure

Given how well YouTube tutorials already cover operating proficiency, the operating-skill portion of Phase C can be flipped — moving it to pre-class video viewing and freeing class time for application and problem-solving. That compresses the whole course to 13 sessions.

12.1 Flipping the operating-skill sessions

Within Phase C (sessions 8–11), merge the basic-operations sessions (8 and 11), have students watch the assigned YouTube videos beforehand and reproduce the minimum workflow on their own accounts, and use class time for troubleshooting and application.

  • Pre-class assignment: Watch the assigned YouTube videos (e.g., R7GegCn8SbU, TEYITeWXRJo) and reproduce the minimum workflow on your own account
  • Class time: Troubleshooting, application exercises, workflow-design discussion

This compresses Phase C from four sessions to three.

12.2 The 13-session structure

With operating-skill sessions flipped, the structure becomes:

  • Phase A: Foundations and orientation (sessions 1–2) — merging the original sessions 2 and 3
  • Phase B: Pre-production (sessions 3–6) — same four sessions
  • Phase C: Pipeline construction (sessions 7–9) — compressed from four to three
  • Phase D: Integrated production and presentation (sessions 10–13) — same four sessions

180 min × 13 sessions = 39 hours total. The final presentation and critique sessions must stay.

12.3 Further compression options

If 15 sessions can’t be secured:

  • 12-session version: Merge design-sheet I and II into Session 7; merge sessions 10 and 11 into one; merge sessions 13 and 14 into one
  • 10-session version: Compress Phase A into one session; Phase B becomes three sessions, Phase C three, Phase D three
  • 8-session version: Drop the foundations exercises and redesign as an “implementation-focused” course centered on Higgsfield Canvas

For art-school students, however, removing the critique session means design education doesn’t reach closure — so the final presentation session in Phase D must remain regardless.


13. Summary

The hypothesis — “The high-level flow is the same as a traditional director’s workflow; only the material-creation steps have been replaced by AI” — is consistent with the current industry and academic synthesis. The three-phase structure (pre / production / post), the primacy of the director’s judgment, and the importance of the designs (script, shot list, design sheet) are all being inherited into the AI era and, if anything, reinforced.

Yamamoto’s design-sheet method repurposes conventional production-design documents as AI reference images. Higgsfield Canvas productizes the ComfyUI-style node graph as a team-oriented tool. These are not separate things; they are different implementations of the same underlying concern — “design reuse” and “pipeline visualization.”

A 180-min × 15-session curriculum is reasonable for art-school students who are also programming beginners, on the grounds that feedback sessions, iteration sessions, and critique sessions all need to be preserved. That said, given the abundance of YouTube tutorials, a flipped-classroom 13-session version is the more pragmatic real-world choice. The right structure should be selected from the options presented here, based on the student level and the time available.