Media Creation Curriculum in the AI Era #2 — A ComfyUI-Based Curriculum

Updated: 2026-05*

1. Introduction

This essay is a Q&A between the author and Claude AI about the author’s vision for redesigning a media-production course. As the second installment in the “Media Creation Curriculum in the AI Era” series, it considers what the syllabus would look like if all 180 min × 15 sessions were built on ComfyCloud (the official hosted version of ComfyUI) alone.

The first installment used Andrew Price’s argument as a starting point to set the overall direction for “Visual Practicum I/II” and proposed a 30-session redesign integrating AI image/video generation, live-action, and 3DCG. In that proposal, ComfyCloud only appeared as Session 3 in the first semester plus an optional deep-dive slot (Session 8 in the second semester) — not much depth. This installment goes the other direction: what if all 15 sessions went to ComfyCloud? — and sketches the outline of a node-based curriculum.

The reason for this thought experiment is that the learning curves of Runway-style and ComfyUI-style tools are fundamentally different in character. A Runway-style integrated web service is something you can run on once you learn the basic UI and a few recipes; after that, you make progress through repeated production work, and 180 min × 15 sessions is more than enough. ComfyUI/ComfyCloud, by contrast, lets you customize the generation process itself by recombining nodes, and the distance from “run a workflow someone handed you” to “search for your own mode of expression” is long. How far to go and what to skip within 15 sessions has to be designed deliberately.

1.1 References

References:

1.2 What this piece covers

Premises and justification for building a course entirely on ComfyCloud
The difference in learning curves between Runway-style and ComfyUI-style tools
A 180 min × 15 session curriculum proposal
Assessment design and operational caveats

2. Premises for a ComfyCloud-only design

I’ll lay out the case for using ComfyCloud alone for 15 sessions along three axes: tool properties, learning structure, and operational conditions.

2.1 The learning-curve gap between Runway-style and ComfyUI-style tools

A Runway-style integrated web service has a vendor-fixed processing pipeline. Learners can only touch the inputs (text, reference images, seed) and a limited set of output-control parameters (motion strength, camera control, length, etc.). The list of recipes is largely exhausted by text-to-video / image-to-video / video-to-video / motion brush / camera control / lip sync / in- and outpainting. UI fluency really only needs 1–2 sessions; the remaining time can go into prompt-design precision, source-material selection, cut construction, and the round trip between intent and output. The student’s cognitive resources flow into the work, not the tool.

ComfyUI is the opposite. You build the entire workflow by placing processing units — Load Checkpoint, CLIP Text Encode, KSampler, VAE Decode — as nodes and connecting them with wires. The minimal txt2img setup needs 7 nodes; once you start combining control systems and video generators like ControlNet, IPAdapter, LoRA, AnimateDiff, Wan2.x, or HunyuanVideo, node counts run from a few dozen to over a hundred. You can intervene at every stage of generation, which makes it possible to draw different expressions from the same source material — but it also means understanding the structure of the tool itself takes ongoing time.

If Runway-style tools have a “learn recipes and iterate” character, ComfyUI-style tools have an “observe materials and reassemble” character. This is not a relationship of better and worse; it is a difference in where the student’s cognitive resources end up.

2.2 Five layers between “running a workflow” and “your own expression”

Between “running a workflow someone handed you” and “searching for your own mode of expression,” there are roughly five layers of learning stacked on top of each other. When designing the syllabus, each session’s target needs to be set with this layer structure in mind.

Layer 1: Mechanical fluency with node operations. Wire connections, Load/Save, reading and writing JSON, executing Queue Prompt.
Layer 2: Reading standard workflows. Being able to verbalize what each node does in the txt2img / img2img / inpaint workflows.
Layer 3: Understanding the internals of the diffusion model. The roles of Load Checkpoint, CLIP Text Encode, KSampler, VAE Decode, and how sampler / scheduler / cfg / denoise / seed actually behave.
Layer 4: Understanding the model ecosystem. The differences between SD1.5 / SDXL / FLUX as base models; the division of labor between LoRA, ControlNet, and IPAdapter.
Layer 5: Reassembly into your own workflow. With all of the above in place, swapping, adding, or branching nodes to fit your own production intent.

Dropping a workflow JSON onto the canvas, rewriting the prompt, and pressing Queue Prompt is achievable at Layer 1, reachable within Session 1. But getting from there to Layer 5 means climbing Layers 2–4 in order. The “long distance” referred to above is not node-operation fluency; it is the time required to stack this layer structure step by step.

2.3 What ComfyCloud solves and doesn’t

ComfyCloud solves the introduction-cost side. Preparing a local GPU, setting up drivers and Python environments, installing custom nodes — many of the things that trip students up before they even reach Layer 1 are avoided. The operational benefit of “open a browser and everyone starts from the same state” is significant.

What it does not solve is the conceptual learning cost. Node-based thinking, the internal structure of diffusion models, the criteria for choosing a model — Layers 2–4 — are not compressed by moving to the cloud. Adopting ComfyCloud lets you shorten the “introduction session,” but the time required to climb the middle layers is unchanged. That needs to be understood up front.

2.4 ComfyCloud’s current features and operating conditions

ComfyCloud is Comfy Org’s own official hosted service for ComfyUI, an open-source node-based image/video generation interface. Its biggest feature is that users without a local GPU can access nearly all of ComfyUI’s functionality from a browser.

A December 10, 2025 update consolidated the pricing model into a unified credit system. A monthly subscription grants a fixed amount of credits, which are consumed by workflow execution time in the cloud and by Partner Nodes (formerly API Nodes — the node group that calls models like Seedream or Kling via external APIs). If credits run out, you top up with additional purchases.

The properties that matter for classroom use are:

No high-end GPU environment required on the student side.
No variation in environment setup, by construction (open a browser, everyone is in the same state).
Workflows and generated outputs are kept private inside the account.
Workflow JSON is exportable, so students can take it with them to a local ComfyUI after graduation.
Students can upload LoRAs they have trained themselves (Bring Your Own LoRA).
The latest commercial models (Seedream, Kling, Veo, etc.) are reachable via Partner Nodes.

Note: credit consumption and pricing are in flux. When designing a course, confirm the latest official information.

2.5 What it means to spend all 15 sessions on one tool

Devoting all of 180 min × 15 = 2,700 min (45 hours) to ComfyCloud makes sense for three reasons.

First, ComfyUI has evolved into a general-purpose platform that handles image generation, video generation, 3D (generation and editing), and audio. One tool covers most of the major steps in media production. There is no switching cost from moving between tools mid-class.

Second, fluency with node-based operations requires repetition. Spending 15 sessions in the same environment lets node naming conventions, data types, and wire-connection rules sink in naturally. You can reach a depth that’s unreachable in a “different tool every week” design.

Third, in the second half students can spend their time modifying and writing their own workflows, which turns into a real production period for searching out a personal mode of expression. That’s hard to make work in a Runway-centered design.

The limits are equally clear. Live-action shooting, editing (DaVinci Resolve, etc.), compositing (After Effects, etc.), and 3DCG (Blender, etc.) are not covered by ComfyCloud alone. This curriculum closes the loop “from source-material generation up to cut-level video generation” inside ComfyCloud; editing and overall sequencing have to live in another course or in the student’s own time.

3. Curriculum design principles

3.1 Which layer to set as the target

Given the five-layer structure above, the first design decision is which layer to set as the 15-session target. Setting it cleanly at Layer 5 (searching for personal expression through a self-built workflow) is tight on time. That design only works if students repeat the work outside class every week; if they only touch the tool in class, it is inevitable that some will be stuck at Layers 3–4 at the final session.

Strategic options:

High setting: Layer 5 as the final target. Weekly self-study is mandatory. The final assignment requires submission of the workflow JSON.
Middle setting: Layer 4 as the final target. The goal is “can read and modify an existing workflow.” Time loosens up and more of it can go to critiques and exploration of expression.
Compromise line: Layer 5 as the nominal target, but the final assignment is framed realistically — “you may reference an existing workflow, but must change at least 3 things yourself.”

Note: for art-school / design-school students, the middle setting is often more realistic. Tying “exploration of expression” too tightly to “building workflows from scratch” steals time from the actual expression work. This curriculum proposal adopts the compromise line in the later sections.

3.2 Four phases

The 15 sessions break into four phases:

Phase A (Sessions 1–4): Foundations. UI operations and constructing txt2img / img2img workflows.
Phase B (Sessions 5–9): Control and expression. Structural and stylistic control via ControlNet / IPAdapter / LoRA.
Phase C (Sessions 10–12): Video generation. AnimateDiff and the latest video models via Partner Nodes.
Phase D (Sessions 13–15): Production and critique. Short pieces using self-built workflows.

The last session of each phase carries either a small assignment or an interim critique, to make progress visible. Session 15 is reserved as the final critique.

3.3 The shift from “workflow consumption” to “workflow authoring”

ComfyUI has a large body of publicly available workflow JSON that students can download, drop onto the canvas, and run as-is. Asking them to build from scratch from the start invites burnout; letting them stay on other people’s workflows until the end isn’t exploration. The curriculum walks them through it: in Phase A they run existing workflows while learning what each node does; in Phase B they start modifying parts; from Phase C they reconstruct whole workflows.

Concretely, Phase A uses official samples and the teaching workflows from AICU and PERSC as a base. From Phase B, students add and remove nodes themselves. The final assignment in Phase D imposes the condition: “you may reference an existing workflow, but you must change at least 3 things yourself.”

3.4 Time split between lecture and practice

ComfyUI is the kind of tool you understand more by touching than by listening, so expanding lecture time doesn’t deepen understanding much. The standard 180-minute split is: lecture/demo 30 min / individual practice 120 min / sharing & Q&A 30 min. Only on sessions that introduce a new family of nodes (Sessions 5, 7, 10, 12, etc.) does lecture extend to 60 min, with practice shortened to 90 min to compensate.

3.5 Assessment design

Assessment runs on three components:

Interim assignment (as of Session 9): 1 worldbible sheet + a 5–8 image series of the same character / world.
Final assignment (as of Session 15): a 30 sec – 1 min short video + the workflow JSON used to produce it.
Class participation: submission of each session’s practice output, and comments on other students’ workflows.

The final assignment requires the workflow JSON, which folds the generation process itself into the assessment. The design looks at “how the nodes were assembled,” not just the polish of the finished video.

4. The full 15-session curriculum

4.1 Phase A: Foundations (Sessions 1–4)

The aim is Layers 1–3. By way of UI operations and reading standard workflows, students reach the point of feeling out how the main diffusion-model parameters behave.

Session 1: Orientation and ComfyCloud onboarding

The lecture frames ComfyUI / ComfyCloud’s position, the node-based mindset, and the difference from Runway. Then UI fundamentals as a practice exercise.

Lecture: 30 min. The overall picture of ComfyCloud, pricing, contrast with Runway-style tools.
Practice: 120 min. Account setup, UI operations (pan/zoom, adding nodes, wire connections, executing Queue Prompt), running the default txt2img workflow.
Sharing: 30 min. Sharing outputs and Q&A.

Target: students can navigate the canvas, execute Queue Prompt, and save generated images.

Session 2: The 7-node txt2img structure

Rebuilding the minimal workflow from a blank canvas to internalize what each node does.

Lecture: 30 min. The roles of Load Checkpoint, CLIP Text Encode, Empty Latent Image, KSampler, VAE Decode, Save Image.
Practice: 120 min. Delete Session 1’s workflow entirely and rebuild txt2img by placing and connecting all 7 nodes from scratch. Then vary KSampler parameters and observe the output differences.
- KSampler
  - seed: run with a fixed value multiple times to confirm reproducibility
  - steps: compare 10 / 20 / 40
  - cfg: compare 3.0 / 7.0 / 15.0
  - sampler_name: compare euler / dpmpp_2m / dpmpp_sde, etc.
Sharing: 30 min

Target: students can build the minimal txt2img configuration from blank. They can explain what each of KSampler’s main parameters does.

Session 3: Prompts and model selection

How prompt design and model selection affect the workflow as a whole.

Lecture: 30 min. Basics of positive/negative prompts, token weighting, recommended parameters by model family (SDXL, FLUX, etc.).
Practice: 120 min. Generate the same prompt across multiple models (at least one each from the SDXL and FLUX families) and compare. Verify negative-prompt effects.
- Note: when using FLUX-family models, fix KSampler’s cfg to 1.0 and adjust guidance strength through a separate FluxGuidance node.
Sharing: 30 min

Target: students can reliably generate images close to their intended style and can verbalize the characteristics of each model.

Session 4: img2img and inpaint, Phase A mini-assignment

Generation starting from an existing image, plus a Phase A summary mini-assignment.

Lecture: 30 min. The roles of VAE Encode / VAE Decode, the meaning of the denoise parameter, the structure of an inpaint workflow.
Practice: 120 min. Use img2img to transform a personal photo into a different style; use inpaint to fix part of an image; mini-assignment “a workflow that produces a self-introduction visual in under 5 minutes.”
- KSampler (in img2img)
  - denoise: compare 0.4 / 0.6 / 0.8
Sharing: 30 min. Share the created workflows and outputs.

Target: students can generate work that starts from an existing image and can design a workflow for a stated intent.

4.2 Phase B: Control and expression (Sessions 5–9)

The aim is Layer 4. Students come to understand the division of labor among ControlNet, IPAdapter, and LoRA, and reach the point of combining them to land closer to an intended output.

Session 5: ControlNet (structural control), part 1

Stepping past prompt-only methods to control composition, pose, and structure.

Lecture: 60 min. ControlNet’s principles, the roles of preprocessors (pose / depth / canny / lineart, etc.), strength and timestep control.
Practice: 90 min. For a fixed prompt, extract OpenPose from a posed human image and Depth from a landscape photo, and feed each into ControlNet to generate.
- ControlNetApplyAdvanced
  - strength: compare 0.5 / 0.8 / 1.0
  - start_percent: 0.0
  - end_percent: 1.0
Sharing: 30 min

Target: students can feed structural information into the model and reach compositions unreachable from prompt alone.

Session 6: ControlNet, part 2 (combining multiple)

Designing with multiple ControlNets stacked.

Lecture: 30 min. Multi-ControlNet design, interference between preprocessors, how to weight them.
Practice: 120 min. Try combinations such as OpenPose + Depth, Canny + Lineart, observing the effects and limits.
Sharing: 30 min

Target: students can combine multiple structural constraints to land closer to an intended output.

Session 7: IPAdapter (style / composition reference)

Introducing the technique of treating a reference image as “another prompt.”

Lecture: 60 min. IPAdapter’s principles, the difference between style transfer and composition transfer, combined use with ControlNet.
Practice: 90 min. Specify a single reference image and combine it with the student’s own prompt. Then combine with ControlNet to control both structure and style simultaneously.
- IPAdapterAdvanced
  - weight: compare 0.5 / 0.8
  - weight_type: compare style transfer / composition / etc.
Sharing: 30 min

Target: students can leverage reference images to reproduce styles that are hard to specify by text alone.

Session 8: LoRA and the use of custom data

Layering specific styles or characters on top of a general-purpose model.

Lecture: 30 min. LoRA principles, stacking multiple LoRAs, ComfyCloud’s Bring Your Own LoRA feature.
Practice: 120 min. Try 2–3 public LoRAs, varying strength to compare effects. (Optional) Upload a LoRA trained on a handful of personally collected images and test it.
- LoraLoader
  - strength_model: compare 0.6 / 0.8 / 1.0
  - strength_clip: compare 0.6 / 0.8 / 1.0
Sharing: 30 min

Target: students can choose and combine LoRAs according to their goal.

Session 9: Character consistency and interim assignment

As a Phase B summary, combine ControlNet, IPAdapter, and LoRA to generate multiple cuts of the same character and world.

Lecture: 30 min. Strategies for maintaining consistency (fixed reference, fixed LoRA, varying only the pose, etc.).
Practice: 120 min. Start the interim assignment: 1 worldbible sheet + 5–8 images of the same character in different poses / expressions / compositions.
Sharing: 30 min

Target: students can generate not just one-off images but a consistent set of source images.

4.3 Phase C: Video generation (Sessions 10–12)

The aim is to extend Layer 4 into the video domain. Students come to understand the difference between still and video generation and to touch the techniques for controlling inter-frame consistency.

Session 10: AnimateDiff basics

Stepping into temporal control. Covering the basics of txt2video and image-to-video.

Lecture: 60 min. AnimateDiff principles, motion modules, the role of context options, the relationship between frame count and fps.
Practice: 90 min. Generate short 16–32 frame videos from prompts. Then run image-to-video starting from one of the images in the Session 9 series.
- ADE_AnimateDiffLoaderWithContext
  - model_name: v3 or AnimateLCM motion module
- ADE_AnimateDiffUniformContextOptions
  - context_length: 16
Sharing: 30 min

Target: students understand the difference between still and video generation and can reliably produce short videos.

Session 11: Applied video generation (pose-to-video, video-to-video)

Moving on to video generation that starts from live-action footage or existing video.

Lecture: 30 min. Frame extraction, video-version preprocessors (DWPose etc.), and using the video-version ControlNet.
Practice: 120 min. Have students bring in short live-action footage (a phone-shot clip is fine); transform into a different character via pose extraction; convert the same footage with video-to-video into a different style.
Sharing: 30 min

Target: students can transform live-action material with AI while respecting the original motion.

Session 12: Latest video models via Partner Nodes

Covering access to the latest commercial models — a feature unique to ComfyCloud.

Lecture: 60 min. The Partner Nodes mechanism and pricing; comparing the characteristics of Wan2.x / HunyuanVideo / Seedream / Kling / Veo, etc.
Practice: 90 min. Run the same prompt across multiple models and observe differences in image quality, motion naturalness, and camera controllability.
Sharing: 30 min

Target: students can choose a model to fit the project’s requirements and get a feel for Partner Nodes’ credit consumption.

4.4 Phase D: Production and critique (Sessions 13–15)

The aim is Layer 5. Students integrate the individual skills from Phases A–C and gain experience reassembling a workflow to fit their own expressive intent.

Session 13: Final-assignment planning and workflow design

Planning a 30 sec – 1 min short video and designing the workflow for it.

Lecture: 30 min. Composition of a short piece, designing the generated cuts, the idea of a “one-cut to few-cuts” piece that doesn’t depend on editing.
Practice: 120 min. Produce a project sheet (A4, 1 page) plus the skeletal design of the planned workflow.
Sharing: 30 min. Cross-review of plans.

Target: students can articulate the correspondence between their expressive intent and the workflow structure.

Session 14: Production day

A production-focused session built around individual work and Q&A.

Practice: 150 min. Each student works on their piece. Instructor and TA circulate and advise individually.
Sharing: 30 min. Share work-in-progress and exchange advice.

Target: students can resolve production issues on their own or with peer advice.

Session 15: Final critique

Screening and critique of finished pieces.

Screening and critique: 150 min. Each piece is screened, followed by the author’s workflow explanation and peer critique.
Wrap-up: 30 min. Reviewing Phases A–D; guidance on migrating to a local environment and next steps.

Target: students can explain a short piece they generated with a self-built workflow — together with its production process — to a third party.

5. Operational risks and caveats

A ComfyCloud-only design carries some structural risks.

If the target layer (Layers 1–5 from §2.2) is left vague as the course proceeds, students will hit the final assignment having only “modified someone else’s workflow,” and assessment criteria will drift away from reality. The syllabus needs to state explicitly which layer is the target.
Credit-based billing makes total cost hard to see across [number of students × average consumption]. When designing the course, estimate expected credit consumption per session and decide up front whether the university covers it in bulk or whether it falls on individual students.
Partner Nodes (nodes that call external APIs) see frequent additions and discontinuations of the underlying models. Specify which models the course covers by category (fast I2V, long-form T2V, 3D generation, etc.) rather than by specific model name, to reduce the revision burden.
ComfyCloud’s terms of service and copyright policy are in flux. If students will publish final pieces outside the school, confirm the rules in effect at that moment.
A fraction of students will not take to node-based interfaces. Prepare an explicit fallback in Phase A — supplemental sessions, additional TAs, ready-made sample workflows.
This curriculum alone does not produce skills in live-action shooting, editing, 3DCG, or compositing. Make this explicit to students and design with the assumption that another course or self-study covers the gap.
Note: ComfyUI is in the middle of a migration to a new node-definition specification (V3 schema), as of April 2026. Workflows distributed to students should be re-tested against the latest version before each academic year begins.
Note: On a 5–10 year horizon, the design of node-based environments themselves may change. Anchoring the syllabus at the “UI-independent” level (principles of node-based work, decomposition of the generation process, how to impose constraints) makes it more resilient to UI revisions.

6. Summary

Building 180 min × 15 sessions around ComfyCloud alone has a very different character from building it around Runway. The former puts “recombine nodes to search for your own expression” at the center; the latter puts “iterate on recipes to ship work at volume” at the center. Neither is superior; the course’s position needs to be set with both characters in view.

The four-phase structure proposed here — foundations / control and expression / video generation / production and critique — is staged along ComfyUI’s actual learning curve. The skeleton is the intentional progression: Phase A internalizes node roles by running existing workflows; Phase B introduces partial modification; Phase C introduces the latest video models; Phase D produces a piece using a self-built workflow. Including the workflow JSON in the final-assignment criteria means the generation process itself is assessed, not just the polish of the finished video.

If this 15-session block is to be embedded into the 30-session, 90-hour “Visual Practicum I/II” framework from Installment #1, a realistic option is to use part or all of the second semester (Practicum II). After live-action, editing, and basic AI generation in the first semester, going deep on ComfyCloud in the second semester is consistent with creating the substrate for the “judgment” practice Andrew Price talks about. An equally valid alternative is to run it as a standalone 15-session elective.

Media Creation Curriculum in the AI Era #1 — Andrew Price Media Creation Curriculum in the AI Era #3 — A Storyboard-Driven AI Video Curriculum