Media Creation Curriculum in the AI Era #4 — Media Programming Practice with TouchDesigner — Expression and Method

Updated: 2026-05*

1. Introduction

This essay is a Q&A between the author and Claude AI about a redesign of the author’s media production course. As the fourth installment in the “Media Creation Curriculum in the AI Era” series, it surveys what can be expressed through media programming in TouchDesigner (henceforth TD), and what lineages and underlying principles its methods carry.

Installment #1 used Andrew Price’s argument as a starting point to set the overall direction for “Visual Practicum I/II.” Installment #2 examined a node-based syllabus built entirely on ComfyCloud. Installment #3 considered a short-film curriculum centered on generative-video AI. All three target works whose final output is “video that lives inside the screen.”

The TD covered here is a tool of a different lineage. It addresses territory that neither generative-video AI nor Premiere/After Effects can replace — work that requires real-time behavior, interactivity, and connection to physical space. The aim of this essay is to frame that territory as “media programming” and to survey the expressions and methods that TD makes possible. The structure as a course (mapping onto 180-min × 15 sessions, evaluation design, operational notes) and the implementation details for physical-space staging will be handled in subsequent essays. This piece is positioned as the prerequisite — “a reference for what TouchDesigner can do.”

1.1 References

1.2 What this essay covers

The two main lineages of moving-image technology — time-based vs. real-time
Where TouchDesigner sits, and how it relates to the other real-time options
Generative imagery (filter chains, feedback, spatiotemporal manipulation)
Audio-visual synchronization, and how AI music fits in
Sensing and interaction (camera, body sensors, data)
Real-time AI generation and programmatic integration
Connecting to output and exhibition (multi-display, projection mapping, an overview of lighting control)
A map of existing learning resources

2. A bird’s-eye view of moving-image technology — where TouchDesigner sits

2.1 Two large lineages

The most fundamental axis on which to divide moving-image technology is “when the image is computed.” Along that axis, contemporary moving-image technology splits into two large lineages.

Time-based: the image is computed once, at production time, and saved as a fixed video file. Playback shows the same image every time
Real-time: while the system runs, the image is computed every frame. The same piece produces different imagery for each viewer’s encounter

This split is consistent with the way US MFA programs separate their departments (Time-Based Media vs. Interactive/Computational Media) and with the curatorial axis MoMA uses for its Time-Based Media collection — a distinction that is well established in the media-art field.

TouchDesigner belongs to the real-time lineage. The tools covered in Installments #1–#3 (Veo, Kling, Higgsfield, ComfyCloud, Premiere, After Effects, etc.) are all on the time-based side. This essay shifts the series’ center of gravity from time-based to real-time.

2.2 Inside the time-based lineage

Time-based production has two stages: generating the raw material, and assembling it.

Generation methods

Live-action shooting: a camera records the physical world
2D animation: hand-drawn, TVPaint, Toon Boom, After Effects animation
3DCG rendering: Blender, Maya, Cinema 4D, Houdini, 3ds Max, etc. — building a 3D world and rendering it to image
AI generation: Veo, Kling, Seedance, Sora, Higgsfield, Runway, Wan, etc. — text-/image-to-video

Assembly (NLE — Non-Linear Editing)

Premiere Pro, After Effects, DaVinci Resolve, Final Cut Pro, etc. — laying generated material on a timeline, compositing, color, titles, audio integration

Final output

A single video file (mp4, mov, etc.)
Main delivery venues: cinema, television, streaming platforms, social-media video, music videos, commercials, web ads

Time-based expression is stable in that the same content plays back regardless of where, when, or how the audience views it. The strength of careful storytelling and dramatic staging derives from that stability.

2.3 Inside the real-time lineage

Real-time is a paradigm built around “designing a system that keeps running.” Its methods and tools come from several lineages, each with its own context of origin.

Visual programming languages: TouchDesigner, Max/MSP/Jitter, vvvv, Pure Data, Cables.gl — wire nodes together to build the program
Creative-coding frameworks: openFrameworks (C++), Processing (Java), p5.js / three.js (JavaScript) — write the program in code directly
Game engines: Unity, Unreal Engine, Godot, Notch — 3D space and physics simulation as first-class objects
VJ / performance tools: Resolume, Vidvox VDMX, MadMapper — optimized for live performance and projection-mapping work

Final output

A system that keeps running (a program, a scene, a patch)
Main delivery venues: installations, live performance, projection mapping, retail and architectural environments, games, websites, signage, virtual production

Real-time expression has, as its defining characteristic, the ability to change in response to audience, environment, and data.

Audience interaction: sensors, cameras, microphones capture the audience’s state and reflect it in the image
Connection to physical space: imagery is placed in the world through projectors, LED walls, monitor arrays — corresponding to the shape of the space
Multi-channel output: simultaneous output to multiple displays builds a vast canvas or an immersive environment
Direct visualization of external data: CSV, JSON, API calls, serial input all turn into image immediately
Bidirectional communication with other systems: lighting consoles (DMX), audio (OSC, MIDI), robotics, ML inference servers, etc.

These are possibilities that exist precisely because the final output is not a video file but “a system that keeps running.”

2.4 Why TouchDesigner within real-time?

There are several options inside the real-time lineage, each with its own strengths. The reasons for picking TD for this series:

Learning curve and reach: node-based work has a gentler entry than code-based frameworks (openFrameworks, Processing), making it easier to deliver to art-school and design students
Breadth of coverage: imagery (TOP), audio (CHOP), 3D (SOP), data/text (DAT), and components (COMP) are integrated in a single environment. Max/MSP leans audio, vvvv is Windows-only — each has a tilt. TD is the most balanced as an image-centric general-purpose environment
Strength in physical output: multiple-projector, LED-wall, and multi-monitor output; various video I/O cards (NDI, SDI, Blackmagic, AJA, etc.); built-in projection-mapping features
Sensor and data integration: direct connection to Azure Kinect, Leap Motion, various serial devices, OSC, MIDI, Arduino, etc.
Industry-standard status: in events, live production, and projection-mapping work, TD is one of the de facto standard tools. It is used as core infrastructure at Rhizomatiks Research, WOW, teamLab, and similar studios
Free non-commercial license: education and personal use can run on the Non-Commercial build (capped at 1280×1280 output)

Unity / Unreal Engine, Notch, openFrameworks and similar have strengths that TD does not, and they matter as later destinations in a learning path.

2.5 The two lineages are complementary

Time-based and real-time are not opposing choices. The current reality is that a single artist moves between them, picking the output format that fits the job.

Material crosses between them routinely.

Time-based → real-time: 3DCG renders or shot footage used as material that TD or Unity controls in real time
Real-time → time-based: imagery generated in TD recorded out, finished in Premiere/AE, delivered to social media or film festivals
Their merger: with StreamDiffusion and similar tools, AI image generation has begun crossing from time-based into real-time

The time-based work covered in Installments #1–#3 and the real-time work covered here are complementary. By experiencing both, students become artists who can make both “the image inside the screen” and “the image in physical space.”

3. Generative imagery

What lies at TD’s core is not playing back a fixed video, but generating imagery as a program that recomputes every frame. This chapter covers three method groups at that core.

3.1 Filter chains as generative graphics

A method where multiple filter, transform, and color-transform operators are chained to generate organic or geometric abstract imagery. As still output, posters; as motion, VJ material, signage, screensaver-style ambient imagery.

Reference artists and works

Manfred Mohr, Vera Molnar: algorithmic art from the 1960s–70s — the earliest body of computer-driven geometric composition
Joshua Davis: praystation, Hype Framework — popularizing generative graphics through Flash and Processing
Refik Anadol (still-image works): the data sculpture series
Javier Casadidio: TD tutorials published on YouTube — organic, fluid imagery built by stacking Optical Flow, Particles, and Displace

Implementation core

Chaining the basic TOP operators (Composite, Blur, Displace, Lookup, Noise, Level, Ramp, Edge, etc.) and moving the parameters at each stage produces unpredictable textures
The chain-of-filters mindset is a direct descendant of the analog video synthesizer lineage: the Sandin Image Processor (designed 1971–74), the Paik-Abe Synthesizer (1969–71), and the Vasulkas’ work
Exemplifies “exploratory making” — not designing but trying and discovering. Same shape as a culinary apprenticeship: start by reproducing existing recipes, then through countless prototypes discover your own taste
Cognitively a correct strategy for high-dimensional parameter spaces (dozens of parameters in combination) — exploration and selection, not blind chance

Learning resources

3.2 Feedback and time-evolving patterns

Self-running pattern generation via feedback. A simple structure — apply an operation to the previous frame and write it back each frame — gives rise to complex patterns that evolve over time.

Historical placement

Central technique of 1970s analog video synthesizers: Sandin Image Processor, Paik-Abe Synthesizer
Mathematically related to dynamical systems like cellular automata and reaction-diffusion
A canonical case of the generative-art idea that “the combination of nodes itself is the structure of the work”

Implementation core

Built around the Feedback TOP, combined with Transform TOP (rotation, scaling), Blur TOP, Level TOP, Displace TOP
Typical patterns: “rotation + slight scale-up” produces spiral evolution, “Displace driven by noise” produces fluid evolution, “Level for tone compression” produces limit cycles
Combined with the filter chains of §3.1, you get layered, distinctive textures

Learning resources

Feedback

3.3 Spatiotemporal manipulation — Time Displacement / Slit Scan

A paradigm that swaps the time axis with a spatial axis. Assigning the time axis to the vertical, horizontal, or diagonal axis of the image produces uncanny, characteristically graphical textures.

Reference artists and works

Daniel Crooks: Static No. 12 (seek stillness in movement, 2009–10) from the Time Slice series. Footage of a man practicing tai chi in a Shanghai park is sliced along time, so that the motion of the body extends into space — a singular spatiotemporal expression
Adam Magyar: Stainless (2010–11), Urban Flow (2007–15). Using slit-scan, he scans subway platforms and busy intersections along the time axis
Federico Solmi: painterly spatiotemporal manipulation

Implementation core

Time Displacement: a Texture 3D TOP holds a buffer of past frames; the Time Machine TOP reads a per-pixel time offset from a Displacement Map (second input). A vertical time gradient via Ramp TOP, or random delays via Noise TOP, are standard starting points
Slit Scan: a Cache TOP holds past frames; a Crop TOP extracts each time-line and a Composite TOP reassembles them
These produce graphical textures that you cannot get from frame-by-frame stop motion or After Effects’ Time Warp

Learning resources

4. Sound and image

In real-time work, sound and image are inseparable. This chapter covers three methods for designing the correspondence between them.

4.1 Audio Reactive

The most classical form of media-programming expression: map the results of audio analysis to image parameters. It is also the foundational technique of VJ culture.

Implementation core

Audio capture via Audio File In or Audio Device In CHOP
FFT analysis via Audio Spectrum CHOP, splitting into low/mid/high bands
Peak detection and amplitude/period measurement via Analyze CHOP
The design problem is to map bands and features to image parameters (hue, scale, density, position, etc.)
Typical starting points: low band drives the size of a Circle TOP; high band changes the seed of a Noise TOP

Learning resources

Audio Reactive ver.2

4.2 Audio-Visual expression

Beyond mere reaction to sound — designing structural, philosophical correspondences between sound and image. From the 1990s onward, it established itself as an independent art form alongside the development of digital tools.

Reference artists and works

Ryoji Ikeda: test pattern, datamatics, supersymmetry, data-verse — sonifying sine waves, white noise, and data, then strictly synchronizing them with black-and-white patterns, grids, and scan lines
Ryoichi Kurokawa: rheo, ad/ab Atom, syn_ — particle imagery and noise/drone sound linked at high resolution
Carsten Nicolai (alva noto): unitxt, xerrox — exploring the correspondence between sonic glitch and geometric imagery
Robert Henke: lumière, CBM 8032 AV — laser and custom synthesis for geometric performance

What these works share is the idea that “sound and image have the same source.” The acoustic data itself is used as the image source; both are generated from the same mathematical structure. The visual vocabulary — black and white, geometry, sine waves, grids, scan lines — follows naturally from that idea.

Implementation core

Map the Audio Spectrum CHOP waveform directly into a Rendered TOP (turning sound itself into image)
Strict synchronization of pure sine tones with geometric patterns
Use no color (or extremely limited color) to purify the correspondence with sound

Learning resources

Audio Reactive ver.2

4.3 Connecting to AI music

AI music generation (Suno, Udio, Stable Audio, etc.) developed rapidly in 2024–2025 and has become a viable source material for TD’s audio-visual work.

What Suno offers

Generates full tracks (melody, accompaniment, vocals, lyrics) from text prompts
Style specification (ambient, electronica, hardcore, noise, etc.) is available
Per-part stem export (bass, drums, vocals, etc.) is supported

How to use it from TD

Read the generated track with Audio File In CHOP, then feed it into a normal audio-reactive patch
With stem export you can map each part to a different visual element (bass → particle weight, drums → flashes, melody → hue, etc.)
Students who cannot write their own music can still get a track that fits their theme immediately

Caveats

Copyright and commercial-use terms for AI-generated music change quickly — always check the latest terms of service
Crediting (e.g., disclosing AI involvement) depends on the exhibition or screening venue’s rules

5. Sensing and interaction

The heart of real-time work is that the imagery changes in response to something external. This chapter organizes the field by input source.

5.1 Optical Flow and ParticlesGPU

A method that extracts motion vectors from camera footage and uses them to drive a GPU particle system. The signature of Javier Casadidio’s style — a paradigm case of the kind of organic, fluid, “TD-feeling” generative expression.

Implementation core

Optical Flow TOP: estimates motion vectors between consecutive frames
ParticlesGPU (a Compute Shader-based particle component): takes Optical Flow as a velocity field and computes particle trajectories every frame
Key parameters: particle count (thousands to millions), particle lifetime, color mapping, trail strength, velocity sensitivity
Beyond camera footage, audio spectrum or pre-generated video can also serve as the velocity field

Reference artists and works

Javier Casadidio: various tutorial works
Memo Akten: Forms (with Quayola) and other particle/fluid pieces
Sougwen Chung: correspondence between particles and bodily motion

Learning resources

5.2 Body landmarks via MediaPipe

Acquire face, hand, and full-body joint positions in real time using a single webcam. Even without dedicated hardware like Azure Kinect, this enables body-driven expression.

Implementation core

MediaPipe: Google’s lightweight ML inference library
Face Mesh (478 points), Hand Tracking (21 points × 2 hands), Pose (33 points), Selfie Segmentation, Object Detection, etc.
The TDMediaPipe component (and similar) carries webcam → landmark coordinates → CHOP all the way through
Feed the joint positions into Instance COMP to implement body-following particles, gesture detection, face filters, etc.

Reference artists and works

Performance work by Rhizomatiks Research and its body-tracking lineage
Daito Manabe’s body-driven work

Learning resources

5.3 3D body tracking with Azure Kinect

Expression backed by 3D body tracking with depth. Where MediaPipe estimates 2D, Kinect measures depth directly, enabling precise 3D understanding of the body.

Implementation core

Azure Kinect structure: time-of-flight (ToF) depth sensor, RGB, and IR array
Body Tracking: outputs 32 joint positions in 3D space
Depth map (Depth TOP): per-frame distance map — the foundation of point-cloud expression
Differences from MediaPipe: 2D estimation vs. 3D measurement; behavior outdoors and at distance; multi-person detection stability

Reference artists and works

teamLab: their immersive installations (spatial designs that respond to audience bodies)
Daito Manabe / Rhizomatiks: the lineage of Kinect-based work from the early years to the present

Learning resources

Azure Kinect integration

5.4 Other sensors and open data

Body sensors (EEG, heart rate, respiration), environmental sensors of various kinds, and open data (via APIs) — all of it can come into TD.

Examples of body sensors

EEG: NeuroSky MindWave, InteraXon Muse, OpenBCI
Heart rate / pulse (PPG, ECG): Polar H10, Pulse Sensor, Empatica E4
Respiration, EMG, EDA (electrodermal activity)

Bring these in over OSC or serial, and you enable “self-mirror” installations where the audience wears the sensor and experiences their own data as imagery. That is a class of experience that generative-video AI and editing software cannot produce in principle.

Examples of open data

Government statistics (e-Stat, etc.), weather APIs (JMA), earthquake data (USGS API), demographics, stock prices, social-media data
Web Client DAT for API calls, Table DAT for tabulation, DAT to CHOP to get numbers out, Geometry COMP Instancing for 3D visualization

Reference artists and works

Ryoji Ikeda: the data-verse series
Refik Anadol: machine hallucinations
Aaron Koblin: flight patterns
Stamen Design

Learning resources

Python integration (foundations of API fetching and sensor communication)

6. Real-time AI and programmatic integration

6.1 Real-time AI generation via StreamDiffusion

A domain that developed rapidly in 2024–2025. Acceleration techniques like LCM (Latent Consistency Model) and SDXL Turbo brought image generation down to a level where it runs per frame.

Implementation core

StreamDiffusion: combines Stable Diffusion / SDXL with acceleration techniques like LCM and sd-turbo, and uses a batch-parallelization architecture called Stream Batch to achieve real-time generation. In image-to-image mode, it can translate input footage (camera, TD-generated imagery, etc.) into a different style in real time
ControlNet: generation under constraints from composition, pose, edges, etc.
DotSimulate’s StreamDiffusionTD (Patreon) or olegchomp’s TouchDiffusion (open source) bridges TD imagery to real-time AI translation

Applications

Installations that translate audience body motion into AI-generated imagery in real time
Blending camera footage with re-painted imagery
Style switching driven by gesture, voice, or sensor input

This is precisely where the boundary between time-based AI generation (offline video file output) and real-time generation (per-frame generation that keeps running) is dissolving — the intersection of Installments #1–#3 and this one.

Learning resources

6.2 Python and LLM integration

TD embeds a Python 3 interpreter, so you can write any processing that operators cannot reach. Combined with LLM APIs (Claude, GPT, etc.), it enables natural-language control and context-responsive behavior.

Implementation core

Python execution via Execute-family DATs, Script CHOP/TOP, the textport
API calls via Web Client DAT or the requests module
Typical pipeline: microphone → Whisper API for transcription → Claude/GPT API for response → Text TOP for caption display
Applications: hand audio features or body state to an LLM and “consult” it for visual parameters; have the imagery’s tone shift in response to what the audience says

Learning resources

7. Connecting to output and exhibition

This chapter surveys the entry points for taking TD imagery into physical space. Implementation details (coordinate alignment across multiple projectors, mapping procedures for solid primitives, production-grade DMX lighting, etc.) will be handled in a subsequent essay on physical-space implementation. Here we only sketch the map of choices.

7.1 Multi-display and projection

TD handles simultaneous output to multiple displays and projectors out of the box.

Window COMP for output control: monitor index, resolution, window position
Supports the various video I/O cards: NDI, SDI, HDMI, DisplayPort, Blackmagic, AJA, etc.
A huge canvas split across multiple projectors (edge blending), independent imagery to multiple surfaces, 360-degree panoramic deployment, and similar are all possible

7.2 Projection mapping overview

Expression that pastes imagery onto physical space through a projector. TD’s strength is handling generation, mapping, and output in a single environment.

Warp deformation: the basic technique of warping the image’s four corners to the shape of the projection surface. Camera Schnappi, kantan mapper, and similar components are standard
Detailed mapping: per-brick imagery on a brick wall, per-face imagery on a 3D object — fine-grained alignment. Implemented with Replicator COMP, Container COMP + Layout TOP, etc.
Projection onto solid primitives: an assembly of white cubes, spheres, triangular prisms etc. mapped with imagery — a spatially compositional approach
Photograph-first calibration: photograph the projection surface first and place imagery against that reference, simplifying on-site alignment

Reference artists and works

Pablo Valbuena: augmented sculpture series
Joanie Lemercier
1024 Architecture
AntiVJ

7.3 DMX lighting control overview

You can drive lighting at the console level directly from TD.

Supports Ethernet-over-DMX protocols including Art-Net and sACN
DMX Out CHOP controls each channel on a lighting console
Synchronize color, intensity, pan, and tilt of full-color LED fixtures (moving heads, LED bars, flood lights, etc.) with the imagery
Unifying audio, image, and lighting into a single TD project gives you integrated control

Use cases

Coupling imagery and lighting in live performance
Controlling atmosphere in galleries and exhibition spaces
Integrated imagery-and-lighting staging in architectural environments

7.4 Implementation details deferred to a subsequent essay

To actually run the methods of §§7.1–7.3 in a real exhibition space, many site-specific judgments come up: coordinate alignment across projectors, lighting placement design, sync with audio, production-grade redundancy. These will be collected as “an implementation guide for physical-space staging” in a subsequent essay.

8. Learning resources and existing materials

The materials already published on lecture.nakayasu.com, organized along this essay’s chapter structure. Use them as the entry points for hands-on practice with each method.

TouchDesigner basics: prerequisites for Chapter 2 and §3.1 — the operator families and wire connections
Poster made with Touchdesigner TUTORIAL 001: §3.1 — poster production via filter chains
Feedback: §3.2 — feedback structures
Time Displacement: §3.3 — Time Displacement
Slit Scan: §3.3 — Slit Scan
Audio Reactive ver.2: §§4.1, 4.2 — Audio Reactive and Audio-Visual expression
Optical Flow and ParticlesGPU: §5.1 — Optical Flow
Azure Kinect integration: §5.3 — Azure Kinect
Python integration: foundation for §§5.2, 5.4, and Chapter 6
Claude integration: §6.2 — LLM integration
MadMapper materials: §7.2 — a comparison reference for projection mapping

Major external resources

Derivative official documentation: primary source for operators and Python
Javier Casadidio YouTube channel: tutorials on filter chains and particle expression
bileam tschepe (elekktronaut) YouTube channel: broad coverage from foundations to advanced
Matthew Ragan’s articles (matthewragan.com): deep dives into Python and data-flow design

9. Closing

This essay has surveyed, across six domains, the expressions and methods that media programming in TouchDesigner makes possible: generative imagery; sound and image; sensing and interaction; real-time AI; programmatic integration; and connection to output and exhibition.

The framework Andrew Price laid out in Installment #1 — that what survives in the AI era is judgment, carried out by high-agency individuals — applies here too. In an era when generative-video AI spits out video in seconds, the meaning of working in a tool like TD — wiring up nodes, pulling in data, designing the correspondence with space, building the audience’s experience as a continuous series of judgments — has, if anything, become clearer.

Subsequent installments in this series are planned to cover:

An implementation guide for physical-space staging, using the Tokyo Metropolitan University production studio as a case study (multi-projector wall and floor coverage, DMX lighting, mapping onto solid primitives, synchronization with AI music from Suno and similar)
An essay on shooting technique: brain-play camerawork (compositional shooting in the manner of futa.729s), specialty shooting (microscope lenses, motion control with Edelkrone, Small Planet / Tilt-Shift work, light-trail photography), and slow-motion expression (the kind of time manipulation done by aaa_tsushi, plus the latest Premiere / After Effects workflows)

Across the series as a whole, the structure will go through the main lineages of moving-image production in order: AI generation (time-based, Installments #1–#3), media programming (real-time, this installment), and shooting technique (time-based, subsequent installments).

Media Creation Curriculum in the AI Era #3 — A Storyboard-Driven AI Video Curriculum Media Creation Curriculum in the AI Era #5 — Implementing Physical-Space Staging with TouchDesigner — Using the TMU Production Studio as a Case Study