Media Creation Curriculum in the AI Era #4 — Media Programming Practice with TouchDesigner — Expression and Method
Updated: 2026-05*
1. Introduction
This essay is a Q&A between the author and Claude AI about a redesign of the author’s media production course. As the fourth installment in the “Media Creation Curriculum in the AI Era” series, it surveys what can be expressed through media programming in TouchDesigner (henceforth TD), and what lineages and underlying principles its methods carry.
Installment #1 used Andrew Price’s argument as a starting point to set the overall direction for “Visual Practicum I/II.” Installment #2 examined a node-based syllabus built entirely on ComfyCloud. Installment #3 considered a short-film curriculum centered on generative-video AI. All three target works whose final output is “video that lives inside the screen.”
The TD covered here is a tool of a different lineage. It addresses territory that neither generative-video AI nor Premiere/After Effects can replace — work that requires real-time behavior, interactivity, and connection to physical space. The aim of this essay is to frame that territory as “media programming” and to survey the expressions and methods that TD makes possible. The structure as a course (mapping onto 180-min × 15 sessions, evaluation design, operational notes) and the implementation details for physical-space staging will be handled in subsequent essays. This piece is positioned as the prerequisite — “a reference for what TouchDesigner can do.”
1.1 References
- TouchDesigner (Derivative)
- Derivative official documentation
- TouchDesigner basics (lecture.nakayasu.com)
- Poster made with Touchdesigner TUTORIAL 001 (lecture.nakayasu.com Scrapbox)
- Optical Flow and ParticlesGPU (lecture.nakayasu.com Scrapbox)
- Feedback (lecture.nakayasu.com)
- Audio Reactive ver.2 (lecture.nakayasu.com)
- Slit Scan (lecture.nakayasu.com)
- Time Displacement (lecture.nakayasu.com)
- Azure Kinect integration (lecture.nakayasu.com)
- Python integration (lecture.nakayasu.com)
- Claude integration (lecture.nakayasu.com)
- MadMapper materials (lecture.nakayasu.com)
- Javier Casadidio (YouTube channel)
- bileam tschepe (elekktronaut) (YouTube channel)
- Matthew Ragan’s articles (matthewragan.com)
- StreamDiffusionTD (DotSimulate, Patreon)
- TouchDiffusion (olegchomp, GitHub)
- TDMediaPipe (torinmb, GitHub)
- Suno (official)
- Ryoji Ikeda official site
- Ryoichi Kurokawa official site
- Media Creation Curriculum in the AI Era series: Installment #1, Installment #2, Installment #3
1.2 What this essay covers
- The two main lineages of moving-image technology — time-based vs. real-time
- Where TouchDesigner sits, and how it relates to the other real-time options
- Generative imagery (filter chains, feedback, spatiotemporal manipulation)
- Audio-visual synchronization, and how AI music fits in
- Sensing and interaction (camera, body sensors, data)
- Real-time AI generation and programmatic integration
- Connecting to output and exhibition (multi-display, projection mapping, an overview of lighting control)
- A map of existing learning resources
2. A bird’s-eye view of moving-image technology — where TouchDesigner sits
2.1 Two large lineages
The most fundamental axis on which to divide moving-image technology is “when the image is computed.” Along that axis, contemporary moving-image technology splits into two large lineages.
- Time-based: the image is computed once, at production time, and saved as a fixed video file. Playback shows the same image every time
- Real-time: while the system runs, the image is computed every frame. The same piece produces different imagery for each viewer’s encounter
This split is consistent with the way US MFA programs separate their departments (Time-Based Media vs. Interactive/Computational Media) and with the curatorial axis MoMA uses for its Time-Based Media collection — a distinction that is well established in the media-art field.
TouchDesigner belongs to the real-time lineage. The tools covered in Installments #1–#3 (Veo, Kling, Higgsfield, ComfyCloud, Premiere, After Effects, etc.) are all on the time-based side. This essay shifts the series’ center of gravity from time-based to real-time.
2.2 Inside the time-based lineage
Time-based production has two stages: generating the raw material, and assembling it.
Generation methods
- Live-action shooting: a camera records the physical world
- 2D animation: hand-drawn, TVPaint, Toon Boom, After Effects animation
- 3DCG rendering: Blender, Maya, Cinema 4D, Houdini, 3ds Max, etc. — building a 3D world and rendering it to image
- AI generation: Veo, Kling, Seedance, Sora, Higgsfield, Runway, Wan, etc. — text-/image-to-video
Assembly (NLE — Non-Linear Editing)
- Premiere Pro, After Effects, DaVinci Resolve, Final Cut Pro, etc. — laying generated material on a timeline, compositing, color, titles, audio integration
Final output
- A single video file (mp4, mov, etc.)
- Main delivery venues: cinema, television, streaming platforms, social-media video, music videos, commercials, web ads
Time-based expression is stable in that the same content plays back regardless of where, when, or how the audience views it. The strength of careful storytelling and dramatic staging derives from that stability.
2.3 Inside the real-time lineage
Real-time is a paradigm built around “designing a system that keeps running.” Its methods and tools come from several lineages, each with its own context of origin.
- Visual programming languages: TouchDesigner, Max/MSP/Jitter, vvvv, Pure Data, Cables.gl — wire nodes together to build the program
- Creative-coding frameworks: openFrameworks (C++), Processing (Java), p5.js / three.js (JavaScript) — write the program in code directly
- Game engines: Unity, Unreal Engine, Godot, Notch — 3D space and physics simulation as first-class objects
- VJ / performance tools: Resolume, Vidvox VDMX, MadMapper — optimized for live performance and projection-mapping work
Final output
- A system that keeps running (a program, a scene, a patch)
- Main delivery venues: installations, live performance, projection mapping, retail and architectural environments, games, websites, signage, virtual production
Real-time expression has, as its defining characteristic, the ability to change in response to audience, environment, and data.
- Audience interaction: sensors, cameras, microphones capture the audience’s state and reflect it in the image
- Connection to physical space: imagery is placed in the world through projectors, LED walls, monitor arrays — corresponding to the shape of the space
- Multi-channel output: simultaneous output to multiple displays builds a vast canvas or an immersive environment
- Direct visualization of external data: CSV, JSON, API calls, serial input all turn into image immediately
- Bidirectional communication with other systems: lighting consoles (DMX), audio (OSC, MIDI), robotics, ML inference servers, etc.
These are possibilities that exist precisely because the final output is not a video file but “a system that keeps running.”
2.4 Why TouchDesigner within real-time?
There are several options inside the real-time lineage, each with its own strengths. The reasons for picking TD for this series:
- Learning curve and reach: node-based work has a gentler entry than code-based frameworks (openFrameworks, Processing), making it easier to deliver to art-school and design students
- Breadth of coverage: imagery (TOP), audio (CHOP), 3D (SOP), data/text (DAT), and components (COMP) are integrated in a single environment. Max/MSP leans audio, vvvv is Windows-only — each has a tilt. TD is the most balanced as an image-centric general-purpose environment
- Strength in physical output: multiple-projector, LED-wall, and multi-monitor output; various video I/O cards (NDI, SDI, Blackmagic, AJA, etc.); built-in projection-mapping features
- Sensor and data integration: direct connection to Azure Kinect, Leap Motion, various serial devices, OSC, MIDI, Arduino, etc.
- Industry-standard status: in events, live production, and projection-mapping work, TD is one of the de facto standard tools. It is used as core infrastructure at Rhizomatiks Research, WOW, teamLab, and similar studios
- Free non-commercial license: education and personal use can run on the Non-Commercial build (capped at 1280×1280 output)
Unity / Unreal Engine, Notch, openFrameworks and similar have strengths that TD does not, and they matter as later destinations in a learning path.
2.5 The two lineages are complementary
Time-based and real-time are not opposing choices. The current reality is that a single artist moves between them, picking the output format that fits the job.
Material crosses between them routinely.
- Time-based → real-time: 3DCG renders or shot footage used as material that TD or Unity controls in real time
- Real-time → time-based: imagery generated in TD recorded out, finished in Premiere/AE, delivered to social media or film festivals
- Their merger: with StreamDiffusion and similar tools, AI image generation has begun crossing from time-based into real-time
The time-based work covered in Installments #1–#3 and the real-time work covered here are complementary. By experiencing both, students become artists who can make both “the image inside the screen” and “the image in physical space.”
3. Generative imagery
What lies at TD’s core is not playing back a fixed video, but generating imagery as a program that recomputes every frame. This chapter covers three method groups at that core.
3.1 Filter chains as generative graphics
A method where multiple filter, transform, and color-transform operators are chained to generate organic or geometric abstract imagery. As still output, posters; as motion, VJ material, signage, screensaver-style ambient imagery.
Reference artists and works
- Manfred Mohr, Vera Molnar: algorithmic art from the 1960s–70s — the earliest body of computer-driven geometric composition
- Joshua Davis: praystation, Hype Framework — popularizing generative graphics through Flash and Processing
- Refik Anadol (still-image works): the data sculpture series
- Javier Casadidio: TD tutorials published on YouTube — organic, fluid imagery built by stacking Optical Flow, Particles, and Displace
Implementation core
- Chaining the basic TOP operators (Composite, Blur, Displace, Lookup, Noise, Level, Ramp, Edge, etc.) and moving the parameters at each stage produces unpredictable textures
- The chain-of-filters mindset is a direct descendant of the analog video synthesizer lineage: the Sandin Image Processor (designed 1971–74), the Paik-Abe Synthesizer (1969–71), and the Vasulkas’ work
- Exemplifies “exploratory making” — not designing but trying and discovering. Same shape as a culinary apprenticeship: start by reproducing existing recipes, then through countless prototypes discover your own taste
- Cognitively a correct strategy for high-dimensional parameter spaces (dozens of parameters in combination) — exploration and selection, not blind chance
Learning resources
3.2 Feedback and time-evolving patterns
Self-running pattern generation via feedback. A simple structure — apply an operation to the previous frame and write it back each frame — gives rise to complex patterns that evolve over time.
Historical placement
- Central technique of 1970s analog video synthesizers: Sandin Image Processor, Paik-Abe Synthesizer
- Mathematically related to dynamical systems like cellular automata and reaction-diffusion
- A canonical case of the generative-art idea that “the combination of nodes itself is the structure of the work”
Implementation core
- Built around the Feedback TOP, combined with Transform TOP (rotation, scaling), Blur TOP, Level TOP, Displace TOP
- Typical patterns: “rotation + slight scale-up” produces spiral evolution, “Displace driven by noise” produces fluid evolution, “Level for tone compression” produces limit cycles
- Combined with the filter chains of §3.1, you get layered, distinctive textures
Learning resources
3.3 Spatiotemporal manipulation — Time Displacement / Slit Scan
A paradigm that swaps the time axis with a spatial axis. Assigning the time axis to the vertical, horizontal, or diagonal axis of the image produces uncanny, characteristically graphical textures.
Reference artists and works
- Daniel Crooks: Static No. 12 (seek stillness in movement, 2009–10) from the Time Slice series. Footage of a man practicing tai chi in a Shanghai park is sliced along time, so that the motion of the body extends into space — a singular spatiotemporal expression
- Adam Magyar: Stainless (2010–11), Urban Flow (2007–15). Using slit-scan, he scans subway platforms and busy intersections along the time axis
- Federico Solmi: painterly spatiotemporal manipulation
Implementation core
- Time Displacement: a Texture 3D TOP holds a buffer of past frames; the Time Machine TOP reads a per-pixel time offset from a Displacement Map (second input). A vertical time gradient via Ramp TOP, or random delays via Noise TOP, are standard starting points
- Slit Scan: a Cache TOP holds past frames; a Crop TOP extracts each time-line and a Composite TOP reassembles them
- These produce graphical textures that you cannot get from frame-by-frame stop motion or After Effects’ Time Warp
Learning resources
4. Sound and image
In real-time work, sound and image are inseparable. This chapter covers three methods for designing the correspondence between them.
4.1 Audio Reactive
The most classical form of media-programming expression: map the results of audio analysis to image parameters. It is also the foundational technique of VJ culture.
Implementation core
- Audio capture via Audio File In or Audio Device In CHOP
- FFT analysis via Audio Spectrum CHOP, splitting into low/mid/high bands
- Peak detection and amplitude/period measurement via Analyze CHOP
- The design problem is to map bands and features to image parameters (hue, scale, density, position, etc.)
- Typical starting points: low band drives the size of a Circle TOP; high band changes the seed of a Noise TOP
Learning resources
4.2 Audio-Visual expression
Beyond mere reaction to sound — designing structural, philosophical correspondences between sound and image. From the 1990s onward, it established itself as an independent art form alongside the development of digital tools.
Reference artists and works
- Ryoji Ikeda: test pattern, datamatics, supersymmetry, data-verse — sonifying sine waves, white noise, and data, then strictly synchronizing them with black-and-white patterns, grids, and scan lines
- Ryoichi Kurokawa: rheo, ad/ab Atom, syn_ — particle imagery and noise/drone sound linked at high resolution
- Carsten Nicolai (alva noto): unitxt, xerrox — exploring the correspondence between sonic glitch and geometric imagery
- Robert Henke: lumière, CBM 8032 AV — laser and custom synthesis for geometric performance
What these works share is the idea that “sound and image have the same source.” The acoustic data itself is used as the image source; both are generated from the same mathematical structure. The visual vocabulary — black and white, geometry, sine waves, grids, scan lines — follows naturally from that idea.
Implementation core
- Map the Audio Spectrum CHOP waveform directly into a Rendered TOP (turning sound itself into image)
- Strict synchronization of pure sine tones with geometric patterns
- Use no color (or extremely limited color) to purify the correspondence with sound
Learning resources
4.3 Connecting to AI music
AI music generation (Suno, Udio, Stable Audio, etc.) developed rapidly in 2024–2025 and has become a viable source material for TD’s audio-visual work.
What Suno offers
- Generates full tracks (melody, accompaniment, vocals, lyrics) from text prompts
- Style specification (ambient, electronica, hardcore, noise, etc.) is available
- Per-part stem export (bass, drums, vocals, etc.) is supported
How to use it from TD
- Read the generated track with Audio File In CHOP, then feed it into a normal audio-reactive patch
- With stem export you can map each part to a different visual element (bass → particle weight, drums → flashes, melody → hue, etc.)
- Students who cannot write their own music can still get a track that fits their theme immediately
Caveats
- Copyright and commercial-use terms for AI-generated music change quickly — always check the latest terms of service
- Crediting (e.g., disclosing AI involvement) depends on the exhibition or screening venue’s rules
5. Sensing and interaction
The heart of real-time work is that the imagery changes in response to something external. This chapter organizes the field by input source.
5.1 Optical Flow and ParticlesGPU
A method that extracts motion vectors from camera footage and uses them to drive a GPU particle system. The signature of Javier Casadidio’s style — a paradigm case of the kind of organic, fluid, “TD-feeling” generative expression.
Implementation core
- Optical Flow TOP: estimates motion vectors between consecutive frames
- ParticlesGPU (a Compute Shader-based particle component): takes Optical Flow as a velocity field and computes particle trajectories every frame
- Key parameters: particle count (thousands to millions), particle lifetime, color mapping, trail strength, velocity sensitivity
- Beyond camera footage, audio spectrum or pre-generated video can also serve as the velocity field
Reference artists and works
- Javier Casadidio: various tutorial works
- Memo Akten: Forms (with Quayola) and other particle/fluid pieces
- Sougwen Chung: correspondence between particles and bodily motion
Learning resources
5.2 Body landmarks via MediaPipe
Acquire face, hand, and full-body joint positions in real time using a single webcam. Even without dedicated hardware like Azure Kinect, this enables body-driven expression.
Implementation core
- MediaPipe: Google’s lightweight ML inference library
- Face Mesh (478 points), Hand Tracking (21 points × 2 hands), Pose (33 points), Selfie Segmentation, Object Detection, etc.
- The TDMediaPipe component (and similar) carries webcam → landmark coordinates → CHOP all the way through
- Feed the joint positions into Instance COMP to implement body-following particles, gesture detection, face filters, etc.
Reference artists and works
- Performance work by Rhizomatiks Research and its body-tracking lineage
- Daito Manabe’s body-driven work
Learning resources
5.3 3D body tracking with Azure Kinect
Expression backed by 3D body tracking with depth. Where MediaPipe estimates 2D, Kinect measures depth directly, enabling precise 3D understanding of the body.
Implementation core
- Azure Kinect structure: time-of-flight (ToF) depth sensor, RGB, and IR array
- Body Tracking: outputs 32 joint positions in 3D space
- Depth map (Depth TOP): per-frame distance map — the foundation of point-cloud expression
- Differences from MediaPipe: 2D estimation vs. 3D measurement; behavior outdoors and at distance; multi-person detection stability
Reference artists and works
- teamLab: their immersive installations (spatial designs that respond to audience bodies)
- Daito Manabe / Rhizomatiks: the lineage of Kinect-based work from the early years to the present
Learning resources
5.4 Other sensors and open data
Body sensors (EEG, heart rate, respiration), environmental sensors of various kinds, and open data (via APIs) — all of it can come into TD.
Examples of body sensors
- EEG: NeuroSky MindWave, InteraXon Muse, OpenBCI
- Heart rate / pulse (PPG, ECG): Polar H10, Pulse Sensor, Empatica E4
- Respiration, EMG, EDA (electrodermal activity)
Bring these in over OSC or serial, and you enable “self-mirror” installations where the audience wears the sensor and experiences their own data as imagery. That is a class of experience that generative-video AI and editing software cannot produce in principle.
Examples of open data
- Government statistics (e-Stat, etc.), weather APIs (JMA), earthquake data (USGS API), demographics, stock prices, social-media data
- Web Client DAT for API calls, Table DAT for tabulation, DAT to CHOP to get numbers out, Geometry COMP Instancing for 3D visualization
Reference artists and works
- Ryoji Ikeda: the data-verse series
- Refik Anadol: machine hallucinations
- Aaron Koblin: flight patterns
- Stamen Design
Learning resources
- Python integration (foundations of API fetching and sensor communication)
6. Real-time AI and programmatic integration
6.1 Real-time AI generation via StreamDiffusion
A domain that developed rapidly in 2024–2025. Acceleration techniques like LCM (Latent Consistency Model) and SDXL Turbo brought image generation down to a level where it runs per frame.
Implementation core
- StreamDiffusion: combines Stable Diffusion / SDXL with acceleration techniques like LCM and sd-turbo, and uses a batch-parallelization architecture called Stream Batch to achieve real-time generation. In image-to-image mode, it can translate input footage (camera, TD-generated imagery, etc.) into a different style in real time
- ControlNet: generation under constraints from composition, pose, edges, etc.
- DotSimulate’s StreamDiffusionTD (Patreon) or olegchomp’s TouchDiffusion (open source) bridges TD imagery to real-time AI translation
Applications
- Installations that translate audience body motion into AI-generated imagery in real time
- Blending camera footage with re-painted imagery
- Style switching driven by gesture, voice, or sensor input
This is precisely where the boundary between time-based AI generation (offline video file output) and real-time generation (per-frame generation that keeps running) is dissolving — the intersection of Installments #1–#3 and this one.
Learning resources
6.2 Python and LLM integration
TD embeds a Python 3 interpreter, so you can write any processing that operators cannot reach. Combined with LLM APIs (Claude, GPT, etc.), it enables natural-language control and context-responsive behavior.
Implementation core
- Python execution via Execute-family DATs, Script CHOP/TOP, the textport
- API calls via Web Client DAT or the requests module
- Typical pipeline: microphone → Whisper API for transcription → Claude/GPT API for response → Text TOP for caption display
- Applications: hand audio features or body state to an LLM and “consult” it for visual parameters; have the imagery’s tone shift in response to what the audience says
Learning resources
7. Connecting to output and exhibition
This chapter surveys the entry points for taking TD imagery into physical space. Implementation details (coordinate alignment across multiple projectors, mapping procedures for solid primitives, production-grade DMX lighting, etc.) will be handled in a subsequent essay on physical-space implementation. Here we only sketch the map of choices.
7.1 Multi-display and projection
TD handles simultaneous output to multiple displays and projectors out of the box.
- Window COMP for output control: monitor index, resolution, window position
- Supports the various video I/O cards: NDI, SDI, HDMI, DisplayPort, Blackmagic, AJA, etc.
- A huge canvas split across multiple projectors (edge blending), independent imagery to multiple surfaces, 360-degree panoramic deployment, and similar are all possible
7.2 Projection mapping overview
Expression that pastes imagery onto physical space through a projector. TD’s strength is handling generation, mapping, and output in a single environment.
- Warp deformation: the basic technique of warping the image’s four corners to the shape of the projection surface. Camera Schnappi, kantan mapper, and similar components are standard
- Detailed mapping: per-brick imagery on a brick wall, per-face imagery on a 3D object — fine-grained alignment. Implemented with Replicator COMP, Container COMP + Layout TOP, etc.
- Projection onto solid primitives: an assembly of white cubes, spheres, triangular prisms etc. mapped with imagery — a spatially compositional approach
- Photograph-first calibration: photograph the projection surface first and place imagery against that reference, simplifying on-site alignment
Reference artists and works
- Pablo Valbuena: augmented sculpture series
- Joanie Lemercier
- 1024 Architecture
- AntiVJ
7.3 DMX lighting control overview
You can drive lighting at the console level directly from TD.
- Supports Ethernet-over-DMX protocols including Art-Net and sACN
- DMX Out CHOP controls each channel on a lighting console
- Synchronize color, intensity, pan, and tilt of full-color LED fixtures (moving heads, LED bars, flood lights, etc.) with the imagery
- Unifying audio, image, and lighting into a single TD project gives you integrated control
Use cases
- Coupling imagery and lighting in live performance
- Controlling atmosphere in galleries and exhibition spaces
- Integrated imagery-and-lighting staging in architectural environments
7.4 Implementation details deferred to a subsequent essay
To actually run the methods of §§7.1–7.3 in a real exhibition space, many site-specific judgments come up: coordinate alignment across projectors, lighting placement design, sync with audio, production-grade redundancy. These will be collected as “an implementation guide for physical-space staging” in a subsequent essay.
8. Learning resources and existing materials
The materials already published on lecture.nakayasu.com, organized along this essay’s chapter structure. Use them as the entry points for hands-on practice with each method.
- TouchDesigner basics: prerequisites for Chapter 2 and §3.1 — the operator families and wire connections
- Poster made with Touchdesigner TUTORIAL 001: §3.1 — poster production via filter chains
- Feedback: §3.2 — feedback structures
- Time Displacement: §3.3 — Time Displacement
- Slit Scan: §3.3 — Slit Scan
- Audio Reactive ver.2: §§4.1, 4.2 — Audio Reactive and Audio-Visual expression
- Optical Flow and ParticlesGPU: §5.1 — Optical Flow
- Azure Kinect integration: §5.3 — Azure Kinect
- Python integration: foundation for §§5.2, 5.4, and Chapter 6
- Claude integration: §6.2 — LLM integration
- MadMapper materials: §7.2 — a comparison reference for projection mapping
Major external resources
- Derivative official documentation: primary source for operators and Python
- Javier Casadidio YouTube channel: tutorials on filter chains and particle expression
- bileam tschepe (elekktronaut) YouTube channel: broad coverage from foundations to advanced
- Matthew Ragan’s articles (matthewragan.com): deep dives into Python and data-flow design
9. Closing
This essay has surveyed, across six domains, the expressions and methods that media programming in TouchDesigner makes possible: generative imagery; sound and image; sensing and interaction; real-time AI; programmatic integration; and connection to output and exhibition.
The framework Andrew Price laid out in Installment #1 — that what survives in the AI era is judgment, carried out by high-agency individuals — applies here too. In an era when generative-video AI spits out video in seconds, the meaning of working in a tool like TD — wiring up nodes, pulling in data, designing the correspondence with space, building the audience’s experience as a continuous series of judgments — has, if anything, become clearer.
Subsequent installments in this series are planned to cover:
- An implementation guide for physical-space staging, using the Tokyo Metropolitan University production studio as a case study (multi-projector wall and floor coverage, DMX lighting, mapping onto solid primitives, synchronization with AI music from Suno and similar)
- An essay on shooting technique: brain-play camerawork (compositional shooting in the manner of futa.729s), specialty shooting (microscope lenses, motion control with Edelkrone, Small Planet / Tilt-Shift work, light-trail photography), and slow-motion expression (the kind of time manipulation done by aaa_tsushi, plus the latest Premiere / After Effects workflows)
Across the series as a whole, the structure will go through the main lineages of moving-image production in order: AI generation (time-based, Installments #1–#3), media programming (real-time, this installment), and shooting technique (time-based, subsequent installments).