In my last post I talked about some of the decision process I used when exploring which video platform to use a series of online office hours for ecommerce. In this post I focus on some of the underlying technologies. It was interesting to see how the complexity can blow out pretty quickly. There are lots of interesting things you can do, but they don’t always all work together as you plan.
Input and Output Streams
It may be obvious to most, but it took me a moment to realize that applications can read from the microphone or camera, and can send output to the speakers or screen, but that does not mean you can do a UNIX pipe command between them. That is, on Mac and WIndows you cannot just hook the audio/video output of program 1 to the input of program 2.
There are some 3rd party utilities around that make this possible, such as “Virtual Cable”. BlackHole and Soundflower are example applications on Mac that can connect the audio output of one program to the microphone of another, if you can tell the application to use different “speakers”. (BlackHole etc appears alongside speakers and headphones as the output device. You select that and the audio output goes to BlackHole. The second application then selects a Microphone called BlackHole from which it gets the audio stream.)
Similarly there are programs that can do this with video, such as CamTwist on the Mac. They can create a virtual camera for other applications to use as an input video stream. This allows, for example, the output of CamTwist to be fed into the camera for Google Meet, Skype, etc. The application thinks its just using a different video camera.
There is also OBS Studio (and others) that can watch a part of the screen and capture that, turning it into a video feed. This is useful for example when streaming game play by streaming directly to YouTube, Twitch, etc. But they can also be used to generate a camera source.
I am not trying to exhaustively list such tools, just be aware that they exist. There any many options out there and I would suggest reading some knowledgeable reviews before selecting one. Also be aware that cameras and microphones are separate devices, so you may need to connect up video and audio separately.
One example of wanting to stream audio through another application is to improve audio quality. Using applications such as Audacity (free) or Adobe Audition (paid) you can apply various audio processing techniques like filtering and noise reduction. This can improve the quality of the sound coming from your microphone (e.g. to remove fan noise, or adjust bass and treble levels). There is some skill to doing this, so it is better to do whatever you can to avoid the noise in the first place.
There are lots of instructions out there on how to do sound processing. For example I watched a really interesting one on removing hum. It used a notch filter where you could control the frequency of what to amplify or silence. It showed by amplifying the notch first you could drag the frequency up and down the frequency range and it would amplify the hum when you found the frequency it was at. You then drag that point down so instead of amplifying it quietened that frequency. Once explained it was pretty easy to do. There are lots of people out there sharing their knowledge like this if you look.
As I hobby I play with Adobe Character Animator, animation software for simple cartoons. It’s a bit of fun. But it also introduced me to NDI. Character Animator on Windows has the ability to generate a live video stream in NDI format. Other tools can there convert NDI into a camera source (e.g. using the NDI utility “Virtual Input” from NewTek).
On the Mac, there is standard for video streaming called Syphon, but it is not cross machine. Character Animator 3.2 (at least) can output Syphon directly on Mac, so running Character Animator and the video conferencing software on the same machine does not require NDI. If run on different machines, then NDI can be used to stream the video between the two machines.
There is also a useful tool for Mac called “NDI Syphon” which can listen to a NDI stream and convert it to a Syphon stream. The reason I explored Syphon was CamTwist can read a Syphon stream and turn it into a virtual camera, allowing most software using cameras read the stream as if it was a physical camera. For example, it allows me to feed the Character Animator video stream into a camera to use on a call (Character Animator -> NDI Syphon -> CamTwist -> Skype).
NDI is also supported by Skype. Skype on the Mac supports NDI streaming – you can turn a flag on and generate a good quality video stream for other programs to read from. (I believe this produces better quality than using something like OBS to do screen capture of the Skype window.) On windows you need to make sure you get the “Skype for Content Creators” edition as the default version does not include NDI Support.
One of the nice (and painful) things about NDI is it uses mDNS (multi-cast DNS, aka Bonjour on Mac) to allow NDI streams to broadcast their existence and have other machines on the same network learn and automatically display them in lists of NDI sources. Just start a NDI Test Pattern application on your Windows laptop and other programs on your Mac (on the same subnet) can just see it appear (after a few seconds).
If you need cross subnet support, you can still do it, but you start needing to manually register IP addresses of other hosts, or have a directory server permanently running. This can be useful for more serious video processing shops, like a TV station.
And this is where I hit a blocker for longer than I care to remember in trying to make it work from my office hours.
MacOS NDI Streams Not Discoverable !?#?#!
My problem was I could start applications generating NDI streams on Windows and they would be visible on my Mac and Windows laptops. However, when I created NDI streams on my Mac it seemed to work one day, then a day or two later it stopped working.
One useful utility to debug the problem was “Discovery” which displayed the contents of the local mDNS server. (The application is available for Windows and iOS as well.) I discovered using this that the Mac programs where creating an entry in mDNS, but the entry seemed to be empty! The Windows entries included the host and post to read the NDI stream from.
Why the strange “AJK=MAC” hostname? That was a typo by myself. My work laptop was including a long domain name including names like “roam” in the domain name. This name however did not exist in DNS servers or /etc/hosts. It was just a name given to my laptop. (Hostname was set to a fully qualified domain name, which is generally not recommended, so I tried the above trick of shortening the domain name).
My final solution that started working was to change the hostname to “ajk-mac.local” and then manually enter that with the IP address allocated by my wifi in my /etc/hosts file. (There might be a better option, but this worked for me.)
My theory is the code tried to resolve the host/domain name into an IP address but failed. As a result the mDNS entry was incomplete and sources on my Mac would not be discoverable.
Of course using all these tools adds no CPU overhead or lag to your streams.
Scratch that!!! It can actually be very painful using these programs as it almost seems to work, except now the audio and video do not quite line up any more. This is where OBS Studio has extra power. You can insert delays into the different streams to get them to line up again. This can be acceptable on a live stream where people are just watching you, but it is less acceptable on a video conference call where multiple people are talking and interacting. The lag can be annoying.
For example, many audio effects need to buffer up some sound to do analysis on it to spot frequencies. Other programs that content is streaming through read a buffer at a time, process it, then pass the buffer on. The time to fill the buffer creates latency.
My personal experience so far is to stay away from audio processing. It is much better to avoid noise in the first place than try to remove it with audio processing software (if you are live). This is different if you are editing video where it is fine to post process the audio track to clean it up.
The other problem with audio processing is utilities like Ecamm Live have only basic audio processing (volume), so you either have to insert a processing delay into the audio track (causing it to become out of sync with the video) or you need to use something like OBS Studio which is harder to use, but gives you the control to resync things.
And then there is the good old problem of the more moving parts the more likely something will break. So keeping it simple is good.
My Final Setup
I am trying to avoid OBS Studio as some of the paid products are easier to use and I don’t want to be an expert to have to run a session. That means I am looking for a microphone that does not pick up the fan noise in the first place – using a headset or similar where the mic is close to my mouth.
My first setup then was to just use Google Meet to record the session. No special tricks.
If I want to include an animated character superimposed over the video later, on Windows I would probably use Character Animator via NDI into OBS Studio (with the NDI plugin) so OBS Studio can superimpose my camera and the Character Animator video stream, plus delay audio/video streams as required to get them to line up nicely. Then take the output and feed it into a virtual camera (which OBS supports). There would be lag, but that may be unavoidable simply because Character Animator.
On Mac, I would do the same except NDI could probably be dropped by just using Syphon instead (no network connection required).
My second setup that I am considering moving to is using Ecamm Live. (There are other similar products that I did not evaluate, so this is not an endorsement of Ecamm Live over other products.) I plan to use Skype to connect to the invited guest, then use Ecamm Live to merge the Skype video stream (made available via NDI) and my webcam into a single stream to send to YouTube. This appears to work, but I want to try it out for a few weeks to make sure the approach is robust.
It is very interesting to see what is possible on Mac and Windows related to streaming video content. This post I summarized a few of the more commonly used technologies available. My goals are to avoid the number of hops in processing, as each hop has the potential to introduce lag. But you can do some quite funky things merging the different tools together.