Blog: VR performance stuff

VR performance stuff

September 7, 2025 at 22:06-0700 (9 months ago)

Today was the second day of VRelium Enchanted, which I performed at and had a really good time. I’m going to talk a bit about how VRChat performances work and some thoughts about my most recent one, in particular.

UPDATE: This is obsolete! It might still be useful for some folks (especially those working on a budget or trying to get ideas of how to cobble someting together), but my current setup is completely different and much easier to work with.

VR venues

Performing live music in VR has a very different set of considerations than from real life. It’s very difficult to get an ensemble together due to lag between people¹, and also the way that audio is usually sent to the world adds even more lag and makes it impossible for two performers to both share audio into the world.

There are two major styles of performance space: mic boosted, and streamed. Some worlds, such as Transitions Club (which I’ve used for most of my solo shows, such as the recent Radio Free Fedi Fest) support both modes of operation, but most I’ve encountered only support streaming.

Mic boosting

Mic boosting is typically used by open mics or other situations where a lot of performers are playing just one or two songs; for example, Trans Academy does this for the short Moonlit performances (but not for DJ sets or full concerts), and most open mics do this as well. The setup is pretty simple and easy; from the performer’s point of view², there’s a special region in the world which amplifies the volume and/or decreases the attenuation of anyone standing in it, so anything they say over their microphone gets put in everyone’s ears.

This has a number of advantages:

It’s super easy for people to set up since there’s plenty of ways of getting arbitrary audio fed into the VRChat microphone input
The performer can also simultaneously stream to the outside world including audience noise/reactions
Interactions with the audience are more or less immediate
A show can be put on by a single person with no support team

But it has some pretty hefty limitations:

People in the audience generally can’t change the volume level of the performance
VRChat’s own audio transport is pretty low-quality and is meant for real-time speech, not for music (for example, this is what it sounds like on my performance in-world, compared to the quality of my local signal, and the more complex the music, the worse it gets) UPDATE: This has actually changed; as of December 2025 they’ve vastly improved the mic audio, by switching to Steam Audio.
You can only get audio through, rather than any extra visuals
Doing more complex things with the audio gets incredibly finicky and error-prone
The audience size is limited to however many can fit in a single world instance

Streaming

A streaming approach is much more commonly used for music festivals such as Furality, VRelium, CMFS, and so on. In this setup, the performer is sending their audio and visuals through streaming software (typically OBS) to a streaming provider. VRCDN is very popular but any number of things work, including ~~Twitch, YouTube Live, and~~ owncast, the last of which being what I use for my independent shows³.

Streaming has a bunch of advantages over mic boosting:

The audio quality is way better
There’s much better control over the audio signal, and in particular since VRChat always uses its view of the mic input as the lipsync source, being able to separate your vocal audio from your other instruments makes for a cleaner performance
The performer can provide visuals that will appear in the world in some way, usually on a large projection screen behind the stage, and some worlds are built to support multi-screen visuals (by mapping different parts of the screen image to different surfaces in the world)
The audience can be way larger since multiple instances can reference the same video stream (and some venues can also do holographic projection of performers between instances, which is also done using clever video and shader tricks)
You also automatically have a stream that can be viewed from outside VRChat or recorded for later editing
Various compositing proxies can be used to add even more visuals to the outside world stream/recording (for example, having multiple camera operators streaming to video sources that are then assembled by someone else); for example it’s common for music festivals to map the performer’s visuals to one screen while providing audio visualizations for the audience and a countdown timer for the preformer

But it also has disadvantages:

You generally need a support team managing the stream (it’s not super feasible for solo shows — it’s possible and I’ve done it, but it’s much more of a hassle)
There’s a lot of lag between the world and the stream (usually on the order of 5-15 seconds), so interactions with the audience are confusing or on a delay, and people who are looking directly at the performer (rather than the screen) will have extremely bad lip sync⁴
Because the performer is streaming audio and video to the world, extra considerations are necessary for a livestream that includes audience reactions
~~If the venue is small and doesn’t have a means of muting on-stage performers, the audience may hear doubled audio with considerable delay between them~~ UPDATE: Along with the improvements to mic audio, VRChat now allows you to set your microphone output level to 0% which has the effect of preserving lip sync without broadcasting any audio into the instance.

Due to these disadvantages, and since my solo shows tend to be pretty small, if I’m going solo it’s usually using a mic boost.

A typical setup

Most people who perform in VR are exclusively using backing tracks (often ones obtained from karaoke videos, if they’re doing covers of songs), and use Voicemeeter, a virtual mixer combines multiple audio inputs into a single virtual audio source.

For a mic-boosted performance, they’ll be using Voicemeeter to combine the backing track with their microphone, and also monitor the audio back to their own ears.

They may also be doing it that way for a streamed performance, although for streaming the better option is to use OBS’s audio mixing and monitoring for the backing track.

My typical setup

Because I am ~~a masochist~~ coming from the tradition of conventional live performances, I have a tendency to want to perform as much live as possible, and have previously never used a backing track⁵.

Typically I have Voicemeeter capturing my vocal microphone (which is usually the mic built-in to my Bigscreen Beyond) and my computer’s onboard line input, which I then have various amps/pedals/mixers connected to for my performance instruments; my typical loadout is:

My guitar
A cheap reverb pedal from AliExpress
A cheap distortion pedal, also from AliExpress
A Boss RC-20 looper pedal
A dynamic mic, for percussion and such going through the looper
A small lunchbox amplifier, which is a simple way of amplifying that mess to line-level and giving me basic monitoring

I then also have VRChat’s voice input set to Voicemeeter.

When I’m performing, regardless of whether it’s mic-boost or streamed, I have a bunch of OBS scenes with different setups for different purposes, but generally-speaking my scene setup is:

A Spout2 capture source (which allows OBS to directly project VRChat’s camera capture rather than having to make a round trip through the display compositor)
Audio sources for Voicemeeter and then direct inputs for my headset mic, the computer line input, and a music player source for backing tracks
Also an audio output capture for my headset speakers (to capture in-game audio)
Whatever visuals I want to overlay on the stream (Waveform is especially useful)

I make heavy use of OBS’s multichannel audio functionality for this. I set channel 1 to be whatever I want to go to the stream. What goes on here depends on the kind of performance I’m doing:

If I’m doing a mic boosted performance, it’ll typically be my performance audio (either Voicemeeter, or a mix of mic+line input) mixed with the in-game audio so the livestream audience can hear the in-game audience as well
If I’m doing a streamed performance, it’ll be just my performance audio as a mix (usually the raw inputs, not Voicemeeter, as Voicemeeter adds a bit of lag making the timing of the backing track to be somewhat off)

Regardless of this, I also record each separate thing to separate audio channels:

Channel 2: vocal mic
Channel 3: line input
Channel 4: backing track
Channel 5: game audio
Channel 6: Voicemeeter

I rarely actually use the Voicemeeter channel anymore (since I switched to directly mixing the live inputs to the stream) but I’d might as well keep it around, just in case (although keep in mind that using it will require latency compensation). There’s been a few times that it’s been helpful due to OBS settings getting messed up, for example, so I actually didn’t end up recording my raw instrument channel, for example.

Depending on whether I’m using Voicemeeter or my raw microphone as my vocal input affects what sort of lipsync issues I have in the stream and/or recorded video. So I try to keep them the same as each other, but it’s easy to forget and not a huge deal to fix since it’s usually only offset by 1-2 frames (and VRChat’s lip sync isn’t all that great anyway).

My usual performances

So, when I perform, I’m typically playing guitar or piano and singing, and occasionally making use of my effect pedals (especially the looper). Because I need to be able to see my guitar’s fretboard and/or piano’s keys while I perform, I built a custom headset gasket (based on the Slimterface, modified with a big cutout on the bottom, which I really should get around to uploading the files for at some point, oh here we go).

I still haven’t figured out a good way of tracking my hand movements while playing guitar in-headset. I’ve tried a bunch of things and so far the best results I’ve gotten have been with a Leapmotion 1 and Leapify, but it’s still been pretty inconsistent and fiddly (but I have a few more things to try before I give up entirely). But for now, when I’m playing guitar, I just signify it by swapping to a version of my avatar that’s wearing a guitar and then people can just see my arms hang limply by my sides.

Also, because VRChat doesn’t really have any way of bringing printouts into the world with you, to keep my setlist at the ready I open it up in a text editor or similar (usually synced with iCloud Notes or SyncThing or whatever) and then attach the window to my VR playspace from the SteamVR overlay. This can be a little confusing, since it blocks a large part of my field of view and audience members can’t tell that they’re behind it (since it only exists locally and not in VRChat itself).

Since I have VRC+ I could also theoretically make an image of my setlist and print it into the world, but that makes it a manipulable object that people in the audience could steal and mess with, so I haven’t ever actually done that (also music festivals tend to disable or force-remove prints for performance reasons).

There’s probably some clever thing I could do with OSC to provide myself a local avatar HUD or something, but the SteamVR overlay works well enough for me.

What I did tonight

I have a chronic pain disorder which flares up sometimes. On the days leading up to this weekend it was flaring up pretty badly, so I figured I’d play it safe and use backing tracks for once. Since all my music is original, I bounced out vocal-free versions of the songs I was going to perform, then combined them into a single file with a suitable gap between each song. I did leave some time at the end for an acoustic performance of Strategies to Live, because I’m not at all happy with the quality of the album recording, and I also wanted a bit more flexibility in how I performed it.

It turned out that my pain flareup had subsided by performance time, but I decided to go with the backing track anyway, as a bit of an experiment.

My overarching philosophy is that the live version of a song should be special, and a reinterpretation of the album version. Using the album recording as a backing track flies in the face of that. However, this only really makes sense in a context where the audience already knows my music; generally when I’m performing at a larger show, most of the audience is not familiar with my music, and as a result doesn’t actually know that there’s anything special about an acoustic guitar-folk performance of a sex-positive hip-hop song or a 90s alt-rock song or a hardstyle dancepop track.

Further, genre bias is a thing, and when people get the impression that all of my music is acoustic indie rock, if they aren’t into acoustic indie rock they have no reason to check my music out to hear how it really is.

So, at least for this large show I decided it’d be interesting to see how the audience reacts to the songs as they originally were, rather than the “special” renditions I do as a treat. And judging by the audience reaction, it went over pretty well! It was also nice having my hands free, so I could actually move around and dance as I performed, and also could do appropriate avatar switches during some songs (particularly demonstrating some of the titular material changes in Material Change).

But I also ended up realizing that for my live performances, I put a lot of muscle memory into things, and there were a few spots where I came close to forgetting how the words went because I needed the shape of the chords I was making to cue me on them.

A baked backing track had another nice bonus in that I didn’t need to display my setlist, because I could just use audio cues to know which song I was performing. But on the downside, some of the intra-song timings weren’t quite right. I have a particular transition I like to do from You’re Never Around to Better Than Before and while I worked embarrassingly-long on trying to nail the timing in my backing track, I still feel like it didn’t line up with how I wanted it to. It’s probably a thing that only I would ever notice, though.

This also meant I was basically on-rails with no ability to adjust timing based on mood or tripping up on things or whatever. This was, ultimately, more good than bad, since it meant my set length was timed out perfectly (aside from Strategies to Live, which I was able to perform a bit faster than usual to make up for some previous schedule slippage), and also it kept me from rushing my songs, which in turn made tripping up much less likely.

In most of my festival sets I’ve ended up doing one big flub and just powering through it, and tonight I only did a couple of minor flubs which nobody but me would have noticed anyway (even if they were super familiar with my songs which, let’s face it, nobody else is).

I also loved being able to be way more expressive. My avatar is set up to allow me to puppet the eye and mouth shapes through hand gestures, which is pretty common, and I designed my puppetry setup specifically around doing karaoke, which this was just sort of an ascended version of anyway. Having the ability to make hand gestures at all is a huge improvement.

(That said, what’d be even better is getting proper eye and mouth tracking support so I don’t have to do puppetry to begin with, but I’m not ready to spend that kind of money right now.)

Even without the puppetry aspect, I also had friends and fans arrive in the audience and would wave at me, and I was able to wave back, which made a couple of them absolutely delighted!

So, yeah, it gave me a much better audience connection, much better expressiveness in general, a much less stressful performance experience, and me just plain sounding better.

But in doing so I also gave up an amount of spontaneity, and also a feeling that I was actually performing and not just doing, y'know, karaoke. Which just feels weird to me. It feels like I’m lowering my standards, but the audience doesn’t care at all, and if anything the audience seems to prefer the “karaoke” form.

Future plans

I think I’ll go with something like this:

For small shows (like Moonlit) and solo shows (where my existing fans will be the majority of the audience) I’ll continue doing things mostly live on instruments, although I’ll probably start to incorporate more backing tracks in as well to expand my repertoire, since many of my songs aren’t super feasible to have a guitar rendition of (I’d love to be able to do a live version of A Long Plastic Hallway, for example)
For larger shows I’ll continue to do a mix of studio-version backing tracks and acoustic versions as appropriate
For major shows (like if I ever get accepted into Furality) I’ll possibly make some custom backing tracks just for the show

Also, I can still be proud that I’m using my own backing tracks for my own songs. Not that there’s anything wrong with people using existing karaoke versions for their covers, and obviously audiences don’t mind it at all (and are just there to hear amazing singers singing songs they like). And who knows, if I keep uploading karaoke tracks, maybe someday I’ll hear someone doing a cover of one of my songs at one of these things.

You can do limited ensembles by having a chain of people streaming to one another, as long as nobody earlier in the chain needs to be able to hear anyone later in the chain, but that can also be difficult to manage and coordinate. Plus, every link in the chain is an opportunity for network problems. ↩
Or you can just have everyone be physically in the same room.
The actual implementation of this is much more complicated but that’s outside the scope of this article. ↩
Unfortunately Twitch and YouTube are no longer viable options, as their respective wars on adblockers have prevented third-party streaming clients from reliably accessing the streams anymore, and that includes VRChat. The death of open standards is an awful thing to witness in real time. ↩
Many performers opt to just mute their in-game mic so that there’s no lip sync at all, which is throwing out the baby with the bath water, IMO. ↩
Except when doing karaoke, of course. ↩