Voice Recognition For Unreal Engine, Our Big Hurdle


We know our game is probably not as sleek as some of the other games submitted for the most recent UE4 Game Jam, but our goal was to make something unique while pushing the limits of the engine.  So far we have succeeded in this when playing our non-packaged game.  Our next goal is to create a version that could be packaged and played on Windows.  There are a couple of key hurdles that we'll be quietly working on to get towards a playable release.

Generalized Voice Recognition in Unreal Engine

After the Jam we cloned a fresh copy of our game and removed Python from the equation.  The first step for us was to try and work with the latest Audio plugin for UE that includes the "Audio Capture" component.  At this time documentation on the audio capture component is still sparse.  After much tinkering we were finally able to reliably obtain voice samples as a wav file completely through blueprints.

This brought about a new challenge, our goal is to use Deepspeech for voice recognition and that requires the captured voice wav file be in a mono format, with a 16000 sampling rate.  At this stage we are stilling working with the new submix effect and sound bus system to try and ensure that the submix recording is saved in the correct format.  Our other option if that doesn't work will be to adjust the source code of the plugin to save the needed format.

Voice immersion with technology is rapidly increasing and many people have asked about such integrations on the UE forums over the past 5 years or so.  The new audio plugin actually serves as a very good way to capture needed voice data (once we figured out how without ear-piercing feedback), but doesn't have clear options or documentation about how to format a recorded submix.

This brings us to our next biggest challenge.  Now that we can capture voice data natively in UE, we need to be able to send that data to some type of voice recognition service.  Our original concept for the Game Jam relied on Wit.ai, the real world accuracy in our testing was...  not inspiring. 

Introducing Deepspeech, this open source general voice recognition engine is managed by Mozilla.  Yeah, that Mozilla, and with their track record on privacy we felt it was a great option.  Even if it still a young project.  Rather than trying to integrate Deepspeech into UE we opted to setup a Deepspeech server that would handle speech to text requests via CURL.  For this we are using the Va Rest plugin and at this time are close to having the right setup to handle the STT completely via UE blueprints.

We feel that our process in developing Hill Top Trivia highlights the difficulty of integrating voice interaction with Unreal Engine.  Obviously not every game will need such interaction, but for certain games and other types of applications (digital kiosk avatar, training app, etc) easy to use tools for such needs would probably go over very well with the community.

Hopefully the soon to be released toolkit for voice interaction in Unreal Engine from PhonalAI might make our work so far trivial.  Though our team got a much deeper working knowledge of the UE audio system by trying to implement our functionality natively through Blueprints, as well as testing Unreal.js and UnrealEnginePython versions of our core voice interaction mechanics.

Stay tuned as we move towards a fully playable version of Hill Top Trivia - powered by your voice!

Get Top Of The Hill Trivia

Download NowName your own price

Leave a comment

Log in with itch.io to leave a comment.