Let's face it talking to a computer is an old dream. The thing is, the technology is here today, we can dictate and the computer can write our musings (it would be so much better if I were using 1 of those programs right now). Ok so it does take some time to set up, and I'm sure there's some overhead.
But let's not fixate on a full recognition package, we don't need all that for an MMO. Instead, we need a fairly short list of commands, as few as 20 and certainly under 200. How do I come up with such a range? Simple 20 words should be plenty for controlling a pet, 200 words would leave room for 180 spells (or any verbal abilities) + the 20 I'm using for the pet. If you had the time/resources you could make the list more dynamic, players could each have an individual list (saved client side) that draws from a larger pool. At this point you can shrink the client's active list way down again because it is unlikely any class would use more than 75 words.
Under this, the 20 pet commands would be simple recognition, that is the computer would know the word and execute "attack" "flank". Spell casting (and any vocal skills) would go 1 step further, it would use the % match to determine the effect. The game would have the "ideal" file it matches against, and the closer you get the better. Low matches are very likely to be resisted and high matches are more likely to crit, in addition other penalties/rewards could be used (mana usage or longer cooldowns come to mind).
Now as selling points, it's unique and fairly intuitive. The user is only required to buy a mic, something far cheaper than any other "non standard" input device (and something most people have anyways). It's something of a guess, but with a short list to match against I can't imagine it would be all that CPU hungry. It should add to the immersion. There is also a lot of code already written to do this with, which should cut development time and costs.
The biggest question is would people find it fun and play it?
