We have entered our final week of the Kickstarter campaign and thanks to everyone we're very close to our goal. With your help we can make it over the finish line. So let's get the word out one more time!
Today I’d like to tell your more about server architecture in massively multiplayer online games. Many MMO companies protect this knowledge and keep it to themselves for several reasons, mostly because developing such games is a very expensive enterprise and why should anyone get this knowledge without paying the price they had to. I can understand that, but I think the actual knowledge worth protecting lies in all the small details that make a game unique, for example the awesome massive battles in Planetside 2, which are impressive from a technical point of view. So, this article is directed at those who consider creating an MMO one day or those who were wondering how it all comes together. Everything I will tell you comes from actual experience and is no theory.
At the beginning a developer must take a look at what the game tries to accomplish. Is it a game with point and click combat or does the player have to aim using the mouse and shoots there without auto-targeting? Is it an RTS type game where the action is a series of commands and events? All this defines what our server architecture needs to be able to do. The more data synchronization between client and server, the higher the load on the server will be in the end. I will focus on a game like Face of Mankind, with twitch based combat, lots of activity and epic battles.
So, an MMO and any online game for that matter consists of the following most basic elements: a server and many clients. But you knew that much. Let’s put it into a simple diagram.
Of course, many important pieces are missing in this setup. I don’t really want to focus too much on the IT, so I’m not covering firewalls, DoS protection or backup solutions. But first, you will need a dedicated database in case of hardware failure. It will also increase your system’s scalability. Next, we should have a login server and another server that handles your updates and patching. Some might argue that having a dedicated login server is unnecessary overhead, but the way I see it is that a login is a process that is perfect to “outsource”. With a dedicated server, many logins can be processed in parallel while not affecting the performance of the other servers. This way it becomes a great shield during high-load situations (launch, start of open beta). Another advantage is that logins can easily be prioritized if too many try to login at the same time.
Each arrow signifies a connection and data flow between the different elements. In the case of the connection between login server and game server the data flow is one-way.
So, how does this simple setup work?
First, clients connect to the patch server to make sure their applications are up to date.
Then they connect to the login server, where their credentials are checked against the database. Upon success it will ask the Game Server whether a gaming session is still open. If yes, tell the client to try again later while the Game Server cleans up the session. If no, a session will be established, telling the Game Server to accept connections from the client. The connection between Client and Login Server may then be terminated, as it is no longer needed. For the remaining time data is synchronized between Client and Game Server and the Game Server takes care of persistency (keeping the database up to date). Please note that the “servers” in boxes are not necessarily dedicated physical machines, but server applications. In fact, under low-load conditions it makes a lot of sense to run all of these applications on a single machine to save money. If the load increase, you can run these on dedicated machines. Don’t make the mistakes to over-allocate resources at the beginning of development that you don’t need.
This is probably a very common setup for online games. But it’s not very good.
Having a single Game Server will probably not be enough for long. Also, have clients directly connected to the game’s command centre is not a good idea either. But if we have multiple game servers to spread the work load, we will require one more central application that manages them all. This will change a lot in the architecture.
Now we have added a new World Server, which manages all the Game Servers. When a Game Server is added (at runtime) or fails, the World Server will know about it and act accordingly. Clients will keep a constant connection to the World Server and to one of the Game Servers. The grey connections are temporary ones. The server cluster is now able to better distribute the load among a number of Game Servers and it can change while the game is running.
But we still have a major problem. Clients are still directly connected to very core components, those that we really need to protect to ensure the game is running fine at all times.
To achieve this, we can add so called Proxy Servers. These servers take the load caused by managing client connections, packet encryption and packet compression from the World and Game Servers and therefore improving their performance significantly. Another huge advantage is that you can easily change the number of Proxy Servers if the load demands it. Scalability is a very important aspect to ensure population growth.
Clients now only have a single constant connection to one of the Proxy Servers. These will forward all the packets to and from the clients, they encrypt and decrypt the packets and compress and decompress them. As a result the number of connections to the servers doing the actual game related work is very low.
In this scenario, clients will be assigned to one of the Proxy Server during the login process. The World Server acts as a kind of master server in the cluster, so it will be the server that also handles communication that goes beyond managing some part of the world, such as chat, faction management and other universe-wide features. If we wanted, we could split up the World Server in multiple functional modules, such as a chat module, a faction module. But this will only be necessary if the load gets really, really high.
This is already quite a nice setup, but we can still improve one part that we haven’t really watched so far - the database. Constant access can make even the best database server to its knees. In the early days of Face of Mankind we did almost all database synchronization right when the changes happened. That created lots of delays and affected the gameplay, even creating lag. Later we had each game server perform a synchronization task every once in a while, but this can still be improved massively. Check out the architecture now.
Many say: “You shouldn’t try to improve a database. It is very efficient as it is and it supports caching. So trying to cache the data yourself is not really useful.”
I disagree. Relational databases such as MySQL are great, I adore them. However, even they have overhead. They have to interpret the queries, then search for the data, process it and send it back. On top, the applications accessing the database need to prepare the queries and then interpret the data when it returns. All this takes a lot of time and not often involves non-cached access to the data (hard drive access).
With the newly added Synchronization Server we can save a lot of time and turn database synchronization into an asynchronous process. Changes to players or other objects can be made quickly in-memory on the Synch Server and the server then takes care of applying all the changes to the database. Regularly, but it doesn’t need to happen immediately. In this process we can also include a range of optimization, like combining queries into batches or even skipping changes that happen very often. The Synch Server can always keep storing the right memory representations of a player or an object, ready to be transferred to another server without further processing. For example on login it would check if the Synch Server already has a player’s data loaded and doesn’t need to touch the database. If it’s not loaded yet, it can do it in a streamlined way. After all it’s not important that a login is done within a second. This will reduce the load on the entire cluster enormously.
If you think all these concepts further, you will probably notice many more small optimizations that you could implement, depending on the type of game. I hope my article helped you understand MMO server architecture a little bit better and that you enjoyed this little journey into the more technical aspects of the game. I am just very excited about all this and wanted to share it with everyone.