WebSockets - A Conceptual Deep-Dive

The WebSocket protocol elevated the possibilities of communication over the internet and gave way to a truly realtime web. This article is all about building a deeper understanding of what WebSockets are, how they came to be and what’s actually going on under the hood of applications using WebSockets. This article contains details as per the time of it’s writing, i.e, October 2018.

First, a bit of background

WebSockets have been generally available only since a few years. Before they came along, the “real-time” web existed, but it was difficult to achieve, typically slower, and was delivered by hacking existing web technologies which were not designed for real-time applications.

The web has been built on the back of the HTTP protocol, which was originally designed entirely as a request-response mechanism. Open a connection, describe what you want, get back a response, then close the connection. This was fine in the early days of the web because, back then, we were really just dealing with a text document and maybe a few additional assets (usually images).

Scripting the web with JavaScript

In 1995, Netscape Communications hired Brendan Eich with the goal of embedding scripting capabilities into their Netscape Navigator browser, and thus JavaScript was born. Initially, JavaScript was kind of weird and couldn’t do a whole lot (especially with the extremely limited browser DOM that JavaScript had at its disposal), but it was useful for a few things, such as simple validation of input fields before submitting an HTML form to the server.

Microsoft soon entered the arena with Internet Explorer, which is where the original browser wars really began. Both companies were competing with each other to have the best browser, so features and capabilities were inevitably added on a regular basis to both Netscape and Internet Explorer as each company strove to outdo the other.


Browser Wars

The birth of XMLHttpRequest and AJAX

Two of the most significant capabilities that were soon introduced at that time were the ability to embed Java applets into a page, and Microsoft’s own offering – ActiveX controls. These were essentially precompiled components that could optionally present an embedded user interface of their own within a web page. More than that though, they allowed a whole host of additional possibilities beyond JavaScript’s own (at the time) meager suite of scripting capabilities.

While there were a few comparable networking capabilities available via Java, the most significant background communication feature first appeared in 1999, with the Microsoft XMLHTTP ActiveXObject interface. It was available natively in Internet Explorer 5.0 without installing plugins, it could be instantiated with a single line of JavaScript, and it didn’t require any of the usual friction involved when dealing with Java applets. The XMLHTTP object made it possible to silently issue a request to a server and receive a response – all without reloading the page or otherwise interrupting the user’s experience. JavaScript code could then decipher the response and construct modifications to the page, enabling a whole host of rich experiences to be integrated into a website. Common early use cases included things like allowing a drop-down box to be populated with options based on a user’s prior input, and “instant” validation of username availability while filling out a user registration form.

Example JavaScript code that would have been used to instantiate an XMLHTTP object:

// Note: this code only works in old versions of Internet Explorer
var xmlhttp = new ActiveXObject("Microsoft.XMLHTTP");
xmlhttp.open("GET", "/api/example", true);
xmlhttp.onreadystatechange = function () {
  if (xmlhttp.readyState == 4) {
    alert(xmlhttp.responseText);
  }
};
xmlhttp.send(null);

XMLHTTP later became the XMLHttpRequest de facto standard due to its adoption by other browsers. This was about the time that the term “AJAX” was coined, standing for "Asynchronous JavaScript and XML. The JSON standard later came along and made everything better, but the ‘X’ in AJAX (not to mention the ‘XML’ in XMLHttpRequest) never really went away, despite actual XML-formatted data having largely disappeared from standard messaging payloads.

Comparative example of modern use of the XMLHttpRequest object:

const req = new XMLHttpRequest();
req.addEventListener("load", () => console.log(this.responseText));
req.open("GET", "/api/example");
req.send();

As you can see, the modern analog is much the same as the original, albeit with a few additions to make things a little more concise.

A world of new possibilities

In any case, XMLHttpRequest was still following the same HTTP request-response model used to retrieve the original HTML document. There was no real notion of allowing a server to contact the user proactively, or to establish any kind of general two-way connection for more sophisticated use cases. All the while though, JavaScript was gaining new features and capabilities as time went on, and browsers were enhancing the document object model (DOM), leading to greater and greater potential concerning how JavaScript could be used to enrich the user’s experience of interacting with a web page.


HTTP request-response cycle

As the potential for vibrant experiences started to become apparent, developers naturally gravitated toward the idea of client-server applications implemented directly in the browser. Before this, the standard paradigm for anything non-trivial was to build a dedicated software application, package it up with an installer, have the user download the installer and then install it natively on their machine. Needless to say, this was quite a barrier to entry for all but the tech-savvy user. Keeping the application up to date with fixes and enhancements was a challenge all of its own. It is easy to understand, then, how alluring it was to be able to build an application that neither required installation before it could be accessed nor user training and nagging to iterate on the software’s implementation.

Hatching the real-time web

“Innovation is taking two things that already exist and putting them together in a new way.”
- Tom Freston

When something appears to be technically possible, and the potential reward is worth the effort, we’ll usually go to great lengths to bend what is available into a shape that serves our needs. Thus, developers took XMLHttpRequest and abused it to start emulating real-time back-and-forth communication between the web page and the server. Some of the techniques to do this became commonplace – “standard” even – and eventually started being referred to with the umbrella term Comet.”

Probably the most common of these techniques was (is) long polling. This involves opening an XMLHttpRequest connection to the server and leaving it open until ongoing communication is no longer required. Under normal circumstances when making an HTTP request, the server’s response is streamed back to a client via the connection on which the request was made, the intent being allowing a browser to start rendering an HTML page while waiting for the next part of the document to be delivered by the server. With long polling, the effect of leaving an HTTP connection open is that the server can continue to deliver response data for as long as the connection remains open, and there is no technical requirement that the data be in one format or another, or that the request be closed after sending data to the client. The same applies to the HTTP request payload sent by the client. A server may begin delivering its response before the client’s request data has arrived in its entirety, and the client is not strictly required to stop sending request data until it chooses to do so. This means that, just as the server may continue delivering response data for the life of the connection, the client may do the same, resulting in a de facto two-way communication stream between server and client.


Long Polling

As enabling as it has been for the web application developer community in the absence of other suitable tools, long polling is tricky to do correctly and is fraught with unexpected complications that must be managed. The same goes for other Comet techniques, which are beyond the scope of this article. All in all, long polling is really just a case of repurposing the tools available to do something they weren’t really designed to do.

A real solution was needed – something which would empower developers with proper TCP/IP socket-style capabilities in a web environment. Such a solution would need to be built for the web, and it would need to address all of the concerns that arise when operating in a web environment.

Enter, WebSockets!

Around the middle of 2008, the pain and limitations of using Comet when implementing anything truly robust were being felt particularly keenly by developers Michael Carter and Ian Hickson. Through collaboration on IRC and W3C mailing lists, they hatched a plan to introduce a new standard for modern real-time, bi-directional communication on the web, and thus the name ‘WebSocket’ was coined.

The idea made its way into the W3C HTML draft standard and, shortly after, Michael Carter wrote an article introducing the Comet community to the WebSockets. In 2010, Google Chrome 4 was the first browser to ship full support for WebSockets, with other browser vendors following suit over the course of the next few years. In 2011, RFC 6455 – The WebSocket Protocol – was published to the IETF website.

Today, all major browsers have full support for WebSockets, even including Internet Explorer 10 and 11. Besides, browsers on both iOS and Android have supported WebSockets since 2013, which means that all in all, the modern landscape for WebSocket support is very healthy. Much of the “internet of things” or IoT runs on some version of Android as well, so WebSocket support on other types of devices is reasonably pervasive too, as of 2018.

So what exactly are WebSockets anyway?

In a nutshell, WebSockets are a thin transport layer built on top of a device’s TCP/IP stack. The intent is to provide what is essentially an as-close-to-raw-as-possible TCP communication layer to web application developers while adding a few abstractions to eliminate certain friction that would otherwise exist concerting the way the web works. They also cater to the fact that the web has additional security considerations that must be taken into account to protect both consumers and service providers.

You may have heard WebSockets simultaneously referred to both as a “transport” and as a “protocol”. The former is more accurate, because while they are a protocol in the sense that a strict set of rules for establishing communication and enveloping the transmitted data must be adhered to, the standard does not take any stance regarding how the actual data payload is structured within the outer message envelope. In fact, part of the specification includes the option for the client and server to agree on a protocol with which the transmitted data will be formatted and interpreted. The standard refers to these as “subprotocols”, in order to avoid issues of ambiguity in the nomenclature. Examples of subprotocols are JSON, XML, MQTT, WAMP, et al. and these can ensure agreement not only about the way the data is structured but also the way communication must commence, continue and eventually terminate. As long as both parties understand what the protocol entails, anything goes. The WebSocket provides merely a transport layer over which that messaging process can be implemented, which is why most common subprotocols are not exclusive to WebSocket-based communications.

A quick note about authentication and authorization

Seeing as WebSockets is a thin layer built on top of TCP/IP, anything beyond the basic handshake and specification for message framing is really something that needs to be handled either on a per-application or per-library basis. Quoting the RFC:

This protocol doesn’t prescribe any particular way that servers can authenticate clients during the WebSocket handshake. The WebSocket server can use any client authentication mechanism available to a generic HTTP server, such as cookies, HTTP authentication, or TLS authentication.

In a nutshell, use the HTTP-based authentication methods you’d use anyway, or use a subprotocol such as MQTT or WAMP, both of which offer approaches for authentication and authorization.

Getting the ball rolling with HTTP

One of the early considerations when defining the WebSocket standard was to ensure that it “play nicely” with the web. This meant recognizing that the web is generally addressed using URLs, not IP addresses and port numbers and that a WebSocket connection should be able to take place with the same initial HTTP-based handshake used for any other type of web request.

Here’s what happens in a simple HTTP GET request.

Let’s say there’s an HTML page hosted at http://www.example.com/index.html. Without getting too deep into the HTTP protocol itself, it is enough to know that a request must start with what is referred to as Request-Line, followed by a sequence of key-value pair header lines, each telling the server something about what to expect in the subsequent request payload that will follow the header data, and what it can expect from the client regarding the kinds of responses it will be able to understand.

The very first token in the request is the HTTP method, which tells the server the type of operation that the client is attempting with respect to the referenced URL. The GET method is used when the client is merely requesting that the server deliver it a copy of the resource that is referenced by the specified URL.

A barebones example of a request header, formatted according to the HTTP RFC, looks like this:

GET /index.html HTTP/1.1
Host: www.example.com

Having received the request header, the server then formats a response header starting with a Status-Line, followed by a set of key-value header pairs that provide the client with complementary information from the server, with respect to the request that the server is responding to. The “Status-Line” tells the client the HTTP status code (usually 200 if there were no problems) and provides a brief “reason” text description explaining the status code. Key-value header pairs appear next, followed by the actual data that was requested (unless the status code indicated that the request was unable to be fulfilled for some reason).

HTTP/1.1 200 OK
Date: Wed, 1 Aug 2018 16:03:29 GMT
Content-Length: 291
Content-Type: text/html
(additional headers...)

(response payload continues here...)

So what’s this got to do with WebSockets, you might ask?

Ditching HTTP for something more appropriate

When making an HTTP request and receiving a response, the actual two-way network communication involved takes place over an active TCP/IP socket. The web URL that was requested in the browser is mapped via the global DNS system to an IP address, and the default port for HTTP requests is 80. This means that although a web URL was entered into the browser, the actual communication occurs via TCP/IP, using an IP address and port combination that looks something like, for example, 123.11.85.9:80.

As we now know, WebSockets are built on top of the TCP stack as well, which means all we need is a way for the client and the server to jointly agree to hold the socket connection open and repurpose it for ongoing communication. If they do this, then there is no technical reason why they can’t continue to use the socket to transmit any kind of arbitrary data, as long as they have both agreed as to how the binary data being sent and received should be interpreted.

To begin the process of repurposing the TCP socket for WebSocket communication, the client can include a standard request header that was invented specifically for this kind of use case:

GET /index.html HTTP/1.1
Host: www.example.com
Connection: Upgrade
Upgrade: websocket


WebSockets

The Connection header tells the server that the client would like to negotiate a change in the way the socket is being used. The accompanying value Upgrade indicates that the transport protocol currently in use via TCP should change. Now that the server knows that the client wants to upgrade the protocol currently in use over the active TCP socket, the server knows to look for the corresponding Upgrade header, which will tell it which transport protocol the client wants to use for the remaining lifetime of the connection. As soon as the server sees websocket as the value of the Upgrade header, it knows that a WebSocket handshake process has begun.

Note that the handshake process (along with everything else) is outlined in RFC 6455, if you’d like to go into more detail than is covered in this article.

Avoiding funny business

The first part of the WebSocket handshake, other than what is described above, involves proving that this is actually a proper WebSocket upgrade handshake and that the process is not being circumvented or emulated via some kind of intermediate trickery either by the client or perhaps by a proxy server that sits in the middle.

When initiating an upgrade to a WebSocket connection, the client must include a Sec-WebSocket-Key header with a value unique to that client. Here’s an example:

Sec-WebSocket-Key: BOq0IliaPZlnbMHEBYtdjmKIL38=

The above is automatic and handled for you if using the WebSocket class provided in modern browsers. You need only look for it on the server side and produce a response.

When responding, the server must append the special GUID value 258EAFA5-E914-47DA-95CA-C5AB0DC85B11 to the key, generate a SHA-1 hash of the resultant string, then include it as the base-64-encoded value of a Sec-WebSocket-Accept header that it includes in the response:

Sec-WebSocket-Accept: 5fXT1W3UfPusBQv/h6c4hnwTJzk=

In a Node.js WebSocket server, we could write a function to generate this value like so:

const crypto = require('crypto');

function generateAcceptValue (acceptKey) {
  return crypto
    .createHash('sha1')
    .update(acceptKey + '258EAFA5-E914-47DA-95CA-C5AB0DC85B11', 'binary')
    .digest('base64');
}

We’d then need only call this function, passing the value of the Sec-WebSocket-Key header as the argument, and set the function return value as the value of the Sec-WebSocket-Accept header when sending the response.

To complete the handshake, write the appropriate HTTP response headers to the client socket. A bare-bones response would look something like this:

HTTP/1.1 101 Web Socket Protocol Handshake
Upgrade: WebSocket
Connection: Upgrade
Sec-WebSocket-Accept: m9raz0Lr21hfqAitCxWigVwhppA=

We’re not actually quite finished with the handshake at this point – there are a couple more things to think about.

Subprotocols – Agreeing upon a shared dialect

The client and server generally need to agree on a compatible strategy with respect to how they format, interpret and organize the data itself, both within a given message and over time from one message to the next. This is where subprotocols (mentioned earlier) come in. If the client knows that it can deal with one or more specific application-level protocols (such as WAMP, MQTT, etc.), it can include a list of the protocols it understands when making the initial HTTP request. If it does so, the server is then required to either select one of those protocols and include it in a response header or to otherwise fail the handshake and terminate the connection.

Example subprotocol request header:

Sec-WebSocket-Protocol: mqtt, wamp

Example reciprocal header issued by the server in the response:

Sec-WebSocket-Protocol: wamp

Note that the server must select precisely one protocol from the list provided by the client. Selecting more than one would mean that the server cannot reliably or consistently interpret the data in subsequent WebSocket messages. An example of this would be if both json-ld and json-schema were selected by the server. Both are data formats built on the JSON standard, and there would be numerous edge cases where one could be interpreted as the other, leading to unexpected errors when processing the data. While admittedly not messaging protocols per se, the example is still applicable.

When both the client and server are implemented to use a common messaging protocol from the outset, the Sec-WebSocket-Protocol header can be omitted in the initial request, in which case the server can ignore this step. Subprotocol negotiation is most useful when implementing general-purpose services, infrastructure, and tools where there can be no forward guarantee that both the client and server will understand each other once the WebSocket connection has been established.

Standardized names for common protocols should be registered with the IANA registry for WebSocket Subprotocol Names, which, at the time of this article, has 36 names already registered, including soap, xmpp, wamp, mqtt, et al. Though the registry is the canonical source for mapping a subprotocol name to its interpretation, the only strict requirement is that the client and server agree upon what their mutually-selected subprotocol actually means, irrespective of whether not it appears in the IANA registry.

Note that if the client has requested use of a subprotocol but hasn’t provided any that the server can support, the server must send a failure response and close the connection.

WebSocket Extensions

There is also a header for defining extensions to the way the data payload is encoded and framed, but at the time of this article, only one standardized extension type exists, and it provides a kind of WebSocket-equivalent to gzip compression in messages. Another example of where extensions might come into play is multiplexing – the use of a single socket to interleave multiple concurrent streams of communication.

WebSocket extensions is a somewhat advanced topic, and are really beyond the scope of this article. For now, it is enough to know what they are, and how they fit into the picture.

The client-side – Using WebSockets in the browser

The WebSocket API is defined in the WHATWG HTML Living Standard and is actually pretty trivial to use. Constructing a WebSocket takes one line of code:

const ws = new WebSocket('ws://example.org');

Note the use of ws where you’d normally have the http scheme. There’s also the option to use wss where you’d normally use https. These protocols were introduced in tandem with the WebSocket specification, and are designed to represent an HTTP connection that includes a request to upgrade the connection to use WebSockets.

Creating the WebSocket object doesn’t do a lot by itself. The connection is established asynchronously, so you’d need to listen for the completion of the handshake before sending any messages, and also include a listener for messages received from the server:

ws.addEventListener('open', () => {
  // Send a message to the WebSocket server
  ws.send('Hello!');
});

ws.addEventListener('message', event => {
  // The `event` object is a typical DOM event object, and the message data sent
  // by the server is stored in the `data` property
  console.log('Received:', event.data);
});

There are also the error and close events. WebSockets don’t automatically recover when connections are terminated – this is something you need to implement yourself, and is part of the reason why there are many client-side libraries in existence. While the WebSocket class is straightforward and easy to use, it really is just a basic building block. Support for different subprotocols or additional features such as messaging channels must be implemented separately.

Generating and parsing WebSocket message frames

Once the handshake response has been sent to the client, the client and server are free to begin communicating using their chosen subprotocol (if any). Take a quick look at section 5 of the RFC to get a sense of what’s involved.

WebSocket messages are delivered in packages called “frames”, which begin with a message header, and conclude with the “payload” – the message data for this frame. Large messages may split the data over several frames, in which case you need to keep track of what you’ve received so far and piece the data together once it has all arrived.

Moving forward – Other things you might consider

Basic implementation of a WebSocket server is really just the first stage of the process. Here are just a few of the things you’d want to think about when taking things to the next level:

  • What framing extensions will you support, such as per-message deflation?
  • What degree of client interoperability are you aiming for?
  • Are messages being received in the same order they were sent, and if not, how can you prevent this from putting your application into an invalid state?
  • Do you need guarantees on message delivery, and if so, what strategies can you implement to this end?
  • How many connections are active on your server?
  • Are any connections hogging all of the server’s resources?
  • Are any connections idle and should ideally be dropped?
  • What is the duration of the average connection lifespan?
  • Are connections being dropped prematurely/unexpectedly, and if so, how can you retain diagnostic data to explain why?
  • Are you experiencing brief connection spikes ever, and if so, what is the performance impact on your server?
  • How much bandwidth is being used overall, and how is it impacting your budget?
  • Is your server’s capacity near its limit, and if not, how soon will that threshold be reached?
  • How will you automatically add additional server capacity if and when it is needed?

Think about the messaging protocols available, such as MQTT, WAMP, etc., and whether they can provide a solution to some of these questions. Consider existing libraries and frameworks, and the additional features they offer beyond simple, bare-bones management of WebSocket connections. If you have a particular need to scale, and limited workforce or expertise to do so effectively, consider leveraging cloud-based realtime messaging solutions that have already solved these problems for you.

Some of the open source WebSocket libraries you can use right now

There are two primary classes of WebSocket libraries; those that implement the protocol and leave the rest to the developer, and those that build on top of the protocol with various additional features commonly required by realtime messaging applications, such as restoring lost connections, pub/sub and channels, authentication, authorization, etc. The latter variety often requires that their own libraries be used on the client side, rather than just using the raw WebSocket API provided by the browser. As such, it becomes crucial to make sure you’re happy with how they work and what they’re offering. You may find yourself locked into your chosen solution’s way of doing things once it has been integrated into your architecture, and any issues with reliability, performance, and extensibility may come back to bite you.

I’ll start with a list of those that fall into the first of the above two categories.

Note: All of the following are open-source libraries.

ws

ws is a “simple to use, blazing fast and thoroughly tested WebSocket client and server for Node.js”. It is definitely a barebones implementation, designed to do all the hard work of implementing the protocol, however additional features such as connection restoration, pub/sub, and so forth, are concerns you’ll have to manage yourself.

Client (Browser, before bundling):

const WebSocket = require('ws');

const ws = new WebSocket('ws://www.host.com/path');

ws.on('open', function open() {
  ws.send('something');
});

ws.on('message', function incoming(data) {
  console.log(data);
});

Server (Node.js):

const WebSocket = require('ws');

const wss = new WebSocket.Server({ port: 8080 });

wss.on('connection', function connection(ws) {
  ws.on('message', function incoming(message) {
    console.log('received: %s', message);
  });

  ws.send('something');
});

μWebSockets

μWS is a drop-in replacement for ws, implemented with a particular focus on performance and stability. To the best of my knowledge, μWS is the fastest WebSocket server implementation available by a mile. It’s actually used under the hood by SocketCluster, which I’ll talk about below.

There has been a little controversy around μWS recently due to the author having attempted to pull it from NPM for philosophical reasons, but the latest working version remains on NPM and can be installed specifying that version explicitly when installing from NPM. That said, the author is working on a new version, with accompanying node.js bindings also in development.

Server (Node.js):

var WebSocketServer = require('uws').Server;
var wss = new WebSocketServer({ port: 3000 });

function onMessage(message) {
  console.log('received: ' + message);
}

wss.on('connection', function(ws) {
  ws.on('message', onMessage);
  ws.send('something');
});

faye-websocket

faye-websocket is a standards-compliant WebSocket implementation for both the client and server, and originated from the Ruby-on-Rails community as part of the Faye project. As per the Github, project:

It does not provide a server itself, but rather makes it easy to handle WebSocket connections within an existing Node application. It does not provide any abstraction other than the standard WebSocket API.

In the sample code below for the server, you can see that all of the work of handling the connection upgrade and translating messaging frames from inbound socket buffers is handled by the WebSocket class provided by the library. As with other minimal solutions, this is a no-frills implementation – you’ll need to handle application-specific concerns yourself.

Client (Browser, before bundling):

var WebSocket = require('faye-websocket'),
var ws = new WebSocket.Client('ws://www.example.com/');

ws.on('open', function(event) {
  console.log('open');
  ws.send('Hello, world!');
});

ws.on('message', function(event) {
  console.log('message', event.data);
});

ws.on('close', function(event) {
  console.log('close', event.code, event.reason);
  ws = null;
});

Server (Node.js):

var WebSocket = require('faye-websocket'),
var http = require('http');

var server = http.createServer();

server.on('upgrade', function(request, socket, body) {
  if (WebSocket.isWebSocket(request)) {
    var ws = new WebSocket(request, socket, body);

    ws.on('message', function(event) {
      ws.send(event.data);
    });

    ws.on('close', function(event) {
      console.log('close', event.code, event.reason);
      ws = null;
    });
  }
});

server.listen(8000);

Socket.io

Socket.io has been around for a while now and could be thought of as the “jQuery” of WebSockets. It uses long polling and WebSockets for its transports, by default starting with long polling and then upgrading to WebSockets if available. Given the waning relevance of long polling, Socket.io’s main drawcards these days are its other features, such as restoring dropped connections, automatic support for JSON, and “namespaces”, which are essentially isolated messaging channels multiplexed over the same client connection.

Socket.io isn’t actually interchangeable with generic WebSockets solutions – either on the server side or on the client side – and attempting to connect to something other than a Socket.io client or server will fail. It has its own additional handshake protocol, and some additional metadata included in each message.

Should you use it? On the plus side, in addition to the simplicity of getting up and running, it’s well-established and there is a wealth of learning material out there if you get stuck. On the other hand, there have been plenty of reports of memory leaks and general performance issues over time, and despite there being a mountain of issues in the Github repository, responses are few, and commits and updates these days seem relatively infrequent. Just like jQuery, Socket.io is largely a product of a bygone era, and for new projects, you’d be better off with something a little more modern. jQuery does still have its fans though, so…

Client (Browser, before bundling):

const io = require('socket.io-client');
const socket = io();
socket.emit('chat message', 'Hello there');

Server (Node.js):

var app = require('express')();
var http = require('http').Server(app);
var io = require('socket.io')(http);

app.get('/', function(req, res){
  res.sendFile(__dirname + '/index.html');
});

io.on('connection', function(socket){
  console.log('a user connected');
});

http.listen(3000, function(){
  console.log('listening on *:3000');
});

SocketCluster

SocketCluster is a full-featured client-server messaging framework built entirely around WebSockets, and uses μWebSockets under the hood. From the website:

SocketCluster is an open source real-time framework for Node.js. It supports both direct client-server communication and group communication via pub/sub channels. It is designed to easily scale to any number of processes/hosts and is ideal for building chat systems. See SocketCluster Design Patterns for Chat.

Unlike simpler solutions such as Socket-io, SocketCluster requires slightly more installation, but it generally very easy to get up and running.

These are taken from the getting started guide in the SocketCluster documentation.

Client (Browser, before bundling):

var socket = socketCluster.create();
socket.emit('sampleClientEvent', {message: 'This is an object with a message property'});

Server (Node.js):

var SocketCluster = require('socketcluster');
var socketCluster = new SocketCluster({
  workers: 1, // Number of worker processes
  brokers: 1, // Number of broker processes
  port: 8000, // The port number on which your server should listen
  appName: 'myapp', // A unique name for your app

  // Switch wsEngine to 'sc-uws' for a MAJOR performance boost (beta)
  wsEngine: 'ws',

  /* A JS file which you can use to configure each of your
   * workers/servers - This is where most of your backend code should go
   */
  workerController: __dirname + '/worker.js',

  /* JS file which you can use to configure each of your
   * brokers - Useful for scaling horizontally across multiple machines (optional)
   */
  brokerController: __dirname + '/broker.js',

  // Whether or not to reboot the worker in case it crashes (defaults to true)
  rebootWorkerOnCrash: true
});

SockJS

SockJS has as its primary feature WebSocket emulation, which puts it into an increasingly-outdated class of solutions, given that support for WebSockets is pretty pervasive these days. From their client repository:

SockJS is a browser JavaScript library that provides a WebSocket-like object. SockJS gives you a coherent, cross-browser, Javascript API which creates a low latency, full duplex, cross-domain communication channel between the browser and the web server.

It supports numerous fallback transports, giving it a fairly robust suite of support for Comet techniques. Again though, how relevant is this in the modern web landscape?

See SockJS: web messaging ain’t easy and SockJS: WebSocket emulation done right for some insight into the philosophy behind the SockJS suite of libraries.

On the server side, check out the SockJS-node project.

Client (Browser, before bundling):

var sock = new SockJS('https://mydomain.com/my_prefix');
sock.onopen = function() {
  console.log('open');
  sock.send('test');
};

sock.onmessage = function(e) {
  console.log('message', e.data);
  sock.close();
};

sock.onclose = function() {
  console.log('close');
};

Server (Node.js):

var http = require('http');
var sockjs = require('sockjs');

var echo = sockjs.createServer({ sockjs_url: 'http://cdn.jsdelivr.net/sockjs/1.0.1/sockjs.min.js' });
echo.on('connection', function(conn) {
  conn.on('data', function(message) {
    conn.write(message);
  });
  conn.on('close', function() {});
});

var server = http.createServer();
echo.installHandlers(server, {prefix:'/echo'});
server.listen(9999, '0.0.0.0');

Scaling beyond a single server

The number of concurrent connections a server can handle is rarely the bottleneck when it comes to server load. Most decent WebSocket servers can support thousands of concurrent connections, but what’s the workload required to process and respond to messages once the WebSocket server process has handled receipt of the actual data? Typically there will be all kinds of potential concerns, such as reading and writing to and from a database, integration with a game server, allocation and management of resources for each client, and so forth. As soon as one machine is unable to cope with the workload, you’ll need to start adding additional servers, which means now you’ll need to start thinking about load-balancing, synchronization of messages among clients connected to different servers, generalized access to client state irrespective of connection lifespan or the specific server that the client is connected to – the list goes on and on.

Such concerns are deserving of an article all of their own, and you’ll find plenty in the Engineering section of the Ably blog – see in particular Matt O’Riordan’s write-up on some of the complexities that were dealt with when building Ably.

Ably – Realtime messaging as a service

As you have seen, there’s a lot involved when implementing support for The WebSocket protocol, not just in terms of client and server implementation details, but also with respect to support for other transports to ensure robust support for different client environments, as well as broader concerns, such as authentication and authorization, guaranteed message delivery-, reliable message ordering-, historical message retention, and so forth.

Ably is a battle-tested, enterprise-ready, cloud-based messaging platform with native support for WebSockets, long polling and other fallback transports, in addition to support for other open protocols such as MQTT. We’ve already done all of the above work for you, which means you can get on with the real task of building your business’s application platform and leave the mammoth task of implementing robust, reliable realtime messaging to us.

Find out more about the Ably Realtime platform.

References and further reading


Get started now with our free plan

It includes 3m messages per month, 100 peak connections, 100 peak channels, and loads of features.

Create your free account