How Ably works
Ably’s global realtime service enables Internet enabled devices, such as a browser, phone, server or IoT sensor, to stream data in realtime between to any other Internet connected device in milliseconds. The Ably platform brings enterprise scale messaging to developers by delivering 100% service availability, message delivery guarantees- and low-latencies globally (typically less than 60ms).
At its simplest, Ably is a cloud-based pub/sub Platform-as-a-Service ensuring any device publishing messages to Ably will be received in real time by any number of subscribing devices. But it is more than that. Ably makes it possible for developers to build apps and infrastructure that can communicate in realtime without the worries of managing scale, latency, data durability, integrity and storage, seamless connection recovery, device interopability, network outages, encryption, security and authentication, throttling, and denial of service attacks, to name a few.
In order to understand how Ably works, and why pub/sub is only one part of the problem we solve for developers, we have have broken down the key concepts below.
The Ably service organizes the traffic within any application into named channels. Clients connected to Ably can either be publishers (they push messages with data to Ably), subscribers (they wait for messages to be pushed from Ably to them), or both. Messages are always published over a named channel. Channels are used to filter messages, so whilst billions of messages may be delivered by Ably, subscribers will only receive the messages on the channels they subscribe to.
Channels are uniquely identified by a string specified when publishing to or attaching to a channel. Publishers and subscribers are completely decoupled: a publisher can publish a message without any subscribers on the channel; subscribers can listen on channels that don’t yet have publishers; arbitrarily many subscribers can receive a single message published on a channel. In other words, Ably channels support one-to-many (fan-out), many-to-one (fan-in), and many-to-many.
The following diagram illustrates a very simple use case for channels. The server and vehicle are publishing messages on channels without any knowledge of subscribers. One mobile device is both a subscriber and publisher and is publishing its location, but also subscribing for alerts. All other devices are subscribed to just one channel.
All messages received by Ably are immediately stored in RAM in three separate physical locations for redundancy. They are then persisted as follows:
- In server RAM for 2 minutes in every location that the channel is active
- On disk in three locations for 24 – 72 hours if the channel is configured to persist messages
Whilst Ably is used primarily by clients to receive messages in real time, Ably provides a history API that allows clients to retrieve older messages from memory and/or disk.
Presence provides awareness of other clients that are connected to Ably and present on a channel. Each member present on a channel has a unique client identifier and an optional payload that can be used to describe the member’s status or attributes. Presence allows you to quickly build apps such as chat rooms or multiplayer games as Ably will automatically keep track of who is present in real time across any device. Clients can also subscribe to the presence events and members on a channel without being present themselves.
There are three presence operations,
enter for new members,
update allowing the payload data associated with a member to be updated and announced to everyone, and
leave for members that have requested to leave or who have left implicitly as a result of their connection being disconnected.
The complete set of members present and their optional payload is stored by Ably in server RAM in at least three locations. Like messages, presence events such as
leave are persisted in RAM for 2 minutes, and optionally to disk in three locations for 24 – 72 if the channel is configured to persist messages
Ably supports two forms of authentication described below. For an in-depth explanation, view the authentication documentation.
Basic authentication is the simplest form of authentication, allowing clients to communicate with Ably by including the complete private API key within the URL or request headers. To mitigate the risks of sending a private key over the Internet, basic authentication is only permitted over an encrypted TLS connection. Private API keys and their capabilities (permissions) are managed within an app’s dashboard. In most cases, we do not recommend that basic authentication is used as it requires that you share your private API key with the client that is connecting to Ably. Our recommendation is to only use basic authentication on your trusted servers.
Token authentication provides a means for clients to connect to Ably without a private API key. Tokens have a time-to-live (they expire) ensuring that any compromise of a token will have a limited impact. Token authentication is also used to identify clients and provide fine-grained access control on a per-client basis.
Most often, token authentication is implemented as follows: a token request is generated by your server; it is signed using your private API key and then passed to the client; the realtime client uses this signed token request to request a token from Ably directly with the trusted token request. This approach is recommended as at no point does your server need to communicate directly with Ably and ever communicate the secret API key.
Client identity and access control
A client using token authentication is considered to either anonymous, or identified if a client ID exists in the token. All messages, presence state and connection state for identified clients contain the trusted client ID and are accessible by other clients. As a private key is needed to generate a token for a client with a client ID, the client ID can be trusted by other clients. Find out more on identified clients.
Tokens are limited to the capabilities of the API key used to create the token. When requesting a token, the capabilities may be limited further using fine-grained permissions using a combination of the operation (such as publish, subscribe, presence) and the channel(s). Find out more on token capabilities.
Global message routing
The Ably service, running in over 24 data-centers globally, provides a true mesh distributed system ensuring that there is both no single point of congestion and no single point of failure. Ably is designed to always route messages using the least number of network hops minimising latency and ensuring maximum performance for clients no matter their location.
The diagram below explains how Ably solves the challenge of efficient global routing at all times:
- The publish only server located in New York is routed to the nearest data center (US East) using our latency based routing;
Msg Apublished to US East is routed directly to clients in US East, and once to every other data center with clients subscribed for messages; Clients in all other regions subscribed for messages receive
Msg Adirectly from the data center they are connected to
- Publish & subscribe client in London is routed to the nearest data (EU West) using our latency based routing;
Msg Bpublished to EU West is routed directly to clients in EU West, and once to every other data center with clients subscribed for messages; Clients in all other regions subscribed for messages receive
Msg Bdirectly from the data center they are connected to
Connection state recovery
It is common for devices to have constantly changing network conditions as a result of moving from a mobile data network to a wifi network, being in a tunnel for a short period, or perhaps due to intermittent network issues. We believe that developers shouldn’t have to worry about service continuity between brief connection loss. As such, Ably ensures that the connection state for a client is retained on our servers whenever a client is disconnected abruptly. By retaining a client’s connection state on our servers, this allows us to replay all channel activity back to a client as soon as it reconnects.
Ably provides the following assurances in regards to connection state recovery:
- Any client that is able to reestablish a connection within 2 minutes will remain attached on all channels and will not lose any messages published by other clients whilst disconnected;
- All messages published on channels whilst the client was disconnected will be replayed to the recovered client in the order they were published;
- There is no upper limit in regards to the number of messages that are queued for disconnected clients, however if a threshold to ensure system stability is exceeded then an error is emitted;
- Connection state recovery provides certainty and is thus binary in its behavior – it either succeeds and all operations continue as if the connection was never disconnected, or it fails by detaching the channel or emitting an error on the channel so that the developer is made aware of the state loss and is able to respond accordingly
The Ably global platform is designed to provide an industry first 100% uptime guarantee. This is possible because redundancy has been addressed in not just every area of our systems, but also within the client libraries used by our customers. Our redundancy is best depicted in the following diagram:
- Connection state recovery
- ensures that clients disconnected abruptly from Ably can resume their connection
- Proactive health checked DNS
- our DNS TTL is kept very low allowing us to route traffic away from unhealthy data-centers in 60s once our monitoring systems detect an issue
- Secondary domain endpoints
- we operate a completely isolated backup domain that is used by our client libraries if the primary
ably.ioendpoint is failing
- Host fallback
- clients that are unable to connect to their closest data center using the primary
ably.iodomain will automatically fallback to an alternative data center using the secondary domain
- Auto scaling
- as load on the system increases, more servers are automatically provisioned in every part of the system that requires additional capacity
- Load balancers
- our load balancers are elastic and scale to meet demand, but are also responsible for intelligently routing traffic to existing and new frontends that are coming online
- our frontends do not store any state thus ensuring that frontends can come online quickly and service new requests, but also go offline easily without data loss
- Self-healing cluster
- as problems are detected in the system, they are isolated or remedied by our automated health servers
- Data replicas
- all data is stored in at least three data-centers across at least two regions ensuring data availability through any imaginable failure
- Multiple availability zones
- in every region our our data is replicated between servers in at least two independent data-centers ensuring outages in one data center cannot cause data loss
- Multiple regions
- ensuring that data is always stored in at least two regions protects against complete region outage or network partitioning
Security and Encryption
By default all of our client libraries use TLS when communicating with Ably over REST or via our Realtime transports such as Websockets. We do not charge customers for using TLS and actively encourage everyone to always use TLS as it provides a secure transport for communication with Ably with very little practical overhead.
Whilst TLS encryption ensures that messages in transit to and from Ably cannot be intercepted, inspected, or tampered with, it does not ensure that the Ably service itself is unable to (in theory) inspect your messages and their content. If you want to ensure that all messages are encrypted and complete opaque and inaccessible to the Ably service, we provide built-in private key symmetric encryption within our client libraries.
Any internet enabled device support
Most of our client libraries use a WebSocket to establish a realtime connection to Ably, and use a simple HTTP request for all other REST operations including authentication.
- WebSockets (supported by 88% of browsers globally as of Nov 2015)
- XHR streaming
- XDR streaming
- XHR polling
- XDR polling
- iFrame XHR polling
- JSONP polling