Real Time Data Methods

I have spent a few years building, and rebuilding a live-data service in my free time. I have progressed in many ways toward a solution that works for me. The goal was a wide-coverage webapp that provided real time updates, and could scale with little consideration. This was my journey.

MeteorJS

A couple of years ago, I got all twitterpated for MeteorJS. You could make live-data apps in no time. It seemed like the best thing since *Insert very cool invention*. The concept was great, the API and documentation was well written, and you could get real-time data into your app! But, there was a low volume grumble about the frustrations around some of the limitations of the framework.

The Good

  • Live-data to the app on updates to the central database.
  • Authentication/permissions built in.
  • MongoDB drives the backend.
  • Good Documentation

 

The Ugly

  • Blaze templating.
  • Difficult to scale.
  • Meteor framework has it’s tentacles in every aspect.
  • They have their own package manager.
  • Weird project structure.

 

CouchDB/PouchDB

CouchDB is an Apache project that has three main driving motivations.

The first is that the DB can be fully used through it’s RESTful interface. Just simple GETs, POSTs, PUTs… That’s awesome right? No drivers or abstractions that you have to hand build!

The second feature is replication. With CouchDB, I can make one-way, two-way, snapshot or even continuous replication of data to some other CouchDB on the internet. It’s great for uptime, backups, and moving production data to a development environment where you can play around without worrying about messing up the production data.

The third feature is revisioning. So long as you don’t mind the size of your data on disk and in memory, you can have access to any document you have ever inserted, and every state that it has ever had by simply twiddling with the revision id.

Remember how I said you don’t need a driver? It is true, but some amazingly smart people, specifically Nolan Lawson, created an in-browser (or NodeJS) lightweight version of CouchDB called PouchDB, which can participate with a remote CouchDB database. With this, you can two-way-sync data, and even use your local DB offline. Truly amazing work.

The Good

  • Two-way syncing and live-data.
  • Offline data, and syncing when you get back online.
  • Great API and documentation.

 

The Ugly

  • Relies on WebSQL, Localstorage or IndexDB which Apple doesn’t really ever want you to use.
  • Syncing can be slow if there are lots of documents to sync.
  • You can’t two-way sync with any more than 5 databases at once. Max concurrent connection limit is reached in most browsers.
  • No real indexes in CouchDB or PouchDB, so queries can be slow.

 

Horizon.io

Horizon is a framework basically built around that idea of using RethinkDB (A very neat, very well planned and executed database) to create “serverless” apps. Really, the most important thing about Horizon is the database, so, let me tell you about that first.

RethinkDB is a super-scalable database (scale out with clusters) that features a Mongo-like query language,  tables and table-join ability, and live-data. The only reason that this is not the best DB for live-data is the need for a driver to use it. For some reason, the developers created drivers for all sorts of languages, and stopped short of a browser one.

This is where Horizon comes in. Horizon is very much like Meteor in the way that it provides an ecosystem for a developer to work inside of. This scares me a bit because of the lock-in and limitations that Meteor created with this approach, but also doesn’t scare me away completely because it is backed by such an amazing database. Full disclosure, I have not yet built a webapp using Horizon, but I will give you my opinion of what I see.

The Good

  • RethinkDB backed (more scaling than an Alaskan fisherman)
  • Built in authentication/permissions
  • Live-data
  • No special package manager

 

The Ugly

  • RethinkDB and Horizon development has been halted for a few months now
  • Certain powerful features like GeoSpatial index queries are not available
  • Big ‘ol framework, like Meteor

 

My Current Solution

So, what do I use for my live-data webapp? Well, none of these currently. In building my app I progressed from Meteor to PouchDB and now a little something I lashed together that works perfectly for my needs. Turns out, some genius out there created that missing RethinkDB driver, with a Websockets transport layer underneath. This fellow and his libraries can be found here and here. Using his client and server code, I am able to achieve live-data, with no giant or opinionated framework and with very wide browser coverage (Looking in your direction PouchDB).

Things to note about this solution:

By default, the driver wants to run on an ephemeral port which many paranoid IT folks don’t allow in their networks or their browsers to use. The only way to get around this is to serve both the webapp and the live-data connection on the same port, 80. To accomplish this, let ExpressJS serve the webapp and listen for websocket connections on the same port. You will also need to configure Nginx not to timeout your websocket connections after 30 seconds, which is the default behavior (see below).

server.js
var express = require('express');
var webServer = express();
var http = require('http');
var wsListen = require('rethinkdb-websocket-server').listen;
var RethinkDB = require('rethinkdb');
 
webServer.use('/', express.static('public'));
var httpServer = http.createServer(webServer);
wsListen({
    httpServer: httpServer,
    httpPath: '/rethinkApi',
    unsafelyAllowAnyQuery: true,
    loggingMode: 'none'
});
httpServer.listen(8000);
client.js
var RethinkdbWebsocketClient = require('rethinkdb-websocket-client');
var RethinkDB = RethinkdbWebsocketClient.rethinkdb;
let rethinkOptions = {
    host: location.hostname, // hostname of the websocket server
    port: dev ? 80 : 443,
    path: '/rethinkApi', // HTTP path to websocket route
    wsProtocols: ['base64'], // sub-protocols for websocket, required for websockify
    secure: dev ? false : true, // set true to use secure TLS websockets
    db: 'testDB', // default database, passed to rethinkdb.connect
};
RethinkdbWebsocketClient.connect(rethinkOptions).then((conn) => {
    console.log('Connected!');
}).catch((err) => {
    console.log('Not Connected :(', err);
});

Using the variable RethinkDB in client.js, you can run queries and set up live-data listeners which you can use to update your app in realtime. When mixed with something like a centralized state library like Vuex or Redux, it becomes almost trivial to provide live-data to any webapp, even if you have already built it.

nginx.conf
server {
    listen 80;
    listen 443 ssl;
 
    ssl_certificate /path/to/crt.crt;
    ssl_certificate_key /path/to/key.key;
 
    server_name something.com;
 
    if ($ssl_protocol = "") {
        rewrite ^ https://$host$request_uri? permanent;
    }
 
    location / {
        proxy_pass http://127.0.0.1:8000;
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection 'upgrade';
        proxy_set_header Host $host;
        proxy_cache_bypass $http_upgrade;
        proxy_read_timeout 86400s;
    }
}

I like this particular solution because it isn’t opinionated, and in no way dictates the other frontend technologies that I am allowed to use. I can program with the stuff I like, and live-data is loosely coupled into the mix. If I ever need to scale, doing so would be a simple matter of spawning some duplicate servers, configuring RethinkDB to distribute and share it’s load to those other instances, and setting up round-robin load balancing in my Nginx configuration.

Good Luck and Gosh Speed

About the Author

Corey Webster profile.

Corey Webster

Sr. Consultant
Leave a Reply

Your email address will not be published.

Related Blog Posts
Natively Compiled Java on Google App Engine
Google App Engine is a platform-as-a-service product that is marketed as a way to get your applications into the cloud without necessarily knowing all of the infrastructure bits and pieces to do so. Google App […]
Building Better Data Visualization Experiences: Part 2 of 2
If you don't have a Ph.D. in data science, the raw data might be difficult to comprehend. This is where data visualization comes in.
Unleashing Feature Flags onto Kafka Consumers
Feature flags are a tool to strategically enable or disable functionality at runtime. They are often used to drive different user experiences but can also be useful in real-time data systems. In this post, we’ll […]
A security model for developers
Software security is more important than ever, but developing secure applications is more confusing than ever. TLS, mTLS, RBAC, SAML, OAUTH, OWASP, GDPR, SASL, RSA, JWT, cookie, attack vector, DDoS, firewall, VPN, security groups, exploit, […]