Node.js + Twitter Streaming API + Socket.io = Twitter Cashtag Heatmap

My friends and I have always tossed around the idea of using real-time Tweets to determine the popularity of a specific company. The vision, in it’s entirety, is pretty grandiose but today I made a small step towards that goal. With the introduction of Twitter’s Streaming API comes a world of possibilities! When I saw this I knew I had to create a project with it. At the same time I’ve been ramping up on Node.js so I thought: what better than to create a project with both technologies, plus Socket.io for good measure. Thus, I present the Twitter Cashtag Heatmap!

 

Demo and Source Code

As always, I’ve provided the demo code here: Demo which is generously (freely) hosted by Heroku. In addition, the project is open sourced so please feel free to take a look at the code! Remember, if you clone the repository make sure you run an ‘npm install’ to download all the dependencies.

 

Creating the Back-end

Let’s first start by covering what is happening in the back-end of the code. The entire server is coded up using Node.js with Express. The server is responsible for serving up a single ‘index.html’ which will include our only page with the necessary Javascript required to communicate to the server to retrieve live data. In addition, the server opens a streaming connection with Twitter and requests that it be informed whenever any of the cash tags we specified are mentioned. When the server receives data from Twitter it processes it sends it to every client via Socket.io.

Lets look at some code:

/**
 * Module dependencies.
 */
var express = require('express')
  , io = require('socket.io')
  , http = require('http')
  , twitter = require('ntwitter')
  , cronJob = require('cron').CronJob
  , _ = require('underscore')
  , path = require('path');

//Create an express app
var app = express();

//Create the HTTP server with the express app as an argument
var server = http.createServer(app);

// Twitter symbols array
var watchSymbols = ['$msft', '$intc', '$hpq', '$goog', '$nok', '$nvda', '$bac', '$orcl', '$csco', '$aapl', '$ntap', '$emc', '$t', '$ibm', '$vz', '$xom', '$cvx', '$ge', '$ko', '$jnj'];

//This structure will keep the total number of tweets received and a map of all the symbols and how many tweets received of that symbol
var watchList = {
    total: 0,
    symbols: {}
};

//Set the watch symbols to zero.
_.each(watchSymbols, function(v) { watchList.symbols[v] = 0; });

//Generic Express setup
app.set('port', process.env.PORT || 3000);
app.set('views', __dirname + '/views');
app.set('view engine', 'jade');
app.use(express.logger('dev'));
app.use(express.bodyParser());
app.use(express.methodOverride());
app.use(app.router);
app.use(require('stylus').middleware(__dirname + '/public'));
app.use(express.static(path.join(__dirname, 'public')));

//We're using bower components so add it to the path to make things easier
app.use('/components', express.static(path.join(__dirname, 'components')));

// development only
if ('development' == app.get('env')) {
  app.use(express.errorHandler());
}

I won’t go into too much detail here since it’s commented pretty heavily, but just note for those whom are shaky at Node.js/Express, this is a typical setup for the two. If you generated a new Express project you’d get most of this code except for the ‘watchSymbols’ and ‘watchList’ variables which I use to hold data for the application. Everything else is pretty standard.

Up until now, we have created a server, configured it, and setup some structures that we’ll use to hold our data. Lets start serving up some data.

//Our only route! Render it with the current watchList
app.get('/', function(req, res) {
	res.render('index', { data: watchList });
});

Again, nothing special here. We only have one page to serve up: index.html. The second render parameter is passing the data to the view layer so we can render the current ‘watchList’. While data is streamed live to the client sometimes it’s nice to have immediate gratification. Also, it allows client’s with the inability to use websockets to see the fruits of our labor.

Next, we’ll setup Socket.io to work with our application.

//Start a Socket.IO listen
var sockets = io.listen(server);

//Set the sockets.io configuration.
//THIS IS NECESSARY ONLY FOR HEROKU!
sockets.configure(function() {
  sockets.set('transports', ['xhr-polling']);
  sockets.set('polling duration', 10);
});

//If the client just connected, give them fresh data!
sockets.sockets.on('connection', function(socket) { 
    socket.emit('data', watchList);
});

Wow, that was easy. One thing to note here is that I had to make some adjustments for Heroku since it doesn’t support websockets. However, if you’re running it locally you can simply remove the sockets.configure call entirely. It’s only needed for Heroku. Finally, the last function tells Socket.io to send the ‘watchList’ whenever a new connection is established.

Next, lets hook up Twitters Streaming API:

//Instantiate the twitter component
//You will need to get your own key. Don't worry, it's free. But I cannot provide you one
//since it will instantiate a connection on my behalf and will drop all other streaming connections.
//Check out: https://dev.twitter.com/
var t = new twitter({
    consumer_key: '',           // <--- FILL ME IN
    consumer_secret: '',        // <--- FILL ME IN
    access_token_key: '',       // <--- FILL ME IN
    access_token_secret: ''     // <--- FILL ME IN
});

//Tell the twitter API to filter on the watchSymbols 
t.stream('statuses/filter', { track: watchSymbols }, function(stream) {

  //We have a connection. Now watch the 'data' event for incomming tweets.
  stream.on('data', function(tweet) {

    //This variable is used to indicate whether a symbol was actually mentioned.
    //Since twitter doesnt why the tweet was forwarded we have to search through the text
    //and determine which symbol it was ment for. Sometimes we can't tell, in which case we don't
    //want to increment the total counter...
    var claimed = false;

    //Make sure it was a valid tweet
    if (tweet.text !== undefined) {

      //We're gunna do some indexOf comparisons and we want it to be case agnostic.
      var text = tweet.text.toLowerCase();

      //Go through every symbol and see if it was mentioned. If so, increment its counter and
      //set the 'claimed' variable to true to indicate something was mentioned so we can increment
      //the 'total' counter!
      _.each(watchSymbols, function(v) {
          if (text.indexOf(v.toLowerCase()) !== -1) {
              watchList.symbols[v]++;
              claimed = true;
          }
      });

      //If something was mentioned, increment the total counter and send the update to all the clients
      if (claimed) {
          //Increment total
          watchList.total++;

          //Send to all the clients
          sockets.sockets.emit('data', watchList);
      }
    }
  });
});

When you run this locally, make sure you acquire a Twitter API key for yourself. I can’t divulge mine since Twitter only allows one streaming connection for API key and drops all other connections if multiples are detected.

Finally, lets setup a ‘reset’ cron, and start the server.


//Reset everything on a new day!
//We don't want to keep data around from the previous day so reset everything.
new cronJob('0 0 0 * * *', function(){
    //Reset the total
    watchList.total = 0;

    //Clear out everything in the map
    _.each(watchSymbols, function(v) { watchList.symbols[v] = 0; });

    //Send the update to the clients
    sockets.sockets.emit('data', watchList);
}, null, true);

//Create the server
server.listen(app.get('port'), function(){
  console.log('Express server listening on port ' + app.get('port'));
});

That’s it for the server. That’s all that’s needed! We can now run ‘node app.js’ and watch as the server begins execution. Now we need a client.

 

Enter The Client

The client could not be easier. We’ve already done most of the heavy lifting in the server so our client is pretty thin. Infact, it only consists of a single page and some Javascript. Let’s take a look at the markup for the index page first which is coded in the Jade template engine.

doctype 5
html
    head
        title Node.js, Twitter Stream, Socket.io
        link(rel='stylesheet', href='/stylesheets/style.css')
        script(src="http://code.jquery.com/jquery-1.9.1.min.js")
        script(src="components/socket.io-client/dist/socket.io.min.js")
        script(src="/javascripts/script.js")
    body
        .container
            .header
                h1 Twitter Symbol Heatmap
                small 
                    | This application uses Node.js to create a streaming connection between it and Twitter. 
                    | Any mentions of the following symbols are received by the Node application and broadcasted
                    | to any clients using Socket.io.
                small
                    i
                        | Last updated:
                        span(id="last-update") Never
            ul
                each val, key in data.symbols
                    li(data-symbol="#{key}")= key

Remember in the server when we request the ‘/’ route we render the ‘index’ and some data? Well this is where that data comes into play. The last two lines loop through every symbol and create a list item for it. This will create a HTML element for each one with a data tag so we can refer to it from our Javascript. Speaking of which, let’s look at that code now.

$(function() {
    var socket = io.connect(window.location.hostname);
    socket.on('data', function(data) {
        var total = data.total;
        for (var key in data.symbols) {
            var val = data.symbols[key] / total;
            if (isNaN(val)) {
                val = 0;
            }

            $('li[data-symbol="' + key + '"]').each(function() {
                $(this).css('background-color', 'rgb(' + Math.round(val * 255) +',0,0)');
            });
        }
        $('#last-update').text(new Date().toTimeString());
    });
});

Here’s where things get interesting. First thing to node is that I’m using jQuery here with the Socket.io client library. In the first line I instantiate a connection to the remote host (the server). Next, whenever data arrives on the connection we’ll loop through all of the data’s symbols and pull out the number of times they’ve been tweeted about. We’ll do a little math to create a normalized value based on how often that symbol was mentioned over the total amount for every symbol listed within the app. Finally, we’ll set the background color for each HTML element to the calculated normalized value. This produces the heatmap effect.

 

Conclusion

The application is far from perfect but I think provides a much cooler example than the generic chat program with Node.js.

7 thoughts on “Node.js + Twitter Streaming API + Socket.io = Twitter Cashtag Heatmap

  1. Thanks for this – nicely explained.

    I’ve got it up and running, but I keep getting this repeated in the logs:


    debug - xhr-polling writing 8::
    debug - set close timeout for client BvVNABHYOTVFm3jizfxr
    debug - xhr-polling closed due to exceeded duration
    debug - setting request GET /socket.io/1/xhr-polling/BvVNABHYOTVFm3jizfxr?t=1373144763416
    debug - setting poll timeout
    debug - discarding transport
    debug - cleared close timeout for client BvVNABHYOTVFm3jizfxr
    debug - clearing poll timeout
    debug - xhr-polling writing 8::

    Every once in a while it works and some data comes in correctly, but mostly its like the above. Any ideas?

    1. Hey Phil,
      thanks for the feedback. What browser are you using? I ask because the debug prints you are seeing are typically due to some issue between the browser and the node server. If you’re not using Heroku, you can try removing the following and running again.

      //Set the sockets.io configuration.
      //THIS IS NECESSARY ONLY FOR HEROKU!
      sockets.configure(function() {
      sockets.set(‘transports’, ['xhr-polling']);
      sockets.set(‘polling duration’, 10);
      });

      The above code is in the app.js file and is only needed if you’re deploying to Heroku.

  2. Oh, silly me! Yes, if I comment out those lines (I’m just running the server locally in Mac OS X) then it works fine! I noticed those lines while reading your post yesterday, and then forgot all about them when I tried to get it working today. Many thanks for the speedy response!

  3. Dillon, great twitter streaming API tut!

    I was also checking out Peepcode’s full stack node.js screencast at the time, and refactored your project to gel with their architecture, use coffeescript, and break up socket and twitter functionality into other files. Finally, I got rid of bower components in lieu of serving up static content with connect-assets (as in the Peepcode thing).

    Let me know what you think, or if it even works :)

    https://github.com/adambmedia/twitter-cashtag-heatmap/tree/master/apps

  4. Hi Dillon,

    Very useful article, I have a question for u, we can’t use “xhr-polling” to implement long polling in socket.io???? Do i need Heroku????

    sockets.set('transports', ['xhr-polling']);
    sockets.set('polling duration', 10);

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>