Categories
analytics chat chatting google google-analytics javascript live live-users long-polling nginx node.js performance persistent persistent-connection php5-fpm Proxy real-time response reverse-proxy socket socket.io socks tcp upstream web-server websocket websocket-handshake

How To Make Your Own Analytics Tool?

An analytics tool is a tool that you can use to get the information about how is your website performance, how are the users using it, what are the number of users at anytime(live users), live users for different urls, and other infos. One such example of the tool is the google analytics.

Analytics tool

Analytics tool gives a lot of information about your website with the nice looking graphs and all. In this blog, we will be discussing how we can make such analytics tool(digging the data).

Question 1: Why should we make such thing if it already exists?

Answer : One reason is that it is not free for always. The other reason is that the process of making it is very sexy and you learn technologies that you can use to make your own other real time apps like chatting app. Now that is more a valid reason.

Question 2: What are we talking about?

Answer : This is an example of a real time application. You see, you are tracking live users’ data. The data can be anything like total number of users anytime or number of users for different urls or different referral or different user agent or different visited ips, or anything. The other example of a real time application is a chatting application where a lot of users are communicating at the same time.

Question 3: What technologies are used for such an application?

Answer : So we need a continuous stream of response from the server about their performance as only servers know what is hitting them. It can only be achieved if there is some persistent connection between client and server so that server can continuously push data to the client. With persistent connection I would mean one for literally persistent connection(web sockets) or else timed persistent connection(long polling) or else chunk persistent connection(polling). By timed persistent connection, I mean persistent connection only for certain time interval after which it is closed and again opened. And by chunk persistent connection, I mean lots of request with no persistency, but the requests are so frequent that altogether they seem persistent(my bad).

Among all, I prefer web sockets, as I can’t logically satisfy myself why to use others and more of it that they are the outdated styles and only existed when sockets were not there.

So in this tutorial we will be using socket.io library with node.js to create web sockets. Below are the other libraries with other language which I found from here.

Analytics tool

And yes, we are going to use Nginx as a reverse proxy for web sockets. A very great tutorial regarding this can be found here. You can skip Nginx if you wish and directly test for your node.js server.

For Nginx configuration, as mentioned in tutorial,

server {
    server_name app.domain.com;
    location / {
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection "upgrade";
        proxy_http_version 1.1;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header Host $host;
        proxy_pass http://socket_nodes;
    }
}

Proper explanation is given in that tutorial for how this will work for sockets. Now we setup our Node server(server.js), a very nice tutorial can be found here.

var data = {};
var http = require(“http”);
var url = require(‘url’);
var fs = require(‘fs’);
var io = require(‘socket.io‘);
var path;
var server = http.createServer(function(request, response){
        console.log(‘Request Accepted’);
        path = url.parse(request.url).pathname;
        switch(path){
            case ‘/’:
                response.writeHead(200, {‘Content-Type’: ‘text/html’});
                response.write(‘hello world’);
                response.end();
                break;
            case ‘/live’:
                fs.readFile(__dirname + ‘/socket_io.html’, function(error, data){
                    if (error){
                        response.writeHead(404);
                        response.write(“opps this doesn’t exist – 404”);
                        response.end();
                    }
                    else{
                        response.writeHead(200, {“Content-Type”: “text/html”});
                        response.write(data, “utf8”);
                        response.end();
                    }
                });
                break;
            default:
                response.writeHead(404);
                response.write(“opps this doesn’t exist – 404  –“+path);
                response.end();
                break;
        }
    });
 server.listen(8001);
 var listener = io.listen(server);
 listener.sockets.on(‘connection’, function(socket){
    console.log(‘Socket Made.’);
    setInterval(function(){
        socket.emit(‘showdata’, {‘data’: data});
    }, 1000);
 socket.on(‘disconnect’, function(){
    console.log(“Socket Closed.”);
 });
});

Load socket.io.js on the client side in your socket_io.html which will be used to create event to create a persistent connection. (Please don’t bang your head for how would it find socket.io.js)

/socket.io/socket.io.js

     var socket = io.connect(); // your initialization code here.
     socket.on('showdata', function(data){
          console.log(data);
     });

Now you can see all the data in your browser’s console. You can put this data at the right place of your analytics html. Now you can easily send continuous data from node server to your client.

Are you still wondering how would we gather data? Suppose we want to track number of live users, for that we will use a key “live” in data variable and initialize with 0. We will increment it as soon as there is a socket made and decrement it as soon as the socket is closed(Make sure io.connect() is called once and only once for a page load). So the data[‘live’] will determine the number of live users.

In server.js,

listener.sockets.on(‘connection’, function(socket){
    console.log(‘Socket Made.’);
    data[‘live’] +=1;
    setInterval(function(){
        socket.emit(‘showdata’, {‘data’: data});
    }, 1000);
 socket.on(‘disconnect’, function(){
 data[‘live’]-=1;
    console.log(“Socket Closed.”);
 });

As simple as that. It is just the logics in node to be done and you can achieve any data. Now for temporary data(by temporary I mean the machine hosting node server is until rebooted), we don’t need any extra storage.

If we want to use some durable data, or store some information, we can choose database of our choice and send data from node to database.

Now we are also using Nginx as a reverse proxy which is helpful in load handling and better scaling. We can extract other information like user ip, upstream responses(php), and other valuable information and send to node.

So yes, it is really interesting to make such tool, I told you so. You are surely going to stuck in somewhere or make it up or for any suggestion or for confusion, we can always discuss.

Great to see you people with your own analytics tool. 🙂

Categories
ab-testing caching configuration hhvm loader.io mastering-nginx microcaching nginx optimization php5-fpm

Super Nginx Configuration : Feeling the 100000 r/s capability of Nginx.

In this post, I will share my experience of how I managed to reduce ten fold the number of servers for our web service and for three fold more traffic. So, actually I could increase our systems efficiency 30 times. For people who can’t scale it, I tell you, this much improvement is tremendous. We are able to save a lot of money. There are still more improvements that can be done. We have seen a lot of posts, articles of how to tweak your Nginx configuration to make it serve lot of requests, I have now actually felt that.

Earlier, for 8000 users/sec, we used to have 23-25 servers running. Now, we have served more than 25000 users/sec with only four machines (Though we also went with one server for around 12000 users/sec, but it gave little high cpu utilization). Even we have served around 45000 users/sec for just 6 servers. Also, the machines used earlier were of higher configuration, than now.

How did we modify our nginx  configuration?

Earlier, we had used a normal configuration with most of the parameters having default or recommended values. My team told me that it used to handle a lot of requests earlier but something happened recently which gave all of us nightmares. We were trying hard to figure out the problem in all possible domains. We improved our code, we tried to modify nginx parameters’ values, db parameters’ values. We were still running on 22 machines for not much load.

Honestly, I didn’t care much about the old configuration anymore, and wanted to write a whole new one. I started with this book Mastering Nginx by Dimitri Aivaliotis. This is a great book. I read mainly 2nd, 4th and 5th chaptersThen I also read these threads, http://tweaked.io/guide/nginx/,                                                                                http://www.aosabook.org/en/nginx.html,                                                                                   https://www.nginx.com/blog/inside-nginx-how-we-designed-for-performance-scale/,      http://highscalability.com/blog/2014/4/30/10-tips-for-optimizing-nginx-and-php-fpm-for-high-traffic-si.html .

I started writing my own Nginx configuration. I used loader.io and ab testing tools all through the way to set correct values for the parameters. I increased timeout values to 300s from 15s. I also altered other values. Then I also implemented caching(microcaching) in Nginx. And caching of file descriptors for caching. I also changed the memory for caching to RAM. Then, I went for TCP connection from Unix socket connection to upstream servers. I observed drastic improvement on loader.io and ab tools tests. Then we went live with changed configuration and we had actually improved a lot. Surprisingly, we could handle the same load with just four machines(we even went with one machine but more cpu usage).

We were so happy, we were shouting like anything.

Further, I implemented hhvm to replace php5-fpm which actually has brought down the number of machine to just one.

Now we all are flying. We see yet more improvement to be done.

Thank You everyone. 🙂