Categories
access-log aws elasticsearch elk filter grok grok-debugger gui input kibana log-format logging logrotate logstash monitoring nginx optimization output s3 webbrowser

ELK : Configure Elasticsearch, Logstash, Kibana to visualize and analyze the logs.

This is about how to configure elasticsearch, logstash, kibana together to visualize and analyze the logs. There are situations when you want to see whats happening on your server, so you switch on your access log. Of course you can tail -f from there or grep from that, but I tell you that it is real cumbersome to analyze the logs through that log file.

What if I told you there is a super cool tool, Kibana, which lets you analyze your logs, gives all kind of information in just clicks.   Kibana is a gui tool designed for the purpose to analyze the large log files which when used properly with logstash and elasticsearch(configure elasticsearch, logstash and kibana together) can be a boon for developers.

Now, logstash is a tool which is used to move logs from one file or storage(s3) to another. Elasticsearch is a search server which is used to store the logs in some format.

Now, here is the picture in short, Logstash will bring logs from s3, formatize them and send them to elasticsearch. Kibana will fetch logs from elasticsearch and show it to you. With that lets see the actual thing which matters, the code. I assume you everything installed, ELK, and s3 as the source of our logs(why?).

So first we configure our Logstash. There are mainly three blocks in logstash, input, filter, output. In input, we specify the source of the log file, in filter block, we format the logs the way we want it to be stored in elastic search, in output block we specify the destination for the output.

Code :

open up terminal

nano /etc/logstash/conf.d/logstash.conf

edit it for the following code,

input {

s3 {

bucket => “bucket_name_containing_logs”

credentials => [“ACCESS_KEY”, “SECRET_KEY”]

region_endpoint => “whichever_probably_us-east-1”

codec => json {

charset => “ISO-8859-1”

}

}

}

filter {

grok {

match => {“message” => “grok_pattern”}

}

}

output {

#stdout {

#codec =>json

#}

elasticsearch_http {

host => “localhost”

port => “9200”

codec => json {

charset => “ISO-8859-1”

}

}

}

Explanation :

In input block, we specify that our log comes from s3. We provide necessary credentials to access the s3 bucket. We specify the charset for the input.

In filter block, we use grok tool to create custom fields in kibana by making proper pattern for the input logs. You can use grokdebugger to debug your pattern for a given input.

In output block, we specify the destination for output as elasticsearch, its address. We also specify the charset for the output.

You can uncomment the stdout block in output block to print data on to console.

Elasticsearch

We don’t need to change anything for elasticsearch configuration for now. Though if curious you can find it at /etc/elasticsearch/elasticsearch.yml . One thing which we should keep in mind that we need high configuration machine for this ELK system, otherwise, you might encounter different errors when elasticsearch gets full. One workaround that can be done for that whenever your elasticsearch is full, clear it out.

The following command will remove all index and bring elasticsearch to its initial state.

curl -XDELETE ‘ http://localhost:9200/_all/ ‘

You can read here to optimize elastic search for memory.

Kibana

You don’t have to do much here in its configuration file, just make sure its listening to the correct port number. You can find its configuration file at /opt/kibana/config/kibana.yml

Now go ahead and enter the ip of the machine wherever the kibana is setup and port number or whatever url you specified in kibana.yml into the browser.

Now you can see something like thisconfigure elasticsearch

You can now explore a lot of things from here, visualize logs, compare fields. Go ahead check out different settings in kibana.

That’s it for now.

Welcome and do let me know when you configure elasticsearch, logstash, kibana combo pack.

Cheerio 🙂

Categories
bot Proxy Python subprocess tor traffic web scraper webbrowser

The Bot : How To Make A Simple Anonymous Web Scraper?

Web scraper is a program to automate the process of accessing websites with or without browsers. In this post, by web scraping, I mean of only accessing the webpages. And the web scraper built here access webpages through web browsers.

An anonymous web scraper will be the one which keeps the identity of the program hidden. That means a program that accesses some website without revealing its information(ip address).

Usage(Though this should be used at one’s own risk)
There are websites which offer you money to bring traffic to their websites. Using such anonymous web scrappers (the bots), you can send fake traffic to their websites and earn some money.

Note: These programs are tested on ubuntu 14.04. Similar programs can be made on other platforms as well.
We will discuss two ways of making such web scraper here using python.
For the first method, we will use webbrowser library of python to open webpages, and tor to create anonymity. We will use subprocess library to automate it.

Bot

import time
import subprocess as sp
import webbrowser

urls = [“www.example.com/p”, “www.example.com/q”]
count =10000
while count>=0:
for url in urls:
webbrowser.open(url)
time.sleep(2)
time.sleep(4)
sp.call([“sudo”, “killall”, “firefox”])
sp.call([“sudo”, “/etc/init.d/tor”, “restart”])
count-=1

Explanation

1- urls contain the list of url, bringing traffic on which gives you money.
2- while loop makes the iteration over the list of urls.
3- each url is opened using webbrowser(default firefox) and waits for two seconds.
4- then it waits for 4 seconds after opening all the urls to let the pages load properly(these can be changed according to the time, website takes to load pages).
5- the firefox is forcefully closed
6- tor is restarted so that user ip is changed.

Note
Before running the program, set up the firefox for socks5 proxy and port no. 9050 to make requests through tor.

For the second method, we will simply open browser through subprocess,

import time
import subprocess as sp
import webbrowser

urls = [“www.example.com/p”, “www.example.com/q”]
count =10000
while count>=0:
for url in urls:
child = sp.Popen(“firefox %s” %url, shell=True)
time.sleep(2)
time.sleep(4)
sp.call([“sudo”, “killall”, “firefox”])
sp.call([“sudo”, “/etc/init.d/tor”, “restart”])
count-=1

Explanation
1- the webpages are opened as a subprocess

Now you are all set up with your anonymous web scraper.

Note
Do setup the firefox for proxy as in first method.
With such bots, they must execute javascript to send information, which can be done using browsers.
The other technique used to create bot using phantomJs and bash can be found here.
Read websites’ terms, take your own risks and enjoy free money. 🙂