Proxy The Proxy : How to set socks5 proxy in dryscrape(python library)?

Dryscrape is the python library that can be used to execute javascript code when a webpage is opened. You can read about it more from here.

Now we can set the http proxy for dryscrape in the following way:

sess = dryscrape.Session(baseurl = "http://www.example.com")
sess.set_proxy(host='localhost', port = port_no)

But the problem is that, we can set only http proxy to dryscrape session.
But sometimes we may want to set socks proxy for dryscrape session. I wanted it to connect it through tor to make an anonymous web scraper but tor works only on socks protocol.

Now there is an interesting called Polipo.
Polipo is a web proxy that supports socks protocol. But the more interesting thing(may be just to me or other novice explorers) is that it can pipeline the request to other proxy server of other protocol from request protocol.

In more simple language, we can configure Polipo to route the request from one server listening on some protocol to other server with other protocol.

So, we will configure our polipo server to listen for http requests and forward it through socks protocol. So this will proxy the proxy server for initial request(might be puzzling).

Solution:

Following is the configuration that can be added in your /etc/polipo/config file,

socksParentProxy = "localhost:9050"
socksProxyType = socks5

proxyAddress = “localhost”
proxyPort = 8118

Explanation:

socksParentProxy determines the next machine with port no. to target for the incoming request(tor running on my machine with default port no. 9050).
socksProxyType determines the type of protocol of the next server.
proxyAddress detemines the address of the polipo server.
proxyPort determines the port no. for polipo server(default 8118).

Now we can set proxy server as polipo server for dryscrapes’ requests in the following way:
sess = dryscrape.Session(base_url = "http://www.example.com")
sess.set_proxy(host = "localhost", port = 8118)

So, now whenever dryscrape makes request for some webpage, the request first goes to polipo and then it gets forward to tor(or some other server).

Cheers. Now, using dryscrape, we can make request through socks protocol(socks5 proxy).

Some of the related questions on stackoverflow can be found here.

Leave a Reply