Multithreaded Crawler While Using Tor Proxy
I am trying to build multi threaded crawler that uses tor proxies: I am using following to establish tor connection: from stem import Signal from stem.control import Controller con
Solution 1:
This is a perfect example of why monkey patching socket.socket
is bad.
This replaces the socket used by allsocket
connections (which is most everything) with the SOCKS socket.
When you go to connect to the controller later, it attempts to use the SOCKS protocol to communicate instead of establishing a direct connection.
Since you're already using requests
, I'd suggest getting rid of SocksiPy and the socks.socket = socks.socksocket
code and using the SOCKS proxy functionality built into requests:
proxies = {
'http': 'socks5h://127.0.0.1:9050','https': 'socks5h://127.0.0.1:9050'
}
response = r.get(url, headers=request_headers, proxies=proxies)
Post a Comment for "Multithreaded Crawler While Using Tor Proxy"