Skip to content Skip to sidebar Skip to footer

Multithreaded Crawler While Using Tor Proxy

I am trying to build multi threaded crawler that uses tor proxies: I am using following to establish tor connection: from stem import Signal from stem.control import Controller con

Solution 1:

This is a perfect example of why monkey patching socket.socket is bad.

This replaces the socket used by allsocket connections (which is most everything) with the SOCKS socket.

When you go to connect to the controller later, it attempts to use the SOCKS protocol to communicate instead of establishing a direct connection.

Since you're already using requests, I'd suggest getting rid of SocksiPy and the socks.socket = socks.socksocket code and using the SOCKS proxy functionality built into requests:

proxies = {
    'http': 'socks5h://127.0.0.1:9050','https': 'socks5h://127.0.0.1:9050'
}

response = r.get(url, headers=request_headers, proxies=proxies)

Post a Comment for "Multithreaded Crawler While Using Tor Proxy"