Python Urllib2 Automatic Form Filling And Retrieval Of Results
Solution 1:
If you absolutely need to use urllib2, the basic gist is this:
import urllib
import urllib2
url = 'http://whatever.foo/form.html'
form_data = {'field1': 'value1', 'field2': 'value2'}
params = urllib.urlencode(form_data)
response = urllib2.urlopen(url, params)
data = response.read()
If you send along POST data (the 2nd argument to urlopen()
), the request method is automatically set to POST.
I suggest you do yourself a favor and use mechanize, a full-blown urllib2 replacement that acts exactly like a real browser. A lot of sites use hidden fields, cookies, and redirects, none of which urllib2 handles for you by default, where mechanize does.
Check out Emulating a browser in Python with mechanize for a good example.
Solution 2:
Using urllib and urllib2 together,
data = urllib.urlencode([('field1',val1), ('field2',val2)]) # list of two-element tuples
content = urllib2.urlopen('post-url', data)
content will give you the page source.
Solution 3:
I’ve only done a little bit of this, but:
- You’ve got the HTML of the form page. Extract the
name
attribute for each form field you need to fill in. - Create a dictionary mapping the names of each form field with the values you want submit.
- Use
urllib.urlencode
to turn the dictionary into the body of your post request. - Include this encoded data as the second argument to
urllib2.Request()
, after the URL that the form should be submitted to.
The server will either return a resulting web page, or return a redirect to a resulting web page. If it does the latter, you’ll need to issue a GET
request to the URL specified in the redirect response.
I hope that makes some sort of sense?
Post a Comment for "Python Urllib2 Automatic Form Filling And Retrieval Of Results"