Parse Custom Uris With Urlparse (python)

February 27, 2024 Post a Comment

My application creates custom URIs (or URLs?) to identify objects and resolve them. The problem is that Python's urlparse module refuses to parse unknown URL schemes like it parses

Solution 1:

You can also register a custom handler with urlparse:

import urlparse

defregister_scheme(scheme):
    for method infilter(lambda s: s.startswith('uses_'), dir(urlparse)):
        getattr(urlparse, method).append(scheme)

register_scheme('moose')

This will append your url scheme to the lists:

uses_fragment
uses_netloc
uses_params
uses_query
uses_relative

The uri will then be treated as http-like and will correctly return the path, fragment, username/password etc.

urlparse.urlparse('moose://username:password@hostname:port/path?query=value#fragment')._asdict()
=> {'fragment': 'fragment', 'netloc': 'username:password@hostname:port', 'params': '', 'query': 'query=value', 'path': '/path', 'scheme': 'moose'}

Solution 2:

I think the problem is that URI's don't all have a common format after the scheme. For example, mailto: urls aren't structured the same as http: urls.

I would use the results of the first parse, then synthesize an http url and parse it again:

parts = urlparse.urlparse("qqqq://base/id#hint")
fake_url = "http:" + parts[2]
parts2 = urlparse.urlparse(fake_url)

Solution 3:

There is also library called furl which gives you result you want:

Baca Juga

>>>import furl
>>>f=furl.furl("qqqq://base/id#hint");
>>>f.scheme
'qqqq'>>> f.host
'base'>>> f.path
Path('/id')
>>>  f.path.segments
['id']
>>> f.fragment                                                                                                                                                                                                                                                                 
Fragment('hint')   
>>> f.fragmentstr                                                                                                                                                                                                                                                              
'hint'

Solution 4:

The question appears to be out of date. Since at least Python 2.7 there are no issues.

Python 2.7.10 (default, May 232015, 09:40:32) [MSC v.150032 bit (Intel)] on win32
>>> import urlparse
>>> urlparse.urlparse("qqqq://base/id#hint")
ParseResult(scheme='qqqq', netloc='base', path='/id', params='', query='', fragment='hint')

Solution 5:

Try removing the scheme entirely, and start with //netloc, i.e.:

>>> SCHEME="qqqq">>> url="qqqq://base/id#hint"[len(SCHEME)+1:]
>>> url
'//base/id#hint'>>> urlparse.urlparse(url)
('', 'base', '/id', '', '', 'hint')

You won't have the scheme in the urlparse result, but you know the scheme anyway.

Also note that Python 2.6 seems to handle this url just fine (aside from the fragment):

$ python2.6 -c 'import urlparse; print urlparse.urlparse("qqqq://base/id#hint")'
ParseResult(scheme='qqqq', netloc='base', path='/id#hint', params='', query='', fragment='')

Getting Started with Python