George V. Reilly

Obfuscating Passwords in URLs in Python

[Pre­vi­ous­ly published at the now defunct MetaBrite Dev Blog.]

RFC 1738 allows passwords in URLs, in the form <scheme>://<username>:<password>@<host>:<port>/<url-path>. Although passwords are deprecated by RFC 3986 and other newer RFCs, it’s oc­ca­sion­al­ly useful. Several important packages in the Python world allow such URLs, including SQLAlchemy ('post­gresql://scott:tiger@localhost:5432/my­data­base') and Celery ('amqp://guest:guest@localhost:5672//'). It’s also useful to be able to log such URLs without exposing the password.

Python 2 has urlparse.urlparse (known as urllib.parse.urlparse in Python 3 and six.moves.url­lib_­parse.urlparse in the Six com­pat­i­bil­i­ty library) to split a URL into six components, scheme, netloc, path, parameters, query, and fragment. The netloc cor­re­sponds to <user>:<password>@<host>:<port>.

Un­for­tu­nate­ly, neither Python 2 nor 3’s urlparse properly handle the userinfo (username + optional password in the netloc), as they must be encoded. RFC 1798: “Within the user and password field, any :, @, or / must be encoded [as %3A, %40, and %2F re­spec­tive­ly].” (Not said: % also needs to be encoded as %25.) Consider a username like fred@example.com or a password like b@d:/st%ff, which would create ambiguous URLs if they were not encoded.

The following demon­strates both how to obfuscate a password (if present) in a URL, as well as how to encode and decode the username and password correctly.

 1 from six.moves.urllib_parse import urlparse, urlunparse, unquote
 2 
 3 def obfuscate_url_password(url):
 4     """Obfuscate password in URL for use in logging"""
 5     parts = urlparse(url)
 6     if parts.password:
 7         url = urlunparse(
 8             (parts.scheme,
 9              make_netloc(parts.hostname, parts.port, netloc_username(parts.netloc), '***'),
10              parts.path, parts.params, parts.query, parts.fragment))
11     return url
12 
13 def netloc_username(netloc):
14     """Extract decoded username from `netloc`."""
15     if "@" in netloc:
16         userinfo = netloc.rsplit("@", 1)[0]
17         if ":" in userinfo:
18             userinfo = userinfo.split(":", 1)[0]
19         return unquote(userinfo)
20     return None
21 
22 def netloc_password(netloc):
23     """Extract decoded password from `netloc`."""
24     if "@" in netloc:
25         userinfo = netloc.rsplit("@", 1)[0]
26         if ":" in userinfo:
27             return unquote(userinfo.split(":", 1)[1])
28     return None
29 
30 def make_netloc(host, port=None, username=None, password=None):
31     """Make a netloc for URL."""
32     if username:
33         userinfo = rfc_1738_quote(username)
34         if password is not None:
35             userinfo += ':' + rfc_1738_quote(password)
36         userinfo += '@'
37     else:
38         userinfo = ''
39 
40     if ':' in host:
41         netloc = '[' + host + ']'  # IPv6 literal
42     else:
43         netloc = host
44     if port:
45         netloc += ':' + str(port)
46     return userinfo + netloc
47 
48 def rfc_1738_quote(text):
49     # RFC 1798: Within the user and password field, any ":", "@", or "/" must be encoded.
50     # (Also "%" must be encoded.) Adapted from SQLAlchemy
51     return re.sub(r'[:@/%]', lambda m: "%%%X" % ord(m.group(0)), text)
blog comments powered by Disqus
Obfuscating Passwords in URLs in Python » « Decrementing Loops