Obfuscating Passwords in URLs in Python
[Previously published at the now defunct MetaBrite Dev Blog.]
RFC 1738 allows passwords in URLs, in the form <scheme>://<username>:<password>@<host>:<port>/<url-path>. Although passwords are deprecated by RFC 3986 and other newer RFCs, it’s occasionally useful. Several important packages in the Python world allow such URLs, including SQLAlchemy ('postgresql://scott:tiger@localhost:5432/mydatabase') and Celery ('amqp://guest:guest@localhost:5672//'). It’s also useful to be able to log such URLs without exposing the password.
Python 2 has urlparse.urlparse (known as urllib.parse.urlparse in Python 3 and six.moves.urllib_parse.urlparse in the Six compatibility library) to split a URL into six components, scheme, netloc, path, parameters, query, and fragment. The netloc corresponds to <user>:<password>@<host>:<port>.
Unfortunately, neither Python 2 nor 3’s urlparse properly handle the userinfo (username + optional password in the netloc), as they must be encoded. RFC 1798: “Within the user and password field, any :, @, or / must be encoded [as %3A, %40, and %2F respectively].” (Not said: % also needs to be encoded as %25.) Consider a username like fred@example.com or a password like b@d:/st%ff, which would create ambiguous URLs if they were not encoded.
The following demonstrates both how to obfuscate a password (if present) in a URL, as well as how to encode and decode the username and password correctly.
1 from six.moves.urllib_parse import urlparse, urlunparse, unquote 2 3 def obfuscate_url_password(url): 4 """Obfuscate password in URL for use in logging""" 5 parts = urlparse(url) 6 if parts.password: 7 url = urlunparse( 8 (parts.scheme, 9 make_netloc(parts.hostname, parts.port, netloc_username(parts.netloc), '***'), 10 parts.path, parts.params, parts.query, parts.fragment)) 11 return url 12 13 def netloc_username(netloc): 14 """Extract decoded username from `netloc`.""" 15 if "@" in netloc: 16 userinfo = netloc.rsplit("@", 1)[0] 17 if ":" in userinfo: 18 userinfo = userinfo.split(":", 1)[0] 19 return unquote(userinfo) 20 return None 21 22 def netloc_password(netloc): 23 """Extract decoded password from `netloc`.""" 24 if "@" in netloc: 25 userinfo = netloc.rsplit("@", 1)[0] 26 if ":" in userinfo: 27 return unquote(userinfo.split(":", 1)[1]) 28 return None 29 30 def make_netloc(host, port=None, username=None, password=None): 31 """Make a netloc for URL.""" 32 if username: 33 userinfo = rfc_1738_quote(username) 34 if password is not None: 35 userinfo += ':' + rfc_1738_quote(password) 36 userinfo += '@' 37 else: 38 userinfo = '' 39 40 if ':' in host: 41 netloc = '[' + host + ']' # IPv6 literal 42 else: 43 netloc = host 44 if port: 45 netloc += ':' + str(port) 46 return userinfo + netloc 47 48 def rfc_1738_quote(text): 49 # RFC 1798: Within the user and password field, any ":", "@", or "/" must be encoded. 50 # (Also "%" must be encoded.) Adapted from SQLAlchemy 51 return re.sub(r'[:@/%]', lambda m: "%%%X" % ord(m.group(0)), text)