2016-01-31

URLs from Unicode Strings

[Previously published at the now defunct MetaBrite Dev Blog.]

Some time ago, we made an ill-considered decision to use recipe names for image URLs, which simplified image management with our then-rudimentary tools. For example, the recipe named "Twisted Pasta With Browned Butter, Sage, and Walnuts" becomes a URL ending in "Twisted%20Pasta%20With%20Browned%20Butter%2C%20Sage%2C%20and%20Walnuts.jpg".

Life becomes more interesting when you escape the confines of 7-bit ASCII and use Unicode. How should u"Sautéed crème fraîche Provençale" be handled? The only reasonable thing to do is to first convert the Unicode string to UTF-8 and then hex-encode those octets: "Saut%C3%A9ed%20cr%C3%A8me%20fra%C3%AEche%20Proven%C3%A7ale".

That seems reasonable, but it was giving us inconsistent results when the images were uploaded to an S3 bucket. When …continue.

2016-01-04

Obfuscating Passwords in URLs in Python

[Previously published at the now defunct MetaBrite Dev Blog.]

RFC 1738 allows passwords in URLs, in the form <scheme>://<username>:<password>@<host>:<port>/<url-path>. Although passwords are deprecated by RFC 3986 and other newer RFCs, it's occasionally useful. Several important packages in the Python world allow such URLs, including SQLAlchemy ('postgresql://scott:tiger@localhost:5432/mydatabase') and Celery ('amqp://guest:guest@localhost:5672//'). It's also useful to be able to log such URLs without exposing the password.

Python 2 has urlparse.urlparse (known as urllib.parse.urlparse in Python 3 and six.moves.urllib_parse.urlparse in the Six compatibility library) to split a URL into six components, scheme, netloc, path, parameters, query, and fragment. The netloc corresponds to <user>:<password>@<host>:<port>.

Unfortunately, neither Python 2 nor 3's urlparse properly handle the userinfo (username + optional password in the netloc), as …continue.