[Previously published at the now defunct MetaBrite Dev Blog.]
Some time ago,
we made an ill-considered decision to use recipe names for image URLs,
which simplified image management with our then-rudimentary tools.
For example, the recipe named
"Twisted Pasta With Browned Butter, Sage, and Walnuts"
becomes a URL ending in
"Twisted%20Pasta%20With%20Browned%20Butter%2C%20Sage%2C%20and%20Walnuts.jpg".
Life becomes more interesting when you escape the confines of 7-bit ASCII and use Unicode.
How should u"Sautéed crème fraîche Provençale" be handled?
The only reasonable thing to do is to first convert the Unicode string to UTF-8
and then hex-encode those octets:
"Saut%C3%A9ed%20cr%C3%A8me%20fra%C3%AEche%20Proven%C3%A7ale".
That seems reasonable, but it was giving us inconsistent results
when the images were uploaded to an S3 bucket.
When …continue.
[Previously published at the now defunct MetaBrite Dev Blog.]
RFC 1738 allows passwords in URLs,
in the form <scheme>://<username>:<password>@<host>:<port>/<url-path>.
Although passwords are deprecated by RFC 3986 and other newer RFCs,
it's occasionally useful.
Several important packages in the Python world allow such URLs,
including SQLAlchemy ('postgresql://scott:tiger@localhost:5432/mydatabase')
and Celery ('amqp://guest:guest@localhost:5672//').
It's also useful to be able to log such URLs without exposing the password.
Python 2 has urlparse.urlparse
(known as urllib.parse.urlparse in Python 3
and six.moves.urllib_parse.urlparse in the Six compatibility library)
to split a URL into six components,
scheme, netloc, path, parameters, query, and fragment.
The netloc corresponds to <user>:<password>@<host>:<port>.
Unfortunately, neither Python 2 nor 3's urlparse
properly handle the userinfo
(username + optional password in the netloc),
as …continue.