Unnecessary percent-encoding in URLs

Draft: started Tagged , ,

I’m on a expotition to serve each page at only one URL.

I’ve taken care of query strings; Caddy doesn’t seem to respond at https://chrismorgan.info./; so I think percent-encoding is the only uncertain bit left.

I’ll be drawing from a few specs here: RFC 9112 (HTTP/1.1), RFC 9110 (HTTP), RFC 3986 (URI), and the WHATWG URL Standard.

I’m going to follow URL Standard’s lead, and just call everything URL.

I’m only consider https:// URLs, which are special URLs, which defines the whole path/query/fragment thing (which is not used across all URL schemes).

Paths

RFC 9112 (HTTP/1.1): origin-form = absolute-path [ "?" query ] absolute-path = RFC 9110 (HTTP): absolute-path = 1*( "/" segment ) segment = RFC 3986 (URI): segment = *pchar pchar = unreserved / pct-encoded / sub-delims / ":" / "@" unreserved = ALPHA / DIGIT / "-" / "." / "_" / "~" pct-encoded = "%" HEXDIG HEXDIG sub-delims = "!" / "$" / "&" / "'" / "(" / ")" / "*" / "+" / "," / ";" / "=" query = *( pchar / "/" / "?" ) (This applies to `absolute-form` too: `absolute-form → absolute-URI → hier-part → path-abempty = *( "/" segment )`, you can follow it yourself if you wish.) The total set of characters this means, sorted: !$&'()*+,-.0123456789:;=@ABCDEFGHIJKLMNOPQRSTUVWXYZ_abcdefghijklmnopqrstuvwxyz~

Query string

TODO.

Non-special URLs

TODO.

These characters need not be encoded in *paths*:
!$&'()*+,-.0123456789:;=@ABCDEFGHIJKLMNOPQRSTUVWXYZ_abcdefghijklmnopqrstuvwxyz~
All others should be. HTTP/1.1 messages RFC 9112 (HTTP/1.1): origin-form = absolute-path [ "?" query ] absolute-path = RFC 9110 (HTTP): absolute-path = 1*( "/" segment ) segment = RFC 3986 (URI): segment = *pchar pchar = unreserved / pct-encoded / sub-delims / ":" / "@" unreserved = ALPHA / DIGIT / "-" / "." / "_" / "~" pct-encoded = "%" HEXDIG HEXDIG sub-delims = "!" / "$" / "&" / "'" / "(" / ")" / "*" / "+" / "," / ";" / "=" (This applies to `absolute-form` too: `absolute-form → absolute-URI → hier-part → path-abempty = *( "/" segment )`, you can follow it yourself if you wish.) The total set of characters this means, sorted: !$&'()*+,-.0123456789:;=@ABCDEFGHIJKLMNOPQRSTUVWXYZ_abcdefghijklmnopqrstuvwxyz~
{{"ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-._~!$&'()*+,;=:@" | urlencode}}
ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-._%7E%21%24%26%27%28%29%2A%2B%2C%3B%3D%3A%40
Body text: Fonts:
Theme:
Explanation of all this
(yes, this works without JavaScript; persists to cookies)