Teepee design: the HTTP method

This article is part of the series on the rust-http redesign, Teepee.

An apology

Teepee has stood with nothing much happening for a while; this is my fault and I’m sorry to have caused it. Things are starting moving again and will continue moving now; I have the low‐level design of the HTTP/1 message reading parts of Teepee almost done and expect to post this within a few days. But I don’t blame you if you don’t trust my “few days”; I wouldn’t either.

There are also quite a number of people that have emailed me asking how you can help on Teepee—sorry, it’s a little hard to figure out ways just at present, for the project is not really mature enough for that yet. It will come, though. If you really want fun, start looking at HTTP/2 and figure out a sensible implementation strategy, especially for its stream dependencies and prioritisation…

Unless stated otherwise, all grammar comes from RFC 7230.

Relevant definitions

Here’s what rust-http has:

pub enum Method {
    Options,
    Get,
    Head,
    Post,
    Put,
    Delete,
    Trace,
    Connect,
    Patch,  // RFC 5789
    ExtensionMethod(String),
}

Here is what RFC 2616 had:

   The Method  token indicates the method to be performed on the
   resource identified by the Request-URI. The method is case-sensitive.

       Method         = "OPTIONS"                ; Section 9.2
                      | "GET"                    ; Section 9.3
                      | "HEAD"                   ; Section 9.4
                      | "POST"                   ; Section 9.5
                      | "PUT"                    ; Section 9.6
                      | "DELETE"                 ; Section 9.7
                      | "TRACE"                  ; Section 9.8
                      | "CONNECT"                ; Section 9.9
                      | extension-method
       extension-method = token

Here is what RFC 7230 has:

   The method token indicates the request method to be performed on the
   target resource.  The request method is case-sensitive.

     method         = token

   The request methods defined by this specification can be found in
   Section 4 of [RFC7231], along with information regarding the HTTP
   method registry and considerations for defining new methods.

The contents of RFC 7231 are largely immaterial here, except insofar as the common properties that are defined for methods are specified (in section 8.1.1): these are safety and idempotency. Representing these properties seem to me desirable.

The `token` type

By the way, if you’re wondering what characters are valid in a method, method = token; token = 1*tchar; tchar = "!" / "#" / "$" / "%" / "&" / "'" / "*" / "+" / "-" / "." / "^" / "_" / "`" / "|" / "~" / DIGIT / ALPHA.

In other words, a method (or any other place we get token in the grammar) is a sequence of letters, numbers or any of those other fancy characters—yes, !#$%&'*+-.^_`|~ is a valid HTTP method name.

This is the sort of thing that should probably be encoded in the type system, using a separate token type instead of using String or Vec<u8>, &str or &[u8]. Given this much, we will need both owned and borrowed versions of this, for I am imposing a rule in the low‐level API that it must not allocate, while causing the token to be owned is the only sensible route for the higher‐level API.

The solution is an equivalent to std::str::MaybeOwned<'a>, which applies to strings and is an owned string or a string slice (possibly even of static duration): then the owned form can be MaybeOwnedToken<'static>, and the unowned form MaybeOwnedToken<'a>. (Incidentally, we really need a general solution to this problem.)

On second thoughts, I doubt that the owned and slice tokens will be considered of value by themselves, so we might just eliminate them altogether, putting them straight into the enum.

Here’s the gist of what it’ll end up as:

pub enum Token<'a> {
    Owned {
        #[doc(hidden)]
        pub _bytes: Vec<u8>,
    },
    Slice {
        #[doc(hidden)]
        pub _bytes: &'a [u8],
    },
}

impl<'a> Token<'a> {
    fn bytes<'b>(&'b self) -> &'b [u8];
    fn into_owned(self) -> Token<'static>;
    fn as_slice<'b>(&'b self) -> Token<'b>;
}

(The #[doc(hidden)] pub part is to allow statics to be defined outside the defining module. You know, we really need a general solution to this problem also.)

While a little cumbersome, unifying these imposes certain constraints on the solution space for handling methods.

Design one: a struct and statics

Nothing like a bit of code.

pub struct Method<'a> {
    name: Token<'a>,
    safe: bool,
    idempotent: bool,
}

pub static GET: Method<'static> = Method {
    name: token::Slice { _bytes: b"GET", },
    safe: true,
    idempotent: true,
};
// Et cetera.

Method::from_token would then produce one of those statics where possible, thus automatically filling in the details with regards to safety and idempotency, which are important to model. (I don’t think I clarified this earlier; well, I have seen too much code comparing method names in an ad‐hoc way, where what they really wanted to be examining was whether the method was safe or idempotent.)

This looks very well, doesn’t it? Unfortunately, it simply won’t do because of its ergonomics. Because of how we unified the token type, Method<'static> is no longer Copy, and something like request.method = GET will not work—can’t move out of static. We are left with request.method = GET.clone(), and the hypothetical request(url, GET.clone()), a construct which is all in all rather an untidy thing. Is this a reasonable trade‐off to make? I am not convinced it is.

Design two: back to the enum

The second option is a mere extension of what is in rust-http: simple variants (which avoids the Copy problem) plus one at the end for custom things. Something like this:

macro_rules! method_enum {
    ($(
        $ident:ident
        $bytes:expr
        $safe:ident
        $idempotent:ident
        #[$doc:meta];
    )*) => {
        /// …
        pub enum Method<'a> {
            $(#[$doc] pub $ident,)*
            /// A method not in the IANA HTTP method registry.
            pub UnregisteredMethod {
                token: Token<'a>,
                safe: bool,
                idempotent: bool,
            },
        }

        impl Method {
            pub fn token(&self) -> Token<'a> {
                match *self {
                    $($ident => SliceToken { _bytes: $bytes },)*
                    UnregisteredMethod => token.as_slice(),
                }
            }
            // &c. for safe and idempotent
        }
    }
}

method_enum! {
    // Variant name   method name          safe  idempotent
    Acl               b"ACL"               false true  #[doc = "`ACL`, defined in RFC3744, Section 8.1"];
    BaselineControl   b"BASELINE-CONTROL"  false true  #[doc = "`BASELINE-CONTROL`, defined in RFC3253, Section 12.6"];
    Bind              b"BIND"              false true  #[doc = "`BIND`, defined in RFC5842, Section 4"];
    Checkin           b"CHECKIN"           false true  #[doc = "`CHECKIN`, defined in RFC3253, Section 4.4, Section 9.4"];
    Checkout          b"CHECKOUT"          false true  #[doc = "`CHECKOUT`, defined in RFC3253, Section 4.3, Section 8.8"];
    Connect           b"CONNECT"           false false #[doc = "`CONNECT`, defined in RFC7231, Section 4.3.6"];
    Copy              b"COPY"              false true  #[doc = "`COPY`, defined in RFC4918, Section 9.8"];
    Delete            b"DELETE"            false true  #[doc = "`DELETE`, defined in RFC7231, Section 4.3.5"];
    Get               b"GET"               true  true  #[doc = "`GET`, defined in RFC7231, Section 4.3.1"];
    Head              b"HEAD"              true  true  #[doc = "`HEAD`, defined in RFC7231, Section 4.3.2"];
    Label             b"LABEL"             false true  #[doc = "`LABEL`, defined in RFC3253, Section 8.2"];
    Link              b"LINK"              false true  #[doc = "`LINK`, defined in RFC2068, Section 19.6.1.2"];
    Lock              b"LOCK"              false false #[doc = "`LOCK`, defined in RFC4918, Section 9.10"];
    Merge             b"MERGE"             false true  #[doc = "`MERGE`, defined in RFC3253, Section 11.2"];
    MkActivity        b"MKACTIVITY"        false true  #[doc = "`MKACTIVITY`, defined in RFC3253, Section 13.5"];
    MkCalendar        b"MKCALENDAR"        false true  #[doc = "`MKCALENDAR`, defined in RFC4791, Section 5.3.1"];
    MkCol             b"MKCOL"             false true  #[doc = "`MKCOL`, defined in RFC4918, Section 9.3"];
    MkRedirectRef     b"MKREDIRECTREF"     false true  #[doc = "`MKREDIRECTREF`, defined in RFC4437, Section 6"];
    MkWorkspace       b"MKWORKSPACE"       false true  #[doc = "`MKWORKSPACE`, defined in RFC3253, Section 6.3"];
    Move              b"MOVE"              false true  #[doc = "`MOVE`, defined in RFC4918, Section 9.9"];
    Options           b"OPTIONS"           true  true  #[doc = "`OPTIONS`, defined in RFC7231, Section 4.3.7"];
    OrderPatch        b"ORDERPATCH"        false true  #[doc = "`ORDERPATCH`, defined in RFC3648, Section 7"];
    Patch             b"PATCH"             false false #[doc = "`PATCH`, defined in RFC5789, Section 2"];
    Post              b"POST"              false false #[doc = "`POST`, defined in RFC7231, Section 4.3.3"];
    PropFind          b"PROPFIND"          true  true  #[doc = "`PROPFIND`, defined in RFC4918, Section 9.1"];
    PropPatch         b"PROPPATCH"         false true  #[doc = "`PROPPATCH`, defined in RFC4918, Section 9.2"];
    Put               b"PUT"               false true  #[doc = "`PUT`, defined in RFC7231, Section 4.3.4"];
    Rebind            b"REBIND"            false true  #[doc = "`REBIND`, defined in RFC5842, Section 6"];
    Report            b"REPORT"            true  true  #[doc = "`REPORT`, defined in RFC3253, Section 3.6"];
    Search            b"SEARCH"            true  true  #[doc = "`SEARCH`, defined in RFC5323, Section 2"];
    Trace             b"TRACE"             true  true  #[doc = "`TRACE`, defined in RFC7231, Section 4.3.8"];
    Unbind            b"UNBIND"            false true  #[doc = "`UNBIND`, defined in RFC5842, Section 5"];
    Uncheckout        b"UNCHECKOUT"        false true  #[doc = "`UNCHECKOUT`, defined in RFC3253, Section 4.5"];
    Unlink            b"UNLINK"            false true  #[doc = "`UNLINK`, defined in RFC2068, Section 19.6.1.3"];
    Unlock            b"UNLOCK"            false true  #[doc = "`UNLOCK`, defined in RFC4918, Section 9.11"];
    Update            b"UPDATE"            false true  #[doc = "`UPDATE`, defined in RFC3253, Section 7.1"];
    UpdateRedirectRef b"UPDATEREDIRECTREF" false true  #[doc = "`UPDATEREDIRECTREF`, defined in RFC4437, Section 7"];
    VersionControl    b"VERSION-CONTROL"   false true  #[doc = "`VERSION-CONTROL`, defined in RFC3253, Section 3.5"];
}

Incidentally, Teepee will support all the methods in the IANA HTTP method registry.

This has in its favour that it is something that has good ergonomics for the standard case. It does, however, have a slight weakness in that when a new method is added to the registry, code that was already using it may break, depending on how things were done. Pattern matching may break, for example. But this is a rare thing, so I’m not terribly distressed about it. It does, however, stand against it—strict semver would require a new major release to add a new item to the method enum.

Summary

In the end, I think that what rust-http does at present is sound; it just needs to be expanded to cover the entire HTTP method registry, and expanded to also keep track of whether a method is safe and/or idempotent.

Pending a suitable outcome to discussion on Reddit, the second approach is what I intend to implement.