Teepee design: a careful look at the HTTP/1.1 Status-Line

This article is part of the series on the rust-http redesign, Teepee.

I say Status-Line, but actually I don’t care about part of it; Status-Line is defined in RFC 2616 (HTTP/1.1) as HTTP-Version SP Status-Code SP Reason-Phrase CRLF, but all I care about in this article is the Status-Code (e.g. 200) and its corresponding Reason-Phrase (e.g. OK).

The current position

At present, rust-http has http::status::Status. Here’s the implementation, distilled:

#[deriving(Eq, Clone)]
pub enum Status {
    // 1xx Informational
    Continue,
    SwitchingProtocols,
    Processing,  // WebDAV; RFC 2518

    // 2xx Success
    Ok,
    Created,
    ...,

    // 3xx Redirection
    MultipleChoices,
    MovedPermanently,
    ...,

    // 4xx Client Error
    BadRequest,
    Unauthorized,
    ...,

    // 5xx Server Error
    InternalServerError,
    NotImplemented,
    ...,

    UnregisteredStatus(u16, ~str),
}

impl Status {
    /// Get the status code
    pub fn code(&self) -> u16 {
        match *self {
            Continue => 100,
            ...,

            UnregisteredStatus(code, _)   => code,
        }
    }

    /// Get the reason phrase
    pub fn reason(&self) -> StrBuf {
        match *self {
            Continue => StrBuf::from_str("Continue"),
            ...,

            UnregisteredStatus(_, ref reason) => (*reason).clone(),
        }
    }

    /// Get a status from the code and reason
    pub fn from_code_and_reason(status: u16, reason: ~str) -> Status {
        let reason_lower = reason.to_ascii_lower();
        match (status, reason_lower.as_slice()) {
            (100, "continue") => Continue,
            ...,

            (_, _) => UnregisteredStatus(status, reason),
        }
    }
}

impl FromPrimitive for Status {
    /// Get a *registered* status code from the number of its status code.
    ///
    /// This will return None if an unknown code is passed.
    ///
    /// For example, `from_u64(200)` will return `OK`.
    fn from_u64(n: u64) -> Option<Status> {
        Some(match n {
            100 => Continue,
            ...,

            _   => { return None }
        })
    }
}

On the surface this approach looks very well; unfortunately, it turns out that it has some problems. For now, the main thing to note is that it treats 404 Not Found as the same as 404 not found but as distinct from 404 File Not Found.

The specification

Later on we’ll look at the practical applications of these things; for now let’s take a look at what the spec says. RFC 2616 (HTTP/1.1), section 6.1.1:

6.1.1 Status Code and Reason Phrase

   The Status-Code element is a 3-digit integer result code of the
   attempt to understand and satisfy the request. These codes are fully
   defined in section 10. The Reason-Phrase is intended to give a short
   textual description of the Status-Code. The Status-Code is intended
   for use by automata and the Reason-Phrase is intended for the human
   user. The client is not required to examine or display the Reason-
   Phrase.

   The first digit of the Status-Code defines the class of response. The
   last two digits do not have any categorization role. There are 5
   values for the first digit:

      - 1xx: Informational - Request received, continuing process

      - 2xx: Success - The action was successfully received,
        understood, and accepted

      - 3xx: Redirection - Further action must be taken in order to
        complete the request

      - 4xx: Client Error - The request contains bad syntax or cannot
        be fulfilled

      - 5xx: Server Error - The server failed to fulfill an apparently
        valid request

   The individual values of the numeric status codes defined for
   HTTP/1.1, and an example set of corresponding Reason-Phrase's, are
   presented below. The reason phrases listed here are only
   recommendations -- they MAY be replaced by local equivalents without
   affecting the protocol.

      Status-Code    =
            "100"  ; Section 10.1.1: Continue
          | "101"  ; Section 10.1.2: Switching Protocols
          | "200"  ; Section 10.2.1: OK
[many fields omitted for brevity]
          | extension-code

      extension-code = 3DIGIT
      Reason-Phrase  = *<TEXT, excluding CR, LF>

   HTTP status codes are extensible. HTTP applications are not required
   to understand the meaning of all registered status codes, though such
   understanding is obviously desirable. However, applications MUST
   understand the class of any status code, as indicated by the first
   digit, and treat any unrecognized response as being equivalent to the
   x00 status code of that class, with the exception that an
   unrecognized response MUST NOT be cached. For example, if an
   unrecognized status code of 431 is received by the client, it can
   safely assume that there was something wrong with its request and
   treat the response as if it had received a 400 status code. In such
   cases, user agents SHOULD present to the user the entity returned
   with the response, since that entity is likely to include human-
   readable information which will explain the unusual status.

The problem

I have marked the key phrases which the current implementation does not take properly into account. Simply, they boil down to this: the reason phrase is absolutely meaningless (sigh). I didn’t notice or didn’t pay attention to this first time around.

I also treated the field as case insensitive, having observed things written in lowercase sometimes, and having looked at how something else, I forget what, did it. This is incorrect; while in various places the spec defines things as case insensitive, the Reason-Phrase does not fall under such a category. The Reason-Phrase, in all its uselessness, must thus be considered case sensitive.

Well then—can we just drop the reason phrase?

The answer is both yes and no.

Yes: the integrity of the protocol itself is not affected if the reason phrase is altered; the semantics of the protocol lie purely in the Status-Code.

No: at the lowest level, we must provide the Reason-Phrase intact, because it may be meaningful. For example, someone might be writing an HTTP inspection tool for which they wish to report this value, or someone else might be writing a proxy, where changing the value would be a highly suspect move.

No: although the specification declares the number to be all that matters to a machine, there are cases (mostly with unregistered status codes) where people have used the same status code with multiple distinct meanings, and one may need to figure out what that meaning is. For example, the status code 451 has been used by Microsoft as “Redirect” in Exchange ActiveSync (niche, granted), and as “Unavailable for Legal Reasons” now.

So then, how do we reconcile this?

The possible consequences

By the way, this is a genuine problem and must be fixed; as it is, it will lead to people comparing statuses and suddenly finding bugs appearing when servers use different reason codes and all of a sudden their comparisons are not working. It might also get people comparing codes, as status.code() == 200 if they know of this deficiency. I do not want either of these to happen.

Some solutions

One solution to the primary symptoms is to change equality checking on a Status (the Eq implementation) to just compare the Status-Code and not the Reason-Phrase. Another method can be provided to check strict equality, inclusive of Reason-Phrase:

impl Eq for Status {
    fn eq(&self, other: &Status) -> bool {
        self.code() == other.code()
    }
}

impl Status {
    fn strict_eq(&self, other: &Status) -> bool {
        self.code() == other.code() && self.reason() == other.reason()
    }
}

This doesn’t sit especially well with me (in Rust, people expect equality comparison to check everything), but it would serve the purpose.

Another not incompatible solution is to make it so that unless the user needs the reason-phrase, a known status is normalised so that 200 All Good will come through as Ok rather than as UnregisteredStatus(200, ~"All Good"). This could be arranged with a response object from a request containing two fields, status: Status containing the normalised status and raw_status: Option<Status> containing, if it differed, the unnormalised status. At the lowest level, of course, the status code will not be normalised.

Either of these solutions will take care of the majority of cases, and each leaves a gap of potentially surprising (hence undesirable) behaviour. I am mildly inclined at present to go with both, but I would like opinions on the matter.

The status code class

This part is still relevant in take two.

One other thing I am going to add is better support for the class of a status. I don’t want people to be writing status.code() >= 400 && status.code() < 500; they should instead be able to write status.class() == ClientError.

The representation technique: enum or struct?

At present Status is an enum. It could also be represented as a simple struct with plenty of constants:

pub struct Status {
    code: u16,
    reason: std::str::SendStr,
}

// 1xx Informational
pub static CONTINUE: Status = Status { code: 100, reason: Slice("Continue"), };
pub static SWITCHING_PROTOCOLS: Status = Status { code: 101, reason: Slice("Switching Protocols"), };
pub static PROCESSING: Status = Status { code: 102, reason: Slice("Processing"), };

// 2xx Success
pub static OK: Status = Status { code: 200, reason: Slice("OK"), };
...

Is this feasible? Sure. Is it better? I don’t know.

Using a struct and constants would remove the clash on the name Ok which the Result type uses (it would become OK).
An enum uses less memory and is faster to compare on.
Using a struct and constants would allow slightly better ergonomics for others using unregistered statuses (they could create their own statics, rather than needing to have a function that generated it, owing to the ~str contained in the current model, though that could also be changed to use SendStr), leveling the field a little on what is an extremely rare case.
Pattern matching doesn’t work on struct statics (or for that matter, non‐C‐style enums). (Actually, it complains “unsupported constant expr” at the static site, rather than at the match! If you’re careful, you can probably find an ICE nearby—pnkfelix did. Bad.)

At present I’m mildly in favour of the status quo. Either way, if using SendStr, the built‐in statics would still be special if I applied the normalisation technique, unless I were also to retain a mapping of IDs to statuses, ([Option<&'static Status>, ..500], it would be, I suppose. Nasty global state.)

Bear in mind also that the method currently uses the same technique, and should, I think, use the same technique in general.

But really, I’m open to being swayed. Do you have an opinion? Same goes for the earlier questions.

Feel free to chip in to the discussion at /r/rust.