Hunchentoot: requests and replies [b]

09d August 30, 2019 -- (tech tmsr)

This post is part of a series on Common Lisp WWWism, more specifically a continuation of "Hunchentoot: requests and replies". In this second part of "requests and replies" we will dissect "replies" and what remains of "requests", in precisely the reverse order.

We ended last time with a look at code that parses GET and POST requests. This time we will look, in the following order, at: a set of "interface methods" aimed to be employed by the user when implementing request handlers; the process-request method; and, last but not least, the methods exposed by the "reply" component.

First, we notice that most request accessors come with wrapper methods whose names end in a "star", which, by some arbitrary convention, means that they implicitly bind their current "request" parameter to the special variable *request*1. Let's take a look at a few of these:

[sns] script-name*

[qss] query-string*

[gps] get-parameters*

[his] headers-in*

[cis] cookies-in*

[ras] remote-addr*

[rps] remote-port*

[las] local-addr*

[lps] local-port*

[rus] request-uri*

[rms] request-method*

[sps] server-protocol*

The implementation of these functions is, as can be readily observed, trivial, and the meaning should be easily deducible from the name; thus I won't bother the reader with redundant details. However, there's also:

[pps] post-parameters*: This one calls post-parameters, which:

[pp] post-parameters: Runs ":before" more specific implementations of this method are called; given a request, it calls maybe-read-post-parameters. If the post-parameters slot is nil, then set the force parameter to true -- in other words, "force" maybe-read-post-parameters to populate that slot with the parsed POST parameters if they don't already exist.

Similarly:

[rrp] recompute-request-parameters: Calls maybe-read-post-parameters with the force argument always set to true. Also, set the get-parameters to the re-parsed value of query-string.

However, we also have:

[hi] header-in: Gets a specific header from the headers-in slot.

[his2] header-in*: The same as header-in, with the request parameter bound to *request*.

And there's also these methods which dynamically bind *request*, although their name doesn't end in "star"2:

[a] authorization: Reads the "authorization" header and tries to parse the user/password combination associated included, if they exist. Notice the magic number 5 used in this shitty piece of coad, because writing a proper parser for yet-another-set-of-ad-hoc-encoded-parameters is too much work.

[rra] real-remote-addr: Reads the "X-Forwarded-For" header and, if it exists, it parses the client and proxy address fields and returns them. Apparently there's no "official spec" to be found for this, apparently the whole thing's superseded by some other syntax specified in some RFC. Hunchentoot doesn't parse this other one, however, so... well!

[h] host: Reads the "Host" header.

[ua] user-agent: Reads the "User-Agent" header.

[ci] cookie-in: Looks up the cookie with the given name in the cookies slot.

[r] referer: Reads the "Referer" header.

[gp] get-parameter: Looks up a specific GET parameter in get-parameters.

[pp2] post-parameter: Looks up a specific POST parameter in the post-parameters slot.

[p] parameter: Looks up a GET parameter, and, if not found, a POST parameter with a specific name.

[hims] handle-if-modified-since: Reads the "If-Modified-Since" header and compares it with a given time argument. If the two dates match, then a. set the reply as follows: a1. the content-length to nil; a2. remove the content-length header; a3. the return-code to http-not-modified; then b. abort-request-handler.

[rpd] raw-post-data: a. First, try to set an external-format based on the values of arguments force-binary, external-format and force-text; if no external-format is set and force-binary is not set, then try getting one from external-format-from-content-type, and if that fails, fall back to *hunchentoot-default-external-format*.

b. Get the raw-post-data slot; if not set, then get post data by calling get-post-data. Given a local binding to raw-post-data, the return value is determined based, in order, on the following conditions: b1. if raw-post-data is a stream, then return it; else b2. if raw-post-data is t or nil, return nil; else b3. if external-format was set, octets-to-string on raw-post-data and return the result; otherwise b4. return raw-post-data as-is.

[arv] aux-request-value: If the given request is non-nil, then look up a value in the alist given by the aux-data slot. Return, as multiple values, both the value and the symbol-value pair itself.

There's also a defsetf defined for this function, which looks up the value and a. if it exists, sets the new value, and b. if it doesn't, pushes a new symbol-value pair to aux-data.

[darv] delete-aux-request-value: Similarly to aux-request-value, operates on aux-data; this one deletes a symbol-value binding from the alist.

[rp] request-pathname: Given a request, a. get its script-name; and b. parse-path. If drop-prefix is set to a string representing a path prefix, then the pathname is returned sans said prefix.

[pp3] parse-path: Sanitization function used by request-pathname to bring a path-string to the proper abstraction level, i.e. a Common Lisp pathname. After a. parsing the namestring using either parse-namestring or some implementation-specific function, e.g. SBCL's parse-native-namestring, run the following checks:

b. the host field of the pathname is nil, or it equals that of *default-pathname-defaults*; c. the device field of the pathname is nil, or it equals that of *default-pathname-defaults*; d. the directory component is nil, or it's a relative pathname without :up and :wild; e. the pathname name and type fields are either nil or strings; f. the namestring isn't "..".

g. When (b)-(f) are satisfied, return the parsed pathname.

Since all these functions bind at least one of their arguments to *request*, we also have:

[wrp] within-request-p: When *request* is bound, return it.

And finally:

[pr] process-request: This function is on the main request processing/handling path, and is called by process-connection whenever a new HTTP request is available. It does a bunch of more or less related things, let's take them one by one.

a. the first part of the function a1. defines and binds some special variables, e.g. the current *request* being processed, and a2. wraps everything into some condition handling code, namely: a2i. it "maps" all conditions3; a2ii. wraps this in an unwind-protect4; a2iii. all this wrapped in a catch5.

b. the code at (a) wraps a b1. local procedure definition, report-error-to-client, which logs an error and returns a http-internal-error, i.e. start-output on the result of an acceptor-status-message; and b2. a call to handle-request wrapped in a catch for "handler-done"6, with the return values for handle-request bound to "contents", "error" and "backtrace"; b3. if "error", then report-error-to-client; b4. if headers are not yet sent, start-output with the return code of *reply* and whatever contents we have; b4. if no contents are set, then get a default page from acceptor-status-message; b5. if an error occurs during (b4), then call report-error-to-client.

c. this occurs on the "cleanup-form" part of the unwind-protect at (a2ii): if there are any temporary files that were set up during the function, delete them.

This function is all over the fucking place, owing mainly to the pretense of "modularity" and separation between requests and replies -- I did say they're part of the same logical unit, didn't I? Just look at it: "request processing" calls "start output" -- which by the way, is part of "headers", right? but this same "start output" is the one which actually delivers a response to the client, which response delivery oughta be part of "reply"!! So to conclude: this whole shit is in dire need of refactoring and ultimately a complete rewrite. No, no, this time I'm pretty damn sure it's not the protocol's fault for this abomination.

Now that we've exhausted request.lisp, we're left with:

[[]] reply: The "response" counterpart to request objects; holds the response content-type, length, headers, return code, encoding and cookies. Implemented methods:

[ii3] initialize-instance: ":after" the reply object is instantiated, set content-type header to the *default-content-type*.

The following are, similarly to the request accessor functions, wrappers which implicitly bind their reply argument to the *reply* special. Also, most of them also come with a setf definition, which simply set the associated slot of the reply argument (also bound to *reply* unless specified) to a given new-value. I shan't bother to give details where there are none, here they are:

[hos] headers-out*

[cos] cookies-out*: setf-er.

[cts] content-type*: setf-er.

[cls] content-length*: setf-er.

[rcs] return-code*: setf-er.

[refs] reply-external-format*: setf-er.

[hosp] header-out-set-p: Looks up an output header and returns true if found.

[ho] header-out: Looks up a(n output) header and, if found, returns its value. Unlike the previous function, this one also comes with a setf-er which e.g. performs conversions if the name provided is a string etc. and checks the types of content-type and content-length headers.

[co] cookie-out: Looks up a cookie in the respective alist and returns its value if found.

This is it, then: a complete review of all the core architectural components of Hunchentoot. Remember, this program actually works, and not by mere happenstance, but quite deliberately, as the result of what I expect was hard work to bake this tangled shawarma. Note that I don't plan to add any functionality to the current mess, at least not until refactoring/rewriting/bulldozing it, preferably on top of a not-completely-fucked WWW stack. If you want to make a Lisp logger/blog/web front-end using current-day WWW, then it seems unfortunately that this is the best you got.

Having said that, the next episode in the Hunchentoot saga will reveal two items: a genesis of the Hunchentoot that was reviewed here, and one of Hunchentoot plus all its dependencies, for which I can only vouch by noting that they sorta work, not that they do precisely what's written on the label. Then I can start building around this: a comment mechanism for this blog, an IRC logger, a pastebin and all those other fundamental webthings that keep things running.


  1. Special variables have a special meaning in Common Lisp -- see what I did there? Namely, a special variable may be declared globally and "bound dynamically", i.e. its value may depend on its current execution context, such as, say, the function or the thread where execution takes place, if it's been bound this way using e.g. a let -- the naive me, who started his adventure in Lisp programming in Scheme, thought let can only be used for lexical bindings, and yet... look! Now, if there is no such "dynamic binding" for that variable, then the variable's global binding is used, which binding is shared between execution contexts.

    Yeah, what can I say... I didn't write this language, okay?

  2. Not like this so-called convention is specified anywhere, so we're stuck guessing why these are named this way while the others are named the other. I could come up with some ideas, but why bother.

  3. with-mapped-conditions is actually a usocket macro which, annoyingly enough, is not documented. Long story short, it can be used to define a context in which all usocket conditions are safely handled. I guess I'll get to this when usocket is to be either sanitized or ripped altogether out of Hunchentoot.

  4. Which ensures that the code at the end of the function gets executed no matter what conditions occur during execution.

  5. More precisely, "request-processed" throws are caught here, which, as the comment describes, are thrown by start-output after responding to a HEAD request. The idea is, when the request method is HEAD, exit process-request once the headers are sent, bypassing all the other request handling code.

    Since we're here, let's examine start-output: this is the function that pushes the actual response headers and content to the socket. a. determine whether the keep-alive should be set: if either: a1. we used chunked transmission; or a2. the request method is HEAD; or a3. the return code is http-not-modified; or a4. the content-length or the content are set; then set keep-alive-p to true. b. when the acceptor requests output chunking, set the Transfer-Encoding header to chunked; c. if c1. keep-alive-p is set to true, then set *finish-processing-socket* to nil; if a keep-alive was requested, then set the Connection header to Keep-Alive, and set the keep-alive timeout; otherwise, if c2. the Connection header is not set, set it to Close. d. set the Server header, if the acceptor contains one and it wasn't already set. e. set the Date header; f. do some weird-ass URL rewriting for sessions*. g. convert the content to the encoding given by reply-external-format*. h. set the content-length. i. if *headers-sent* was set, then return; otherwise j. set *headers-sent* to true and call send-response, then k. throw a request-processed to end processing. Otherwise, l. make our stream into a "chunked stream".

    From the sausage above, send-response is what does the actual pushing of the first response line, headers, content and so on.

    The only thing this whole load of crap gives is the ability for the user to do his own processing, by writing a custom start-output which throws its own request-processed and bypasses the entire process-request control flow. This is, as far as I can tell, evidence of serious brain rot on the part of whoever wrote this, since the user can write his own thing by simply modifying the existing code, otherwise there being really no need for all this imagined flexibility.

    ---
    *: I don't know what the fuck this is, make sure to keep note of it if you ever plan to use the "sessions" thing.

  6. We get there via abort-request-handler, basically.