Last modified: 2021-08-30 20:52

Adding HTTP to Gopher Clients

Why add HTTP to gopher?

The final trigger to add HTTP to a gopher client was a very simple question: How can I add content to gopher space? I mean, where is the server? Here are some possibilities:

  1. I have seen that there are people offering gopher space for free (e.g. tilde.something). This could be an option.

  2. Running a server at home. This can quickly become a security nightmare so I'm wondering how this is properly done.

  3. Have you ever tried to get a gopher server from a commercial vendor? They sell LAMP-servers (Linux, Apache, MySQL/MariaDB and PHP) of all sizes but no gopher.

That is where I started and putting stuff on a tilde servers looks like the best answer - at least for now. Later I might go with a server hosted at home. On the other hand, at least some tilde servers have content restrictions (no DOS execs, no Java archives), which is ok (their server, their rules), but I might still want to put that online.

But I was also already thinking about adding HTTP to gopher for two more reasons:

  1. The server side is what I just explained: It is difficult to get a gopher server but HTTP server are a commodity. (And yes, I have one.)

  2. Then there's is also the client side. Most Internet clients don't speak gopher. What they do is HTTP. Formatting the content in a suitable way for HTTP makes the files available for a larger audience. This could also bring "normal people" into touch with gopherspace.

  3. A gopher client, which is able to render some simple HTML could also fetch regular HTTP/HTML content and show it to the user without making the user switch the client to a web browser.

The last one is not so important yet but the first two address the availability and accessibility of original Gopher content, which I think are important.

Of course, adding HTTP to a gopher client doesn't turn the server into a full gopher(+) server. It only enables clients to fetch files by HTTP. So what need's to be done?

HTTP and Gopher: Protocol Basics

Some time ago I read that HTTP is difficult compared to gopher. I agree that modern and complete HTTP may be difficult but to get files from an HTTP server it is sufficient to sent this:

GET /gopher/gopherplus.txt HTTP/1.0
Host: www.example.com

(This seems to work most? of the time, but e.g. accessing tools.ietf.org returns an error message.)

The whole request consists of three lines:

  1. The GET method along with the path (or "selector" in gopher terms) and the protocol version.
  2. The HTTP host to which the request goes. This enables virtual servers (multiple servers running on the same IP; gemini has a different mechanism for that) and is mandatory most of the time.
  3. An empty line to end the request. (A client can basically send "any" amount of request headers to the server.)

That is more than a gopher request, which would be only the selector:

/gopher/gopherplus.txt

Yes, that's definitely less than HTTP but that doesn't make HTTP "difficult".

The response is slightly different. For Gopher0 the server just sends the data (gopher+ would put a status flag plus length information into the first line). An HTTP server may send all kind of stuff (including e.g. cookie request) in the response header but the really important things are

HTTP/1.1 200 OK
Content-Type: text/plain
Content-Length: 34848
  1. The first line shows the request's status code: 200 is ok and what you usually should get back. 3xx is a redirection if the URL does not (longer) match the item and 4xx, 5xx indicate request or server error. The thing with them is, you can display the error messages as text to the user. What else could be done?

  2. The Content-Type header tells the client the type of the data it is going to receive. If the request is based on a selector from a gopher directory is could be ignored because the directory entry comes with a type indicator. That is, the client knows the content-type before sending the request but it is also not to difficult to integrate that header into the client code (which is what I would recommend). Notice that for text/plain is may be omitted because that's the default content-type.

  3. The Content-Length header can be safely ignored and the client can read until the server closes the connection (just as with gopher). (The content's length is important for "keep-alive" transactions.)

  4. Finally there's an empty line telling that client that the header is finished and the next data is the content.

I agree that this is more to do than for gopher but really, I wouldn't call that difficult.

Patch Example

I have written a patch to add HTTP to vf-1 (get it from Github or from here. It adds 81 lines of code to vf-1 increasing its length from 944 lines to 1025 (not counting empty lines of comments) adding approx. 7% to the code base. The patch includes

and for this amount of added "complexity" vf-1 is now able to talk to today's dominant server type. I think that that is a good deal.

Here's the patched version: vf-1.

Differences

Well, there are differences between Gopher and HTTP but as shown above they are not in the transfer protocol. Instead, they are in the content and the addressing scheme. HTTP is short for "Hyper Text Transfer Protocol". As outlined about the problems are not in the TP (transfer protocol) but in the "Hyper Text" part.

"Hyper Text" means HTML and inside HTML you will find hyperlinks like

<A HREF="gopherplus.html">Gopherplus Specification</A>

Any HTML renderer knows how to display and to feed that to the HTTP client when the user chooses the link to follow. A gopher client would be lost (but the functionality can be easily added).

Why would a gopher client have problems with the link above? When the link above would be part of a directory it may by something like

hGopherplus<tab>/gopher/gopherplus.html<tab>gopher.example.com<tab>70

There are two significant differences to the HREF from above:

  1. The gopher directory indicates the type of the referenced file, h in the example. This is missing in the HREF; there is no GOPHER-TYPE attribute.

  2. Gopher doesn't know anything about relative URLs. Gopher directories list always the full path to the item. The path element is/was not even called path, but selector.

The issue with the missing GOPHER-TYPE attribute can be solved by looking at the HTTP's content-type header. From that the client knows what to do with the data. (The user still does not know what content he gets by clicking on the link.) But, this works only, if the file lives on an HTTP server. Consider the referenced HTML file is on a gopher server. There are no content-types in the server response because the type is in the gopher directory (which is not present here).

So, what now? The answer is pretty easy: the client inspects the file extension. In the world of 2021 it's safe to assume that e.g. .html is for text/html and .png is image/png. In fact, that's the ways how HTTP servers determines a file's content-type. They take the extension, look it up in /etc/mime.types (or whatever is configured) and use that as type for the response. This does not solve situations where the MIME-type definitions differ between client and server. But again, in 2021 I expect that this is a very rare case.

This is effectively the way how gc addresses the whole content-type story: take the gopher type from the directory as first information but take the server's content-type header if present. Furthermore, if the content-type is application/octet-stream (either because of a gopher type of 4, 5, 6 or 9 or because of the content-type header) gc reads /etc/mime-types to determine the file's type (e.g. a PDF) based on the extension.

That was for issue #1, the missing GOPHER-TYPE in A tags. Now what about #2, relative URL addressing? Here are significant differences between HTTP and gopher addresses:

  1. HTTP path begin always with a slash (/).
  2. Directory paths end always with a slash.
  3. Links (HREFs) that do not begin with a slash (or server name of course) are relative to the directory of the referencing document.

None of this this three statements is true for gopher. You can design your server to use gopher selectors that have the properties above but the client can not know, assume or rely this.

In the context of gopher directories and plain-text files this problem does not exist. The directories are created by the server and the server can insert absolute paths to the items. And plain-text was not expected to have links inside. However, times have changed. Today we might want to put HTML or gemtext on gopher servers. The original Gopher0 did already know about HTML texts and Gopher+ supports them as well. The thing is simply that relative addressing from HTML files is not working well.

I think it is suitable that gopher clients (this problem is not on the server side) implement relative URL addressing as HTTP-browsers do. This will not work with all servers because the selectors must have the three properties above. If that's not the case, it doesn't break gopher, the server simply can't serve HTML files (or other format with relative links inside) and expect the client to follow the links. So, by adding relative addressing to the client it's up to the server to support this optional feature it or not by implementing item selectors according to http's URL scheme.

One additional note to this: If the gopher server expects that the gopher type character in the path (UMN gopher is doing this, right?) then relative addressing will not work when the link goes to e.g. a picture because it would have g or I as indicator but not h.

An Unsolved Issue

So it is basically possible to make Gopher clients fetch files from HTTP-servers and to implement rewriting relative URLs the way web browsers do it. However, there is something that can not be solved that easily. Consider

Now what is the correct Gopher+ information for b? Is this in the +-block in D's $ listing? Or does the client have to pick up that information from B with a ! request to b? But HTTP-Servers have never heard of $ or ! requests. $ can be delivered by putting an appropriate file on the server but for ! an information file must be created or each file on the server. Furthermore, the client must have some knowledge about the !-file's filename to read for the information block. That could be appending a .plus to the filename or similar. But there is no such arrangement. (Gopher+ solved that by defining the ! request.)

CGI-script could be an option for the individual !-files. The script would query the $-file and extract the relevant data. But first, I would prefer something that does not require scripting and second, this would still not answer the question with the filename or URL to request when the client wants the +-information.

The whole problem of possible multiple information blocks is not related to HTTP but comes with Gopher+. Clients can retrieve the extra information with a ! request, which answers the access questions. Still there is no answer which +-block is correct or fits better. Since I also don't see a compelling reason or suggesting answer to address that question I tend to leave things as they are by not using !-requests, neither for Gopher+ nor HTTP.