Feedbot [i]: the feed db; its manipulation; and the feed checker

^08a March 23, 2019 -- (tech tmsr)

This post introduces the first major component of Feedbot: the feed checker, accompanied by its building blocks. Although this part contains no actual IRC code¹, it rests on top of the Botworks V tree, more precisely Trilemabot:

the V patch; and
my seal.

The rest of this post discusses design and implementation details, although most of the content here is already contained in the patch².

I. The feed db

Feedbot operation is centered around a data structure that I've so pompously called a "feed database" (or feed db). In more detail:

* A feed db is a list of feeds.

* A feed is a list of the form:

(url :title title :entries entries :rcpts rcpts)

where url and title are strings, entries is a list of entries, rcpts is a list of recipients³.

* An entry is a list of the form:

(entry :id id :title title :link url)

where entry is the CL symbol ENTRY; id, title and url are strings. Note that in practice, id and url often (but not always) match.

* A recipient is a string denoting a nick who will receive new entries when they are added to the database.

II. Feed db manipulation

Functionality pertaining to the feed db is split into the following categories:

a. "Low-level" functions operating on the feed db, feeds, entries and recipients; examples include setting the title of a feed, adding entries or searching for a recipient.

For example, the functions below:

(defun lookup-feed! (feed-id feed-db)
  (assoc feed-id feed-db :test #'string=))

(defun find-entry-in-feed! (feed entry-id)
  (find entry-id (get-feed-entries! feed)
        :key #'get-entry-id
        :test #'string=))

lookup a feed in a given feed db and, respectively, an entry within a feed.

b. A macro, with-feed-db, providing a thread-safe context for feed db processing; see notes below for further details. The implementation is reproduced below:

(defmacro with-feed-db ((feed-db) bot &body body)
  "Execute code within the thread-safe `feed-db' scope of `bot'."
  (with-gensyms (db-mutex)
    `(with-slots ((,feed-db feed-db) (,db-mutex db-mutex))
         ,bot
       (with-mutex (,db-mutex)
         ,@body))))

c. Interface, or "high-level" methods to be called by e.g. the bot operator or by the IRC-facing code. These typically bear the following form:

(defmethod feedbot-... ((bot feedbot) arg1 arg2 ...)
  ...)

For example:

(defmethod feedbot-get-or-create-feed ((bot feedbot) feed-id)
  "Get feed with id `feed-id' from the feed db of `bot'.

If `feed-id' doesn't point to a feed, a new feed with that id is created
and inserted into the feed db."
  (with-feed-db (feed-db) bot
    (let ((feed (lookup-feed! feed-id feed-db)))
      (when (not feed)
        (setq feed (list feed-id :title "" :entries nil :rcpts nil))
        (push feed feed-db))
      feed)))

is self-explanatory.

Note: feedbot operates in a concurrent environment, where multiple threads may access the feed db at a given time; for example, the feed checker and SBCL's shell. Thus, all (c) functions are implemented in terms of (a), and furthermore, they use (b) in order to ensure thread-safety. We distinguish between thread-safe and unsafe functions by employing the following convention:

Feed db functions whose name end in ! (also named below !-functions) are thread unsafe and should be used only in conjunction with the db-mutex or with-feed-db.

Note: the feed db should also reside on a persistent medium. This functionality will be implemented later.

III. The feed checker

The feed checker runs on a so-called "checker thread", that periodically (see *check-freq*) runs the feed db update code. Additionally, the feed checker delegates new (previously unseen) entries to a so-called "announcer"⁴.

To test feedbot feed checker functionality, simply run:

> (defvar *feedbot*
     (make-instance 'feedbot:feedbot))
> (feedbot:feedbot-start-checker-thread *feedbot*)
> (feedbot:feedbot-get-or-create-feed
   *feedbot* "http://thetarpit.org/rss.xml")

then sit back and enjoy the feeds as they come.

Achtung! Spoilers below:

At the moment of writing, Feedbot is supposed to comprise three parts, in this order: one. a feed checker; two. a feed announcer; and three. an IRC-based interface.

The disadvantage of doing it this way is that for two of the three parts, I'm pushing patches downstream of ircbot that don't use any ircbot*. But let's imagine for a moment that I did it the other way around -- now the reader can stand up a nice IRC bot implementing some commands that do what, more precisely? They call empty functions? They mock the IRC interface?

So that's how things are: at the end of part one, you have a working bit that checks for new feeds and looks at them and so on and so forth; at the end of part two, you also have a (small) bit that looks at new content and consumes it; and finally, after part three you have the whole thing.

---
* As an aside: notice how Feedbot imports the Feedparse code ad litteram. The separate V tree still exists if you want to use it in your own thing, but otherwise that item's been completely glued to Feedbot, and thus to Botworks. This is no news, the same happened before with Eucrypt and MPI.↩

For the record:

$ wc -l *.lisp
  307 feedbot.lisp
   28 feedbot-utils.lisp
   26 package.lisp
  361 total
$ grep '^ *;' *.lisp | wc -l
104

That is, comments comprise about one third of the code. This is roughly similar to other republican code ↩

These so-called "recipients" are for now completely meaningless and thus useless. So why add them? Let's explore the possibilities; we could have the feed db organized as: a. a list of feeds, each containing a list of entries and recipients; b. a list of recipients, each with its own list of feeds, each with a list of entries; and c. two separate lists, one with recipients, each recipient with a list of subscriptions (feed IDs), and one with feeds, each feed with a list of entries.

First, we observe that in all cases, entries are subordinated to feeds -- this sounds like the correct relation, I hope that we're on the same page here. Second, we observe that each of a, b and c has its own trade-offs in terms of space and time usage. For example, "recipients" are useless if for some reason there's only one feedbot user; also, subordinating feeds to recipients leads to duplicated feed checking, at least unless we do some tricks with references to lists; also, separating feeds and recipients requires an efficient lookup algorithm for feeds and recipients, in turn requiring a data structure that's "better" than S-expressions (at least in this respect), which then later will require a (more complex) serialization/deserialization piece when we're saving things to the disk; also, feel free to add any pros and cons to this list.

So stemming from these trade-offs, I made the decision to use this particular structure, from which stems this particular feed checker implementation. Maybe a "better" one will arise at some point, although experience shows "better" tends to arrive at the problem of "worse" after a while.↩
This "so-called" announcer is supposed to notify recipients of new content whenever it's posted. For now, the announcer is a stub that prints new entries to standard output.↩