My Personal Information Filtering System Design

2023-07-07

I think it is important to have a unified interface to pre-process the incoming information. My "serious" information source includes Emails and Feeds.

I follow (Voit 2021) here to do the analysis.

Requirements Analysis

Must Haves

be able to integrate with My Personal Knowledge Management System Design smoothly
the filtered contents result as the input for my PKM, so this process need to be smooth
be able to handle Email
mailing lists and newsletters are a main source for my input
be able to handle RSS/ATOM feed
feeds are another source for my input
be able to (be extended to) handle other kinds of web updates
I wish to fully unburden the need of visiting websites one by one
be able to handle all these input without feeling overwhelming
The internet is big, and so is my feedings.
be offline cached
I want to view contents offline, so that networking condition does not generally affect the integrity of the information I subscribed to at the moment of opening my system
be a unified interface, or at least several front-ends that share somewhat similar interfaces
I want to reduce the work of re-expressing the same thing multiple times, so it is definitely great to

Nice to Haves

be non-blocking when refreshing
It is nice to have the refreshing in the background.
be able to display the contents in the system
It's okay if the system delegate displaying to web browser or other external tools, but displaying things natively is a nice-to-have point.
be remotely accessible (via ssh etc)
I can run the whole system on my local computer, but it is nice if the system can also be run on my NAS and I can access it from my other terminals.
be accessible from mobiles
I don't really do serious reading on my phone, but again a nice-to-have feature so that I can filter out unimportant things during my waiting time.
be able to save the original content as-is permanently
I don't really need the system to save content as-is at such an early phase in my information absorbing chain.
"standardized" instead of something from scratch
If the system is following some common standard, I can easily switch or reimplement some parts of it, I can easily switch or reimplement some parts of it.

Non-requirements

be able to notify new content availability
I don't need the system to tell me how many unread emails do I have.
be able to suggest related content online (by big data analysis)
I don't trust such kind of things.
be able to share the content to others
I can do this with inner parts of my PIM.
be "up-to-date"
It does need to have newest information when I open it, just a snapshot of things recent enough is fine.

Method Chosen

I use a combination of (Mann 2007), (Tietze 2014) and (Fraser 2013).

That is:

  • email / feeds as a fraction of the whole web
  • prioritize them as they arrived, with tagging to hint what makes each of them interesting to read
  • go over the inbox and pick out items worth reading into a reading list. At the same time, reply/delete the items as needed (non-deleted items are archived).
  • consume the reading list, and further drop these turns out to be irrelevant. For the rest of them, take notes with my PKM.

Analysis on the Tools Available

I'll pick Emacs apps as my front-end, since they fulfill my first must-have requirement easily.

Clients (front-ends)

There are many general discussions on the choices:

Rmail

Rmail is a built-in MUA for Emacs. It seems not used by most of the community, so I did not consider it as a choice from the beginning.

Gnus

Gnus is a all-in-one client, which is famous for its slowness (not a big issue with local mail server set up as (Chamberlin 2023)), difficulty to set up, and crazy rich of feature sets as (Bin 2014; voltecrus 2018) suggest.

It is more like a front-end to news group rather than a front-end to email, and is greater at handling mailing list rather than individual personal back-and-forth conversations.

  • It is integrated into Emacs ecosystem already, because it is a built-in
  • It is able to handle Emails (mainly for net protocols I think, although Maildir is also supported)
  • it does not handle ATOM feed due to its age, but there is an extension addressing this.
  • It is easy to handle many kinds of web updates with Shimbun, and also it supports some good old solutions like nntp.
  • It has scoring, and many other extensions to prioritize the inputs.
  • It does not do offline caching out of the box I think, but with an external Maildir set-up at least emails are pre-fetched.
  • the interface is definitely unified.
  • it blocks heavily when refreshing, so people in communities usually run it in a separate Emacs instance. This is not a big problem with a local email server. See (Purcell 2010).
  • It use eww to display contents.
  • It is not remotely accessible.
  • It is not accessible from mobiles.
  • It does not have built-in support for as-is permanent archive, but should be easily extensible.
  • It's Emacs built-in, I think it is standardized.

Wanderlust

wanderlust is Another all-in-one client, maintained by the community. It is on top of w3m instead of Eww. Its documentation is really out-of-date, but actually it is still maintained. There are (LdBeth 2017a; Fu 2018) to give views on a subset of its functions. The author of Mu4e also has some (old) write-ups about it: (Binnema 2011, 2009b, 2010, 2009a).

  • it is not integrated as deeply as Gnus, but still it is Emacs!
  • It has decent Email support, including net protocols and Maildir.
  • It can handle Feeds
  • it also has Shimbun support (and I think this integration is the best among all the supported client)
  • It has mu and Notmuch supports to filter the inbox.
  • it has support for Maildir, and it can even cache things by itself.
  • The interface is mostly unified.
  • it blocks when refreshing, but with supports of Maildir it is okay
  • it use W3m to display things
  • It is not remotely accessible.
  • It is not accessible from mobiles.
  • It does not have built-in support for as-is permanent archive, but should be easily extensible.
  • It is not Emacs built-in, and it has its own MIME/CL-like/web libraries, so it is not that standardized.

Mu4e

mu4e is a dedicated MUA on top of mu, a Xapian based text indexing tool that focuses on searching. Its UI is said to be more Emacs-ish according to (Jie 2019; Wellons 2013). It is a popular new MUA implementation in the ecosystem, as (Snader 2023, 2015; Lafon 2023; Bertrand 2021; Maughan 2015) suggests.

  • It is much more light-weight than Gnus or Wanderlust, makes it easier to integrate with other code.
  • It depends on Maildir
  • It cannot handle feeds. But we can have external tools to work around it.
  • It cannot handle other web updates. But we can have external tools to work around it.
  • It uses mu to filter by searching.
  • It relies on Maildir, so yes
  • Since it does not support Feeds and other web updates natively, we need some hard work to make the interface unified
  • Maildir is updated by external tools, so there is no refreshing overhead in mu4e.
  • In the email list it does not have collapse feature built-in, so mails of the same thread take several lines. it reuses Gnus' gnus-article-mode (with a derived mode) to display messages. So everything that works for Gnus should work for it too. This also means that only one post in the thread is displayed instead of a whole thread (like Notmuch).
  • It is not remotely accessible, claimed by (Wellons 2013).
  • It is not accessible from mobiles.
  • It does not have built-in support for as-is permanent archive, but should be easily extensible.
  • It is not Emacs built-in. It mainly depends on Maildir standard only(with mu as a searching tool), as (Nelson 2015) hints, so it integrates with other tools.

Notmuch

Notmuch is a dedicated MUA on top of notmuch, a Xapian based text indexing tool with the same name (which makes search engine unhappy) that focuses on tagging. I actually used notmuch (the searching back-end) with mutt as the front-end several years ago when I was still using Neovim, and the experience was great. It is also quite popular in the ecosystem, as (Zakkak 2021; Sapka 2023; Siewierski 2019; Korytov 2021) suggests.

  • It is much more light-weight than Gnus or Wanderlust, makes it easier to integrate with other code.
  • It depends on Maildir
  • It cannot handle feeds. But we can have external tools to work around it.
  • It cannot handle other web updates. But we can have external tools to work around it.
  • It uses notmuch to filter by tagging.
  • It relies on Maildir, so yes
  • Since it does not support Feeds and other web updates natively, we need some hard work to make the interface unified
  • Maildir is updated by external tools, so there is no refreshing overhead in notmuch. (tohiko 2021) says it is faster than mu4e.
  • It has a thread-based message view, quite like Gmail. It uses Eww to display the HTML contents.
  • It can be accessed via SSH, claimed by (Wellons 2013).
  • It is not accessible from mobiles.
  • It does not have built-in support for as-is permanent archive, but should be easily extensible.
  • It is not Emacs built-in. It saves tags in a separate database (while keep the Maildir mostly immutable), as (Nelson 2015) hints, so it is harder to integrate with other tools than mu4e.

Elfeed

Elfeed, different from the above ones, is an RSS client, and it is said to be inspired by Notmuch. It is also very popular in the community, as (Cundy 2022; Liujacai 2021) show.

  • It is much more light-weight than Gnus or Wanderlust, makes it easier to integrate with other code.
  • It depends on Maildir
  • It cannot handle feeds. But we can have external tools to work around it.
  • It cannot handle other web updates. But we can have external tools to work around it.
  • It uses notmuch to filter by tagging.
  • It relies on Maildir, so yes
  • Since it does not support Feeds and other web updates natively, we need some hard work to make the interface unified
  • Maildir is updated by external tools, so there is no refreshing overhead in notmuch.
  • it use Eww to display the contents.
  • It can be accessed via SSH, claimed by (Wellons 2013).
  • It is not accessible from mobiles.
  • It does not have built-in support for as-is permanent archive, but should be easily extensible.
  • It is not Emacs built-in. It saves tags in a separate database (while keep the Maildir mostly immutable), as (Nelson 2015) hints, so it is harder to integrate with other tools than mu4e.

Local Email Solutions

As local caching considered an important requirement in my set up, here are the tools:

offlineimap

Offlineimap fetches emails from servers into Maildir folders. It is implemented in Python, so its performance might not be that great. Its configuration can basically invoke most kinds of Python code.

mbsync

mbsync fetch emails from servers into Maildir folders. It is implemented in C.

dovecot

Dovecot is a local email server.

RSS to Email Solutions

Given that not all front-ends support feeds, integration between feeding and emailing might be a reasonable work-around.

feed2maildir

feed2maildir is implemented in Python. There is also a fork (Warburton 2017) that makes it fits more into the Unix Philosophy: do one thing, do it well.

The upstream implementation uses a database to save information about last checked time for each feed, and converts each item directly into a Maildir file.

imm

imm is implemented in Haskell. It saves information about feeds in a database, and make callbacks (with JSON intermediate) to external tools. It comes built-in a tool that sends emails, and there is no direct Maildir support.

feembox

feembox is implemented in Rust. It saves items to Maildir files directly, and it seems to not have any database saving the progress for each feed.

universal aggregator

ua is implemented in Go. It has a bunch of executables, mostly following the Unix Philosophy:

  • a cron-like timer
  • a tool to fetch new items
  • a tool to convert new item into Maildir file

Other Web Updates Solutions

Shimbun

Shimbun is basically a web scraper interface implemented in Emacs. See (LdBeth 2017b). It is part of W3m, and there is already Wanderlust and Gnus integration.

RSSHub

RSSHub is an "extensible RSS feed aggregator" which can be deployed privately.

rss-bridge

rss-bridge is a "PHP project capable of generating RSS and Atom feeds for websites that don't have one."

huginn

huginn is a really all-in-one web automation written in Ruby, like a private version of IFTTT. See (笠三叶 2015). I'd say it looks scarily feature-rich.

Implementation

I'm going to try out different combinations of the above tools recently (thanks to Guix!)

I'm currently using mu4e.

Reference

Bertrand, Aimé. 2021. “Email Setup in Emacs with Mu4e on Macos.” June 16, 2021. https://macowners.club/posts/email-emacs-mu4e-macos/.
Bin, Chen. 2014. “A Practical Guide to Gnus.” August 9, 2014. https://github.com/redguardtoo/mastering-emacs-in-one-year-guide/blob/master/gnus-guide-en.org.
Binnema, Dirk-Jan C. 2009a. “E-Mail with Wanderlust.” June 8, 2009. https://emacs-fu.blogspot.com/2009/06/e-mail-with-wanderlust.html.
———. 2009b. “Wanderlust Tips and Tricks.” September 19, 2009. https://emacs-fu.blogspot.com/2009/09/wanderlust-tips-and-tricks.html.
———. 2010. “Wanderlust Iii.” February 23, 2010. https://emacs-fu.blogspot.com/2010/02/i-have-been-using-wanderlust-e-mail.html.
———. 2011. “Searching E-Mails with Wanderlust and Mu.” March 31, 2011. https://emacs-fu.blogspot.com/2011/03/searching-e-mails-with-wanderlust-and.html.
Chamberlin, Giles. 2023. “Using Gnus with a Local Email Server.” June 13, 2023. https://gileschamberlin.wordpress.com/2023/06/13/using-gnus-with-a-local-email-server/.
cidra_, Giles Chamberlin, lichtbogen, pathemata, NapoleonWils0n, Fox Kiester, and Karthik Chikmagalur. 2023. “Alternative to Gnus for Reading Both Emails and Rss Feeds?” June 6, 2023. https://old.reddit.com/r/emacs/comments/142cdox/alternative_to_gnus_for_reading_both_emails_and/.
Cundy, Chris. 2022. “Managing Arxiv Rss Feeds in Emacs.” March 26, 2022. https://cundy.me/post/elfeed/.
Fraser, James. 2013. “The Fraser Lab Method of Following the Scientific Literature.” September 28, 2013. https://fraserlab.com/2013/09/28/The-Fraser-Lab-method-of-following-the-scientific-literature/.
Fu, Yuan. 2018. “Wanderlust.” September 28, 2018. https://casouri.github.io/note/2018/wanderlust/index.html.
Jie, Pan. 2019. “Post #12 in Emacs 下比较好的邮件方案是什么.” April 28, 2019. https://emacs-china.org/t/emacs/9145/12.
Korytov, Pavel. 2021. “Mail.” June 18, 2021. https://sqrtminusone.xyz/configs/mail/.
Lafon, Alain M. 2023. “Inbox Zero Hack: Achieving Productivity Bliss in the New Year with Mu4e.” January 5, 2023. https://200ok.ch/posts/2023-01-05_inbox_zero_hack:_achieving_productivity_bliss_in_the_new_year_with_mue.html.
LdBeth. 2017a. “Wanderlust.” March 28, 2017. https://github.com/LdBeth/Emacs-for-Noobs/blob/master/WanderLust.org.
———. 2017b. “W3m Shimbun - a Tool for Reading a Newspaper.” April 19, 2017. https://github.com/LdBeth/Emacs-for-Noobs/blob/master/Shimbun.org.
Liujacai. 2021. “使用 Emacs 阅读邮件与 Rss.” March 5, 2021. https://liujiacai.net/blog/2021/03/05/emacs-love-mail-feed/.
Mann, Merlin. 2007. “Inbox Zero.” Google TechTalks. July 23, 2007. https://www.youtube.com/watch?v=z9UjeTMb3Yk.
Maughan, Ben. 2015. “Master Your Inbox with Mu4e and Org-Mode.” December 17, 2015. https://pragmaticemacs.wordpress.com/2015/12/17/master-your-inbox-with-mu4e-and-org-mode/.
Nelson, Mark J. 2015. “Search-Oriented Tools for Unix-Style Mail: A Brief Comparison of Mu and Notmuch.” December 20, 2015. https://www.kmjn.org/notes/unix_style_mail_tools.html.
Purcell, Steve. 2010. “Save Hours by Reading Mailing Lists in Emacs over Imap.” September 5, 2010. https://www.sanityinc.com/articles/read-mailing-lists-in-emacs-over-imap/.
RevTomJohnson. 2022. “Email in Emacs - Noob.” April 1, 2022. https://old.reddit.com/r/emacs/comments/ttegie/email_in_emacs_noob/.
Sapka, Michał. 2023. “Managing Email with Notmuch and Emacs.” July 3, 2023. https://michal.sapka.me/2023/notmuch/.
Siewierski, Wojciech. 2019. “One Year with Notmuch.” June 14, 2019. https://blog.einval.eu/2019/06/one-year-with-notmuch/.
Snader, Jon. 2015. “Mu4e and Org Mode.” December 18, 2015. http://irreal.org/blog/?p=4807.
———. 2023. “Inbox Zero with Mu4e Bookmarks.” January 20, 2023. https://irreal.org/blog/?p=11092.
Tietze, Christian. 2014. “Note-Taking When Reading the Web and Rss.” February 14, 2014. https://zettelkasten.de/posts/reading-web-rss-note-taking/.
tohiko. 2021. “Notmuch as an Alternative to Mu4e.” November 6, 2021. https://old.reddit.com/r/emacs/comments/qo3eza/notmuch_as_an_alternative_to_mu4e/.
Vjalmr. 2023. “Switching from Neomutt to Emacs.” May 8, 2023. https://old.reddit.com/r/emacs/comments/13by8vx/switching_from_neomutt_to_emacs/.
Voit, Karl. 2021. “How to Choose a Tool.” January 18, 2021. https://karl-voit.at/2021/01/18/tool-choices/.
voltecrus. 2018. “Gnus Is Such a Neat and Good Idea. Why Doesn’t It Get More Love from Emacs Community?” July 10, 2018. https://old.reddit.com/r/emacs/comments/8xlvpo/gnus_is_such_a_neat_and_good_idea_why_doesnt_it/.
Warburton, Chris. 2017. “Converting Feeds (Rss/Atom/Etc.) To Maildir.” January 14, 2017. http://www.chriswarbo.net/blog/2017-01-14-rss_to_maildir.html.
Wellons, Chris. 2013. “Leaving Gmail behind.” September 3, 2013. https://nullprogram.com/blog/2013/09/03/.
Zakkak, Foivos. 2021. “Using Emacs and Notmuch as a Mail Client.” November 24, 2021. https://foivos.zakkak.net/tutorials/using_emacs_and_notmuch_mail_client/.
笠三叶. 2015. “Huginn: 烧录Rss的神器.” November 11, 2015. https://web.archive.org/web/20170315013204/http://www.jianshu.com/p/4a47e452abc9.