Using LLMs to Build Smart RSS Readers

By Garett MacGowan on September 24, 2023

RSS feeds ("Really Simple Syndication" feeds) are web pages which are designed to be read by computers. They are often used to disseminate chronological information such as news articles, blog posts, academic papers, and much more.

Often times, only a small subset of the entries in an RSS feed are relevant to you. What if you want to do some not-so-simple things based on the natural language contents of an RSS feed? What if you could build a smart RSS reader that makes intelligent decisions based on the content of the feed? With the power of LLMs, you now can.

Today we'll show you how to build a smart RSS reader using PhaseLLM. This tutorial is based on the arXiv assistant demo product available in the PhaseLLM GitHub repository here. arXiv assistant emails you when a new paper that is relevant to your interests is published on arXiv.

PhaseLLM's RSS Agent

PhaseLLM provides an RSSAgent class which can be used to read RSS feeds. The RSSAgent class is located in the phasellm.agents module. You can read the docs for the latest information about this class, including more usage examples here.

RSSAgent does its best to format valid requests for RSS feed endpoints. However, in some cases you may need to provide custom headers for your request. For example, some RSS feeds require a User-Agent header to be set. You can do this by passing the agent kwarg to the RSSAgent.

RSSAgent offers a few ways to read RSS feeds. In most cases, you'll want to use the RSSAgent's polling context manager. This context manager will poll the RSS feed at a specified interval (in seconds) and return the latest items in the feed.


from phasellm.agents import RSSAgent

agent = RSSAgent(url='https://arxiv.org/rss/cs')
with agent.poll(interval=60) as poller:
    for papers in poller():
        print(f'Found {len(papers)} new paper(s).')
        for paper in papers:
            ...

Interest Analysis

Within the for paper in papers: loop above, we can parse out the details of each RSS feed entry.


for paper in papers:
    title = paper['title']
    abstract = paper['summary']
    link = paper['link']

Now that we have the paper's title, abstract, and link, we can use PhaseLLM to analyze whether the paper is interesting to us. For this, we will need to do some prompt engineering to create a prompt that gives us answers in a consistent parsable format.

Here's an example prompt which works reasonably well with Anthropic's Claude LLM:

I want to determine if an academic paper is relevant to my interests. I am interested in: {interests}. The paper is titled: {title}. It has the following abstract: {abstract}. Is this paper relevant to my interests? Respond with either 'yes' or 'no'. Do not explain your reasoning.

Example responses are given between the ### ### symbols. Respond exactly as shown in the examples.

###yes###
or
###no###

If you are using a different model, you may need to modify the prompt above to get consistent parsable results.

Note that some LLM vendors support features which make structured responses easier. For one such example, see OpenAI's function calling API here. These methods are outside the scope of this tutorial.

Now that we have our prompt, we'll create an LLM instance and put it to work.


from phasellm.llms import ClaudeWrapper

llm = ClaudeWrapper('MY_API_KEY')
interested = llm.text_completion(prompt=interest_analysis_prompt)

If all goes well, our interested variable will contain one of the ###yes### or ###no### strings. We can parse this string to get our final answer.


import re

answer = re.search(r'###(yes|no)###', interested)
if not answer:
    # Optionally retry
else:
    # Get the substring that matched the regex
    interested = answer.group(0)

if interested == '###yes###':
    # Do something
elif interested == '###no###':
    # Do something else
else:
    # LLM didn't respond in expected format.

Notice that we use a regular expression to parse the response. This is because LLMs do not always respond in the way we expect. The response may include a preamble, punctuation, whitespace, or some other unexpected tokens. We can reduce the error rate by parsing our answer from the full response. As models improve, this step will naturally become unnecessary.

Take Action

Now that we have an answer, we can do something with it. In the demo app, we use the answer to determine whether to send an email with a link to the paper. Alternatively, you could send a text message with the link, log the paper to a file on your machine, or anything else you can think of.

The full demo app not only sends an email, but also summarizes why the paper is interesting to us based on the abstract and our interests. This is accomplished by engineering one more prompt and calling the LLM with it.

Summarize why the the following paper is relevant to my interests. My interests are: {interests}. The paper is titled: {title}. It has the following abstract: {abstract}.


summary = llm.text_completion(prompt=summary_prompt)

To wrap up, here's the code for sending ourselves an email with the paper's title, summary, abstract, and link. This snippet uses PhaseLLM's EmailSenderAgent class available in the phasellm.agents module.


from phasellm.agents import EmailSenderAgent

content = f"Title: {title}\n\nSummary:\n{summary}\n\nAbstract:\n{abstract}\n\nLink: {link}"

email_agent = EmailSenderAgent(
    sender_name='arXiv Assistant',
    smtp='smtp.gmail.com',
    sender_address='my_gmail_email',
    password='my_gmail_password',
    port=587
)
email_agent.send_plain_email(recipient_email=gmail_email, subject=title, content=content)

For the full demo app code, see the PhaseLLM GitHub repository here.

That's it! You should now be able to make intelligent RSS readers using PhaseLLM!

As always, reach out at w (at) phaseai (dot) com if you want to collaborate. You can also follow PhaseLLM on Twitter.