RSS Ingestion
If you don’t know what an RSS feed is (where have you been?), it’s a link which you can subscribe to get data in an structured format…
RSS Ingestion
If you don’t know what an RSS feed is (where have you been?), it’s a link which you can subscribe to get data in an structured format. Typically, we use RSS readers to subscribe to RSS feeds.
Getting data from RSS feeds can also be done through Python, and it’s a breeze (compared with HTM), as RSS feeds are typically very structured.
We first import the following libraries -
Beautiful Soup (bs4), to help us tease the good stuff out from HTML and XML
feedparser, to help us make fetch and parse RSS feeds
import bs4
import requestsNext, specify the RSS feed where you would like to get data from, say the Kaggle RSS feed.
feeds = ['http://blog.kaggle.com/feed/']Now, we get feeds from this source.
parsed = feedparser.parse(feeds[0])Next, we break it up, to get the posts.
posts = parsed.entriesWe can examine the first post.
posts[0]We can also get the title of the first post.
posts[0].titleAs usual, we use the BeautifulSoup library again to get easy access to the items on the page by their tags
html = posts[0].content[0].get('value')
soup = bs4.BeautifulSoup(html, 'html5lib')Now, we can easily access the items in the page using their tags with the helper functions in the BeautifulSoup library
soup.find_all('h1')The Jupyter notebook with the code is here

