Parser API Client

The Parser API is an API for programmatically extracting content and metadata from html documents. Unlike the Reader API, the Parser API does not require oAuth authentication but rather a single token query parameter that must be used to sign every requests. You can find your token by visiting your Readability account settings page.

This token can then be passed to the constructor or can be set via environment variables.

export READABILITY_PARSER_TOKEN='your parser token here'
from readability import ParserClient
client = ParserClient(token='your parser token')

Under the hood, the ParserClient uses the popular requests library. The objects returned by client calls are instances of requests.Response.

Client Documentation

class readability.ParserClient(base_url_template='https://www.readability.com/api/content/v1/{}', **xargs)

Client for interacting with the Readability Parser API.

Docs can be found at http://www.readability.com/developers/api/parser.

get(url)

Make an HTTP GET request to the Parser API.

Parameters:url – url to which to make the request
get_article(url=None, article_id=None, max_pages=25)

Send a GET request to the parser endpoint of the parser API to get back the representation of an article.

The article can be identified by either a URL or an id that exists in Readability.

Note that either the url or article_id param should be passed.

Parameters:
  • (optional) (article_id) – The url of an article whose content is wanted.
  • (optional) – The id of an article in the Readability system whose content is wanted.
  • max_pages – The maximum number of pages to parse and combine. The default is 25.
get_article_status(url=None, article_id=None)

Send a HEAD request to the parser endpoint to the parser API to get the articles status.

Returned is a requests.Response object. The id and status for the article can be extracted from the X-Article-Id and X-Article-Status headers.

Note that either the url or article_id param should be passed.

Parameters:
  • (optional) (article_id) – The url of an article whose content is wanted.
  • (optional) – The id of an article in the Readability system whose content is wanted.
get_confidence(url=None, article_id=None)

Send a GET request to the confidence endpoint of the Parser API.

Note that either the url or article_id param should be passed.

Parameters:
  • (optional) (article_id) – The url of an article whose content is wanted.
  • (optional) – The id of an article in the Readability system whose content is wanted.
get_root()

Send a GET request to the root resource of the Parser API.

head(url)

Make an HTTP HEAD request to the Parser API.

Parameters:url – url to which to make the request
post(url, post_params=None)

Make an HTTP POST request to the Parser API.

Parameters:
  • url – url to which to make the request
  • post_params – POST data to send along. Expected to be a dict.
post_article_content(content, url, max_pages=25)

POST content to be parsed to the Parser API.

Note: Even when POSTing content, a url must still be provided.

Parameters:
  • content – the content to be parsed
  • url – the url that represents the content
  • (optional) (max_pages) – the maximum number of pages to parse and combine. Default is 25.