Beautiful Soup (HTML parser)

For other uses, see Beautiful Soup.
Beautiful Soup
Original author(s) Leonard Richardson
Stable release
4.5.1 / August 2, 2016 (2016-08-02)
Repository code.launchpad.net/beautifulsoup/
Written in Python
Platform Python
Type HTML parser library, Web scraping
License Python Software Foundation License (Beautiful Soup 3 - an older version) MIT License 4+[1]
Website www.crummy.com/software/BeautifulSoup/

Beautiful Soup is a Python package for parsing HTML and XML documents (including having malformed markup, i.e. non-closed tags, so named after tag soup). It creates a parse tree for parsed pages that can be used to extract data from HTML, which is useful for web scraping.[2]

It is available for Python 2.6+ and Python 3.

Code example

# anchor extraction from html document
from bs4 import BeautifulSoup
import urllib2

webpage = urllib2.urlopen('http://en.wikipedia.org/wiki/Main_Page')
soup = BeautifulSoup(webpage,'html.parser')
for anchor in soup.find_all('a'):
    print(anchor.get('href', '/'))

See also

References

  1. "Beautiful Soup website". Retrieved 18 April 2012. Beautiful Soup is licensed under the same terms as Python itself
  2. "Beautiful Soup website". Retrieved 18 April 2012.


This article is issued from Wikipedia - version of the 12/2/2016. The text is available under the Creative Commons Attribution/Share Alike but additional terms may apply for the media files.