Show Menu
Cheatography

BeautifulSoup Cheat Sheet (DRAFT) by

This is a draft cheat sheet. It is a work in progress and is not finished yet.

Installing

$ pip install beauti­ful­soup4
Installing Beauti­fulSoup
$ pip install lxml
Installing a parser
$ pip install html5lib
Installing a parser

Kinds of objects

tag = soup.b; type(tag)
Tag
# <class 'bs4.e­lem­ent.Ta­g'>
tag.name
Name
# u'b'
tag.name = "­blo­ckq­uot­e"
change tag's name
tag['c­lass']
Attributes
# u'boldest'
tag.attrs
 
# {u'class': u'bold­est'}
 

Basic Operation

from bs4 import Beauti­fulSoup
import module
soup = Beauti­ful­Sou­p('­<b class=­"­bol­des­t">E­xtr­emely bold</­b>')
Making a soup

Navigating the tree

soup.b­ody.b
Navigating using tag names
# <b>­tex­t</­b>
soup.a
get the first <a> tag
soup.f­ind­_al­l('a')
get all the <a> tags
len(so­up.c­on­tents)
<ht­ml> tag has a child <ht­ml>
soup.c­ont­ent­s[0­].name
# u'html'
WRONG: test_t­ext.co­nte­nts[0]
A string does not have .contents
title_­tag.string
a tag has only one child, and that child is a Naviga­ble­String
# u'The Dormouse's story'
head_t­ag.c­on­tents
 
# [<t­itl­e>The Dormouse's story<­/ti­tle­>]
soup.h­tml.string
a tag contains more than one thing, .string is None
# None