Cheatography https://cheatography.com

Download This Cheat Sheet (PDF)

Comments
Rating: ()

Network data mining Cheat Sheet (DRAFT) by gonz95alo

Useful stuff for web scraping/data mining in Python

This is a draft cheat sheet. It is a work in progress and is not finished yet.

Web scraping with urllib

Import basic functions	`from urllib.request import urlopen, urlretrieve, Request`
Request webpage 'example.com'	`raw_request = Request('https://example.com')`
Add headers (I)	`raw_request.add_header('User-Agent', 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:78.0) Gecko/20100101 Firefox/78.0')`
Add headers (II)	`raw_request.add_header('Accept', 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,/;q=0.8')`
Get HTML code as a string	`html = urlopen(raw_request).read().decode("utf-8")`
Download file	`urlretrieve(fileURL, 'file_name_in_destination')`

Python libraries

urllib: URL handling	`import urllib`
BeautifulSoup: HTML/XML parser	`from bs4 import BeautifulSoup`
Regular expressions: pattern matching	`import re`
NetworkX: network analysis	`import networkx as nx`

Websites of interest

Stanford Network Analysis Project	http://snap.stanford.edu/
Koblenz Network Collection	http://konect.cc/
Network Repository	https://networkrepository.com/

Graph dataset formats

GML (.gml)	Custom structure	`G = nx.read_gml(path)`
Pajek (.net)	Custom structure	`G = nx.read_pajek(path)`
JSON (.json)	Custom structure	`G = nx.read_json(path)`
Plain text	`node1 node2 weight`	Manual
Multilayer v1 (.)	`layer node1 node2 weight`	Manual
Multilayer v2 (.)	`layer1 layer2 node1 node2 weight`	Manual

Those formats without a specific function are usually easy to parse manually.

Beautiful Soup HTML parsing

String slicing tricks

Split by character	`'a_string'.split('_')`
Join with a character	`'_'.join(['a', 'string'])`
Capitalize	`foo.capitalize()`
Capitalize, lower, upper	`foo.capitalize(), foo.lower(), foo.upper()`
Find, count	`foo.find('e'), foo.count('e')`
Replace	`'a_string.replace('string', 'banana')'`

Regular Expressions

Download the Network data mining Cheat Sheet

2 Pages

PDF (recommended)

PDF (2 pages)

Alternative Downloads

Latest Cheat Sheet

1 Page

(0)

Sublime Text 4 Keyboard Shortcuts

Sublime Text 4 keybindings

26 Dec 24

Random Cheat Sheet

2 Pages

(0)

NUnit Cheat Sheet

NUnit example of setup, teardown, and testing

5 Mar 22, updated 14 Mar 22

About Cheatography

Cheatography is a collection of 6605 cheat sheets and quick references in 25 languages for everything from science to google!

Behind the Scenes

If you have any problems, or just want to say hi, you can find us right here:

Recent Cheat Sheet Activity

fribes published Sublime Text 4.
15 hours 57 mins ago

SilverPearl published MINERAL AND ENERGY RESOURCES CLASS 10 GEO CBSE.
19 hours 52 mins ago

ThereamDream updated Urea and Uric Acid.
1 day 6 hours ago

diminext updated Visual Studio Code(vscode) handy shortcuts.
3 days 9 hours ago

femke published English grammer unit 3.
4 days 5 hours ago

© 2011 - 2024 Cheatography.com | CC License | Terms | Privacy

Latest Cheat Sheets RSS Feed