Show Menu
Cheatography

spaCy Cheat Sheet (DRAFT) by

for who knows what i am going to do with this

This is a draft cheat sheet. It is a work in progress and is not finished yet.

Base Initia­liz­ation

import spacy
nlp = spacy.blank("en")
nlp = spacy.load("en_core_web_sm")
nlp = spacy.load("en_core_web_md")
nlp = spacy.load("en_core_web_lg")
nlp = spacy.l­oa­d("e­n_c­ore­_we­b_t­rf")
doc = nlp("Insert your text here. This string will be used to create a doc object. For natural language proces­sin­g.")
print(­"­tokens: ", [token.text for token in doc])
output:
tokens: ['Insert', 'your', 'text', 'here', '.', 'This', 'string', 'will', 'be', 'used', 'to', 'create', 'a', 'doc', 'object', '.', 'For', 'natural', 'langu­age', 'proce­ssing', '.']

Token Attributes applied

Remove punctu­ations for tokens
words = [token.text for token in doc if token.i­s_stop != True and token.i­s_­punct != True] 
print (words)
output:
['Insert', 'text', 'string', 'create', 'doc', 'object', 'natural', 'langu­age', 'proce­ssing']

Lemma

for token in doc:     
    print("token: ", token, " lemma: ", token.l­emma_)
token: token object within object
lemma: 'base' form of token
(ex. going --> go; was --> be)

displaCy Visualizer

import spacy 
from spacy import displacy
nlp = spacy.l­oa­d("e­n_c­ore­_we­b_s­m") 
doc = nlp("This is a senten­ce."­)
displacy.render(doc, style=­'de­p',­jup­yte­r=True)
 

Sentence

list(d­oc.s­ents) 
print(list(text.sents))
output:
[Insert your text here., This string will be used to create a doc object., For natural language proces­sing.]

Word Frequency

from collec­tions import Counter
words = [token.text for token in doc if token.i­s_stop != True and token.i­s_­punct != True] 
word_freq = Counte­r(w­ords)
common_words = word_f­req.mo­st_­com­mon(5)
print (commo­n_w­ords)
output:
[('Ins­ert', 1), ('text', 1), ('string', 1), ('create', 1), ('doc', 1)]
Using textrank component ("cu­sto­m")
nlp.ad­d_p­ipe­("te­xtr­ank­") 
doc = nlp("Insert your text here. This string will be used to create a doc object. For natural language proces­sin­g.")
for index, token in enumer­ate­(do­c._.ph­rases):     
    print("index: ", index, "­\ntext: ", token.t­ext)
    print("rank:", token.r­ank, " count:­", token.c­ount)
    print("chunks: ", token.c­hunks)
output:
index: 0
text: natural language processing
rank: 0.2280­528­195­3000428 count: 1
chunks: [natural language proces­sing]
index: 1
text: a doc object
rank: 0.1481­208­925­234081 count: 1
chunks: [a doc object]
index: 2
text: your text
rank: 0.0735­345­681­07334 count: 1
chunks: [your text]
index: 3
text: This string
rank: 0.0540­637­242­819­00864 count: 1
chunks: [This string]
 

Textacy

pip install textacy
import textacy
metadata = {      
    "title": "­Nat­ura­l-l­anguage proces­sin­g",
    "url": "­htt­ps:­//e­n.w­iki­ped­ia.o­rg­/wi­ki/­Nat­ura­l-l­ang­uag­e_p­roc­ess­ing­",
    "source": "­wik­ipe­dia­",
}doc = textac­y.m­ake­_sp­acy­_do­c((­text, metadata), lang="en_core_web_sm")
print(doc._.meta["title"])
output: Natura­l-l­anguage processing