spaCy Cheat Sheet

Init

from spacy.lang.en import English
nlp = English()

Basic

doc = nlp("SOME TEXTS")
span = doc[i:j]
token = doc[i]

Pre-trained Model

nlp = spacy.load('en_core_web_sm')
doc = nlp(MY_TEXT)

Name entity

doc.ents

.text
.label_

spacy.tokens

Doc	`Doc(nlp.vocab, words=words, spaces = spaces)`
Span	`Span(doc, i, j, label="PERSON")`

index: i, j
words: a collection of words
spaces: a collecture of booleans

Matcher

matcher = spacy.matcher.Matcher(nlp.vocab)
matches = matcher(doc)

[(id, start, end)]

Add pattern to matcher

pattern = [ { key: value } ]
matcher.add("PATTERN_NAME", None, pattern)

Two types of key:
1. regex pattern
2. label (i.e. POS, entity)

Phrase matching

matcher = spacy.matcher.PhraseMatcher(nlp.vocab)
pattern = nlp("Golden Retriever")
matcher.add("DOG", None, pattern)

for match_id, start, end in matcher(doc):
    span = doc[start:end]

Similarity

word vector	token.vector
Doc	`doc1.similarity(doc2)`
Span	`span1.similarity(span2)`
Token	`token1.similarity(token2)`
Doc by Token	`doc.similarity(token)`

return a similarity score 0~1
NOT for small model
cosine similarity by default

Pipeline

nlp.pipe_names

nlp.pipeline

Add pipeline component

def fn(doc):
    # function body
    return doc

nlp.add_pipe(fn, last, first, before, after)

Set custom attributes

add metadata	`doc._.ATTR = "ATTRIBUTE NAME"`
register globally	`Doc.set_extension("ATTR", default=None)`

set to doc, tokens, spans
access property via

._

Extension attribute types

attribute	`Token.set_extension("ATTR", defaut=Bool)`
property	`Span.set_extension("PROP", getter=fn)`
method	`Doc.set_extension("METHOD", method=fn)`

Boost up

nlp.pipe(DATA)

Passing in context

data = [ ("SOME TEXTS", {"KEY": "VAL"}),  (...), ]

# Method 1
for doc, ctx in nlp.pipe(data, as_tuple=True):
    print( doc.ATTR, ctx[KEY] )

# Method 2
Doc.set_extension("KEY", default=None)
for doc, ctx in nlp.pipe(data, as_tuples=True):
    doc._.KEY = ctx["KEY"]

Using tokenizer only

# Method 1
doc = nlp.make_doc("SOME TEXTS")

# Method 2
with nlp.disable_pipes("tagger", "parser"):
    doc = nlp(text)

Download the spaCy Cheat Sheet

1 Page

Latest Cheat Sheet

1 Page

(0)

AI Font Generator & Typography Tools Cheat Sheet

A quick reference for designers, founders, and creators using AI font generators and typography tools. Covers font formats, typography terms, AI font creation workflows, and useful tools for generating downloadable TTF and WOFF2 fonts.

hardwellee

30 Jun 26

design, font

spaCy Cheat Sheet (DRAFT) by Nuozhi

Init

Basic

Pre-trained Model

Name entity

spacy.tokens

Matcher

Add pattern to matcher

Phrase matching

Similarity

Pipeline

Add pipeline component

Set custom attributes

Extension attribute types

Boost up

Passing in context

Using tokenizer only

Latest Cheat Sheet

Random Cheat Sheet

About Cheatography

Behind the Scenes

Recent Cheat Sheet Activity

Please Disable Your Ad Blocker

spaCy Cheat Sheet (DRAFT) by Nuozhi

Init

Basic

Pre-tr­ained Model

Name entity

spacy.t­okens

Matcher

Add pattern to matcher

Phrase matching

Similarity

Pipeline

Add pipeline component

Set custom attributes

Extension attribute types

Boost up

Passing in context

Using tokenizer only

Latest Cheat Sheet

Random Cheat Sheet

About Cheatography

Behind the Scenes

Recent Cheat Sheet Activity

Please Disable Your Ad Blocker

Pre-trained Model

spacy.tokens