POS tags
N (noun) |
dog, cat, chair |
V (verb) |
read, write, get |
ADJ (adjective) |
pretty, smart, blue |
ADV (adverb) |
gently, carefully, extremely |
P (preposition) |
in, on, by, with, about |
PRO (pronoun) |
I, me, mine, it, they... |
CON (conjunction) |
and, or, but, while, because |
INT (interjection) |
ooh, wow, yeah |
DET (determiner) |
all, his, they |
AUX (auxiliary verb) |
have done, might do |
PAR (particle) |
look up, get on |
NUM (numeral) |
one, two, three |
Context-free grammar
Grammar = {
objects: [
Words/tokens: terminals,
Right above: pos tags,
Above: syntactic tags,
Above: sentence
];
Rules: [
X: node name, #eg "VP" (verb phrase)
Y: sequence of objects that make up X #eg (V+NP)
]
}
|
Morphemes
stems, affixes (prefix/suffix). Useful for POS tagging and text normalization |
Semantics
synonyms |
diff words, same meaning |
polyseme |
same word, diff meaning |
hypernym/hyponym |
category >>> specific |
meronym/metonym |
part >>> whole |
|
|
LDA
gibbs sampling |
1. random word-to-topic assignment |
|
2. re-assign each word to a topic, one by one, assuming all other assignments are correct |
hyperparameters |
high $alpha$ --> documents feature a mixture of most topics |
|
high $eta$ --> topics feature a mixture of most words |
evaluation |
coherence (PMI), human eval |
Sentiment-Topic Model (Plate Notation)
|
|
Pointwise Mutual Information
Discourse Markers
causal |
because |
consequence |
as a result |
conditional |
if |
temporal |
when |
additive |
and |
elaboration |
[exemplification, re-wording] |
contrastive/concessive |
but |
Preparation for NLTK classifier
#doc_tuple = (doc_representation, label)
> ({'police':1, 'lawyer':1, 'court':1}, 'Crime')
#train_set = [doc_tuple1, doc_tuple2, ...] |
|
Created By
Metadata
Comments
No comments yet. Add yours below!
Add a Comment
Related Cheat Sheets