Cheatography
https://cheatography.com
Formal Systems
Cognition, cognitive theory are formal systems, we use mathematics to study them 
Abstract concepts can be reasoned about precisely when situated in formal systems 
Neural networks are continuous systems 
Constructing the continum
Axiomatization Describe the basic properties and declare them to be true by definition

Construction Use simpler objects and operations to explicitly define more complex models

Equivalence classes Partitions a set based on some rules

Dynamic Systems
Recurrent ,learning, and biological neural networks 
Motor control is the effector of a dynamic system 
The mind is an abstract dynamical system with continuous state variables that are not activation values of units or representations 
NNs as Probabilistic Models
Used for 
stochastically searching for global optima 

representing and rationally coping with uncertainty 

measuring information 
Deployed in 
neural network models 

symbolic modes with nondeterminism and uncertainty (e.g. inferring knowledge from experience, using knowledge to infer outputs given inputs) 


Optimization
Processing Activation dynamics. Maximizes wellformedness (harmony) of the activation patter (depends on the connection weight). Spreading activation dynamics is an optimization algorithm for the representation.

Learning Weight dynamics. Minimizes error. Weightadjustment dynamics is an optimization algorithm for the knownledge in the weights: learning algorithm

Probabilistic modelling Parameters of the statistical model change as more data is received. Optimized based of likelihood according to data or Bayesian posterior probability of the data

Fourier analysis
f(x) = \sum_k c_k e^{ikx} employs a basis of imaginary powers of x, {e^{ikx}}_{k \in \Z} 
Also a basis of cos(kx) and sin(kx) 
Fourier coefficient states how strongly an oscillation of frequency 1/k is present in f 
{f(t)}_t describe f in the time / spatial domain 
{c_k}_k describe f in the frequency / spatialfrequency domain 
Support Vector Machines
Use supervised learning to learn a region of activation space for each concept 
Classification driven only by training near the region boundary 
Wide margin: error function favors a large margin between the training samples and the boundary it posits for separating the categories 
Slack variable: minimizes a variable for each training example that "picks up the slack" between the point and the category region it should be in 
Kernel trick: implicitly maps the data into a high dimensional space in which classification conceptually takes places (implemented through a kernel function) 


Discrete structures of distribution patterns
vectors v in R: distributed representation 
With respect to an appropriate conceptual basis for V, components of a representation v indicate the strength of a set of basis concepts in v: gradient conceptual representation 
Eigenbasis rescales components 
Analyse the entire distribution within V of the representations {v^k} of a set of represented items {x^k} 
Clusters of {v^k} constitute a conceptual group and may be hierachically structured 
Can construct is such that greater distance between v^{k} and v^{t} means greater mental distinguishability of x^{k} and x^\{t} 
Harmony
Weight matrices, error functions, learning as optimization 
Activation vectors, wellformedness = harmony function, processing as optimzation 
 Parallel, violableconstraint satisfaction 
 Schemas/prototypes in Harmony landscapes 
Local optima is a deterministic problem, global optima require randomized/stochastic algorithms 
Inductive learning
Finding the best hypothesis within a hypothesis space about the word that the learning is trying to understand 
Goodness of a hypothesis is determined jointly by how well H fits the data that the learning has received about the world and how simple H is 
A hypothesis is a probabilistic datagenerator 
Maximum Likelihood Principle 
Maximum A Posteriori Principle: Bayesian principle that says pick the hypothesis that has highest a posteriori probability, balancing the likelihood of the data against the a priori probability of H. Best not pick a H, maintain a degree of belief for every H in the H space 
Maximum Entropy Principle: Maxent. Pick the H with the max missing information, among those H that are consistent with the known data 
Minimum Description Length Principle: shorter is better 

Created By
Metadata
Comments
No comments yet. Add yours below!
Add a Comment
Related Cheat Sheets