Cheatography
https://cheatography.com
Formal Systems
Cognition, cognitive theory are formal systems, we use mathematics to study them |
Abstract concepts can be reasoned about precisely when situated in formal systems |
Neural networks are continuous systems |
Constructing the continum
Axiomatization Describe the basic properties and declare them to be true by definition
|
Construction Use simpler objects and operations to explicitly define more complex models
|
Equivalence classes Partitions a set based on some rules
|
Dynamic Systems
Recurrent ,learning, and biological neural networks |
Motor control is the effector of a dynamic system |
The mind is an abstract dynamical system with continuous state variables that are not activation values of units or representations |
NNs as Probabilistic Models
Used for |
stochastically searching for global optima |
|
representing and rationally coping with uncertainty |
|
measuring information |
Deployed in |
neural network models |
|
symbolic modes with non-determinism and uncertainty (e.g. inferring knowledge from experience, using knowledge to infer outputs given inputs) |
|
|
Optimization
Processing Activation dynamics. Maximizes well-formedness (harmony) of the activation patter (depends on the connection weight). Spreading activation dynamics is an optimization algorithm for the representation.
|
Learning Weight dynamics. Minimizes error. Weight-adjustment dynamics is an optimization algorithm for the knownledge in the weights: learning algorithm
|
Probabilistic modelling Parameters of the statistical model change as more data is received. Optimized based of likelihood according to data or Bayesian posterior probability of the data
|
Fourier analysis
f(x) = \sum_k c_k e^{ikx} employs a basis of imaginary powers of x, {e^{ikx}}_{k \in \Z} |
Also a basis of cos(kx) and sin(kx) |
Fourier coefficient states how strongly an oscillation of frequency 1/k is present in f |
{f(t)}_t describe f in the time / spatial domain |
{c_k}_k describe f in the frequency / spatial-frequency domain |
Support Vector Machines
Use supervised learning to learn a region of activation space for each concept |
Classification driven only by training near the region boundary |
Wide margin: error function favors a large margin between the training samples and the boundary it posits for separating the categories |
Slack variable: minimizes a variable for each training example that "picks up the slack" between the point and the category region it should be in |
Kernel trick: implicitly maps the data into a high dimensional space in which classification conceptually takes places (implemented through a kernel function) |
|
|
Discrete structures of distribution patterns
vectors v in R: distributed representation |
With respect to an appropriate conceptual basis for V, components of a representation v indicate the strength of a set of basis concepts in v: gradient conceptual representation |
Eigen-basis re-scales components |
Analyse the entire distribution within V of the representations {v^k} of a set of represented items {x^k} |
Clusters of {v^k} constitute a conceptual group and may be hierachically structured |
Can construct is such that greater distance between v^{k} and v^{t} means greater mental distinguishability of x^{k} and x^\{t} |
Harmony
Weight matrices, error functions, learning as optimization |
Activation vectors, well-formedness = harmony function, processing as optimzation |
- Parallel, violable-constraint satisfaction |
- Schemas/prototypes in Harmony landscapes |
Local optima is a deterministic problem, global optima require randomized/stochastic algorithms |
Inductive learning
Finding the best hypothesis within a hypothesis space about the word that the learning is trying to understand |
Goodness of a hypothesis is determined jointly by how well H fits the data that the learning has received about the world and how simple H is |
A hypothesis is a probabilistic data-generator |
Maximum Likelihood Principle |
Maximum A Posteriori Principle: Bayesian principle that says pick the hypothesis that has highest a posteriori probability, balancing the likelihood of the data against the a priori probability of H. Best not pick a H, maintain a degree of belief for every H in the H space |
Maximum Entropy Principle: Maxent. Pick the H with the max missing information, among those H that are consistent with the known data |
Minimum Description Length Principle: shorter is better |
|
Created By
Metadata
Comments
No comments yet. Add yours below!
Add a Comment
Related Cheat Sheets