Day 2 - Friedemann Zenke, Matthew Cook on Neural Computation
After the coffee break, Matthew Cook kicked off the 2nd part of the morning asking, "What is computation"? It is NOT the computer science view of algorithm with a halting condition. Is a bucket of water? Yes and no, in any case this argument about definitions is not fruitful but considering different ways to compute is useful.
This generated an immense amount of discussion about the meaning of "computation" in the brain and in computer science. After this discussion was used up, Matt pointed out the continuing need (last made in his great 2019 talk at CCNW) for increasing the models of computation. There is always room to at least consider more models; most will not work for AI or not explain anything new, but there will occasionally be new insights from new ways of looking at computing.
Among the models considered, Turing machines, cellular automata, neural networks, synchronous RTL logic circuits, various computer architectures, cortex.
One definition of a computation is "a useful reduction of information", e.g. 4+2=6, the 4 and 2 are lost but the sum is maintained. Another is a spiking neuron: It integrates its input and occasionally deems it worthy to spike. Another is a transformation of representation according to Barbara Webb, correlation by itself is not useful but prediction based on it is?XXX
There were many points of view presented; too many to note here in real time.
Then Friedemann took over and continued the discussion about the nature of neural computation. He brought up the question of what AI and DNN community has brought in to enable their impressive results. From his list the audience contributed the following list of things that AI has effectively exploited:
- A big variety of clever architectures like CNN, resnets, soft WTA layers, RNNs, transformers, attention and multi-headed attention, some of which have proven very effective but seem to have only vague connection with known neuroscience.
- Clever hand crafting of loss functions and objectives like cross entropy
- Lots of data, labeled and unlabeled
- Reinforcement learning that can discover optimal loss functions and control policies from trial and error with reward
- Self supervision to predict the future or context in unlabeled data
- Training/optimization algorithms like backprop and ADAM that can solve the credit assignment problem for changing weights and escaping local minima and flat regions, like LSTMs and GRUs do for RNNs that are trained with BPTT.
- Activation functions like ReLU and leaky ReLU, and custom neural units, e.g. as in RNNs or max pooling units.
- Transfer learning in particular as a way to augment limited real data for pretraining DNNs
Comments
Post a Comment