artificial intelligence | Girl Meets Whiskey

Neural Networks + Artificial Neural Networks

Your visual system does not work like a camera. If you don’t believe me, think about what happens to an image if, while taking it, you shake the camera from side to side. The image is fuzzy, right? Now shake your head from side to side. The world doesn’t get blurry, does it? This is because the brain does more than just take snapshots of what hits your retina. The instant light hits your retina, the image is deconstructed into, essentially, pixels. In visual processing, these “pixels” are interpreted and eventually reintegrated into recognizable percepts.

Photoreceptors in the retina translate the stimulus into neural signals. The initial deconstruction of the stimulus breaks the image up based on brightness and wavelengths of light tat fall on the retina. The extensive processing of visual information performed even just within the retina constitutes an elaborate convergence of information. We have about 260 million photoreceptors, but there are only 2 million ganglion cells, which are the cells that send information out from the retina, via the optic nerve, and on to the central nervous system. At this very early stage of visual processing, the image is being parsed, and the compression of this information suggest that higher-level visual centers must be sophisticated and efficient processors in order to recover visual details.

Signals from the photoreceptors are then sent from the optic nerve to the optic chiasm where they are split up yet again. Signals carrying information about the right visual field are routed to the left hemisphere, and signals carrying information about the left visual field are routed to the right hemisphere.^¹ The signals are now “in the brain”. Each optic nerve divides into pathways that terminate at different places in the subcortex, but 90% of the fibers go to the lateral geniculate nucleus (LGN).^² From the LGN, nearly all of the optic fibers terminate in the primary visual cortex.

Even by the time visual information reaches the primary visual cortex, it has been processed by at least four distinct neurons. As you recall, this means that a whole lot of information has been compressed, because each neuron can only send one of two signals: 0 or 1. How, then, does the visual cortex reintegrate all of this parsed, compressed information into recognizable percepts?

From the primary visual cortex, information is sent to distinct regions throughout the larger visual cortex that carry out specialized processing functions. For example, area V4 processes color information, while area V5 processes motion information. Generally, as information moves from the primary visual cortex to these more specialized visual processing areas—moving from the very back of the brain toward the front—the processing becomes more and more sophisticated.

To get the big picture, information that hits the retina is parsed and compressed as it is projected to the primary visual cortex at the very back of the brain. From there it is projected forward through the visual cortex and reintegrated into recognizable percepts. How the information is reintegrated isn’t fully understood. One possibility is that different processing tasks are delegated to different visual areas, and each visual area provides its own limited analysis based on the attributes it’s in charge of processing. As information progresses through the visual system, different areas elaborate on the initial information projected onto the primary visual cortex and begin to integrate that information across dimensions.

This (somewhat) detailed description of visual processing will hopefully help you better understand how neural networks in machine learning work. Machine learning programs, just like the brain, have to process a lot of information. To make them more efficient, computer scientists have started modeling machine learning programs after the brain. It’s not a crazy idea; the brain si the most efficient information processing system we know of!

Say you want to write a program to recognize hand-written digits. The computer receives as one input, for example, an image of the digits 0 – 9 written by me on a piece of paper, and, as another input, an image of the digits 0 – 9 written by you on a piece of paper. Now all the computer “sees” when it “looks at” these images are huge matrices of pixel values.

Each value in the above matrix corresponds to the brightness of a single pixel of the input image. How does the computer interpret all of those numbers? How does it find a pattern in them such that it “knows” that it’s looking at a “0” or a “1” or a “2” etc.? To handle this and other machine learning problems, computer scientists build programs called artificial neural networks (ANNs) that parse, compress, and interpret data analogously to how the brain parses, compresses, and interprets information. ANNs are inspired by biological neural networks, and they consist of an interconnected group of artificial neurons that find and represent or model complex relationships and patterns in data.

Artificial neurons are, no surprise here, based on neurons. The “cell bodies” in these artificial neuron models are called units, and each unit has a function to be computed. Activation of these units (“the neurons firing”) just means that that computation is computed. These units are organized into layers based on where they are receiving information. The first layer is called the input layer because these units are receiving the raw data as input. The last layer comprises usually just a single unit, and it is called the output layer.^³ The output unit is the unit that computes the final value, whatever that might be. (In our example above, it would be a “0” or a “1” or a “2” etc.)

There are also hidden layers, which is misleading terminology. Hidden layers are just any and all layers that are neither the input nor the output layer. All hidden layers receive input either from the input layer, or from other hidden layers.

Like in the visual system, information is propagated forward in artificial neural networks. Each layer computes input from each unit in the previous layer, and outputs a single value. The raw data values are computed by the first layer units, each of which sends a discrete value onto the units of the second layer (or the output layer if the network is simple). Then the second layer units compute input → output, then the third, and fourth, and fifth—however many layers the network has—until the output layer computes the final value.

Hopefully the similarity between neurons and units is becoming clearer to you. Neurons receive and process a lot of signals from other cells and output a single signal; units receive and compute a lot of values from other units and output a single value. This is not a coincidence; units are modeled after neurons!

What about bona fide neural networks and artificial neural networks as whole networks? Without getting into the specific computations performed by each unit, it’s difficult to understand what exactly the units are doing, and thus difficult to grasp the intuition of ANNs on a larger scale. Essentially, the units in each layer compute the input and, at least in all of the (admittedly few) ANNs that I’ve programmed, output either 0 or 1, meaning either that the information they were programmed to compute for was either present in the data they received, or it wasn’t. Analogously, certain cells in the visual system will fire if the stimulus is moving but not if it isn’t moving, for example. To put it concretely, this cell would output a 1 if the stimulus is moving, and a 0 if the stimulus isn’t moving.

Computation tasks are delegated to different layers of an artificial neural network analogously to the delegation of different visual processing tasks to different visual area, and each layer/area provides its own limited analysis based on the information it is designated to compute/process. As data moves through an ANN, each unit in each layer computes the outputs from the units in the antecedent layers, until the output layer computes the final value.

I understand that this is vague, and I hope that you will extend me a bit of faith. However, the correlation between the visual system and an artificial neural network should be clear enough that we can switch back and forth between the two in understanding information processing in both. It is sometimes useful to use ANN terminology to refer to visual processes, and it is sometimes useful to use visual system terminology to refer to ANN functions. If nothing else, that we can switch back and forth in this way should alert you to the fact that these two information processors are analogous in very real ways. Again, this is not a coincidence!

———

1 This does not mean that signals carrying information from the right eye are routed to the left hemisphere or that information from the left eye is routed to the right hemisphere, which is a common misconception. Information that falls on the outside of each eye stays on that same side while information that falls on the inside of each gets crossed over to the other side.

2 The other 10% get sent to either the pulvinar nucleus or the superior colliculus. These subcortical structures play a major role in visual attention. Though most visual information gets routed to the LGN, keep in mind that 10% of the human optic nerve constitutes more than fibers than are found in the entire human auditory pathway.

3 In multi-class classification problems, the output layer will have more than one output unit.

why a.m. turing has my brain in a knot

minds + machines has begun, as do all great discussions of artificial intelligence, with alan turing. turing is the author of the turing test, a sort of game to test for thought in a machine. the test consists of a machine participant, a human participant, and a human judge. the machine is considered intelligent if and only if the judge cannot distinguish between the machine and the judge. but after debating the effectiveness and even plausibility of the turing test for a few days, we moved on to questions about the hyper turing test. in the hyper turing test, the machine must act as the judge and distinguish between a human and a machine. (ignore, for the time being, that if a machine passed the hyper turing test, it would thus fail the turing test.)

the problem with replacing the turing test with a behavioral task (such as distinguishing people from machines) is that the result of any behavioral test would neither prove nor disprove the function of thought. furthermore, the conditions of the original turing test must be fundamentally altered for the hyper turing test to account for the machine judge.

say a machine is placed as the judge in this hyper turing test. how should the machine judge be programmed to discriminate between human responses and machine responses? it could be programmed to recognize key human-like factors — e.g. typing “teh” instead of “the” is a common human typing error — which shall be referred to as flags. it could be designed to recognize and record flags for each participant’s responses. (that being said, the machine participant could be programmed to generate humanistic misspellings of this kind, so the flag approach is by no means foolproof.)

the machine judge could also be programmed to randomly decide to choose with a 50/50 chance of choosing correctly. if a machine judge were to generate a random choice, and if the machine judge happened to be right, it would be considered intelligent. obviously this is not a sufficient condition of thought, for one would never claim that closing one’s eyes and pointing at an answer is a sufficient condition of thought in a human being. but how would we know if the machine had generated a random response without the confession of the programmer? this difficulty will be discussed in further detail in the following paragraph.

consider that in the original turing test, a machine was considered intelligent if and only if a human judge could not tell the difference between the machine and a human. you can see right away that the condition for the hyper turing test must be altered, otherwise the machine judge in the hyper turing test could be programmed to generate question after question until simply saying: “i give up; i cannot tell which is the machine and which is the human.” (this could be generated as a randomly timed response, as discussed in the previous paragraph.) how, then, to frame the hyper turing test to prevent this from happening?

one could say that the machine judge must decide eventually. to this argument i pose three counterarguments: 1) how to define “eventually”; 2) this does not prevent the machine judge from generating a random decision; and 3) if we are testing to see if the machine can think like a human, we cannot on principle place parameters around the machine judge that we would not place around a human judge in the original turing test. it becomes clear, very quickly, that altering the test is easier said than done.

i do not know how to alter the hyper turing test to respect the machine judge’s machine-hood while also holding it to human judge requirements (of which one is intelligence, but let’s not go there). one option is to require the machine judge to generate a response citing its reasons for choosing or not choosing. while this is not required of a human judge in the turing test, i believe any human judge would be willing and able to cite his or her reasons. the machine judge could build up a store of information that could, quite easily, be cited in the reason report – similarly to how a spell-checker works. by recording flags, the machine could make a decision based on the number of flags per participant. so, if the machine judge has recorded 21 flags for A, and only 9 flags for B, it could state: “A is the human, because they did x, y, and z.” the machine would generate a decision once the difference between the number of flags per participant reaches a value set by the programmer. for example, the machine judge could be designed to make a decision once the difference between the number of flags for A and the number of flags for B had reached 10.

if, after a given amount of time — say half an hour — the flags for each participant have remained within an acceptable distance of one another, the machine judge will generate a non-decision report citing corresponding flags from both participants. “i cannot decide which is the human and which is the machine. both have answered similarly throughout. to question 3, A responded with …, while B responded with … to question 18.” the attractiveness of the response approach lies in the fact that even if the machine judge makes an incorrect decision, it still generates a response report that proves that it has thought about its decision.

however, any machine that could satisfactorily generate such a response would possess partial working consciousness, which automatically means that the machine judge is intelligent and very much a thinking machine. this exceeds the bounds of the hyper turing test — and the turing test for that matter — and is thus disqualified as a fair test of thought.

and so alan turing has my brain in a knot. i’m not even sure i’ve ironed out all my issues with the original turing test, and here i am trying to wrap my head around the hyper turing test. i still maintain that steven levy’s socialbot should be the new standard for testing for artificial intelligence, but that’s just me trying to get out of doing homework.

Girl Meets Whiskey

Tag: artificial intelligence

October 8, 2012

Neural Networks + Artificial Neural Networks

September 12, 2011

why a.m. turing has my brain in a knot