October 24, 2011 § Leave a comment
Representation always has been some kind of magic.
Something could have been there—including all its associated power—without being physically there. Magic, indeed, and involving much more than that.
Literally—if we take the early Latin roots as a measure—it means to present something again, to place sth. again or in an emphasized style before sth. else or somebody, usually by means of placeholder, the so-called representative. Not surprising then it is closely related to simulacrum which stands for “likeness, image, form, representation, portrait.”
Bringing the notion of the simulacrum onto the table is dangerous, since it refers not only to one of the oldest philosophical debates, but also to a central one: What do we see by looking onto the world? How can it be that we trust the images produced by our senses, imaginations, apprehensions? Consider only Platon’s famous answer that we will not even cite here due to its distracting characteristics and you can feel the philosophical vortices if not twisters caused by the philosophical image theory.
It is impossible to deal here with the issues raised by the concepts of representation and simulacrum in any more general sense, we have to focus on our main subject, the possibility and its conditions for machine-based epistemology.
The idea behind machine-based epistemology is to provide a framework for talking about the power of (abstract and concrete) machines to know and to know about the conditions of that (see the respective chapter for more details). Though by “machine” we do not understand a living being here, at least not apriori, it is something produced. Let us call the producer in a simplified manner a “programmer.” In stark contrast to that, the morphological principles of living organisms are the result of a really long and contingent history of unimaginable 3.6 billion years. Many properties, as well as their generalizations, are historical necessities, and all properties of all living beings constitute a miraculous co-evolutionary fabric of dynamic relations. In case of the machine, there are only little historic necessities, for the good and the bad. The programmer has to define necessities, the modality of senses, the chain of classifications, the kind of materiality etc.etc. Among all these decisions there is one class that is predominantly important:
How to represent external entities?
Quite naturally, as “engineers” of cognitive machines we can not really evade the old debate about what is in our brains and minds, and what’s going on there while we are thinking, or even just recognizing a triangle as a triangle. Our programmer could take a practical stance to this question and reformulate it as: How could she or he achieve that the program will recognize any triangle?
It needs to be able to distinguish it from any other figure, even the program never has been confronted with an “ideal” template or prototype. It also needs to identify quite incorrect triangles, e.g. from hand drawings, as triangles. It even should be able to identify virtual figures, which exist only in their negativity like the Kanizsa-triangle. For years, computer scientists proposed logical propositions and shape grammars as a solution—and failed completely. Today, machine learning in all its facets is popular, of course. This choice alone, however, is not yet the solution.
The new questions then have been (and still are): What to present to the learning procedure? How to organize the learning procedures?
Here we have to care about a threatening misunderstanding, actually of two misunderstandings, heading from opposite directions to the concept of “data.” Data are of course not “just there.” One needs a measurement device, which in turn is based on a theory, then on a particular way to derive models and devices from that theory. In other words, data are dependent on the culture. So far, we agree with Putnam about that. Nevertheless, given the body of a cognitive entity, that entity, whether human, animal or machine, finds itself “gestellt” into a particular actuality of measurement in any single situation. The theory about the data is apriori, yet within the particular situation the entity finds “raw data.” Both, theory and data impose severe constraints on what can be perceived by or even known to the cognitive entity. Given the data, the cognitive entity will try to construct diagnostic / predictive models, including schemes of interpretations, theories, etc. The important question then is concerned about the relationship between apriori conditions regarding the cognitive entity and the possibly derived knowledge.
On the other hand, we can defend us against the second misunderstanding. Data may be conceived as (situational) “givens”, as the Latin root of the word suggests. Yet, this givenness is not absolute. Somewhat more appropriate, we may conceive data as intermediate results of transformations. This renders any given method into some kind of abstract measurement device. The label of “data” we usually just use for those bits whose conditions of generation we can not influence.
Consider for instance a text. For the computer a text is just a non-random series of graphemes. We as humans can identify a grammar in human languages. Many years, if not decades, people thought that computers will understand language as soon as grammar has been implemented. The research by Chomsky , Jackendoff  and Pinker , among others, is widely recognized today, resulting in the concepts of phrase structure grammar, x-bar syntax or head-driven syntax. Yet, large research projects with hundreds of researchers (e.g. “verbmobil”) did not only not reach the self-chosen goals, they failed completely on the path to implement understanding of language. Even today, for most languages there is no useful parser available, the best parser for German language achieves around 85-89% accuracy, which is disastrous for real applications.
Another approach is to bring in probabilistic theories. Particularly n-grams and Markov-models have been favored. While the first one is an incredibly stupid idea for the representation of a text, Markov-models are more successful. It can be shown, that they are closely related to Bayes belief networks and thus also to artificial neural networks, though the latter employ completely different mechanism as compared to Markov-models. Yet, from the very mechanism and the representation that is created as/by the Markov-model, it is more than obvious that there is no such thing as language understanding.
Quite obviously, language as text can not be represented as a grammar plus a dictionary of words. Doing so one would be struck by the “representational fallacy,” which not only has been criticized by Dreyfus recently , it is a matter of fact that representationalist in machine learning approaches failed completely. Representational cognitivism claims that we have distinct image-like engrams in our brain when we are experiencing what we call thinking. They should have read Wittgenstein first (e.g. About Certainty), before starting expensive research programs. That experience about one’s own basic mental affairs is as little directly accessible as any other thing we think or talk of. A major summary of many subjections against the representationalist stance in theories about the mind, as well as a substantial contribution is Rosenfield’s “The Invention of Memory” . Rosenfield argues strongly against the concept of “memory as storage,” in the same venue as Edelman, to which we fully agree.
It does not help much either to resort to “simple” mathematical or statistical models, i.e. models effectively based on an analytical function, as apposed to models based on a complex system. Conceiving language as a mere “random process” of whatsoever kind simply does not work, let it be those silly n-grams, or sophisticated Hidden Markov Models. There are open source packages in the web you can use to try it yourself.
But what then “is” a text, how does a text unfold its effects? Which aspects should be presented to the learning procedure, the “pattern detection engine,” such that the regularities could be appropriately extracted and a re-presentation could be built? Taking semiotics into account, we may add links between words. Yet, this involves semantics. Peter Janich has been arguing convincingly that the separation of syntax and semantics should be conceived of as just another positivist/cyberneticist myth . And on which “level” should links be regarded as significant signals? If there are such links, any text renders immediately into a high-dimensional non-trivial and above all dynamic network…
An interesting idea has been proposed by the research group around Teuvo Kohonen. They invented a procedure they call the WebSom . You can find material in the web about it, else we will discuss it in great detail within our sections devoted to the SOM. There are two key elements of this approach:
- (1) It is a procedure which inherently abstracts from the text.
- (2) the text is not conceived—and (re-)presented—as “words”, i.e. distinct lexicographical primitives; instead words are mapped into the learning procedure as a weighted probabilistic function of their neighborhood.
Particularly seminal is the second of the key properties, the probabilization into overlapping neighborhoods. While we usually think that words a crisp entities arranged into a structured series, where the structure follows a grammar, or is identical with it, this is not necessarily appropriate, even not for our own brain. The “atom” of human language is most likely not the word. Until today, most (if not all people engaged in computer linguistics) think that the word, or some very close abstraction of it, plus some accidentia, forms the basic entities, the indivisible of language.
We propose that this attitude is utterly infected by some sort of pre-socratic and romantic cosmology, geometry and cybernetics.We even can’t know which representation is the “best”, or even an appropriate one. Even worse, the appropriateness of the presentation of raw data to the learning procedure via various pre-processors and preparation of raw data (series of words) is not independent from the learning procedure. We see that the problems with presentation and representation reach far into the field of modeling.
Despite we can’t know in principle how to perform measurements in the most appropriate manner, as a matter of fact we will perform some form of measurement. Yet, this initial “raw data” does not “represent” anything, even not the entity being subject of the measurement. Only a predictive model derived from those observations can represent an entity, and it does so only in a given context largely determined by some purpose.
Whatsoever such an initial and multiple presentation of an entity will look like, it is crucial, in my opinion, to use a proababilized preparation of the basic input data. Yet, components of such preparations not only comprise the raw input data, but also the experience of the whole engine, i.e. a kind of semantic influence, acquired by learning. Further (potential) components of a particular small section of a text, say a few words, are any kind of property of the embedding text, of any extent. Not only words as lexemes, but also words as learned entities, as structural elements, then also sentences and their structural (syntactical)) properties, semantic or speech-pragmatic markers, etc.etc. and of course also including a list of properties as Putnam proposed already in 1979 in “The meaning of “Meaning” .”
Taken together we can state that the input to the association engine are probabilistic distributions about arbitrarily chosen “basic” properties. As we will see in the chapter on modeling, these properties are not to be confused with objective facts to be found in the external world. There we also will see how we can operationalize these insights into implementation. In order to enable a machine to learn how to use words as items of a language, we should not present words in their propositional form to it. Any entity has to be measured as a entity from a random distribution and represented as a multi-dimensional probability distribution. In other words, we deny the possibility to transmit any particular representation into the machine (or another mind as well). A particular manifold of representations has to built up by the cognitive entity itself in direct response to requirements of the environment, which is just to be conceived as the embedding for “situations.” In the modeling chapter we will provide arguments for the view that this linkage to requirements does not result in behavioristic associativism, the simple linkage between simulus and response according to the framework proposed by Watson and Pawlow. Target-oriented modeling in the multi-dimensional case necessarily leads to a manifold of representations. Not only the input is appropriately described by probability distributions, but also the output of learning.
And where is the representation of the learned subject? How does it look like? This question is almost sense-free, since it would require to separate input, output, processing, etc. it would deny the inherent manifoldness of modeling, in short, it is a deeply reductionist question. The learning entity is able to behave, react, anticipate, and to measure, hence just the whole entity is the representation.
The second important anatomical property of an entity able to acquire the capability to understand texts is the inherent abstraction. Above all, we should definitely not follow the flat world approach of the positivist ideology. Note, that the programmer should not only not build a dictionary into the machine; he also should not pre-determine the kind of abstraction the engine develops. This necessary involves internal differentiation, which is another word for growth.
-  Noam Chomsky (to be completed…)
-  Jackendoff
-  Steven Pinker 1994?
-  Hubert L Dreyfus, How Representational Cognitivism Failed and is being replaced by Body/World Coupling. p.39-74, in: Karl Leidlmair (ed.), After Cognitivism: A Reassessment of Cognitive Science and Philosophy, Springer, 2009.
-  Peter Janich. 2005.
-  Israel Rosenfield, The Invention of Memory: A New View of the Brain. New York, 1988.
-  WebSom
-  Hilary Putnam, The Meaning of “Meaning”. 1979.