March 20, 2012 § Leave a comment
In this second part of the essay about a fresh perspective on
analogical thinking—more precise: on models about it—we will try to bring two concepts together that at first sight represent quite different approaches: Copycat and SOM.
Why engaging in such an endeavor? Firstly, we are quite convinced that FARG’s Copycat demonstrates an important and outstanding architecture. It provides a well-founded proposal about the way we humans apply ideas and abstract concepts to real situations. Secondly, however, it is also clear that Copycat suffers from a few serious flaws in its architecture, particularly the built-in idealism. This renders any adaptation to more realistic domains, or even to completely domain-independent conditions very, very difficult, if not impossible, since this drawback also prohibits structural learning. So far, Copycat is just able to adapt some predefined internal parameters. In other words, the Copycat mechanism just adapts a predefined structure, though a quite abstract one, to a given empiric situation.
Well, basically there seem to be two different, “opposite” strategies to merge these approaches. Either we integrate the SOM into Copycat, or we try to transfer the relevant yet to be identified parts from Copycat to a SOM-based environment. Yet, at the end of day we will see that and how the two alternatives converge.
In order to accomplish our goal of establishing a fruitful combination between SOM and Copycat we have to take mainly three steps. First, we briefly recapitulate the basic elements of Copycat and the proper instance of a SOM-based system. We also will describe the extended SOM system in some detail, albeit there will be a dedicated chapter on it. Finally, we have to transfer and presumably adapt those elements of the Copycat approach that are missing in the SOM paradigm.
The particular power of (natural) evolutionary processes derives from the fact that it is based on symbols. “Adaptation” or “optimization” are not processes that change just the numerical values of parameters of formulas. Quite to the opposite, adaptational processes that span across generations parts of the DNA-based story is being rewritten, with potential consequences for the whole of the story. This effect of recombination in the symbolic space is particularly present in the so-called “crossing over” during the production of gamete cells in the context of sexual reproduction in eukaryotes. Crossing over is a “technique” to dramatically speed up the exploration of the space of potential changes. (In some way, this space is also greatly enlarged by symbolic recombination.)
What we will try here in our attempt to merge the two concepts of Copycat and SOM is exactly this: a symbolic recombination. The difference to its natural template is that in our case we do not transfer DNA-snippets between homologous locations in chromosomes, we transfer whole “genes,” which are represented by elements.
Elementarizations I: C.o.p.y.c.a.t.
In part 1 we identified two top-level (non-atomic) elements of Copycat
- (1) restricted generalized evolution, and
- (2) concrete instances of domain specific idealizations.
Since the first element, covering evolutionary aspects such as randomness, population and a particular memory dynamics, is pretty clear and a whole range of possible ways to implement it are available, any attempt for improving the Copycat approach has to target the static, strongly idealistic characteristics of the the structure that is called “Slipnet” by the FARG’s. The Slipnet has to be enabled for structural changes and autonomous adaptation of its parameters. This could be accomplished in many ways, e.g. by representing the items in the Slipnet as primitive artificial genes. Yet, we will take a different road here, since the SOM paradigm already provides the means to achieve idealizations.
At that point we have to elementarize Copycat’s Slipnet in a way that renders it compatible with the SOM principles. Hofstadter emphasizes the following properties of the Slipnet and the items contained therein (pp.212).
- (1) Conceptual depth allows for a dynamic and continuous scaling of “abstractness” and resistance against “slipping” to another concept;
- (2) Nodes and links between nodes both represent active abstract properties;
- (3) Nodes acquire, spread and lose activation, which knows an switch-on threshold < 1;
- (4) The length of links represents conceptual proximity or degree of association between the nodes.
As a whole, and viewed from the network perspective, the Slipnet behaves much like a spring system, or a network built from rubber bands, where the springs or the rubber bands are regulated in their strength. Note that our concept of SomFluid also exhibits the feature of local regulation of the bonds between nodes, a property that is not present in the idealized standard SOM paradigm.
Yet, the most interesting properties in the list above are (1) and (2), while (3) and (4) are known in the classic SOM paradigm as well. The first item is great because it represents an elegant instance of creating the possibility for measurability that goes far beyond the nominal scale. As a consequence, “abstractness” ceases to be nominal none-or-all property, as it is present in hierarchies of abstraction. Such hierarchies now can be recognized as mere projections or selections, both introducing a severe limitation of expressibility. The conceptual depth opens a new space.
The second item is also very interesting since it blurs the distinction between items and their relations to some extent. That distinction is also a consequence of relying too readily on the nominal scale of description. It introduces a certain moment of self-reference, though this is not fully developed in the Slipnet. Nevertheless, a result of this move is that concepts can’t be thought without their embedding into other a neighborhood of other concepts. Hofstadter clearly introduces a non-positivistic and non-idealistic notion here, as it establishes a non-totalizing meta-concept of wholeness.
Yet, the blurring between “concepts” and “relations” could be and must be driven far beyond the level Hofstadter achieved, if the Slipnet should become extensible. Namely, all the parts and processes of the Slipnet need to follow the paradigm of probabilization, since this offers the only way to evade the demons of cybernetic idealism and control apriori. Hofstadter himself relies much on probabilization concerning the other two architectural parts of Copycat. Its beyond me why he didn’t apply it to the Slipnet too.
Taken together, we may derive (or: impose) the following important elements for an abstract description of the Slipnet.
- (1) Smooth scaling of abstractness (“conceptual depth”);
- (2) Items and links of a network of sub-conceptual abstract properties are instances of the same category of “abstract property”;
- (3) Activation of abstract properties represents a non-linear flow of energy;
- (4) The distance between abstract properties represents their conceptual proximity.
A note should be added regarding the last (fourth) point. In Copycat, this proximity is a static number. In Hofstadter’s framework, it does not express something like similarity, since the abstract properties are not conceived as compounds. That is, the abstract properties are themselves on the nominal level. And indeed, it might appear as rather difficult to conceive of concepts as “right of”, “left of”, or “group” as compounds. Yet, I think that it is well possible by referring to mathematical group theory, the theory of algebra and the framework of mathematical categories. All of those may be subsumed into the same operationalization: symmetry operations. Of course, there are different ways to conceive of symmetries and to implement the respective operationalizations. We will discuss this issue in a forthcoming essay that is part of the series “The Formal and the Creative“.
The next step is now to distill the elements of the SOM paradigm in a way that enables a common differential for the SOM and for Copycat..
Elementarizations II: S.O.M.
The self-organizing map is a structure that associates comparable items—usually records of values that represent observations—according to their similarity. Hence, it makes two strong and important assumptions.
- (1) The basic assumption of the SOM paradigm is that items can be rendered comparable;
- (2) The items are conceived as tokens that are created by repeated measurement;
The first assumption means, that the structure of the items can be described (i) apriori to their comparison and (ii) independent from the final result of the SOM process. Of course, this assumption is not unique to SOMs, any algorithmic approach to the treatment of data is committed to it. The particular status of SOM is given by the fact—and in stark contrast to almost any other method for the treatment of data—that this is the only strong assumption. All other parameters can be handled in a dynamic manner. In other words, there is no particular zone of the internal parametrization of a SOM that would be inaccessible apriori. Compare this with ANN or statistical methods, and you feel the difference… Usually, methods are rather opaque with respect to their internal parameters. For instance, the similarity functional is usually not accessible, which renders all these nice looking, so-called analytic methods into some kind of subjective gambling. In PCA and its relatives, for instance, the similarity is buried in the covariance matrix, which in turn is only defined within the assumption of normality of correlations. If not a rank correlation is used, this assumption is extended even to the data itself. In both cases it is impossible to introduce a different notion of similarity. Else, and also as a consequence of that, it is impossible to investigate the particular dependency of the results proposed by the method from the structural properties and (opaque) assumptions. In contrast to such unfavorable epistemo-mythical practices, the particular transparency of the SOM paradigm allows for critical structural learning of the SOM instances. “Critical” here means that the influence of internal parameters of the method onto the results or conclusions can be investigated, changed, and accordingly adapted.
The second assumption is implied by its purpose to be a learning mechanism. It simply needs some observations as results of the same type of measurement. The number of observations (the number of repeats) has to exceed a certain lower threshold, which, dependent on the data and the purpose, is at least 8, typically however (much) more than 100 observations of the same kind are needed. Any result will be within the space delimited by the assignates (properties), and thus any result is a possibility (if we take just the SOM itself).
The particular accomplishment of a SOM process is the transition from the extensional to the intensional description, i.e. the SOM may be used as a tool to perform the step from tokens to types.
From this we may derive the following elements of the SOM:1
- (1) a multitude of items that can be described within a common structure, though not necessarily an identical one;
- (2) a dense network where the links between nodes are probabilistic relations;
- (3) a bottom-up mechanism which results in the transition from an extensional to an intensional level of description;
As a consequence of this structure the SOM process avoids the necessity to compare all items (N) to all other items (N-1). This property, together with the probabilistic neighborhoods establishes the main difference to other clustering procedures.
It is quite important to understand that the SOM mechanism as such is not a modeling procedure. Several extensions have to be added and properly integrated, such as
- – operationalization of the target into a target variable;
- – validation by separate samples;
- – feature selection, preferably by an instance of a generalized evolutionary process (though not by a genetic algorithm);
- – detecting strong functional and/or non-linear coupling between variables;
- – description of the dependency of the results from internal parameters by means of data experiments.
Yet, as we explained in part 1 of this essay, analogy making is conceptually incompatible to any kind of modeling, as long as the target of the model points to some external entity. Thus, we have to choose a non-modeling instance of a SOM as the starting point. However, clustering is also an instance of those processes that provide the transition from extensions to intensions, whether this clustering is embedded into full modeling or not. In other words, both the classic SOM as well as the modeling SOM are not suitable as candidates for a merger with Copycat.
Fortunately, there is already a proposal, and even a well-known one, that indeed may be taken as such a candidate: the two-layer SOM (TL-SOM) as it has been demonstrated as essential part of the so-called WebSom [1,2].
Actually, the description as being “two layered” is a very minimalistic, if not inappropriate description what is going on in the WebSom. We already discussed many aspects of its architecture here and here.
Concerning our interests here, the multi-layered arrangement itself is not a significant feature. Any system doing complicated things needs a functional compartmentalization; we have met a multi-part, multi-compartment and multi-layered structure in the case of Copycat too. Else, the SOM mechanism itself remains perfectly identical across the layers.
The real interesting features of the approach realized in the TL-SOM are
- – the preparation of the observations into probabilistic contexts;
- – the utilization of the primary SOM as a measurement device (the actual trick).
The domain of application of the TL-SOM is the comparison and classification of texts. Texts belong to unstructured data and the comparison of texts is exposed to the same problematics as the making of analogies: there is no apriori structure that could serve as a basis for modeling. Also, as the analogies investigated by the FARG the text is a locational phenomenon, i.e. it takes place in a space.
Let us briefly recapitulate the dynamics in a TL-SOM. In order to create a TL-SOM the text is first dissolved into overlapping, probabilistic contexts. Note that the locational arrangement is captured by these random contexts. No explicit apriori rules are necessary to separate patterns. The resulting collection of contexts then gets “somified”. Each node then contains similar random contexts that have been derived from various positions in different texts. Now the decisive step will be taken, which consists in turning the perspective by “90 degrees”: We can use the SOM as the basis for creating a histogram for each of the texts. The nodes are interpreted as properties of the texts, i.e. each node represents a bin of the histogram. The values of the individual bins measure the frequency of the text as it is represented by the respective random context. The secondary SOM then creates a clustering across these histograms, which represent the texts in an abstract manner.
This way the primary lattice of the TL-SOM is used to impose a structure on the unstructured entity “text.”
Figure 1: A schematic representation of a two-layered SOM with built-in self-referential abstraction. The input for the secondary SOM (foreground) is derived as a collection of histograms that are defined as a density across the nodes of the primary SOM (background). The input for the primary SOM are random contexts.
To put it clearly: the secondary SOM builds an intensional description of entities that results from the interaction of a SOM with a probabilistic description of the empirical observations. Quite obviously, intensions built this way about intensions are not only quite abstract, the mechanism could even be stacked. It could be described as “high-level perception” as justified as Hofstadter uses the term for Copycat. The TL-SOM turns representational intensions into abstract, structural ones.
The two aspects from above thus interact, they are elements of the TL-SOM. Despite the fact that there are still transitions from extensions to intensions, we also can see that the targeted units of the analysis, the texts get probabilistically distributed across an area, the lattice of the primary SOM. Since the SOM maps the high-dimensional input data onto its map in a way that preserves their topological properties, it is easy to recognize that the TL-SOM creates conceptual halos as an intermediate.
So let us summarize the possibilities provided by the SOM.
- (1) SOMs are able to create non-empiric, or better: de-empirified idealizations of intensions that are based on “quasi-empiric” input data;
- (2) TL-SOMs can be used to create conceptual halos.
In the next section we will focus on this spatial, better: primarily spatial effect.
The Extended SOM
Kohonen and co-workers [1,2] proposed to build histograms that reflect the probability density of a text across the SOM. Those histograms represent the original units (e.g. texts) in a quite static manner, using a kind of summary statistics.
Yet, texts are definitely not a static phenomenon. At first sight there is at least a series, while more appropriately texts are even described as dynamic networks of own associative power . Returning to the SOM we see that additionally to the densities scattered across the nodes of the SOM we also can observe a sequence of invoked nodes, according to the sequence of random contexts in the text (or the serial observations)
The not so difficult question then is: How to deal with that sequence? Obviously, it is again and best conceived as a random process (though with a strong structure), and random processes are best described using Markov models, either as hidden (HMM) or as transitional models. Note that the Markov model is not a model about the raw observational data, it describes the sequence of activation events of SOM nodes.
The Markov model can be used as a further means to produce conceptual halos in the sequence domain. The differential properties of a particular sequence as compared to the Markov model then could be used as further properties to describe the observational sequence.
(The full version of the extended SOM comprises targeted modeling as a further level. Yet, this targeted modeling does not refer to raw data. Instead, its input is provided completely by the primary SOM, which is based on probabilistic contexts, while the target of such modeling is just internal consistency of a context-dependent degree.)
Just to avoid misunderstanding: it does not make sense to try representing Copycat completely by a SOM-based system. The particular dynamics and phenomenologically behavior depends a lot on Copycat’s tripartite morphology as represented by the Coderack (agents), the Workspace and the Slipnet. We are “just” in search for a possibility to remove the deep idealism from the Slipnet in order to enable it for structural learning.
Basically, there are two possible routes. Either we re-interpret the extended SOM in a way that allows us to represent the elements of the Slipnet as properties of the SOM, or we try to replace the all items in the Slipnet by SOM lattices.
So, let us take a look which structures we have (Copycat) or what we could have (SOM) on both sides.
Table 1: Comparing elements from Copycat’s Slipnet to the (possible) mechanisms in a SOM-based system.
|1.||smoothly scaled abstraction||Conceptual depth (dynamic parameter)||distance of abstract intensions in an integrated lattice of a n-layered SOM|
|2.||Links as concepts||structure by implementation||reflecting conceptual proximity as an assignate property for a higher-level|
|3.||Activation featuring non-linear switching behavior||structure by implementation||x|
|4.||Conceptual proximity||link length (dynamic parameter)||distance in map (dynamic parameter)|
|5.||Kind of concepts||locational, positional symmetries,||any|
From this comparison it is clear that the single most challenging part of this route is the possibility for the emergence of abstract intensions in the SOM based on empirical data. From the perspective of the SOM, relations between observational items such as “left-most,” “group” or “right of”, and even such as “sameness group” or “predecessor group”, are just probabilities of a pattern. Such patterns are identified by functions or dynamic combinations thereof. Combinations ot topological primitives remain mappable by analytic functions. Such concepts we could call “primitive concepts” and we can map these to the process of data transformation and the set of assignates as potential properties.2 It is then the job of the SOM to assign a relevancy to the assignates.
Yet, Copycat’s Slipnet comprises also rather abstract concepts such as “opposite”. Further more, the most abstract concepts often act as links between more primitive concepts, or, in Hofstadter terms, conceptual items of lower “conceptual depth”.
My feeling here is that it is a fundamental mistake to implement concepts like “opposite” directly. What is opposite of something else is a deeply semantic concept in itself, thus strongly dependent on the domain. I think that most of the interesting concepts, i.e. the most abstract ones are domain-specific. Concepts like “opposite” could be considered as something “simple” only in case of geometric or spatial domains.
Yet, that’s not a weakness. We should use this as a design feature. Take the following rather simple case as shown in the next figure as an example. Here we mapped simply triplets of uniformly distributed random values onto a SOM. The three values can be readily interpreted as parts of a RGB value, which renders the interpretation more intuitive. The special thing here is that the map has been a really large one: We defined approximately 700’000 nodes and fed approx. 6 million observations into it.
Figure 2: A SOM-based color map showing emergence of abstract features. Note that the topology of the map is a borderless toroid: Left and right borders touch each other (distance=0), and the same applies to the upper and lower borders.
We can observe several interesting things. The SOM didn’t come up with just any arbitrary sorting of the colors. Instead, a very particular one emerged.
First, the map is not perfectly homogeneous anymore. Very large maps tend to develop “anisotropies”, symmetry breaks if you like, simply due to the fact the the signal horizon becomes an important issue. This should not be regarded as a deficiency though. Symmetry breaks are essential for the possibility of the emergence of symbols. Second, we can see that two “color models” emerged, the RGB model around the dark spot in the lower left, and the YMC model around the bright spot in the upper right. Third, the distance between the bright, almost white spot and the dark, almost black one is maximized.
In other words, and not quite surprising, the conceptual distance is reflected as a geometrical distance in the SOM. As it is the case in the TL-SOM, we now could use the SOM as a measurement device that transforms an unknown structure into an internal property, simply by using the locational property in the SOM as an assignate for a secondary SOM. In this way we not only can represent “opposite”, but we even have a model procedure for “generalized oppositeness” at out disposal.
It is crucial to understand this step of “observing the SOM”, thereby conceiving the SOM as a filter, or more precisely as a measurement device. Of course, at this point it becomes clear that a large variety of such transposing and internal-virtual measurement devices may be thought of. Methodologically, this opens an orthogonal dimension to the representation of data, resembling strongly to the concept of orthoregulation.
The map shown above even allows to create completely different color models, for instance one around yellow and another one around magenta. Our color psychology is strongly determined by the sun’s radiated spectrum and hence it reflects a particular Lebenswelt; yet, there is no necessity about it. Some insects like bees are able to perceive ultraviolet radiation, i.e. their colors may have 4 components, yielding a completely different color psychology, while the capability to distinguish colors remains perfectly.3
“Oppositeness” is just a “simple” example for an abstract concept and its operationalization using a SOM. We already mentioned the “serial” coherence of texts (and thus of general arguments) that can be operationalized as sort of virtual movement across a SOM of a particular level of integration.
It is crucial to understand that there is no other model besides the SOM that combines the ability to learn from empirical data and the possibility for emergent abstraction.
There is yet another lesson that we can take home from the simple example above. Well, the example doesn’t not remain that simple. High-level abstraction, items of considerable conceptual depth, so to speak, requires rather short assignate vectors. In the process of learning qua abstraction it appears to be essential that the masses of possible assignates derived from or imposed by measurement of raw data will be reduced. On the one hand, empiric contexts from very different domains should be abstracted, i.e. quite literally “reduced”, into the same perspective. On the other hand, any given empiric context should be abstracted into (much) more than just one abstract perspective. The consequence of that is that we need a lot of SOMs, all separated “sufficiently” from each other. In other words, we need a dynamic population of Self-organizing maps in order to represent the capability of abstraction in real-life. “Dynamic population” here means that there are developmental mechanisms that result in a proliferation, almost a breeding of new SOM instances in a seamless manner. Of course, the SOM instances themselves have to be able to grow and to differentiate, as we have described it here and here.
In a population of SOM the conceptual depth of a concept may be represented by the efforts to arrive at a particular abstract “intension.” This not only comprises the ordinary SOM lattices, but also processes like Markov models, simulations, idealizations qua SOMs, targeted modeling, transition into symbolic space, synchronous or potential activations of other SOM compartments etc. This effort may be represented finally as a “number.”
The structure of multi-layered system of Self-organizing Maps as it has been proposed by Kohonen and co-workers is a powerful model to represent emerging abstraction in response to empiric impressions. The Copycat model demonstrates how abstraction could be brought back to the level of application in order to become able to make analogies and to deal with “first-time-exposures”.
Here we tried to outline a potential path to bring these models together. We regard this combination in the way we proposed it (or a quite similar one) as crucial for any advance in the field of machine-based episteme at large, but also for the rather confined area of machine learning. Attempts like that of Blank  appear to suffer seriously from categorical mis-attributions. Analogical thinking does not take place on the level of single neurons.
We didn’t discuss alternative models here (so far, a small extension is planned). The main reasons are that first it would be an almost endless job, and second that Hofstadter already did it and as a result of his investigation he dismissed all the alternative approaches (from authors like Gentner, Holyoak, Thagard). For an overview Runco  about recent models on creativity, analogical thinking, or problem solving provides a good starting point. Of course, many authors point to roughly the same direction as we did here, but mostly, the proposals are circular, not helpful because the problematic is just replaced by another one (e.g. the infamous and completely unusable “divergent thinking”), or can’t be implemented for other reasons. Thagard  for instance, claim that a “parallel satisfaction of the constraints of similarity, structure and purpose” is key in analogical thinking. Given our analysis, such statements are nothing but a great mess, mixing modeling, theory, vagueness and fluidity.
For instance, in cognitive psychology and in the field of artificial intelligence as well, the hypothesis of Structural Mapping (STM) finds a lot of supporters . Hofstadter discusses similar approaches in his book. The STM hypothesis is highly implausible and obviously a left-over of the symbolic approach to Artificial Intelligence, just transposed into more structural regions. The STM hypothesis has not only to be implemented as a whole, it also has to be implemented for each domain specifically. There is no emergence of that capability.
The combination of the extended SOM—interpreted as a dynamic population of growing SOM instances—with the Copycat mechanism indeed appears as a self-sustaining approach into proliferating abstraction and—quite significant—back from it into application. It will be able to make analogies on any field already in its first encounter with it, even regarding itself, since both the extended SOM as well as the Copycat comprise several mechanisms that may count as precursors of high-level reflexivity.
After this proposal little remains to be said on the technical level. One of those issues which remain to be discussed is the conditions for the possibility of binding internal processes to external references. Here our favorite candidate principle is multi-modality, that is the joint and inextricable “processing” (in the sense of “getting affected”) of words, images and physical signals alike. In other words, I feel that we have come close to the fulfillment of the ariadnic question this blog:”Where is the Limit?” …even in its multi-faceted aspects.
A lot of implementation work has now to be performed, eventually commented by some philosophical musings about “cognition”, or more appropriate the “epistemic condition.” I just would like to invite you to stay tuned for the software publications to come (hopefully in the near future).
2. It is somehow interesting that in the brain of many animals we can find very small groups of neurons, if not even single neurons, that respond to primitive features such as verticality of lines, or the direction of the movement of objects in the visual field.
3. Ludwig Wittgenstein insisted all the time that we can’t know anything about the “inner” representation of “concepts.” It is thus free of any sense and meaning to claim knowledge about the inner state of oneself as well as of that of others. Wilhelm Vossenkuhl introduces and explains the Wittgensteinian “grammatical” solipsism carefully and in a very nice way. The only thing we can know about inner states is that we use certain labels for it, and the only meaning of emotions is that we do report them in certain ways. In other terms, the only thing that is important is the ability to distinguish ones feelings. This, however, is easy to accomplish for SOM-based systems, as we have been demonstrating here and elsewhere in this collection of essays.
4. Don’t miss Timo Honkela’s webpage where one can find a lot of gems related to SOMs! The only puzzling issue about all the work done in Helsinki is that the people there constantly and pervasively misunderstand the SOM per se as a modeling tool. Despite their ingenuity they completely neglect the issues of data transformation, feature selection, validation and data experimentation, which all have to be integrated to achieve a model (see our discussion here), for a recent example see here, or the cited papers about the Websom project.
-  Timo Honkela, Samuel Kaski, Krista Lagus, Teuvo Kohonen (1997). WEBSOM – Self-Organizing Maps of Document Collections. Neurocomputing, 21: 101-117.4
-  Krista Lagus, Samuel Kaski, Teuvo Kohonen in Information Sciences (2004)
Mining massive document collections by the WEBSOM method. Information Sciences, 163(1-3): 135-156. DOI: 10.1016/j.ins.2003.03.017
-  Klaus Wassermann (2010). Nodes, Streams and Symbionts: Working with the Associativity of Virtual Textures. The 6th European Meeting of the Society for Literature, Science, and the Arts, Riga, 15-19 June, 2010. available online.
- [4 ]Douglas S. Blank, Implicit Analogy-Making: A Connectionist Exploration.Indiana University Computer Science Department. available online.
-  Mark A. Runco, Creativity-Research, Development, and Practice Elsevier 2007.
-  Keith J. Holyoak and Paul Thagard, Mental Leaps: Analogy in Creative Thought.
MIT Press, Cambridge 1995.
-  John F. Sowa, Arun K. Majumdar (2003), Analogical Reasoning. in: A. Aldo, W. Lex, & B. Ganter (eds.), “Conceptual Structures for Knowledge Creation and Communication,” Proc.Intl.Conf.Conceptual Structures, Dresden, Germany, July 2003. LNAI 2746, Springer New York 2003. pp. 16-36. available online.
-  Wilhelm Vossenkuhl. Solipsismus und Sprachkritik. Beiträge zu Wittgenstein. Parerga, Berlin 2009.
March 19, 2012 § Leave a comment
What is the New York of California?
Or even, what is the New York of New York? Almost everybody will come up with the same answer, despite the fact that not only the question is not only ill-defined. Both the question and its answer can be described only after the final appearance of the answer. In other words, it is not possible to provide any proposal about the relevance of those properties apriori to its completion, that aposteriori are easily tagged as relevant for the description of both the question as well as the answer. Both the question and the solution do not “exist” in the way that is pretended by their form before we have finished making sense of it. There is a wealth of philosophical issues around this phenomenon, which we all have to bypass here. Here we will focus just on the possibility for mechanisms that could be invoked in order to build a model that is capable to behave phenomeno-logically “as if“.
The credit to render such questions and the associated problematics salient in the area of computer models of thinking belongs to Douglas Hofstadter and his “Fluid Analogy Research group” (FARG). In his book “Fluid Concepts and Creative Analogies” that we already mentioned here he proposes a particular model of which he claims that it is a proper model for analogical thinking. In constructing this model, which took more than 10 years of research, we did not try to stick (to get stuck?) to the neuronal level. Accordingly, one can’t describe the performance of a tennis player at the molecular level, he says. Remarkably, he also keeps the so-called cognitive sciences and their laboratory wisdom at distance. Instead, his starting point is the everyday language, and presumably a good deal of introspection as well. He sees his model located at an intermediate level between the neurons and consciousness (quite a large field, though).
His overarching claim is as simple as it is distant from the main stream of AI and cognitive science. (Note that Hofstadter does not formulate “analogical reasoning.”)
Thinking is largely equivalent with making analogies.
Hofstadter is not interested to produce just another model for analogy making. There are indeed quite a lot of such models, which he discusses in great detail. And he refutes them all; he proofs that they are all ill-posed, since they all do not start with perception. Without exception they all assume that the “knowledge” is already in the computer and based on this assumption some computer program is established. Of course, such approaches are nonsense, euphemistically called “knowledge acquisition bottleneck” by people working in the field of AI / machine learning. Yet, knowledge is nothing that could be externalized and then acquired subsequently by some other party, it can’t be found “in” the world, and of course it can’t be separated as something that “exists” beside the processing mechanisms of the brain, making the whole thing “smart”. As already mentioned, such ideas are utter nonsense.
Hofstadter’s basic strategy is different. He proposes to create a software system that is able for “concept slipping” as an emergent phenomenon, deeply based on perceptional mechanisms. He even coined the term “high-level perception.”
That is, the […] project is not about simulating analogy-making per se, but about simulating the very crux of human cognition: fluid concepts. (p.208)
This essay will investigate his model. We will find that despite its appeal it is nevertheless seriously unrealistic, even according to Hofstadter’s own standards. Yet, despite its particular weaknesses it also demonstrates very interesting mechanisms. After extracting the cornerstones of his model we will try to map his insights to the world of self-organizing maps. We also will discuss how to transfer the interesting parts of Hofstadter’s model. Hofstadter himself clearly stated the deficiencies of “connectionist models” of “learning,” yet, my impression is that he was not aware about self-organizing maps at this time. By “connectionism” he obviously referred to artificial neural networks (ANN), and for those we completely agree to his critique.
Before we start I would like to provide some original sources, that is, copies of those parts that are most relevant for this essay. These parts are from chapter 5, chapter 7 and chapter 8 of the aforementioned book. There you will find much more details and lucid examples about it in Hofstadter’s own words.
Is there an Alternative to Analogies?
In order to find an alternative we have to take a small bird’s view. Very coarsely spoken, thinking transforms some input into some output while being affected and transforming itself. In some sense, any transformation of input to output transforms the transforming instance, though in vastly different degrees. A trivial machine just wears off, a trivial computer—that is, any digital machine that fits into the scheme of the Turing-computing1—can be reset to meet exactly a previous state. As soon as historical contingency is involved, reproducibility vanishes and strictly non-technical entities appear: memory, value, and semantics (among others).
This transformation game applies to analogy making, and it also applies to traditional modeling.Is it possible to apply any kind of modeling to the problematics that is represented by the “transfer game”, for which those little questions posed in the beginning are just an example?
In his context, Hofstadter calls the modeling approach the brute-force approach (p.327, chp.8). The outline of the modeling approach could look like this (p.337).
- Step 1: Run down the apriori list of city-characterization criteria and characterize the “source town” A according to each of them.
- Step 2: Retrieve an apriori list of “target towns” inside target region Y from the data base.
- Step 3: For each retrieved target town X, run down the a priori list of city-characterization criteria again, calculating X’s numerical degree of match with A for every criterion in the list.
- Step 4: For each target town X, sum up the points generated in Step 3, possibly using apriori weights, thus allowing some criteria to be counted more heavily than others.
- Step 5: Locate the target town with the highest overall rating as calculated in Step 4, and propose it as “the A of Y”.
Any plausible apriori list of city-characterization criteria would be long, very long indeed. Effectively, it can’t be limited in advance, since any imposed limit would represent a model that would claim to be better suited to decide about the criteria than the model being built. We are crashed by an infinite regress, not just in theory. What we experience here is Wittgenstein’s famous verdict that justifications have to come to an end. Rules are embedded in the form of life (“Lebensform”) and without knowing all about a particular Lebensform and to take into consideration anything comprised by such (impossible) knowledge we can’t start to model at all.
He identifies four characteristic difficulties for the modeling approach with regard to his little “transfer game” that plays around with cities.
- – Difficulty 1: It is psychologically unrealistic to explicitly consider all the towns one knows in a given region in order to come up with a reasonable answer.
- – Difficulty 2: Comparison of a target town and a source town according to a specific city-characterization criterion is not a hard-edged mechanical task, but rather, can itself constitute an analogy problem as complex as the original top-level puzzle.
- – Difficulty 3: There will always be source towns A whose “essence”—that is, set of most salient characteristics—is not captured by a given fixed list of city-characterization criteria.
- – Difficulty 4: What constitutes a “town in region Y” is not apriori evident.
Hofstadter underpins his point with the following question (p.347).
What possible set of apriori criteria would allow a computer to reply, perfectly self-confidently, that the country of Monaco is “the Atlantic City of France”?
Of course, the “computer” should come up with the answer in a way that is not pre-programmed explicitly.
Obviously, the problematics of making analogies can’t be solved algorithmically. There is not only no such thing as a single “solution”, even the criteria to describe the problem are missing. Thus we can conclude that modeling, even in its non-algorithmical form, is not a viable alternative to analogy making.
The FARG Model
In the following, we investigate the model as proposed by Hofstadter and his group, mainly Melanie Mitchell. This is separated into the parts
- – precis of the model,
- – its elements,
- – its extension as proposed by Hofstadter,
- – the main problems of the model, and finally,
- – the main superior aspects of the model as compared to connectionist models (from Hofstadter’s perspective, of course).
Precis of the Model
Hofstadter’s conclusion from the problems with the model-based approach and thus also the starting point for his endeavor is that the making of an analogy must appear as an emergent phenomenon. Analogy itself can’t be “defined” in terms of criteria, beyond sort f rather opaque statements about “similarity.” The point is that this similarity could be measured only aposteriori, so this concept does not help. The capability for making analogies can’t be programmed explicitly. It would not be “making” of analogies anymore, it would just be a look-up of dead graphems (not even symbols!) in a database.
He proofs his ideas by means of a small software called “Copycat”. This name derives from the internal processes of the software, as making “almost identical copies” is an important ingredient of it. Yet, it also refers to the problem that appears if you say: “I am doing this, now do the same thing…”
Copycat has three major parts, which he labels as (i) the Slipnet, (ii) the Workspace, (iii) the Coderack.
The Coderack is a rack that serves as a launching site for a population of agents of various kinds. Agents decease and are being created in various ways. They may be spawned by other agents, by the Coderack, or by any of the items in the Slipnet—as a top-down specialist bred just to engage in situations represented by the Slipnet item. Any freshly created agent will be first put into the Coderack, regardless its originator or kind.
Any particular agent behaves as a specialist for recognizing a particular situation or to establish a particular relation between parts of the input “data, ” the initial observation. This recognition requires a model apriori, of course. Since these models are rather abstract as compared to the observational data, Hofstadter calls them “concepts.” After their set up, agents are put into the Coderack from where they start in random order, but also dependent on their “inner state,” which Hofstadter calls “pressure.”
The Slipnet is a loose “network” of deep and/or abstract concepts. In case of Copycat these concepts comprise
a, b, c, … , z, letter, successor, predecessor, alphabetic-first, alphabetic-last, alphabetic position, left, right, direction, leftmost, rightmost, middle, string position, group, sameness group, successor group, predecessor group, group length, 1, 2, 3, sameness, and opposite,
In total there are more than 60 of such concepts. These items are linked together, while the length of the link reflects the “distance” between concepts. This distance changes while Copycat is working on a particular task. The change is induced by the agents in response to their “success.” The Slipnet is not really a “network,” since it is neither a logistic network (it doesn’t transport anything) nor is it an associative network like a SOM. It is also not suitable to conceive it as a kind of filter in the sense of a spider’s web, or a fisherman’s net. It is thus more appropriate to consider it simply as a non-directed, dynamic graph, where discrete items are linked.
Finally, the third aspect is the Workspace. Hofstadter describes it as a “busy construction site” and likens it to the cytoplasm (p.216). In the Workspace, the agents establish bonds between the atomic items of the observation. As said, each agent knows nothing about the posed problem, it is just capable to perform on a mini-aspect of the task. The whole population of agents, however, build something larger. It looks much like the activity in ants or termites, building some morphological structure in the hive, or a macroscopic dynamic effect as hive population. The Workspace is the location of such intermediate structures of various degrees of stability, meaning that some agents also work to remove a particular structure.
So far we have described the morphology. The particular dynamics unfolding on this morphology is settled between competition and cooperation, with the result of a collective calming down of the activities. The decrease in activity is itself an emergent consequence of the many parallel processes inside Copycat.
A single run of Copycat yields one instance of the result. Yet, a single answer is not the result itself. Rather, as different runs of Copycat yield different singular answers, the result consists of a probability density for different singular answers. For the letter-domain in which Copycat is working the result look like this:
Figure 1: Probability densities as result of a Copycat run.
The Elements of the FARG Model
Before we proceed, I should emphasize that here “element” is used as we have introduced the term here.
Returning to the FARG model, it is important to understand that a particularly constraint randomness plays a crucial role in its setup. The population of agents does not search through all possibilities all the time. Yet, any existing intermediate result, say structural hypothesis, serves as a constraint for the future search.
We also find different kinds of memories with different durations, we find dynamic historic constraints, which we also could call contingencies. We have a population of different kinds of agents that cooperate and compete. In some almost obvious way, Copycat’s mechanisms may be conceived as an instance of the generalized evolution that we proposed earlier. Hofstadter himself is not aware that he just proposed a mechanism for generalized evolutionary changes. He calls the process “parallel terraced scan”, thereby unnecessarily sticking to a functional perspective. Yet, we consider generalized evolution as one of the elements of Copycat. It could really be promising to develop Copycat as an alternative to so-called genetic algorithms.2
Despite a certain resemblance to natural evolution the mechanisms built into Copycat do not comprise an equivalent to what is known from biology as “gene doubling”. Gene doubling and the akin part of gene deletion are probably the most important mechanisms in natural evolution. Copycat produces different kinds of agents, but the informational setup of these agents does not change as it is given by the Slipnet. The equivalent to gene doubling would have to be implemented into the Slipnet. On the other hand, however, it is clear that the items in the Slipnet are too concrete, almost representational. In contrast, genes usually do not represent a particular function on the macro-level (which is one of the main structural faults of so-called genetic algorithms). So, we conclude that Copycat contains a restricted version of generalized evolution. Else, we see a structural resemblance to the theories of Edelman and his neuronal Darwinism, which actually is a nice insight.
Conceiving large parts of the mechanism of Copycat as (restricted) generalized evolution covers both the Coderack as well as the Workspace, but not the Slipnet.
The Slipnet acts as sort of a “Platonic Heaven” (Hofstadter’s term). It contains various kinds of abstract terms, where “abstract” simply means “not directly observable.” It is hence not comparable to those abstractions that can be used to build tree-like hierarchies. Think of the series “fluffy”-dog-mammal-animal-living entity. Significantly, the abstract terms in Copycat’s Slipnet also comprise concepts about relations, such as “right,” “direction,” “group,” or “leftmost.” Relations, however, are nothing else than even more abstract symmetries, that is transformational models, that may even build a mathematical group. Quite naturally, we could consider the items in Slipnet as a mathematical category (of categories). Again, Hofstadter and Mitchell do not refer in any way to such structures, quite unfortunately so.
The Slipnet’s items may well be conceived as instances of symmetry relations. Hofstadter treats them as idealizations of positional relations. Any of these items act as a structural property. This is a huge advance as compared to other models of analogy.
To summarize, we find two main elements in Copycat.
- (1) restricted generalized evolution, and
- (2) concrete instances of positional idealization.
Actually, these elements are top-level elements that must be conceived as compounds. In part 2 we will check out the elements of the Slipnet in detail, while the evolutionary aspects we already discussed in a previous chapter. Yet, this level of abstraction is necessary to render Copycat’s principles conceptually more mobile. In some way, we have to apply the principles of Copycat to the attempt to understand it.
The Copycat, released to the wild
Any generalization of Copycat has to withdraw the implicit constraints of its elements. In more detail, this would include the following changes:
- (1) The representation of the items in the Slipnet could be changed into compounds, and these compounds should be expressed as “gene-like” entities.
- (2) Introducing a mechanism to extend the Slipnet. This could be achieved through gene doubling in response to external pressures; yet, these pressures are not to be conceived as “external” to the whole system, just external to the Copycat. The pressures could be issued by a SOM. Alternatively, a SOM environment might also deliver the idealizations themselves. In either case, the resulting behavior of the Copycat has to be shaped by selection, either through internal mechanisms, or through environmentally induced forces (changes in the fitness landscape).
- (3) The focus to positional idealization would have to be removed by introducing the more abstract notion of “symmetries”, i.e. mathematical groups or categories. This would render positional idealization just into a possible instance of potential idealization.
The resulting improvement of these changes would be dramatic. It would be not only much more easy to establish a Slipnet for any kind of domain, it also would allow the system (a CopyTiger?) to evolve new traits and capabilities, and to parametrize them autonomously. But these changes also require a change in the architectural (and mental) setup.
From Copycat to Metacat
Hofstadter himself tried to describe possible improvements of Copycat. A significant part of these suggestions for improvement is represented by the capability for self-monitoring and proliferating abstraction, hence he calls it “Metacat”.
The list of improvements comprises mainly the following five points (pp.315, chp.7).
- (1) Self-monitoring of pressures, actions, and crucial changes as an explicit registering into parts of the Workspace.
- (2) Disassembling of a given solution into the path of required actions.
- (3) Hofstadter writes that “Metacat should store a trace of its solution of a problem in an episodic memory.“
- (4) A clear “meta-analogical” sense as an ability to see analogies between analogies, that is a multi-leveled type of self-reflectiveness.
- (5) The ability to create and to enjoy the creation of new puzzles. In this context he writes “Indeed, I feel that responsiveness to beauty and its close cousin, simplicity, plays a central role in high-level cognition.“
I am not really convinced of these suggestions, at least not if it would be implemented in the way that is suggested by Hofstadter “between the lines”. They look much more like a dream than a reasonable list of improvements, perhaps except the first one. The topic of self-monitoring has been explored by James Marshall in his dissertation , but still his version of “Metacat” was not able to learn. This self-monitoring should not be conceived as a kind of Cartesian theater , perhaps even populated with homunculi on both sides of the stage.
The second point is completely incompatible with the architecture of Copycat, and notably Hofstadter does not provide even the tiniest comment on it. The third point violates the concept of “memory” as a re-constructive device. Hofstadter himself says elsewhere, while discussing alternative models of analogy, that the brain is not a database, which is quite correct. “Memory” is not a storage device. Yet, the consequence is that analogy making can’t be separated from memory itself (and vice versa).
The fourth suggestion, then, would require further platonic heavens, in case of Copycat/Metacat created by a programmer. This is highly implausible, and since it is a consequence of the architecture, the architecture of Copycat as such is not suitable to address real-world entities.
Finally, the fifth suggestion displays a certain naivity regarding either evolutionary contexts, to philosophical aspects of reasoning that are known since Immanuel Kant, or to the particular setup of human cognition, where emotions and propositional reasoning appear as deeply entangled issues.
The main Problem(s) of the FARG model
We already mentioned Copycat’s main problems, which are (i) the “Platonic heaven”, and (ii) the lack of the capability to learn as a kind of structural self-transformation.
Both problems are closely related. Actually, somehow there is only one single problem, and that’s the issue that Hofstadter got trapped by idealism. A Platonic heaven that is filled by the designer with an x-cat (or a Copy-x) is hard to comprehend. Even for the really small letter domain there are more than 60 of such idealistic, top-down and externally imposed concepts. These concepts have to be linked and balanced in just the right way, otherwise the capicut will not behave interesting in any way. Further more, the Slipnet is a structurally static entity. There are some parameters that change during its activity, but Copycat does not add new items to its Slipnet.
For these reasons it remains completely opaque, how Mitchell and Hofstadter arrived at that particular instance of the Slipnet for the letter domain, and thus it also remains completely unclear how the “computer” itself could build or achieve something like a Slipnet. Albeit Linhares  was able to implement an analogous FARG model for the domain of chess3, his model too suffers from the static Slipnet in the same way: it is extremely tedious to set up a Slipnet. Further more, the validation is even more laborious, if not impossible, due to the very nature of making analogies and the idealismic Slipnet.
The result is, well, a model that can not serve as a template for any kind of application that is designed to be able to adapt and to learn, at least if we take it without abstracting from it.
From an architectural point of view the Slipnet is simply not compatible to the rest of Copycat, which is strongly based on randomness and probabilistic processes in populations. The architecture of the Slipnet and the way it is used does not offer something like a probabilistic pathway into it. But why should the “Slipnet” not be a probabilistic process either?
Superior Aspects of the FARG model
Hofstadter clearly and correctly separates his project from connectionism (p.308):
Connectionist (neural-net) models are doing very interesting things these days, but they are not addressing questions at nearly as high a level of cognition as Copycat is, and it is my belief that ultimately, people will recognize that the neural level of description is a bit too low to capture the mechanisms of creative, fluid thinking. Trying to use connectionist language to describe creative thought strikes me as a bit like trying to describe the skill of a great tennis player in terms of molecular biology, which would be absurd.
A cornerstone in Hofstadter’s arguments and concepts around Copycat is conceptual slippage. This occurs in Slipnet and is represented as a sudden change in the weights of the items such that the most active (or influential) “neigh-borhood” also changes. To describe these neighborhoods, he invokes the concept of a halo. The “halo” is a more or less circular region around one of the abstract items in the Slipnet, yet without a clear boundary. Items in the Slipnet change their relative position all the time, thus their co-excitation also changes dynamically.
Hofstadter lists (p.215) the following missing issues in connectionist network (CN) models with regard to cognition, particularly with regard to concept slippage and fluid analogies.
- – CN don’t develop a halo around the representatives of concepts in case of localist networks, i.e. node oriented networks and thus no slippability emerges;
- – CN don’t develop a core region for a halo in case of networks where a “concept” is distributed throughout the network, and thus no slippability emerges;
- – CN have no notion of normality due to learning that is instantiated in any encounter with data.
This critique appears both to be a bit overdone and misdirected. As we have seen above, Copycat can be interpreted as to comprise a slightly restricted case of generalized evolution. Standard neuronal techniques do not know of evolutionary techniques, there are no “coopetitioning” agents, and there is no separation into different memories of different durations. The abstraction achieved by artificial neuronal networks (ANN) or even by standard SOMs is always exhausted by the transition from extensional (observed items) to intensional description (classes, types). The abstract items in the Slipnet are not just intensional descriptions and could not be found/constructed by an ANN or a SOM that would work just on the observation, especially, if there is just a single observation at all!
Copycat is definitely working in a different space as compared to network-based models.1 While the latter can provide the mechanisms to proceed from extensions to intensions in a “bottom-up” movement, the former is applying those intensions in a “top-down” manner. Saying this, we may invoke the reference to the higher forms of comparison and the Deleuzean differential. As many other things mentioned here, this would deserve a closer look from a philosophical perspective, which however we can’t provide here and now.
Nevertheless, Hofstadter’s critique of connectionist models seems to be closely related to the abandonment of modeling as a model for analogy making. Any of the three points above can be mitigated if we take a particular collection of SOM as a counterpart for Copycat. In the next section (which will be found in part II of this essay) we will see how the two approaches can inform each other.
1. We would like to point you to our discussion of non-Turing computation and else make you aware of the this conference: 11th International Conference on Unconventional Computation & Natural Computation 2012, University of Orléans, conference website.
2. Interestingly, Hofstadter’s PhD-student, co-worker and co-author Melanie Mitchell started to publish in the field of genetic algorithms (GA), yet, she never realized the kinship between GA and Copycat, at least she never said anything like this publicly.
3. He calls his model implementation “Capyblanca”; it is available through Google Code.
-  James B. Marshall, Metacat: A Self-Watching Cognitive Architecture for Analogy-Making and High-Level Perception. PhD Thesis, Indiana University 1999. available online (last access 18/3/2012)
-  Daniel Dennett, Consciousness Explained. 1992. p.107.
-  Alexandre Linhares (2008). The emergence of choice: Decision-making and strategic thinking through analogies. available online.
-  Douglas S. Blank, Implicit Analogy-Making: A Connectionist Exploration.
Indiana University Computer Science Department. available online.