Analogical Thinking, revisited. (II)

March 20, 2012 § Leave a comment

In this second part of the essay about a fresh perspective on

(II/II)

analogical thinking—more precise: on models about it—we will try to bring two concepts together that at first sight represent quite different approaches: Copycat and SOM.

Why engaging in such an endeavor? Firstly, we are quite convinced that FARG’s Copycat demonstrates an important and outstanding architecture. It provides a well-founded proposal about the way we humans apply ideas and abstract concepts to real situations. Secondly, however, it is also clear that Copycat suffers from a few serious flaws in its architecture, particularly the built-in idealism. This renders any adaptation to more realistic domains, or even to completely domain-independent conditions very, very difficult, if not impossible, since this drawback also prohibits structural learning. So far, Copycat is just able to adapt some predefined internal parameters. In other words, the Copycat mechanism just adapts a predefined structure, though a quite abstract one, to a given empiric situation.

Well, basically there seem to be two different, “opposite” strategies to merge these approaches. Either we integrate the SOM into Copycat, or we try to transfer the relevant yet to be identified parts from Copycat to a SOM-based environment. Yet, at the end of day we will see that and how the two alternatives converge.

In order to accomplish our goal of establishing a fruitful combination between SOM and Copycat we have to take mainly three steps. First, we briefly recapitulate the basic elements of Copycat and the proper instance of a SOM-based system. We also will describe the extended SOM system in some detail, albeit there will be a dedicated chapter on it. Finally, we have to transfer and presumably adapt those elements of the Copycat approach that are missing in the SOM paradigm.

Crossing over

The particular power of (natural) evolutionary processes derives from the fact that it is based on symbols. “Adaptation” or “optimization” are not processes that change just the numerical values of parameters of formulas. Quite to the opposite, adaptational processes that span across generations parts of the DNA-based story is being rewritten, with potential consequences for the whole of the story. This effect of recombination in the symbolic space is particularly present in the so-called “crossing over” during the production of gamete cells in the context of sexual reproduction in eukaryotes. Crossing over is a “technique” to dramatically speed up the exploration of the space of potential changes. (In some way, this space is also greatly enlarged by symbolic recombination.)

What we will try here in our attempt to merge the two concepts of Copycat and SOM is exactly this: a symbolic recombination. The difference to its natural template is that in our case we do not transfer DNA-snippets between homologous locations in chromosomes, we transfer whole “genes,” which are represented by elements.

Elementarizations I: C.o.p.y.c.a.t.

In part 1 we identified two top-level (non-atomic) elements of Copycat

Since the first element, covering evolutionary aspects such as randomness, population and a particular memory dynamics, is pretty clear and a whole range of possible ways to implement it are available, any attempt for improving the Copycat approach has to target the static, strongly idealistic characteristics of the the structure that is called “Slipnet” by the FARG’s. The Slipnet has to be enabled for structural changes and autonomous adaptation of its parameters. This could be accomplished in many ways, e.g. by representing the items in the Slipnet as primitive artificial genes. Yet, we will take a different road here, since the SOM paradigm already provides the means to achieve idealizations.

At that point we have to elementarize Copycat’s Slipnet in a way that renders it compatible with the SOM principles. Hofstadter emphasizes the following properties of the Slipnet and the items contained therein (pp.212).

  • (1) Conceptual depth allows for a dynamic and continuous scaling of “abstractness” and resistance against “slipping” to another concept;
  • (2) Nodes and links between nodes both represent active abstract properties;
  • (3) Nodes acquire, spread and lose activation, which knows an switch-on threshold < 1;
  • (4) The length of links represents conceptual proximity or degree of association between the nodes.

As a whole, and viewed from the network perspective, the Slipnet behaves much like a spring system, or a network built from rubber bands, where the springs or the rubber bands are regulated in their strength. Note that our concept of SomFluid also exhibits the feature of local regulation of the bonds between nodes, a property that is not present in the idealized standard SOM paradigm.

Yet, the most interesting properties in the list above are (1) and (2), while (3) and (4) are known in the classic SOM paradigm as well. The first item is great because it represents an elegant instance of creating the possibility for measurability that goes far beyond the nominal scale. As a consequence, “abstractness” ceases to be nominal none-or-all property, as it is present in hierarchies of abstraction. Such hierarchies now can be recognized as mere projections or selections, both introducing a severe limitation of expressibility. The conceptual depth opens a new space.

The second item is also very interesting since it blurs the distinction between items and their relations to some extent. That distinction is also a consequence of relying too readily on the nominal scale of description. It introduces a certain moment of self-reference, though this is not fully developed in the Slipnet. Nevertheless, a result of this move is that concepts can’t be thought without their embedding into other a neighborhood of other concepts. Hofstadter clearly introduces a non-positivistic and non-idealistic notion here, as it establishes a non-totalizing meta-concept of wholeness.

Yet, the blurring between “concepts” and “relations” could be and must be driven far beyond the level Hofstadter achieved, if the Slipnet should become extensible. Namely, all the parts and processes of the Slipnet need to follow the paradigm of probabilization, since this offers the only way to evade the demons of cybernetic idealism and control apriori. Hofstadter himself relies much on probabilization concerning the other two architectural parts of Copycat. Its beyond me why he didn’t apply it to the Slipnet too.

Taken together, we may derive (or: impose) the following important elements for an abstract description of the Slipnet.

  • (1) Smooth scaling of abstractness (“conceptual depth”);
  • (2) Items and links of a network of sub-conceptual abstract properties are instances of the same category of “abstract property”;
  • (3) Activation of abstract properties represents a non-linear flow of energy;
  • (4) The distance between abstract properties represents their conceptual proximity.

A note should be added regarding the last (fourth) point. In Copycat, this proximity is a static number. In Hofstadter’s framework, it does not express something like similarity, since the abstract properties are not conceived as compounds. That is, the abstract properties are themselves on the nominal level. And indeed, it might appear as rather difficult to conceive of concepts as “right of”, “left of”, or “group” as compounds. Yet, I think that it is well possible by referring to mathematical group theory, the theory of algebra and the framework of mathematical categories. All of those may be subsumed into the same operationalization: symmetry operations. Of course, there are different ways to conceive of symmetries and to implement the respective operationalizations. We will discuss this issue in a forthcoming essay that is part of the series “The Formal and the Creative“.

The next step is now to distill the elements of the SOM paradigm in a way that enables a common differential for the SOM and for Copycat..

Elementarizations II: S.O.M.

The self-organizing map is a structure that associates comparable items—usually records of values that represent observations—according to their similarity. Hence, it makes two strong and important assumptions.

  • (1) The basic assumption of the SOM paradigm is that items can be rendered comparable;
  • (2) The items are conceived as tokens that are created by repeated measurement;

The first assumption means, that the structure of the items can be described (i) apriori to their comparison and (ii) independent from the final result of the SOM process. Of course, this assumption is not unique to SOMs, any algorithmic approach to the treatment of data is committed to it. The particular status of SOM is given by the fact—and in stark contrast to almost any other method for the treatment of data—that this is the only strong assumption. All other parameters can be handled in a dynamic manner. In other words, there is no particular zone of the internal parametrization of a SOM that would be inaccessible apriori. Compare this with ANN or statistical methods, and you feel the difference…  Usually, methods are rather opaque with respect to their internal parameters. For instance, the similarity functional is usually not accessible, which renders all these nice looking, so-called analytic methods into some kind of subjective gambling. In PCA and its relatives, for instance, the similarity is buried in the covariance matrix, which in turn is only defined within the assumption of normality of correlations. If not a rank correlation is used, this assumption is extended even to the data itself. In both cases it is impossible to introduce a different notion of similarity. Else, and also as a consequence of that, it is impossible to investigate the particular dependency of the results proposed by the method from the structural properties and (opaque) assumptions. In contrast to such unfavorable epistemo-mythical practices, the particular transparency of the SOM paradigm allows for critical structural learning of the SOM instances. “Critical” here means that the influence of internal parameters of the method onto the results or conclusions can be investigated, changed, and accordingly adapted.

The second assumption is implied by its purpose to be a learning mechanism. It simply needs some observations as results of the same type of measurement. The number of observations (the number of repeats) has to  exceed a certain lower threshold, which, dependent on the data and the purpose, is at least 8, typically however (much) more than 100 observations of the same kind are needed. Any result will be within the space delimited by the assignates (properties), and thus any result is a possibility (if we take just the SOM itself).

The particular accomplishment of a SOM process is the transition from the extensional to the intensional description, i.e. the SOM may be used as a tool to perform the step from tokens to types.

From this we may derive the following elements of the SOM:1

  • (1) a multitude of items that can be described within a common structure, though not necessarily an identical one;
  • (2) a dense network where the links between nodes are probabilistic relations;
  • (3) a bottom-up mechanism which results in the transition from an extensional to an intensional level of description;

As a consequence of this structure the SOM process avoids the necessity to compare all items (N) to all other items (N-1). This property, together with the probabilistic neighborhoods establishes the main difference to other clustering procedures.

It is quite important to understand that the SOM mechanism as such is not a modeling procedure. Several extensions have to be added and properly integrated, such as

  • – operationalization of the target into a target variable;
  • – validation by separate samples;
  • – feature selection, preferably by an instance of  a generalized evolutionary process (though not by a genetic algorithm);
  • – detecting strong functional and/or non-linear coupling between variables;
  • – description of the dependency of the results from internal parameters by means of data experiments.

We already described the generalized architecture of modeling as well as the elements of the generalized model in previous chapters.

Yet, as we explained in part 1 of this essay, analogy making is conceptually incompatible to any kind of modeling, as long as the target of the model points to some external entity. Thus, we have to choose a non-modeling instance of a SOM as the starting point. However, clustering is also an instance of those processes that provide the transition from extensions to intensions, whether this clustering is embedded into full modeling or not. In other words, both the classic SOM as well as the modeling SOM are not suitable as candidates for a merger with Copycat.

SOM-based Abstraction

Fortunately, there is already a proposal, and even a well-known one, that indeed may be taken as such a candidate: the two-layer SOM (TL-SOM) as it has been demonstrated as essential part of the so-called WebSom [1,2].

Actually, the description as being “two layered” is a very minimalistic, if not inappropriate description what is going on in the WebSom. We already discussed many aspects of its architecture here and here.

Concerning our interests here, the multi-layered arrangement itself is not a significant feature. Any system doing complicated things needs a functional compartmentalization; we have met a multi-part, multi-compartment and multi-layered structure in the case of Copycat too. Else, the SOM mechanism itself remains perfectly identical across the layers.

The real interesting features of the approach realized in the TL-SOM are

  • – the preparation of the observations into probabilistic contexts;
  • – the utilization of the primary SOM as a measurement device (the actual trick).

The domain of application of the TL-SOM is the comparison and classification of texts. Texts belong to unstructured data and the comparison of texts is exposed to the same problematics as the making of analogies: there is no apriori structure that could serve as a basis for modeling. Also, as the analogies investigated by the FARG the text is a locational phenomenon, i.e. it takes place in a space.

Let us briefly recapitulate the dynamics in a TL-SOM. In order to create a TL-SOM the text is first dissolved into overlapping, probabilistic contexts. Note that the locational arrangement is captured by these random contexts. No explicit apriori rules are necessary to separate patterns. The resulting collection of  contexts then gets “somified”. Each node then contains similar random contexts that have been derived from various positions in different texts. Now the decisive step will be taken, which consists in turning the perspective by “90 degrees”: We can use the SOM as the basis for creating a histogram for each of the texts. The nodes are interpreted as properties of the texts, i.e. each node represents a bin of the histogram. The values of the individual bins measure the frequency of the text as it is represented by the respective random context. The secondary SOM then creates a clustering across these histograms, which represent the texts in an abstract manner.

This way the primary lattice of the TL-SOM is used to impose a structure on the unstructured entity “text.”

Figure 1: A schematic representation of a two-layered SOM with built-in self-referential abstraction. The input for the secondary SOM (foreground) is derived as a collection of histograms that are defined as a density across the nodes of the primary SOM (background). The input for the primary SOM are random contexts.

To put it clearly: the secondary SOM builds an intensional description of entities that results from the interaction of a SOM with a probabilistic description of the empirical observations. Quite obviously, intensions built this way about intensions are not only quite abstract, the mechanism could even be stacked. It could be described as “high-level perception” as justified as Hofstadter uses the term for Copycat. The TL-SOM turns representational intensions into abstract, structural ones.

The two aspects from above thus interact, they are elements of the TL-SOM. Despite the fact that there are still transitions from extensions to intensions, we also can see that the targeted units of the analysis, the texts get probabilistically distributed across an area, the lattice of the primary SOM. Since the SOM maps the high-dimensional input data onto its map in a way that preserves their topological properties, it is easy to recognize that the TL-SOM creates conceptual halos as an intermediate.

So let us summarize the possibilities provided by the SOM.

  • (1) SOMs are able to create non-empiric, or better: de-empirified idealizations of intensions that are based on “quasi-empiric” input data;
  • (2) TL-SOMs can be used to create conceptual halos.

In the next section we will focus on this spatial, better: primarily spatial effect.

The Extended SOM

Kohonen and co-workers [1,2] proposed to build histograms that reflect the probability density of a text across the SOM. Those histograms represent the original units (e.g. texts) in a quite static manner, using a kind of summary statistics.

Yet, texts are definitely not a static phenomenon. At first sight there is at least a series, while more appropriately texts are even described as dynamic networks of own associative power [3]. Returning to the SOM we see that additionally to the densities scattered across the nodes of the SOM we also can observe a sequence of invoked nodes, according to the sequence of random contexts in the text (or the serial observations)

The not so difficult question then is: How to deal with that sequence? Obviously, it is again and best conceived as a random process (though with a strong structure), and random processes are best described using Markov models, either as hidden (HMM) or as transitional models. Note that the Markov model is not a model about the raw observational data, it describes the sequence of activation events of SOM nodes.

The Markov model can be used as a further means to produce conceptual halos in the sequence domain. The differential properties of a particular sequence as compared to the Markov model then could be used as further properties to describe the observational sequence.

(The full version of the extended SOM comprises targeted modeling as a further level. Yet, this targeted modeling does not refer to raw data. Instead, its input is provided completely by the primary SOM, which is based on probabilistic contexts, while the target of such modeling is just internal consistency of a context-dependent degree.)

The Transfer

Just to avoid misunderstanding: it does not make sense to try representing Copycat completely by a SOM-based system. The particular dynamics and phenomenologically behavior depends a lot on Copycat’s tripartite morphology as represented by the Coderack (agents), the Workspace and the Slipnet. We are “just” in search for a possibility to remove the deep idealism from the Slipnet in order to enable it for structural learning.

Basically, there are two possible routes. Either we re-interpret the extended SOM in a way that allows us to represent the elements of the Slipnet as properties of the SOM, or we try to replace the all items in the Slipnet by SOM lattices.

So, let us take a look which structures we have (Copycat) or what we could have (SOM) on both sides.

Table 1: Comparing elements from Copycat’s Slipnet to the (possible) mechanisms in a SOM-based system.

Copycat extended SOM
 1. smoothly scaled abstraction Conceptual depth (dynamic parameter) distance of abstract intensions in an integrated lattice of a n-layered SOM
 2.  Links as concepts structure by implementation reflecting conceptual proximity as an assignate property for a higher-level
 3. Activation featuring non-linear switching behavior structure by implementation x
 4. Conceptual proximity link length (dynamic parameter) distance in map (dynamic parameter)
 5.  Kind of concepts locational, positional symmetries, any

From this comparison it is clear that the single most challenging part of this route is the possibility for the emergence of abstract intensions in the SOM based on empirical data. From the perspective of the SOM, relations between observational items such as “left-most,” “group” or “right of”, and even such as “sameness group” or “predecessor group”, are just probabilities of a pattern. Such patterns are identified by functions or dynamic combinations thereof. Combinations ot topological primitives remain mappable by analytic functions. Such concepts we could call “primitive concepts” and we can map these to the process of data transformation and the set of assignates as potential properties.2 It is then the job of the SOM to assign a relevancy to the assignates.

Yet, Copycat’s Slipnet comprises also rather abstract concepts such as “opposite”. Further more, the most abstract concepts often act as links between more primitive concepts, or, in Hofstadter terms, conceptual items of lower “conceptual depth”.

My feeling here is that it is a fundamental mistake to implement concepts like “opposite” directly. What is opposite of something else is a deeply semantic concept in itself, thus strongly dependent on the domain. I think that most of the interesting concepts, i.e. the most abstract ones are domain-specific. Concepts like “opposite” could be considered as something “simple” only in case of geometric or spatial domains.

Yet, that’s not a weakness. We should use this as a design feature. Take the following rather simple case as shown in the next figure as an example. Here we mapped simply triplets of uniformly distributed random values onto a SOM. The three values can be readily interpreted as parts of a RGB value, which renders the interpretation more intuitive. The special thing here is that the map has been a really large one: We defined approximately 700’000 nodes and fed approx. 6 million observations into it.

Figure 2: A SOM-based color map showing emergence of abstract features. Note that the topology of the map is a borderless toroid: Left and right borders touch each other (distance=0), and the same applies to the upper and lower borders.

We can observe several interesting things. The SOM didn’t come up with just any arbitrary sorting of the colors. Instead, a very particular one emerged.

First, the map is not perfectly homogeneous anymore. Very large maps tend to develop “anisotropies”, symmetry breaks if you like, simply due to the fact the the signal horizon becomes an important issue. This should not be regarded as a deficiency though. Symmetry breaks are essential for the possibility of the emergence of symbols. Second, we can see that two “color models” emerged, the RGB model around the dark spot in the lower left, and the YMC model around the bright spot in the upper right. Third, the distance between the bright, almost white spot and the dark, almost black one is maximized.

In other words, and not quite surprising, the conceptual distance is reflected as a geometrical distance in the SOM. As it is the case in the TL-SOM, we now could use the SOM as a measurement device that transforms an unknown structure into an internal property, simply by using the locational property in the SOM as an assignate for a secondary SOM. In this way we not only can represent “opposite”, but we even have a model procedure for “generalized oppositeness” at out disposal.

It is crucial to understand this step of “observing the SOM”, thereby conceiving the SOM as a filter, or more precisely as a measurement device. Of course, at this point it becomes clear that a large variety of such transposing and internal-virtual measurement devices may be thought of. Methodologically, this opens an orthogonal dimension to the representation of data, resembling strongly to the concept of orthoregulation.

The map shown above even allows to create completely different color models, for instance one around yellow and another one around magenta. Our color psychology is strongly determined by the sun’s radiated spectrum and hence it reflects a particular Lebenswelt; yet, there is no necessity about it. Some insects like bees are able to perceive ultraviolet radiation, i.e. their colors may have 4 components, yielding a completely different color psychology, while the capability to distinguish colors remains perfectly.3

“Oppositeness” is just a “simple” example for an abstract concept and its operationalization using a SOM. We already mentioned the “serial” coherence of texts (and thus of general arguments) that can be operationalized as sort of virtual movement across a SOM of a particular level of integration.

It is crucial to understand that there is no other model besides the SOM that combines the ability to learn from empirical data and the possibility for emergent abstraction.

There is yet another lesson that we can take home from the simple example above. Well, the example doesn’t not remain that simple. High-level abstraction, items of considerable conceptual depth, so to speak, requires rather short assignate vectors. In the process of learning qua abstraction it appears to be essential that the masses of possible assignates derived from or imposed by measurement of raw data will be reduced. On the one hand, empiric contexts from very different domains should be abstracted, i.e. quite literally “reduced”, into the same perspective. On the other hand, any given empiric context should be abstracted into (much) more than just one abstract perspective. The consequence of that is that we need a lot of SOMs, all separated “sufficiently” from each other. In other words, we need a dynamic population of Self-organizing maps in order to represent the capability of abstraction in real-life. “Dynamic population” here means that there are developmental mechanisms that result in a proliferation, almost a breeding of new SOM instances in a seamless manner. Of course, the SOM instances themselves have to be able to grow and to differentiate, as we have described it here and here.

In a population of SOM the conceptual depth of a concept may be represented by the efforts to arrive at a particular abstract “intension.” This not only comprises the ordinary SOM lattices, but also processes like Markov models, simulations, idealizations qua SOMs, targeted modeling, transition into symbolic space, synchronous or potential activations of other SOM compartments etc. This effort may be represented finally as a “number.”

Conclusions

The structure of multi-layered system of Self-organizing Maps as it has been proposed by Kohonen and co-workers is a powerful model to represent emerging abstraction in response to empiric impressions. The Copycat model demonstrates how abstraction could be brought back to the level of application in order to become able to make analogies and to deal with “first-time-exposures”.

Here we tried to outline a potential path to bring these models together. We regard this combination in the way we proposed it (or a quite similar one) as crucial for any advance in the field of machine-based episteme at large, but also for the rather confined area of machine learning. Attempts like that of Blank [4] appear to suffer seriously from categorical mis-attributions. Analogical thinking does not take place on the level of single neurons.

We didn’t discuss alternative models here (so far, a small extension is planned). The main reasons are that first it would be an almost endless job, and second that Hofstadter already did it and as a result of his investigation he dismissed all the alternative approaches (from authors like Gentner, Holyoak, Thagard). For an overview Runco [5] about recent models on creativity, analogical thinking, or problem solving provides a good starting point. Of course, many authors point to roughly the same direction as we did here, but mostly, the proposals are circular, not helpful because the problematic is just replaced by another one (e.g. the infamous and completely unusable “divergent thinking”), or can’t be implemented for other reasons. Thagard [6] for instance, claim that a “parallel satisfaction of the constraints of similarity, structure and purpose” is key in analogical thinking. Given our analysis, such statements are nothing but a great mess, mixing modeling, theory, vagueness and fluidity.

For instance, in cognitive psychology and in the field of artificial intelligence as well, the hypothesis of Structural Mapping (STM) finds a lot of supporters [7]. Hofstadter discusses similar approaches in his book. The STM hypothesis is highly implausible and obviously a left-over of the symbolic approach to Artificial Intelligence, just transposed into more structural regions. The STM hypothesis has not only to be implemented as a whole, it also has to be implemented for each domain specifically. There is no emergence of that capability.

The combination of the extended SOM—interpreted as a dynamic population of growing SOM instances—with the Copycat mechanism indeed appears as a self-sustaining approach into proliferating abstraction and—quite significant—back from it into application. It will be able to make analogies on any field already in its first encounter with it, even regarding itself, since both the extended SOM as well as the Copycat comprise several mechanisms that may count as precursors of high-level reflexivity.

After this proposal little remains to be said on the technical level. One of those issues which remain to be discussed is the conditions for the possibility of binding internal processes to external references. Here our favorite candidate principle is multi-modality, that is the joint and inextricable “processing” (in the sense of “getting affected”) of words, images and physical signals alike. In other words, I feel that we have come close to the fulfillment of the ariadnic question this blog:”Where is the Limit?” …even in its multi-faceted aspects.

A lot of implementation work has now to be performed, eventually commented by some philosophical musings about “cognition”, or more appropriate the “epistemic condition.” I just would like to invite you to stay tuned for the software publications to come (hopefully in the near future).

Notes

1. see also the other chapters about the SOM, SOM-based modeling, and generalized modeling.

2. It is somehow interesting that in the brain of many animals we can find very small groups of neurons, if not even single neurons, that respond to primitive features such as verticality of lines, or the direction of the movement of objects in the visual field.

3. Ludwig Wittgenstein insisted all the time that we can’t know anything about the “inner” representation of “concepts.” It is thus free of any sense and meaning to claim knowledge about the inner state of oneself as well as of that of others. Wilhelm Vossenkuhl introduces and explains the Wittgensteinian “grammatical” solipsism carefully and in a very nice way.[8]  The only thing we can know about inner states is that we use certain labels for it, and the only meaning of emotions is that we do report them in certain ways. In other terms, the only thing that is important is the ability to distinguish ones feelings. This, however, is easy to accomplish for SOM-based systems, as we have been demonstrating here and elsewhere in this collection of essays.

4. Don’t miss Timo Honkela’s webpage where one can find a lot of gems related to SOMs! The only puzzling issue about all the work done in Helsinki is that the people there constantly and pervasively misunderstand the SOM per se as a modeling tool. Despite their ingenuity they completely neglect the issues of data transformation, feature selection, validation and data experimentation, which all have to be integrated to achieve a model (see our discussion here), for a recent example see here, or the cited papers about the Websom project.

  • [1] Timo Honkela, Samuel Kaski, Krista Lagus, Teuvo Kohonen (1997). WEBSOM – Self-Organizing Maps of Document Collections. Neurocomputing, 21: 101-117.4
  • [2] Krista Lagus, Samuel Kaski, Teuvo Kohonen in Information Sciences (2004)
    Mining massive document collections by the WEBSOM method. Information Sciences, 163(1-3): 135-156. DOI: 10.1016/j.ins.2003.03.017
  • [3] Klaus Wassermann (2010). Nodes, Streams and Symbionts: Working with the Associativity of Virtual Textures. The 6th European Meeting of the Society for Literature, Science, and the Arts, Riga, 15-19 June, 2010. available online.
  • [4 ]Douglas S. Blank, Implicit Analogy-Making: A Connectionist Exploration.Indiana University Computer Science Department. available online.
  • [5] Mark A. Runco, Creativity-Research, Development, and Practice Elsevier 2007.
  • [6] Keith J. Holyoak and Paul Thagard, Mental Leaps: Analogy in Creative Thought.
    MIT Press, Cambridge 1995.
  • [7] John F. Sowa, Arun K. Majumdar (2003), Analogical Reasoning.  in: A. Aldo, W. Lex, & B. Ganter (eds.), “Conceptual Structures for Knowledge Creation and Communication,” Proc.Intl.Conf.Conceptual Structures, Dresden, Germany, July 2003.  LNAI 2746, Springer New York 2003. pp. 16-36. available online.
  • [8] Wilhelm Vossenkuhl. Solipsismus und Sprachkritik. Beiträge zu Wittgenstein. Parerga, Berlin 2009.

.

Analogical Thinking, revisited.

March 19, 2012 § Leave a comment

What is the New York of California?

(I/II)

Or even, what is the New York of New York? Almost everybody will come up with the same answer, despite the fact that not only the question is not only ill-defined. Both the question and its answer can be described only after the final appearance of the answer. In other words, it is not possible to provide any proposal about the relevance of those properties apriori to its completion, that aposteriori are easily tagged as relevant for the description of both the question as well as the answer. Both the question and the solution do not “exist” in the way that is pretended by their form before we have finished making sense of it. There is a wealth of philosophical issues around this phenomenon, which we all have to bypass here. Here we will focus just on the possibility for mechanisms that could be invoked in order to build a model that is capable to behave phenomeno-logically “as if“.

The credit to render such questions and the associated problematics salient in the area of computer models of thinking belongs to Douglas Hofstadter and his “Fluid Analogy Research group” (FARG). In his book “Fluid Concepts and Creative Analogies” that we already mentioned here he proposes a particular model of which he claims that it is a proper model for analogical thinking. In constructing this model, which took more than 10 years of research, we did not try to stick (to get stuck?) to the neuronal level. Accordingly, one can’t describe the performance of a tennis player at the molecular level, he says. Remarkably, he also keeps the so-called cognitive sciences and their laboratory wisdom at distance. Instead, his starting point is the everyday language, and presumably a good deal of introspection as well. He sees his model located at an intermediate level between the neurons and consciousness (quite a large field, though).

His overarching claim is as simple as it is distant from the main stream of AI and cognitive science. (Note that Hofstadter does not formulate “analogical reasoning.”)

Thinking is largely equivalent with making analogies.

Hofstadter is not interested to produce just another model for analogy making. There are indeed quite a lot of such models, which he discusses in great detail. And he refutes them all; he proofs that they are all ill-posed, since they all do not start with perception. Without exception they all assume that the “knowledge” is already in the computer and based on this assumption some computer program is established. Of course, such approaches are nonsense, euphemistically called “knowledge acquisition bottleneck” by people working in the field of AI / machine learning. Yet, knowledge is nothing that could be externalized and then acquired subsequently by some other party, it can’t be found “in” the world, and of course it can’t be separated as something that “exists” beside the processing mechanisms of the brain, making the whole thing “smart”. As already mentioned, such ideas are utter nonsense.

Hofstadter’s basic strategy is different. He proposes to create a software system that is able for “concept slipping” as an emergent phenomenon, deeply based on perceptional mechanisms. He even coined the term “high-level perception.”

That is, the […] project is not about simulating analogy-making per se, but about simulating the very crux of human cognition: fluid concepts. (p.208)

This essay will investigate his model. We will find that despite its appeal it is nevertheless seriously unrealistic, even according to Hofstadter’s own standards. Yet, despite its particular weaknesses it also demonstrates very interesting mechanisms. After extracting the cornerstones of his model we will try to map his insights to the world of self-organizing maps. We also will discuss how to transfer the interesting parts of Hofstadter’s model. Hofstadter himself clearly stated the deficiencies of “connectionist models” of “learning,” yet, my impression is that he was not aware about self-organizing maps at this time. By “connectionism” he obviously referred to artificial neural networks (ANN), and for those we completely agree to his critique.

Before we start I would like to provide some original sources, that is, copies of those parts that are most relevant for this essay. These parts are from chapter 5chapter 7 and chapter 8 of the aforementioned book. There you will find much more details and lucid examples about it in Hofstadter’s own words.

Is there an Alternative to Analogies?

In order to find an alternative we have to take a small bird’s view. Very coarsely spoken, thinking transforms some input into some output while being affected and transforming itself. In some sense, any transformation of input to output transforms the transforming instance, though in vastly different degrees. A trivial machine just wears off, a trivial computer—that is, any digital machine that fits into the scheme of the Turing-computing1—can be reset to meet exactly a previous state. As soon as historical contingency is involved, reproducibility vanishes and strictly non-technical entities appear: memory, value, and semantics (among others).

This transformation game applies to analogy making, and it also applies to traditional modeling.Is it possible to apply any kind of modeling to the problematics that is represented by the “transfer game”, for which those little questions posed in the beginning are just an example?

In his context, Hofstadter calls the modeling approach the brute-force approach (p.327, chp.8). The outline of the modeling approach could look like this (p.337).

  • Step 1: Run down the apriori list of city-characterization criteria and characterize the “source town” A according to each of them.
  • Step 2: Retrieve an apriori list of “target towns” inside target region Y from the data base.
  • Step 3: For each retrieved target town X, run down the a priori list of city-characterization criteria again, calculating X’s numerical degree of match with A for every criterion in the list.
  • Step 4: For each target town X, sum up the points generated in Step 3, possibly using apriori weights, thus allowing some criteria to be counted more heavily than others.
  • Step 5: Locate the target town with the highest overall rating as calculated in Step 4, and propose it as “the A of Y”.

Any plausible apriori list of city-characterization criteria would be long, very long indeed. Effectively, it can’t be limited in advance, since any imposed limit would represent a model that would claim to be better suited to decide about the criteria than the model being built. We are crashed by an infinite regress, not just in theory. What we experience here is Wittgenstein’s famous verdict that justifications have to come to an end. Rules are embedded in the form of life (“Lebensform”) and without knowing all about a particular Lebensform and to take into consideration anything comprised by such (impossible) knowledge we can’t start to model at all.

He identifies four characteristic difficulties for the modeling approach with regard to his little “transfer game” that plays around with cities.

  • – Difficulty 1: It is psychologically unrealistic to explicitly consider all the towns one knows in a given region in order to come up with a reasonable answer.
  • – Difficulty 2: Comparison of a target town and a source town according to a specific city-characterization criterion is not a hard-edged mechanical task, but rather, can itself constitute an analogy problem as complex as the original top-level puzzle.
  • – Difficulty 3: There will always be source towns A whose “essence”—that is, set of most salient characteristics—is not captured by a given fixed list of city-characterization criteria.
  • – Difficulty 4: What constitutes a “town in region Y” is not apriori evident.

Hofstadter underpins his point with the following question (p.347).

What possible set of apriori criteria would allow a computer to reply, perfectly self-confidently, that the country of Monaco is “the Atlantic City of France”?

Of course, the “computer” should come up with the answer in a way that is not pre-programmed explicitly.

Obviously, the problematics of making analogies can’t be solved algorithmically. There is not only no such thing as a single “solution”, even the criteria to describe the problem are missing. Thus we can conclude that modeling, even in its non-algorithmical form, is not a viable alternative to analogy making.

The FARG Model

In the following, we investigate the model as proposed by Hofstadter and his group, mainly Melanie Mitchell. This is separated into the parts

  • – precis of the model,
  • – its elements,
  • – its extension as proposed by Hofstadter,
  • – the main problems of the model, and finally,
  • – the main superior aspects of the model as compared to connectionist models (from Hofstadter’s perspective, of course).
Precis of the Model

Hofstadter’s conclusion from the problems with the model-based approach and thus also the starting point for his endeavor is that the making of an analogy must appear as an emergent phenomenon. Analogy itself can’t be “defined” in terms of criteria, beyond sort f rather opaque statements about “similarity.” The point is that this similarity could be measured only aposteriori, so this concept does not help. The capability for making analogies can’t be programmed explicitly. It would not be “making” of analogies anymore, it would just be a look-up of dead graphems (not even symbols!) in a database.

He proofs his ideas by means of a small software called “Copycat”. This name derives from the internal processes of the software, as making “almost identical copies” is an important ingredient of it. Yet, it also refers to the problem that appears if you say: “I am doing this, now do the same thing…”

Copycat has three major parts, which he labels as (i) the Slipnet, (ii) the Workspace, (iii) the Coderack.

The Coderack is a rack that serves as a launching site for a population of agents of various kinds. Agents decease and are being created in various ways. They may be spawned by other agents, by the Coderack, or by any of the items in the Slipnet—as a top-down specialist bred just to engage in situations represented by the Slipnet item. Any freshly created agent will be first put into the Coderack, regardless its originator or kind.

Any particular agent behaves as a specialist for recognizing a particular situation or to establish a particular relation between parts of the input “data, ” the initial observation. This recognition requires a model apriori, of course. Since these models are rather abstract as compared to the observational data, Hofstadter calls them “concepts.” After their set up, agents are put into the Coderack from where they start in random order, but also dependent on their “inner state,” which Hofstadter calls “pressure.”

The Slipnet is a loose “network” of deep and/or abstract concepts. In case of Copycat these concepts comprise

a, b, c, … , z, letter, successor, predecessor, alphabetic-first, alphabetic-last, alphabetic position, left, right, direction, leftmost, rightmost, middle, string position, group, sameness group, successor group, predecessor group, group length, 1, 2, 3, sameness, and opposite,

In total there are more than 60 of such concepts. These items are linked together, while the length of the link reflects the “distance” between concepts. This distance changes while Copycat is working on a particular task. The change is induced by the agents in response to their “success.” The Slipnet is not really a “network,” since it is neither a logistic network (it doesn’t transport anything) nor is it an associative network like a SOM. It is also not suitable to conceive it as a kind of filter in the sense of a spider’s web, or a fisherman’s net. It is thus more appropriate to consider it simply as a non-directed, dynamic graph, where discrete items are linked.

Finally, the third aspect is the Workspace. Hofstadter describes it as a “busy construction site” and likens it to the cytoplasm (p.216). In the Workspace, the agents establish bonds between the atomic items of the observation. As said, each agent knows nothing about the posed problem, it is just capable to perform on a mini-aspect of the task. The whole population of agents, however, build something larger. It looks much like the activity in ants or termites, building some morphological structure in the hive, or a macroscopic dynamic effect as hive population. The Workspace is the location of such intermediate structures of various degrees of stability, meaning that some agents also work to remove a particular structure.

So far we have described the morphology. The particular dynamics unfolding on this morphology is settled between competition and cooperation, with the result of a collective calming down of the activities. The decrease in activity is itself an emergent consequence of the many parallel processes inside Copycat.

A single run of Copycat yields one instance of the result. Yet, a single answer is not the result itself. Rather, as different runs of Copycat yield different singular answers, the result consists of a probability density for different singular answers. For the letter-domain in which Copycat is working the result look like this:

Figure 1: Probability densities as result of a Copycat run.

The Elements of the FARG Model

Before we proceed, I should emphasize that  here “element” is used as we have introduced the term here.

Returning to the FARG model, it is important to understand that a particularly constraint randomness plays a crucial role in its setup. The population of agents does not search through all possibilities all the time. Yet, any existing intermediate result, say structural hypothesis, serves as a constraint for the future search.

We also find different kinds of memories with different durations, we find dynamic historic constraints, which we also could call contingencies. We have a population of different kinds of agents that cooperate and compete. In some almost obvious way, Copycat’s mechanisms may be conceived as an instance of the generalized evolution that we proposed earlier. Hofstadter himself is not aware that he just proposed a mechanism for generalized evolutionary changes. He calls the process “parallel terraced scan”, thereby unnecessarily sticking to a functional perspective. Yet, we consider generalized evolution as one of the elements of Copycat. It could really be promising to develop Copycat as an alternative to so-called genetic algorithms.2

Despite a certain resemblance to natural evolution the mechanisms built into Copycat do not comprise an equivalent to what is known from biology as “gene doubling”. Gene doubling and the akin part of gene deletion are probably the most important mechanisms in natural evolution. Copycat produces different kinds of agents, but the informational setup of these agents does not change as it is given by the Slipnet. The equivalent to gene doubling would have to be implemented into the Slipnet. On the other hand, however, it is clear that the items in the Slipnet are too concrete, almost representational. In contrast, genes usually do not represent a particular function on the macro-level (which is one of the main structural faults of so-called genetic algorithms). So, we conclude that Copycat contains a restricted version of generalized evolution. Else, we see a structural resemblance to the theories of Edelman and his neuronal Darwinism, which actually is a nice insight.

Conceiving large parts of the mechanism of Copycat as (restricted) generalized evolution covers both the Coderack as well as the Workspace, but not the Slipnet.

The Slipnet acts as sort of a “Platonic Heaven” (Hofstadter’s term). It contains various kinds of abstract terms, where “abstract” simply means “not directly observable.” It is hence not comparable to those abstractions that can be used to build tree-like hierarchies. Think of the series “fluffy”-dog-mammal-animal-living entity. Significantly, the abstract terms in Copycat’s Slipnet also comprise concepts about relations, such as “right,” “direction,” “group,” or “leftmost.” Relations, however, are nothing else than even more abstract symmetries, that is transformational models, that may even build a mathematical group. Quite naturally, we could consider the items in Slipnet as a mathematical category (of categories). Again, Hofstadter and Mitchell do not refer in any way to such structures, quite unfortunately so.

The Slipnet’s items may well be conceived as instances of symmetry relations. Hofstadter treats them as idealizations of positional relations. Any of these items act as a structural property. This is a huge advance as compared to other models of analogy.

To summarize, we find two main elements in Copycat.

  • (1) restricted generalized evolution, and
  • (2) concrete instances of positional idealization.

Actually, these elements are top-level elements that must be conceived as compounds. In part 2 we will check out the elements of the Slipnet in detail, while the evolutionary aspects we already discussed in a previous chapter. Yet, this level of abstraction is necessary to render Copycat’s principles conceptually more mobile. In some way, we have to apply the principles of Copycat to the attempt to understand it.

The Copycat, released to the wild

Any generalization of Copycat has to withdraw the implicit constraints of its elements. In more detail, this would include the following changes:

  • (1) The representation of the items in the Slipnet could be changed into compounds, and these compounds should be expressed as “gene-like” entities.
  • (2) Introducing a mechanism to extend the Slipnet. This could be achieved through gene doubling in response to external pressures; yet, these pressures are not to be conceived as “external” to the whole system, just external to the Copycat. The pressures could be issued by a SOM. Alternatively, a SOM environment might also deliver the idealizations themselves. In either case, the resulting behavior of the Copycat has to be shaped by selection, either through internal mechanisms, or through environmentally induced forces (changes in the fitness landscape).
  • (3) The focus to positional idealization would have to be removed by introducing the more abstract notion of “symmetries”, i.e. mathematical groups or categories. This would render positional idealization just into a possible instance of potential idealization.

The resulting improvement of these changes would be dramatic. It would be not only much more easy to establish a Slipnet for any kind of domain, it also would allow the system (a CopyTiger?) to evolve new traits and capabilities, and to parametrize them autonomously. But these changes also require a change in the architectural (and mental) setup.

From Copycat to Metacat

Hofstadter himself tried to describe possible improvements of Copycat. A significant part of these suggestions for improvement is represented by the capability for self-monitoring and proliferating abstraction, hence he calls it “Metacat”.

The list of improvements comprises mainly the following five points (pp.315, chp.7).

  • (1) Self-monitoring of pressures, actions, and crucial changes as an explicit registering into parts of the Workspace.
  • (2) Disassembling of a given solution into the path of required actions.
  • (3) Hofstadter writes that “Metacat should store a trace of its solution of a problem in an episodic memory.
  • (4) A clear “meta-analogical” sense as an ability to see analogies between analogies, that is a multi-leveled type of self-reflectiveness.
  • (5) The ability to create and to enjoy the creation of new puzzles. In this context he writes “Indeed, I feel that responsiveness to beauty and its close cousin, simplicity, plays a central role in high-level cognition.

I am not really convinced of these suggestions, at least not if it would be implemented in the way that is suggested by Hofstadter “between the lines”. They look much more like a dream than a reasonable list of improvements, perhaps except the first one. The topic of self-monitoring has been explored by James  Marshall in his dissertation [1], but still his version of “Metacat” was not able to learn. This self-monitoring should not be conceived as a kind of Cartesian theater [2], perhaps even populated with homunculi on both sides of the stage.

The second point is completely incompatible with the architecture of Copycat, and notably Hofstadter does not provide even the tiniest comment on it. The third point violates the concept of “memory” as a re-constructive device. Hofstadter himself says elsewhere, while discussing alternative models of analogy, that the brain is not a database, which is quite correct. “Memory” is not a storage device. Yet, the consequence is that analogy making can’t be separated from memory itself (and vice versa).

The fourth suggestion, then, would require further platonic heavens, in case of Copycat/Metacat created by a programmer. This is highly implausible, and since it is a consequence of the architecture, the architecture of Copycat as such is not suitable to address real-world entities.

Finally, the fifth suggestion displays a certain naivity regarding either evolutionary contexts, to philosophical aspects of reasoning that are known since Immanuel Kant, or to the particular setup of human cognition, where emotions and propositional reasoning appear as deeply entangled issues.

The main Problem(s) of the FARG model

We already mentioned Copycat’s main problems, which are (i) the “Platonic heaven”, and (ii) the lack of the capability to learn as a kind of structural self-transformation.

Both problems are closely related. Actually, somehow there is only one single problem, and that’s the issue that Hofstadter got trapped by idealism. A Platonic heaven that is filled by the designer with an x-cat (or a Copy-x) is hard to comprehend. Even for the really small letter domain there are more than 60 of such idealistic, top-down and externally imposed concepts. These concepts have to be linked and balanced in just the right way, otherwise the capicut will not behave interesting in any way. Further more, the Slipnet is a structurally static entity. There are some parameters that change during its activity, but Copycat does not add new items to its Slipnet.

For these reasons it remains completely opaque, how Mitchell and Hofstadter arrived at that particular instance of the Slipnet for the letter domain, and thus it also remains completely unclear how the “computer” itself could build or achieve something like a Slipnet. Albeit Linhares [3] was able to implement an analogous FARG model for the domain of chess3, his model too suffers from the static Slipnet in the same way: it is extremely tedious to set up a Slipnet. Further more, the validation is even more laborious, if not impossible, due to the very nature of making analogies and the idealismic Slipnet.

The result is, well, a model that can not serve as a template for any kind of application that is designed to be able to adapt and to learn, at least if we take it without abstracting from it.

From an architectural point of view the Slipnet is simply not compatible to the rest of Copycat, which is strongly based on randomness and probabilistic processes in populations. The architecture of the Slipnet and the way it is used does not offer something like a probabilistic pathway into it. But why should the “Slipnet” not be a probabilistic process either?

Superior Aspects of the FARG model

Hofstadter clearly and correctly separates his project from connectionism (p.308):

Connectionist (neural-net) models are doing very interesting things these days, but they are not addressing questions at nearly as high a level of cognition as Copycat is, and it is my belief that ultimately, people will recognize that the neural level of description is a bit too low to capture the mechanisms of creative, fluid thinking. Trying to use connectionist language to describe creative thought strikes me as a bit like trying to describe the skill of a great tennis player in terms of molecular biology, which would be absurd.

A cornerstone in Hofstadter’s arguments and concepts around Copycat is conceptual slippage. This occurs in Slipnet and is represented as a sudden change in the weights of the items such that the most active (or influential) “neigh-borhood” also changes. To describe these neighborhoods, he invokes the concept of a halo. The “halo” is a more or less circular region around one of the abstract items in the Slipnet, yet without a clear boundary. Items in the Slipnet change their relative position all the time, thus their co-excitation also changes dynamically.

Hofstadter lists (p.215) the following missing issues in connectionist network (CN) models with regard to cognition, particularly with regard to concept slippage and fluid analogies.

  • – CN don’t develop a halo around the representatives of concepts in case of localist networks, i.e. node oriented networks and thus no slippability emerges;
  • – CN don’t develop a core region for a halo in case of networks where a “concept” is distributed throughout the network, and thus no slippability emerges;
  • – CN have no notion of normality due to learning that is instantiated in any encounter with data.

This critique appears both to be a bit overdone and misdirected. As we have seen above, Copycat can be interpreted as to comprise a slightly restricted case of generalized evolution. Standard neuronal techniques do not know of evolutionary techniques, there are no “coopetitioning” agents, and there is no separation into different memories of different durations. The abstraction achieved by artificial neuronal networks (ANN) or even by standard SOMs is always exhausted by the transition from extensional (observed items) to intensional description (classes, types). The abstract items in the Slipnet are not just intensional descriptions and could not be found/constructed by an ANN or a SOM that would work just on the observation, especially, if there is just a single observation at all!

Copycat  is definitely working in a different space as compared to network-based models.1 While the latter can provide the mechanisms to proceed from extensions to intensions in a “bottom-up” movement, the former is applying those intensions in a “top-down” manner. Saying this, we may invoke the reference to the higher forms of comparison and the Deleuzean differential. As many other things mentioned here, this would deserve a closer look from a philosophical perspective, which however we can’t provide here and now.

Nevertheless, Hofstadter’s critique of connectionist models seems to be closely related to the abandonment of modeling as a model for analogy making. Any of the three points above can be mitigated if we take a particular collection of SOM as a counterpart for Copycat. In the next section (which will be found in part II of this essay) we will see how the two approaches can inform each other.

Notes

1. We would like to point you to our discussion of non-Turing computation and else make you aware of the this conference: 11th International Conference on Unconventional Computation & Natural Computation 2012, University of Orléans, conference website.

2. Interestingly, Hofstadter’s PhD-student, co-worker and co-author Melanie Mitchell started to publish in the field of genetic algorithms (GA), yet, she never realized the kinship between GA and Copycat, at least she never said anything like this publicly.

3. He calls his model implementation “Capyblanca”; it is available through Google Code.

4. The example provided by Blank [4] where he tried to implement analogy making in a simply ANN is seriously deficient in many respects.

  • [1] James B. Marshall, Metacat: A Self-Watching Cognitive Architecture for Analogy-Making and High-Level Perception. PhD Thesis, Indiana University 1999. available online (last access 18/3/2012)
  • [2] Daniel Dennett, Consciousness Explained. 1992. p.107.
  • [3] Alexandre Linhares (2008). The emergence of choice: Decision-making and strategic thinking through analogies. available online.
  • [4] Douglas S. Blank, Implicit Analogy-Making: A Connectionist Exploration.
    Indiana University Computer Science Department. available online.

۞

Ideas and Machinic Platonism

March 1, 2012 § Leave a comment

Once the cat had the idea to go on a journey…
You don’t believe me? Did not your cat have the same idea? Or is your doubt about my believe that cats can have ideas?

So, look at this individual here, who is climbing along the facade, outside the window…

(sorry for the spoken comment being available only in German language in the clip, but I am quite sure you got the point anyway…)

Cats definitely know about the height of their own position, and this one is climbing from flat to flat … outside, on the facade of the building, and in the 6th floor. Crazy, or cool, respectively, in its full meaning, this cat here, since it looks like she has been having a plan… (of course, anyone ever lived together with a cat knows very well that they can have plans… proudness like this one, and also remorse…)

Yet, how would your doubts look like, if I would say “Once the machine got the idea…” ? Probably you would stop talking or listening to me, turning away from this strange guy. Anyway, just that is the claim here, and hence I hope you keep reading.

We already discussed elsewhere1 that it is quite easy to derive a bunch of hypotheses about empirical data. Yet, deriving regularities or rules from empirical data does not make up an idea, or a concept. At most they could serve as kind of qualified precursors for the latter. Once the subject of interest has been identified, deriving hypotheses about it is almost something mechanical. Ideas and concepts as well are much more related to the invention of a problematics, as Deleuze has been working out again and again, without being that invention or problematics. To overlook (or to negate?) that difference between the problematic and the question is one of the main failures of logical empiricism, and probably even of today’s science.

The Topic

But what is it then, that would make up an idea, or a concept? Douglas Hofstadter once wrote [1] that we are lacking a concept of concept. Since then, a discipline emerged that calls itself “formal concept analysis”. So, actually some people indeed do think that concepts could be analyzed formally. We will see that the issues about the relation between concepts and form are quite important. We already met some aspects of that relationship in the chapters about formalization and creativity. And we definitely think that formalization expels anything interesting from that what probably had been a concept before that formalization. Of course, formalization is an important part in thinking, yet it is importance is restricted before it there are concepts or after we have reduced them into a fixed set of finite rules.

Ideas

Ideas are almost annoying, I mean, as a philosophical concept, and they have been so since the first clear expressions of philosophy. From the very beginning there was a quarrel not only about “where they come from,” but also about their role with respect to knowledge, today expressed as . Very early on in philosophy two seemingly juxtaposed positions emerged, represented by the philosophical approaches of Platon and Aristotle. The former claimed that ideas are before perception, while for the latter ideas clearly have been assigned the status of something derived, secondary. Yet, recent research emphasized the possibility that the contrast between them is not as strong as it has been proposed for more than 2000 years. There is an eminent empiric pillar in Platon’s philosophical building [2].

We certainly will not delve into this discussion here, it simply would take too much space and efforts, and not to the least there are enough sources in the web displaying the traditional positions in great detail. Throughout history since Aristotle, many and rather divergent flavors of idealism emerged. Whatever the exact distinctive claim of any of those positions is, they all share the belief in the dominance into some top-down principle as essential part of the conditions for the possibility of knowledge, or more general the episteme. Some philosophers like Hegel or Frege, just as others nowadays being perceived as members of German Idealism took rather radical positions. Frege’s hyper-platonism, probably the most extreme idealistic position (but not exceeding Hegel’s “great spirit” that far) indeed claimed that something like a triangle exists, and quite literally so, albeit in a non-substantial manner, completely independent from any, e.g. human, thought.

Let us fix this main property of the claim of a top-down principle as characteristic for any flavor of idealism. The decisive question then is how could we think the becoming of ideas.It is clearly one of the weaknesses of idealistic positions that they induce a salient vulnerability regarding the issue of justification. As a philosophical structure, idealism mixes content with value in the structural domain, consequently and quite directly leading to a certain kind of blind spot: political power is justified by the right idea. The factual consequences have been disastrous throughout history.

So, there are several alternatives to think about this becoming. But even before we consider any alternative, it should be clear that something like “becoming” and “idealism” is barely compatible. Maybe, a very soft idealism, one that already turned into pragmatism, much in the vein of Charles S. Peirce, could allow to think process and ideas together. Hegel’s position, or as well Schelling’s, Fichte’s, Marx’s or Frege’s definitely exclude any such rapprochement or convergence.

The becoming of ideas could not thought as something that is flowing down from even greater transcendental heights. Of course, anybody may choose to invoke some kind of divinity here, but obviously that does not help much. A solution according to Hegel’s great spirit, history itself, is not helpful either, even as this concept implied that there is something in and about the community that is indispensable when it comes to thinking. Much later, Wittgenstein took a related route and thereby initiated the momentum towards the linguistic turn. Yet, Hegel’s history is not useful to get clear about the becoming of ideas regarding the involved mechanism. And without such mechanisms anything like machine-based episteme, or cats having ideas, is accepted as being impossible apriori.

One such mechanism is interpretation. For us the principle of the primacy of interpretation is definitely indisputable. This does not mean that we disregard the concept of the idea, yet, we clearly take an Aristotelian position. More á jour, we could say that we are quite fond of Deleuze’s position on relating empiric impressions, affects, and thought. There are, of course many supporters in the period of time that span between Aristotle and Deleuze who are quite influential for our position.2
Yet, somehow it culminated all in the approach that has been labelled French philosophy, and which for us comprises mainly Michel Serres, Gilles Deleuze and Michel Foucault, with some predecessors like Georges Simondon. They converged towards a position that allow to think the embedding of ideas in the world as a process, or as an ongoing event [3,4], and this embedding is based on empiric affects.

So far, so good. Yet, we only declared the kind of raft we will build to sail with. We didn’t mention anything about how to build this raft or how to sail it. Before we can start to constructively discuss the relation between machines and ideas we first have to visit the concept, both as an issue and as a concept.

Concepts

“Concept” is very special concept. First, it is not externalizable, which is why we call it a strongly singular term. Whenever one thinks “concept,” there is already something like concept. For most of the other terms in our languages, such as idea, that does not hold. Such, and regarding the structural dynamics of its usage,”concept” behave similar to “language” or “formalization.”

Additionally, however, “concept” is not self-containing term like language. One needs not only symbols, one even needs a combination of categories and structured expression, there are also Peircean signs involved, and last but not least concepts relate to models, even as models are also quite apart from it. Ideas do not relate in the same way to models as concepts do.

Let us, for instance take the concept of time. There is this abundantly cited quote by  Augustine [5], a passage where he tries to explain the status of God as the creator of time, hence the fundamental incomprehensibility of God, and even of his creations (such as time) [my emphasis]:

For what is time? Who can easily and briefly explain it? Who even in thought can comprehend it, even to the pronouncing of a word concerning it? But what in speaking do we refer to more familiarly and knowingly than time? And certainly we understand when we speak of it; we understand also when we hear it spoken of by another. What, then, is time? If no one ask of me, I know; if I wish to explain to him who asks, I know not. Yet I say with confidence, that I know that if nothing passed away, there would not be past time; and if nothing were coming, there would not be future time; and if nothing were, there would not be present time.

I certainly don’t want to speculate about “time” (or God) here, instead I would like to focus this peculiarity Augustine is talking about. Many, and probably even Augustine himself, confine this peculiarity to time (and space). I think, however, this peculiarity applies to any concept.

By means of this example we can quite clearly experience the difference between ideas and concepts. Ideas are some kind of models—we will return that in the next section—, while concepts are the both the condition for models and being conditioned by models. The concept of time provides the condition for calendars, which in turn can be conceived as a possible condition for the operationalization of expectability.

“Concepts” as well as “models” do not exist as “pure” forms. We elicit a strange and eminently counter-intuitive force when trying to “think” pure concept or models. The stronger we try, the more we imply their “opposite”, which in case of concepts presumably is the embedding potentiality of mechanisms, and in case of models we could say it is simply belief. We will discuss the issue of these relation in much more detail in the chapter about the choreosteme (forthcoming). Actually, we think that it is appropriate to conceive of terms like “concept” and “model” as choreostemic singular terms, or short choreostemic singularities.

Even from an ontological perspective we could not claim that there “is” such a thing like a “concept”. Well, you may already know that we refute any ontological approach anyway. Yet, in case of choreostemic singular terms like “concept” we can’t simply resort to our beloved language game. With respect to language, the choreosteme takes the role of an apriori, something like the the sum of all conditions.

Since we would need a full discussion of the concept of the choreosteme we can’t fully discuss the concept of “concept” here.  Yet, as kind of a summary we may propose that the important point about concepts is that it is nothing that could exist. It does not exist as matter, as information, as substance nor as form.

The language game of “concept” simply points into the direction of that non-existence. Concepts are not a “thing” that we could analyze, and also nothing that we could relate to by means of an identifiable relation (as e.g. in a graph). Concepts are best taken as gradient field in a choreostemic space, yet, one exhibiting a quite unusual structure and topology. So far, we identified two (of a total of four) singularities that together spawn the choreostemic space. We also could say that the language game of “concept” is used to indicate a certain form of a drift in the choreostemic space. (Later we also will discuss the topology of that space, among many other issues.)

For our concerns here in this chapter, the machine-based episteme, we can conclude that it would be a misguided approach to try to implement concepts (or their formal analysis). The issue of the conditions for the ability to move around in the choreostemic space we have to postpone. In other words, we have confined our task, or at least, we found a suitable entry  point for our task, the investigation of the relation between machines and ideas.

Machines and Ideas

When talking about machines and ideas we are, here and for the time being, not interested in the usage of machines to support “having” ideas. We are not interested in such tooling for now. The question is about the mechanism inside the machine that would lead to the emergence of ideas.

Think about the idea of a triangle. Certainly, triangles as we imagine them do not belong to the material world. Any possible factual representation is imperfect, as compared with the idea. Yet, without the idea (of the triangle) we wouldn’t be able to proceed, as, for instance, towards land survey. As already said, ideas serve as models, they do not involve formalization, they often live as formalization (though not always a mathematical one) in the sense of an idealized model, in other words they serve as ladder spokes for actions. Concepts, if we in contrast them to ideas, that is, if we try to distinguish them, never could be formalized, they remain inaccessible as condition. Nothing else could be expected  from a transcendental singularity.

Back to our triangle. Despite we can’t represent them perfectly, seeing a lot of imperfect triangles gives rise to the idea of the triangle. Rephrased in this way, we may recognize that the first half of the task is to look for a process that would provide an idealization (of a model), starting from empirical impressions. The second half of the task is to get the idea working as a kind of template, yet not as a template. Such an abstract pattern is detached from any direct empirical relation, despite the fact that once we started with with empiric data.

Table 1: The two tasks in realizing “machinic idealism”

Task 1: process of idealization that starts with an intensional description
Task 2: applying the idealization for first-of-a-kind-encounters

Here we should note that culture is almost defined by the fact that it provides such ideas before any individual person’s possibility to collect enough experience for deriving them on her own.

In order to approach these tasks, we need first model systems that exhibit the desired behavior, but which also are simple enough to comprehend. Let us first deal with the first half of the task.

Task 1: The Process of Idealization

We already mentioned that we need to start from empirical impressions. These can be provided by the Self-organizing Map (SOM), as it is able to abstract from the list of observations (the extensions), thereby building an intensional representation of the data. In other words, the SOM is able to create “representative” classes. Of course, these representations are dependent on some parameters, but that’s not the important point here.

Once we have those intensions available, we may ask how to proceed in order to arrive at something that we could call an idea. Our proposal for an appropriate model system consists from the following parts:

  • (1) A small set (n=4) of profiles, which consist of 3 properties; the form of the profiles is set apriori such that they overlap partially;
  • (2) a small SOM, here with 12×12=144 nodes; the SOM needs to be trainable and also should provide classification service, i.e. acting as a model
  • (3) a simple Monte-Carlo-simulation device, that is able to create randomly varied profiles that deviate from the original ones without departing too much;
  • (4) A measurement process that is recording the (simulated) data flow

The profiles are defined as shown in the following table (V denotes variables, C denotes categories, or classes):

V1 V2 V3
C1 0.1 0.4 0.6
C2 0.8 0.4 0.6
C3 0.3 0.1 0.4
C4 0.2 0.2 0.8

From these parts we then build a cyclic process, which comprises the following steps.

  • (0) Organize some empirical measurement for training the SOM; in our model system, however, we use the original profiles and create an artificial body of “original” data, in order to be able to detect the relevant phenomenon (we have perfect knowledge about the measurement);
  • (1) Train the SOM;
  • (2) Check the intensional descriptions for their implied risk (should be minimal, i.e. beyond some threshold) and extract them as profiles;
  • (3) Use these profiles to create a bunch of simulated (artificial) data;
  • (4) Take the profile definitions and simulate enough records to train the SOM,

Thus, we have two counteracting forces, (1) a dispersion due to the randomizing simulation, and (2) the focusing of the SOM due to the filtering along the separability, in our case operationalized as risk (1/ppv=positive predictive value) per node. Note that the SOM process is not a directly re-entrant process as for instance Elman networks [6,7,8].3

This process leads not only to a focusing contrast-enhancement but also to (a limited version) of inventing new intensional descriptions that never have been present in the empiric measurement, at least not salient enough to show up as an intension.

The following figure 1a-1i shows 9 snapshots from the evolution of such a system, it starts top-left of the portfolio, then proceeds row-wise from left to right down to the bottom-right item. Each of the 9 items displays a SOM, where the RGB-color corresponds to the three variables V1, V2, V3. A particular color thus represents a particular profile on the level of the intension. Remember, that the intensions are built from the field-wise average across all the extensions collected by a particular node.

Well, let us now contemplate a bit about the sequence of these panels, which represents the evolution of the system. The first point is that there is no particular locational stability. Of course, not, I am tempted to say, since a SOM is not an image that represents as image. A SOM contains intensions and abstractions, the only issue that counts is its predictive power.

Now, comparing the colors between the first and the second, we see that the green (top-right in 1a, middle-left in 1b) and the brownish (top-left in 1a, middle-right in 1b) appear much more clear in 1b as compared to 1a. In 1a, the green obviously was “contaminated” by blue, and actually by all other values as well, leading to its brightness. This tendency prevails. In 1c and 1d yellowish colors are separated, etc.

Figure 1a thru 1i: A simple SOM in a re-entrant Markov process develops idealization. Time index proceeds from top-left to bottom-right.

The point now is that the intensions contained in the last SOM (1i, bottom-right of the portfolio) have not been recognizable in the beginning, in some important respect they have not been present. Our SOM steadily drifted away from its empirical roots. That’s not a big surprise, indeed, for we used a randomization process. The nice thing is something different: the intensions get “purified”, changing thereby their status from “intensions” to “ideas”.

Now imagine that the variables V1..Vn represent properties of geometric primitives. Our sensory apparatus is able to perceive and to encode them: horizontal lines, vertical lines, crossings, etc. In empiric data our visual apparatus may find any combination of those properties, especially in case of a (platonic) school (say: academia) where the pupils and the teachers draw triangles over triangles into the wax tablets, or into the sand of the pathways in the garden…

By now, the message should be quite clear: there is nothing special about ideas. In abstract terms, what is needed is

  • (1) a SOM-like structure;
  • (2) a self-directed simulation process;
  • (3) re-entrant modeling

Notice that we need not to specify a target variable. The associative process itself is just sufficient.

Given this model it should not surprise anymore why the first philosophers came up with idealism. It is almost built into the nature of the brain. We may summarize our achievements in the following characterization;

Ideas can be conceived as idealizations of intensional descriptions.

It is of course important to be aware of the status of such a “definition”. First, we tried to separate concepts and ideas. Most of the literature about ideas conflate them.Yet, as long as they are conflated, everything and any reasoning about mental affairs, cognition, thinking and knowledge necessarily remains inappropriate. For instance, the infamous discourse about universals and qualia seriously suffered from that conflation, or more precisely, they only arose due to that mess.

Second, our lemma is just an operationalization, despite the fact that we are quite convinced about its reasonability. Yet, there might be different ones.

Our proposal has important benefits though, as it matches a lot of the aspects commonly associated the the term “idea.” In my opinion, what is especially striking about the proposed model is the observation that idealization implicitly also led to the “invention” of “intensions” that were not present in the empiric data. Who would have been expecting that idealization is implicitly inventive?

Finally, two small notes should be added concerning the type of data and the relationship between the “idea” as a continuously intermediate result of the re-entrant SOM process. One should be aware that the “normal” input to natural associative systems are time series. Our brain is dealing with a manifold of series of events, which is mapped onto the internal processes, that is, onto another time-based structure. Prima facie Our brain is not dealing with tables. Yet, (virtual) tabular structures are implied by the process of propertization, which is an inevitable component of any kind of modeling. It is well-known that is is time-series data and their modeling that give rise to the impression of causality. In the light of ideas qua re-entrant associativity, we now can easily understand the transition from networks of potential causal influences to the claim of “causality” as some kind of a pure concept. Despite the idea of causality (in the Newtonian sense) played an important role in the history of science, it is just that: a naive idealization.

The other note concerns the source of the data.  If we consider re-entrant informational structures that are arranged across large “distances”, possibly with several intermediate transformative complexes (for which there are hints from neurobiology) we may understand that for a particular SOM (or SOM-like structure) the type of the source is completely opaque. To put it short, it does not matter for our proposed mechanism whether the data are sourced as empiric data from the external world,or as some kind of simulated, surrogated re-entrant data from within the system itself. In such wide-area, informationally re-entrant probabilistic networks we may expect kind of a runaway idealization. The question then is about the minimal size necessary for eliciting that effect. A nice corollary of this result is the insight that logistic networks, such like the internet or the telephone wiring cable NEVER will start to think on itself, as some still expect. Yet, since there a lot of brains as intermediate transforming entities embedded in this deterministic cablework, we indeed may expect that the whole assembly is much more than could be achieved by a small group of humans living, say around 1983. But that is not really a surprise.

Task 2: Ideas, applied

Ideas are an extremely important structural phenomenon, because they allow to recognize things and to deal with tasks that we never have seen before. We may act adaptively before having encountered a situation that would directly resemble—as equivalence class—any intensional description available so far.

Actually, it is not just one idea, it is a “system” of ideas that is needed for that. Some years ago, Douglas Hofstadter and his group3 devised a model system suitable for demonstrating exactly this: the application of ideas. They called the project (and the model system) Copycat.

We won’t discuss Copycat and analogy-making rules by top-down ideas here (we already introduced it elsewhere). We just want to note that the central “platonic” concept in Copycat is a dynamic relational system of symmetry relations. Such symmetry relations are for instance “before”, “after”, or “builds a group”, “is a triple”, etc. Such kind of relations represent different levels of abstractions, but that’s not important. Much more important is the fact that the relations between these symmetry relations are dynamic and will adapt according to the situation at hand.

I think that these symmetry relations as conceived by the Fargonauts are on the same level as our ideas. The transition from ideas to symmetries is just a grammatological move.

The case of Biological Neural Systems

Re-entrance seems to be an important property of natural neural networks. Very early on in the liaison of neurobiology and computer science, starting with Hebb and Hopfield in the beginning of the 1940ies, recurrent networks have been attractive for researchers. If we take a look to drawings like the following, created (!) by Ramon y Cajal [10] in the beginning of the 20th century.

Figure 2a-2c: Drawings by Ramon y Cajal, the Spain neurobiologist. See also:  History of Neuroscience. a: from a Sparrow’s brain, b: motor brain in human brain, c: Hypothalamus in human brain

Yet, Hebb, Hopfield and Elman got trapped by the (necessary) idealization of Cajal’s drawings. Cajal’s interest was to establish and to proof the “neuron hypothesis”, i.e. that brains work on the basis of neurons. From Cajal’s drawings to the claim that biological neuronal structures could be represented by cybernetic systems or finite state machines is, honestly, a breakneck, or, likewise, ideology.

Figure 3: Structure of an Elman Network; obviously, Elman was seriously affected by idealization (click for higher resolution).

Thus, we propose to distinguish between re-entrant and recurrent networks. While the latter are directly wired onto themselves in a deterministic manner, that is the self-reference is modeled on the morphological level, the former are modeled on the  informational level. Since it is simply impossible for cybernetic structure to reflect neuromorphological plasticity and change, the informational approach is much more appropriate for modeling large assemblies of individual “neuronal” items (cf. [11]).

Nevertheless, the principle of re-entrance remains a very important one. It is a structure that is known to lead to contrast enhancement and to second-order memory effects. It is also a cornerstone in the theory (theories) proposed by Gerald Edelman, who probably is much less affected by cybernetics (e.g. [12]) than the authors cited above. Edelman always conceived the brain-mind as something like an abstract informational population; he even was the first adopting evolutionary selection processes (Darwinian and others) to describe the dynamics in the brain-mind.

Conclusion: Machines and Choreostemic Drift

Out point of departure was to distinguish between ideas and concepts. Their difference becomes visible if we compare them, for instance, with regard to their relation to (abstract) models. It turns out that ideas can be conceived as a more or less stable immaterial entity (though not  “state”) of self-referential processes involving self-organizing maps and the simulated surrogates of intensional descriptions. Concepts on the other hand are described as a transcendental vector in choreostemic processes. Consequently, we may propose only for ideas that we can implement their conditions and mechanisms, while concepts can’t be implemented. It is beyond the expressibility of any technique to talk about the conditions for their actualization. Hence, the issue of “concept” has been postponed to a forthcoming chapter.

Ideas can be conceived as the effect of putting a SOM into a reentrant context, through which the SOM develops a system of categories beyond simple intensions. These categories are not justified by empirical references any more, at least not in the strong sense. Hence, ideas can be also characterized as being clearly distinct from models or schemata. Both, models and schemata involve classification, which—due to the dissolved bonds to empiric data—can not be regarded a sufficient component for ideas. We would like to suggest the intended mechanism as the candidate principle for the development ideas. We think that the simulated data in the re-entrant SOM process should be distinguished from data in contexts that are characterized by measurement of “external” objects, albeit their digestion by the SOM mechanism itself remains the same.

From what has been said it is also clear that the capability of deriving ideas alone is still quite close to the material arrangements of a body, whether thought as biological wetware or as software. Therefore, we still didn’t reach a state where we can talk about epistemic affairs. What we need is the possibility of expressing the abstract conditions of the episteme.

Of course, what we have compiled here exceeds by far any other approach, and additionally we think that it could serve as as a natural complement to the work of Douglas Hofstadter. In his work, Hofstadter had to implement the platonic heavens of his machine manually, and even for the small domain he’d chosen it has been a tedious work. Here we proposed the possibility for a seamless transition from the world of associative mechanisms like the SOM to the world of platonic Copy-Cats, and “seamless” here refers to “implementable”.

Yet, what is really interesting is the form of choreostemic movement or drift, resulting from a particular configuration of the dynamics in systems of ideas. But this is another story, perhaps related to Felix Guattari’s principle of the “machinic”, and it definitely can’t be implemented any more.

.
Notes

1. we did so in the recent chapter about data and their transformation, but also see the section “Overall Organization” in Technical Aspects of Modeling.

2. You really should be aware that this trace we try to put forward here does not come close to even a coarse outline of all of the relevant issues.

3. they called themselves the “Fargonauts”, from FARG being the acronym for “Fluid Analogy Research Group”.

4. Elman networks are an attempt to simulate neuronal networks on the level of neurons. Such approaches we rate as fundamentally misguided, deeply inspired by cybernetics [9], because they consider noise as disturbance. Actually, they are equivalent to finite state machines. It is somewhat ridiculous to consider a finite state machine as model for learning “networks”. SOM, in contrast, especially if used in architectures like ours, are fundamentally probabilistic structures that could be regarded as “feeding on noise.” Elman networks, and their predecessor, the Hopfield network are not quite useful, due to problems in scalability and, more important, also in stability.

  • [1] Douglas Hofstadter, Douglas R. Hofstadter, Fluid Concepts And Creative Analogies: Computer Models Of The Fundamental Mechanisms Of Thought. Basic Books, New York 1996.  p.365
  • [2] Gernot Böhme, “Platon der Empiriker.” in: Gernot Böhme, Dieter Mersch, Gregor Schiemann (eds.), Platon im nachmetaphysischen Zeitalter. Wissenschaftliche Buchgesellschaft, Darmstadt 2006.
  • [3] Marc Rölli (ed.), Ereignis auf Französisch: Von Bergson bis Deleuze. Fin, Frankfurt 2004.
  • [4] Gilles Deleuze, Difference and Repetition. 1967
  • [5] Augustine, Confessions, Book 11 CHAP. XIV.
  • [6] Mandic, D. & Chambers, J. (2001). Recurrent Neural Networks for Prediction: Learning Algorithms, Architectures and Stability. Wiley.
  • [7] J.L. Elman, (1990). Finding Structure in Time. Cognitive Science 14 (2): 179–211.
  • [8] Raul Rojas, Neural Networks: A Systematic Introduction. Springer, Berlin 1996. (@google books)
  • [9] Holk Cruse, Neural Networks As Cybernetic Systems: Science Briefings, 3rd edition. Thieme, Stuttgart 2007.
  • [10] Santiago R.y Cajal, Texture of the Nervous System of Man and the Vertebrates: Volume I: 1, Springer, Wien 1999, edited and translated by Pedro Pasik & Tauba Pasik. see google books
  • [11] Florence Levy, Peter R. Krebs (2006), Cortical-Subcortical Re-Entrant Circuits and Recurrent Behaviour. Aust N Z J Psychiatry September 2006 vol. 40 no. 9 752-758.
    doi: 10.1080/j.1440-1614.2006.01879
  • [12] Gerald Edelman: “From Brain Dynamics to Consciousness: A Prelude to the Future of Brain-Based Devices“, Video, IBM Lecture on Cognitive Computing, June 2006.

۞

Beyond Containing: Associative Storage and Memory

February 14, 2012 § Leave a comment

Memory, our memory, is a wonderful thing. Most of the time.

Yet, it also can trap you, sometimes terribly, if you use it in inappropriate ways.

Think about the problematics of being a witness. As long as you don’t try to remember exactly you know precisely. As soon as you start to try to achieve perfect recall, everything starts to become fluid, first, then fuzzy and increasingly blurry. As if there would be some kind of uncertainty principle, similar to Heisenberg’s [1]. There are other tricks, such as asking a person the same question over and over again. Any degree of security, hence knowledge, will vanish. In the other direction, everybody knows about the experience that a tiny little smell or sound triggers a whole story in memory, and often one that have not been cared about for a long time.

The main strengths of memory—extensibility, adaptivity, contextuality and flexibility—could be considered also as its main weakness, if we expect perfect reproducibility for results of “queries”. Yet, memory is not a data base. There are neither symbols, nor indexes, and at the deeper levels of its mechanisms, also no signs. There is no particular neuron that would “contain” information as a file on a computer can be regarded able to provide.

Databases are, of course, extremely useful, precisely because they can’t do in other ways as to reproduce answers perfectly. That’s how they are designed and constructed. And precisely for the same reason we may state that databases are dead entities, like crystals.

The reproducibility provided by databases expels time. We can write something into a database, stop everything, and continue precisely at the same point. Databases do not own their own time. Hence, they are purely physical entities. As a consequence, databases do not/can not think. They can’t bring or put things together, they do not associate, superpose, or mix. Everything is under the control of an external entity. A database does not learn when the amount of bits stored inside it increases. We also have to be very clear about the fact that a database does not interpret anything. All this should not be understood as a criticism, of course, these properties are intended by design.

The first important consequence about this is that any system relying just on the principles of a database also will inherit these properties. This raises the question about the necessary and sufficient conditions for the foundations of  “storage” devices that allow for learning and informational adaptivity.

As a first step one could argue that artificial systems capable for learning, for instance self-organizing maps, or any other “learning algorithm”, may consist of a database and a processor. This would represent the bare bones of the classic von Neumann architecture.

The essence of this architecture is, again, reproducibility as a design intention. The processor is basically empty. As long as the database is not part of a self-referential arrangement, there won’t be something like a morphological change.

Learning without change of structure is not learning but only changing the value of structural parameters that have been defined apriori (at implementation time). The crucial step however would be to introduce those parameters at all. We will return to this point at a later stage of our discussion, when it comes to describe the processing capabilities of self-organizing maps.1

Of course, the boundaries are not well defined here. We may implement a system in a very abstract manner such that a change in the value of such highly abstract parameters indeed involves deep structural changes. In the end, almost everything can be expressed by some parameters and their values. That’s nothing else than the principle of the Deleuzean differential.

What we want to emphasize here is just the issue that (1) morphological changes are necessary in order to establish learning, and (2) these changes should be established in response to the environment (and the information flowing from there into the system). These two condition together establish a third one, namely that (3) a historical contingency is established that acts as a constraint on the further potential changes and responses of the system. The system acquires individuality. Individuality and learning are co-extensive. Quite obviously, such a system is not a von Neumann device any longer, even if it still runs on a such a linear machine.

Our claim here is that the “learning” requires a particular perspective on the concept of “data” and its “storage.” And, correspondingly, without the changed concept about the relation between data and storage, the emergence of machine-based episteme will not be achievable.

Let us just contrast the two ends of our space.

  • (1) At the logical end we have the von Neumann architecture, characterized by empty processors, perfect reproducibility on an atomic level, the “bit”; there is no morphological change; only estimation of predefined parameters can be achieved.
  • (2) The opposite end is made from historically contingent structures for perception, transformation and association, where the morphology changes due to the interaction with the perceived information2; we will observe emergence of individuality; morphological structures are always just relative to the experienced influences; learning occurs and is structural learning.

With regard to a system that is able to learn, one possible conclusion from that would be to drop the distinction between storage of encoded information and the treatment of that  encodings. Perhaps, it is the only viable conclusion to this end.

In the rest of this chapter we will demonstrate how the separation between data and their transformation can be overcome on the basis of self-organizing maps. Such a device we call “associative storage”. We also will find a particular relation between such an associative storage and modeling3. Notably, both tasks can be accomplished by self-organizing maps.

Prerequisites

When taking the perspective from the side of usage there is still another large contrasting difference between databases and associative storage (“memories”). In case of a database, the purpose of a storage event is known at the time of performing the storing operation. In case of memories and associative storage this purpose is not known, and often can’t be reasonably expected to be knowable by principle.

From that we can derive a quite important consequence. In order to build a memory, we have to avoid storing the items “as such,” as it is the case for databases. We may call this the (naive) representational approach. Philosophically, the stored items do not have any structure inside the storage device, neither an inner structure, nor an outer one. Any item appears as a primitive qualia.

The contrast to the process in an associative storage is indeed a strong one. Here, it is simply forbidden to store items in an isolated manner, without relation to other items, as an engram, an encoded and reversibly decodable series of bits. Since a database works perfectly reversible and reproducible, we can encode the graphem of a word into a series of bits and later decode that series back into a graphem again, which in turn we as humans (with memory inside the skull) can interpret as words. Strictly taken, we do NOT use the database to store words.

More concretely, what we have to do with the items comprises two independent steps:

  • (1) Items have to be stored as context.
  • (2) Items have to be stored as probabilized items.

The second part of our re-organized approach to storage is a consequence of the impossibility to know about future uses of a stored item. Taken inversely, using a database for storage always and strictly implies that the storage agent claims to know perfectly about future uses. It is precisely this implication that renders long-lasting storage projects so problematic, if not impossible.

In other words, and even more concise, we may say that in order to build a dynamic and extensible memory we have to store items in a particular form.

Memory is built on the basis of a population of probabilistic contexts in and by an associative structure.

The Two-Layer SOM

In a highly interesting prototypical model project (codename “WEBSOM”) Kaski (a collaborator of Kohonen) introduced a particular SOM architecture that serves the requirements as described above [2]. Yet, Kohonen (and all of his colleagues alike) did not recognize so far the actual status of that architecture. We already mentioned this point in the chapter about some improvements of the SOM design; Kohonen fails to discern modeling from sorting, when he uses the associative storage as a modeling device. Yet, modeling requires a purpose, operationalized into one or more target criteria. Hence, an associative storage device like the two-layer SOM can be conceived as a pre-specific model only.

Nevertheless, this SOM architecture is not only highly remarkable, but we also can easily extend it appropriately; thus it is indeed so important, at least as a starting point, that we describe it briefly here.

Context and Basic Idea

The context for which the two-layer SOM (TL-SOM) has been created is document retrieval by classification of texts. From the perspective of classification,texts are highly complex entities. This complexity of texts derives from the following properties:

  • – there are different levels of context;
  • – there are rich organizational constraints, e.g. grammars
  • – there is a large corpus of words;
  • – there is a large number of relations that not only form a network, but which also change dynamically in the course of interpretation.

Taken together, these properties turn texts into ill-defined or even undefinable entities, for which it is not possible to provide a structural description, e.g. as a set of features, and particularly not in advance to the analysis. Briefly, texts are unstructured data. It is clear, that especially non-contextual methods like the infamous n-grams are deeply inappropriate for the description, and hence also for the modeling of texts. The peculiarity of texts has been recognized long before the age of computers. Around 1830 Friedrich Schleiermacher founded the discipline of hermeneutics as a response to the complexity of texts. In the last decades of the 20ieth century, it was Jacques Derrida who brought in a new perspective on it. in Deleuzean terms, texts are always and inevitably deterritorialized to a significant portion. Kaski & coworkers addressed only a modest part of these vast problematics, the classification of texts.

The starting point they took by was to preserve context. The large variety of contexts makes it impossible to take any kind of raw data directly as input for the SOM. That means that the contexts had to be encoded in a proper manner. The trick is to use a SOM for this encoding (details in next section below). This SOM represents the first layer. The subject of this SOM are the contexts of words (definition below). The “state” of this first SOM is then used to create the input for the SOM on the second layer, which then addresses the texts. In this way, the size of the input vectors are standardized and reduced in size.

Elements of a Two-Layer SOM

The elements, or building blocks, of a TL-SOM devised for the classification of texts are

  • (1) random contexts,
  • (2) the map of categories (word classes)
  • (3) the map of texts

The Random Context

A random context encodes the context of any of the words in a text. let us assume for the sake of simplicity that the context is bilateral symmetric according to 2n+1, i.e. for example with n=3 the length of the context is 7, where the focused word (“structure”) is at pos 3 (when counting starts with 0).

Let us resort to the following example, that take just two snippets from this text. The numbers represent some arbitrary enumeration of the relative positions of the words.

sequence A of words rel. positions in text “… without change of structureis not learning …”53        54    55    56       57 58     59
sequence B of words rel. positions in text “… not have any structureinside the storage …”19    20  21       22         23    24     25

The position numbers we just need for calculating the positional distance between words. The interesting word here is “structure”.

For the next step you have to think about the words listed in a catalog of indexes, that is as a set whose order is arbitrary but fixed. In this way, any of the words gets its unique numerical fingerprint.

Index Word Random Vector
 …  …
1264  structure 0.270    0.938    0.417    0.299    0.991 …
1265  learning 0.330    0.990    0.827    0.828    0.445 …
 1266  Alabama 0.375    0.725    0.435    0.025    0.915 …
 1267  without 0.422    0.072    0.282    0.157    0.155 …
 1268  storage 0.237    0.345    0.023    0.777    0.569 …
 1269  not 0.706    0.881    0.603    0.673    0.473 …
 1270  change 0.170    0.247    0.734    0.383    0.905 …
 1271  have 0.735    0.472    0.661    0.539    0.275 …
 1272  inside 0.230    0.772    0.973    0.242    0.224 …
 1273  any 0.509    0.445    0.531    0.216    0.105 …
 1274  of 0.834    0.502    0.481    0.971    0.711 …
1274  is 0.935    0.967    0.549    0.572    0.001 …
 …

Any of the words of a text can now be replaced by an apriori determined vector of random values from [0..1]; the dimensionality of those random vectors should be around  80 in order to approximate orthogonality among all those vectors. Just to be clear: these random vectors are taken from a fixed codebook, a catalog as sketched above, where each word is assigned to exactly one such vector.

Once we have performed this replacement, we can calculate the averaged vectors per relative position of the context. In case of the example above, we would calculate the reference vector for position n=0 as the average from the vectors encoding the words “without” and “not”.

Let us be more explicit. For example sentence A we translate first into the positional number, interpret this positional number as a column header, and fill the column with the values of its respective fingerprint. For the 7 positions (-3, +3) we get 7 columns:

sequence A of words “… without change of structure is not learning …”
rel. positions in text        53        54    55    56       57 58     59
 grouped around “structure”         -3       -2    -1       0       1    2     3
random fingerprints
per position
0.422  0.170  0.834  0.270  0.935  0.706  0.330
0.072  0.247  0.502  0.938  0.967  0.881  0.990
0.282  0.734  0.481  0.417  0.549  0.603  0.827

…further entries of the fingerprints…

The same we have to do for the second sequence B. Now we have to tables of fingerprints, both comprising 7 columns and N rows, where N is the length of the fingerprint. From these two tables we calculate the average value and put it into a new table (which is of course also of dimensions 7xN). Such, the example above yields 7 such averaged reference vectors. If we have a dimensionality of 80 for the random vectors we end up with a matrix of [r,c] = [80,7].

In a final step we concatenate the columns into a single vector, yielding a vector of 7×80=560 variables. This might appear as a large vector. Yet, it is much smaller than the whole corpus of words in a text. Additionally, such vectors can be compressed by the technique of random projection (math. foundations by [3], first proposed for data analysis by [4], utilized for SOMs later by [5] and [6]), which today is quite popular in data analysis. Random projection works by matrix multiplication. Our vector (1R x  560C) gets multiplied with a matrix M(r) of 560R x 100C, yielding a vector of 1R x 100C. The matrix M(r) also consists of flat random values. This technique is very interesting, because no relevant information is lost, but the vector gets shortened considerable. Of course, in an absolute sense there is a loss of information. Yet, the SOM only needs the information which is important to distinguish the observations.

This technique of transferring a sequence made from items encoded on an symbolic level into a vector that is based on random context can be applied to any symbolic sequence of course.

For instance, it would be a drastic case of reductionism to conceive of the path taken by humans in an urban environment just as a sequence locations. Humans are symbolic beings and the urban environment is full of symbols to which we respond. Yet, for the population-oriented perspective any individual path is just a possible path. Naturally, we interpret it as a random path. The path taken through a city needs to be described both by location and symbol.

The advantage of the SOM is that the random vectors that encode the symbolic aspect can be combined seamlessly with any other kind of information, e.g. the locational coordinates. That’s the property of the multi-modality. Which particular combination of “properties” then is suitable to classify the paths for a given question then is subject for “standard” extended modeling as described inthe chapter Technical Aspects of Modeling.

The Map of Categories (Word Classes)

From these random context vectors we can now build a SOM. Similar contexts will arrange in adjacent regions.

A particular text now can be described by its differential abundance across that SOM. Remember that we have sent the random contexts of many texts (or text snippets) to the SOM. To achieve such a description a (relative) frequency histogram is calculated, which has as much classes as the SOM node count is. The values of the histogram is the relative frequency (“probability”) for the presence of a particular text in comparison to all other texts.

Any particular text is now described by a fingerprint, that contains highly relevant information about

  • – the context of all words as a probability measure;
  • – the relative topological density of similar contextual embeddings;
  • – the particularity of texts across all contextual descriptions, again as a probability measure;

Those fingerprints represent texts and they are ready-mades for the final step, “learning” the classes by the SOM on the second layer in order to identify groups of “similar” texts.

It is clear, that this basic variant of a Two-Layer SOM procedure can be improved in multiple ways. Yet, the idea should be clear. Some of those improvements are

  • – to use a fully developed concept of context, e.g. this one, instead of a constant length context and a context without inner structure;
  • – evaluating not just the histogram as a foundation of the fingerprint of a text, but also the sequence of nodes according to the sequence of contexts; that sequence can be processed using a Markov-process method, such as HMM, Conditional Random Fields, or, in a self-similar approach, by applying the method of random contexts to the sequence of nodes;
  • – reflecting at least parts of the “syntactical” structure of the text, such as sentences, paragraphs, and sections, as well as the grammatical role of words;
  • – enriching the information about “words” by representing them not only in their observed form, but also as their close synonyms, or stuffed with the information about pointers to semantically related words as this can be taken from labeled corpuses.

We want to briefly return to the first layer. Just imagine not to measure the histogram, but instead to follow the indices of the contexts across the developed map by your fingertips. A particular path, or virtual movement appears. I think that it is crucial to reflect this virtual movement in the input data for the second layer.

The reward could be significant, indeed. It offers nothing less than a model for conceptual slippage, a term which has been emphasized by Douglas Hofstadter throughout his research on analogical and creative thinking. Note that in our modified TL-SOM this capacity is not an “extra function” that had to be programmed. It is deeply built “into” the system, or in other words, it makes up its character. Besides Hofstadter’s proposal which is based on a completely different approach, and for a different task, we do not know of any other system that would be able for that. We even may expect that the efficient production of metaphors can be achieved by it, which is not an insignificant goal, since all the practiced language is always metaphoric.

Associative Storage

We already mentioned that the method of TL-SOM extracts important pieces of information about a text and represents it as a probabilistic measure. The SOM does not contain the whole piece of text as single entity, or a series of otherwise unconnected entities, the words. The SOM breaks the text up into overlapping pieces, or better, into overlapping probabilistic descriptions of such pieces.

It would be a serious misunderstanding to perceive this splitting into pieces as a drawback or failure. It is the mandatory prerequisite for building an associative storage.

Any further target oriented modeling would refer to the two layers of a TL-SOM, but never to the raw input text.Such it can work reasonable fast for a whole range of different tasks. One of those tasks that can be solved by a combination of associative storage and true (targeted) modeling is to find an optimized model for a given text, or any text snippet, including the identification of the discriminating features.  We also can turn the perspective around, addressing the query to the SOM about an alternative formulation in a given context…

From Associative Storage towards Memory

Despite its power and its potential as associative storage, the Two-Layer SOM still can’t be conceived as a memory device. The associative storage just takes the probabilistically described contexts and sorts it topologically into the map. In order to establish “memory” further components are required that provides the goal orientation.

Within the world of self-organizing maps, simple (!) memories are easy to establish. We just have to combine a SOM that acts as associative storage with a SOM for targeted modeling. The peculiar distinctive feature of that second SOM for modeling is that it does not work on external data, but on “data” as it is available in and as the SOM that acts as associative storage.

We may establish a vivid memory in its full meaning if we establish two further components: (1) targeted modeling via the SOM principle, (2) a repository about the targeted models that have been built from (or using) the associative storage, and (3) at least a partial operationalization of a self-reflective mechanism, i.e. a modeling process that is going to model the working of the TL-SOM. Since in our framework the basic SOM module is able to grow and to differentiate, there is no principle limitation of/for such a system any more, concerning its capability to build concepts, models, and (logical) habits for navigating between them. Later, we will call the “space” where this navigation takes place “choreosteme“: Drawing figures into the open space of epistemic conditionability.

From such a memory we may expect dramatic progress concerning the “intelligence” of machines. The only questionable thing is whether we should call such an entity still a machine. I guess, there is neither a word nor a concept for it.

u .

Notes

1. Self-organizing maps have some amazing properties on the level of their interpretation, which they share especially with the Markov models. As such, the SOM and Markov models are outstanding. Both, the SOM as well as the Markov model can be conceived as devices that can be used to turn programming statements, i.e. all the IF-THEN-ELSE statements occurring in a program as DATA. Even logic itself, or more precisely, any quasi-logic, is getting transformed into data.SOM and Markov models are double-articulated (a Deleuzean notion) into logic on the one side and the empiric on the other.

In order to achieve such, a full write access is necessary to the extensional as well as the intensional layer of a model. Hence, artificial neuronal networks (nor, of course, statistical methods like PCA) can’t be used to achieve the same effect.

2. It is quite important not to forget that (in our framework) information is nothing that “is out there.” If we follow the primacy of interpretation, for which there are good reasons, we also have to acknowledge that information is not a substantial entity that could be stored or processed. Information is nothing else than the actual characteristics of the process of interpretation. These characteristics can’t be detached from the underlying process, because this process is represented by the whole system.

3. Keep in mind that we only can talk about modeling in a reasonable manner if there is an operationalization of the purpose, i.e. if we perform target oriented modeling.

  • [1] Werner Heisenberg. Uncertainty Principle.
  • [2] Samuel Kaski, Timo Honkela, Krista Lagus, Teuvo Kohonen (1998). WEBSOM – Self-organizing maps of document collections. Neurocomputing 21 (1998) 101-117.
  • [3] W.B. Johnson and J. Lindenstrauss. Extensions of Lipshitz mapping into Hilbert space. In Conference in modern analysis and probability, volume 26 of Contemporary Mathematics, pages 189–206. Amer. Math. Soc., 1984.
  • [4] R. Hecht-Nielsen. Context vectors: general purpose approximate meaning representations self-organized from raw data. In J.M. Zurada, R.J. Marks II, and C.J. Robinson, editors, Computational Intelligence: Imitating Life, pages 43–56. IEEE Press, 1994.
  • [5] Papadimitriou, C. H., Raghavan, P., Tamaki, H., & Vempala, S. (1998). Latent semantic indexing: A probabilistic analysis. Proceedings of the Seventeenth ACM Symposium on the Principles of Database Systems (pp. 159-168). ACM press.
  • [6] Bingham, E., & Mannila, H. (2001). Random projection in dimensionality reduction: Applications to image and text data. Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 245-250). ACM Press.

۞

A Model for Analogical Thinking

February 13, 2012 § Leave a comment

Analogy is an ill-defined term, and quite naturally so.

If something is said to be analog to something else, one supposes not only just a distant resemblance. Such a similarity could be described indeed by a similarity mapping, based on a well identified function. Yet, analogy is more than that, much more than just similarity.

Analogical thinking is significantly different from determining similarity, or by “selecting” a similar thought. Actually, it is not a selection at all, and it is also not based on modeling. Despite it is based on experience, it is not based on identifiable models. We may even conclude that analogical thinking is (1) non-empiric, and (2) based on a constructive use of theories. In other words, and derived from our theory of theory, an analogy is itself a freshly derived model! Yet, that model is of a  particular kind, since it does not contain any data. In other words, it looks like the basic definition to be used as the primer for surrogating simulated data, which in turn can be used to create SOM-based expectancy. We may also say it in simple terms: analogical thinking is about producing an ongoing stream of ideas. Folding, inventing.

In 2006, on the occasion of the yearly Presidential Lecture at Stanford, Douglas Hofstadter gave some remarkable statements in an interview, which I’d like to quote here, since they express some issues we have argued for throughout our writings.

I knew some people wouldn’t like what I was going to say, since it was diametrically opposed to the standard cognitive-science party line that analogy is simply a part of “reasoning” in the service of some kind of “problem solving” (makes me think of doing problem sets in a physics course). For some bizarre reason, don’t ask me why, people in cognitive science only think of analogies in connection with what they call “analogical reasoning” — they have to elevate it, to connect it with fancy things like logic and reasoning and truth. They don’t seem to see that analogy is dirt-cheap, that it has nothing to do with logical thinking, that it simply pervades every tiny nook and cranny of cognition, it shapes our every thinking moment. Not seeing that is like fish not perceiving water, I guess.

[…]

The point is that thinking amounts to putting one’s finger on the essence of situations one is faced with, which amounts to categorization in a very deep and general sense, and the point is that in order to categorize, one has to compare situations “out there” with things already in one’s head, and such comparisons are analogies. Thus at the root of all acts of thought, every last one of them, is the making of analogies. Cognitive scientists talk a lot about categories, but unfortunately the categories that they study are far too limited. For them, categories are essentially always nouns, and not only that, they are objects that we can see, like “tree” or “car.” You get the impression that they think that categorization is very simple, no more complicated than looking at an object and identifying the standard “features”

The most salient issues here are, in preliminary terms for the time being

  • (1) Whenever thinking happens, this thinking is performed as “making analogy”; there are no “precise = perfectly defined” items in the brain or the mind.
  • (2) Thinking can not be equated with problem-solving and logic;
  • (3) Comparison and categorization are the most basic operations, while both of these take place in a fluid, open manner.

It is due to a fallacy to think that there is analogical reasoning and some other “kinds” (but the journals and libraries are filled with this kind of shortcomings, e.g. [1,2,3]), or vice versa, that there is logical reasoning and something other, such as analogical reasoning. Thinking so would mean to put logic into the world (of neurons, in this case). Yet, we know that logic that is free from interpretive parts can’t be part of the real world. We always and inevitably can deal just and only with quasi-logic, an semantically contaminated instance of transcendental logic. The issue is not just one about wording here. It is the self-referentiality that always is present in multiple respects when dealing with cognitive capacities that enforces us to be not to lazy regarding the wording. Dropping the claim posited by the term “analogical reasoning” we quickly arrive at the insight that all thinking is analogical, or even, that “making analogies” is the label for the visible parts of the phenomenon that we call thinking.

The next issue mentioned by Hofstadter is about categories. Categories in the mind are NOT about the objects, they are even not about any external reference. It is a major and widely spread misunderstanding, according to Hofstadter, among cognitive scientists. It is clear that such referentialism, call it materialism, or naive realism, is conceptually primitive and inappropriate. We also could refer to Charles Peirce, the great American philosopher, who repeatedly stated (and also as the first one) that signs always refer only to signs. Yet, signs are not objects, of course. Similarly, in §198 of the Philosophical Investigations Wittgenstein notes

[…] any interpretation still hangs in the air along with what it interprets, and cannot give it any support. The interpretations alone do not determine the meaning.

The materiality of road sign should not be taken as its meaning, the semiotic sign associated with it is only partially to be found in the matter standing there near the road. The only thing that ties “us” (i.e. our thinking) to the world is modeling, whether this world is taken as the external or the internal one, it is the creation of tools (the models) for anticipation given the expectation of weak repeatability. That means, we have to belief and to trust in the first instance, which yet can not really be taken as “ties”.

The kind of modeling we have in mind is neither covered by model theory, nor by the constructivist framework, nor by the empiricist account of it. Actually, despite we can at least go on believing into an ontology of models ( “a model is…”) as long as we can play the blind man’s buff, that modeling can nevertheless not be separated from entities like concepts, code, mediality or virtuality. We firmly believe in the impossibility of reductionism when it comes to human affairs. And, I guess, we could agree upon the proposal that thinking is such a human affair, even if we are going to “implement” it into a “machine.” In the end, we anyway just strive for understanding ourselves.

It is thus extremely important to investigate the issue of analogy in an appropriate model system. The only feasible model that is known up to these days is that which has been published by Douglas Hofstadter and Melanie Mitchell.

They describe their results on a non-technical level in the wonderful book “Fluid Concepts and Creative Analogies“, while Mitchell focuses on more technical aspects and the computer experiments in “Analogy-Making as Perception: A Computer Model

Though we do not completely agree on their theoretical foundation, particularly the role they ascribe the concept of perception, their approach is nevertheless brilliant.  Hofstadter discusses in a very detailed manner why other approaches are either fake of failure (see also [4]), while Mitchell provides a detailed account about the model system, which they call “CopyCat”.

CopyCat

CopyCat is the last and most advanced variant of a series of similar programs. It deals with a funny example from the letter domain. The task that the program should solve is the following:

Given the transformation of a short sequence of letters, say “abc”, into a similar one, say “abd”, what is the result of applying the very “same” transformation to a different sequence, say “ijk”?

Most people will answer “ijl”. The rule seems to be “obvious”. Yet, there are several solutions, though there is a strong propensity to “ijl”. One of the other solutions would be “ijd”. However, this solution is “intellectually” not appealing, for humans notably…

Astonishingly, CopyCat reproduces the probabilistic distribution of solution provided by a population of humans.

The following extracts show further examples.

Now the important question: How does CopyCat work?

First, the solutions are indeed produced, there is no stored collection of solutions. Thus, CopyCat derives its proposals not from experience that could be expressed by empiric models.

Second, the solutions are created by a multi-layered, multi-component random process. It reminds in some respect to the behavior of a developed ant state, where different functional roles are performed by different sub-populations. In other words, it is more than just swarming behavior, there is division of labor among different populations of agents.

The most important third component is a structure that represents a network of symmetry relations, i.e. a set of symmetry relations are arranged as a graph, where the weights of the relations are dynamically adapted to the given task.

Based on these architectural principals, Copycat produces—and it is indeed a creative production— its answers.

Conclusion

Of course, Copycat is a model system. The greatest challenge is to establish the “Platonic” sphere (Hofstadter’s term) that comprises the dynamical relational system of symmetry relations. In a real system, this sphere of relations has to be fed by other parts of system, most likely the modeling sub-system. This sub-system has to be able to extract abstract relations from data, which then could be assembled into the “Platonic” device. All of those functional parts can be covered or served by self-organizing maps, and all of them are planned. The theory of these parts you may find scattered throughout this blog.

Copycat has been created as a piece of software. Years ago, it had been publicly available from the ftp site at the Illinois University of Urbana-Champaign. Then it vanished. Fortunately I was able to grab it before.

Since I rate this piece as one of the most important contributions to machine-based episteme, I created a mirror for downloading it from Google code. But be aware, so far, you need to be a programmer to run it, it needs a development environment for the Java programming language. You can check it out from the source repository behind the link given above.  In the near future I will provide a version that runs more easily as a standalone program.

  • [1] David E Rumelhart, Adele A Abrahamson (1972), “A model for analogical reasoning,”
    Cognitive Psychology  5(1) 1-28. doi
  • [2] John F. Sowa and Arun K. Majumdar (2003). “Analogical Reasoning,” in: A. Aldo, W. Lex, & B. Ganter, eds. (2003) Conceptual Structures for Knowledge Creation and Communication, LNAI 2746, Springer-Verlag, pp. 16-36. Proc Intl Conf Conceptual Structures, Dresden, July 2003.
  • [3] Morrison RG, Krawczyk DC, Holyoak KJ, Hummel JE, Chow TW, Miller BL, Knowlton BJ. (2004). A neurocomputational model of analogical reasoning and its breakdown in frontotemporal lobar degeneration. J Cogn Neurosci. 16(2), 260-71.
  • [4] Chalmers, D. J., R. M. French, & D. R. Hofstadter (1992). High-level perception, representation, and analogy: A critique of artificial intelligence methodology. J Exp & Theor Artificial Intelligence 4, 185-211.

۞

Where Am I?

You are currently browsing entries tagged with Douglas Hofstadter at The "Putnam Program".