February 4, 2012 § 1 Comment
It is the duality of persistent, quasi-material yet simulated structures
and the highly dynamic, volatile and-most salient-informational aspects that are so characteristic for learning entities like Self-Organizing Maps (SOM) or Artificial Neural Networks (ANN). It should not be regarded as a surprise that the design of manifold aspects of the persistent, quasi-material part of SOM or ANN is quite influential and hence also important.
Here we explore some of the aspects of that design. Sure, there is something like a “classic” version of the SOM, named after its inventor, the so-called “Kohonen-SOM.” Kohonen developed several slightly different SOM mechanisms over many years, starting with statistical covariance matrices. All of them comprise great ideas, for sure. Yet, in a wider perspective it is clear that there are many properties of the SOM that are presumably quite sub-optimal for realizing a generally applicable learning mechanism.
The Elements of SOMs
We shall recapitulate very briefly the principle of SOM below, more detailed descriptions can be found in many places in the Web (one of the best for the newbie, with some formulas and a demo software: ai-junkie), see also our document here that relates some issues to references, as well as our intro in plain language.
Yet, the question beyond all the mathematical formula stuff is: “What are the elements of a SOM?”
We propose to distinguish the following four basic elements:
- (1) a Collection of Items
that have memory for observations, or reflecting them, where all the items start with the same structure for these observations (items are often called “nodes”, or in a more romantic attitude “neurons”);
- (2) the Spatial Layout Principles
and the relational arrangement of this items;
- (3) an Influence Mechanism
that link the items together, and which together with the spatial layout defines the topology of the piece;
- (4) a Perceptional Mechanism
that introduces observations into the SOM in a particular manner.
In the case of the SOM these elements are configured in a way that creates a particular class of “learning” that we can describe as competitive-collaborative abstraction.
Those basic elements of a SOM can be parameterized—and thus also implemented—in very different ways. If we would take only the headlines of that list we could also subsume artificial neural networks (ANN) with these elements. Yet, even the items of a SOM and those of a ANN are drastically different. Else, the meaning of concepts like “layout” or “influence mechanism” are very different. This results in a completely different architecture regarding the relation of the “data”, or if you like potential observations, and the structure (SOM or ANN). Basically, ANNs are analytic,which means that the abstraction is (has to be done) done before the interaction of the structure with the data. In strong contrast to this approach, SOM build up an abstraction while interacting with the data. This abstraction is mostly consisting of the transition from extensional data to intensional representation. Thus SOM are able to find a structure, while ANN only can move within the apriori defined structure. In contrast to ANN, SOM are associative mechanisms (which is the reason why we are so fond of them)
Yet, it is also true for SOMs that the parametrization of the instances of the four elements as listed above have a significant influence on the capabilities and the potential of the resulting actual associative structure. Note that the design of the internals of the SOM does not refer to the issues of the usage or the embedding of the SOM into a wider context of modeling, or the structure of modeling itself.
In the following we will discuss the usual actualizations of those four elements, the respective drawbacks and better alternatives.
The SOM itself
Often one can find schematic representations like the one shown in the following figure 1:
Then this is usually described in this way: “The network is created from a 2D lattice of ‘nodes’, each of which is fully connected to the input layer.”
Albeit this is a possible description, it is a highly misleading one, with some quite unfavorable consequences: as we will see, it hides some important opportunities offered by the SOM mechanism.
Instead of speaking in an opaque manner about the “input layer” we simply can use the concept of “structured observations”. The structure is just given by the features used to establish or describe the observations. The important step that simplifies everything is to give all the nodes the same structure as the observations, at least in the beginning and as the trivial case; we will see that both assumptions may “develop away” as an effect of self-organization.
Anyway, the complicated connectivity in figure 1 changes into the following structure for the simple case:
Figure 2: An interpretation of the SOM grid, where the nodes are stuffed with the same structure (ordered set of variables) as the observations. This interpretation allows for a localizing of structures that is not achievable by the standard interpretation as shown in Fig.1.
To see what we gain by this change we have to visit briefly and partially the SOM mechanism.
The SOM mechanism compares a particular “incoming” observation to “all” nodes and determines a best matching node. The intensional part of this node then gets changed as a function of the given weight vector and the new observation. Some kind of intermediate between the observational vector and the intensional vector of the node is established. As a consequence, the nodes develop different intensional descriptions. This change upon matching with an observation then will be spread in the vicinity of the selected node, decaying with the distance, while this distance additionally is shrinking with increasing duration of the learning process. This is called the lateral control mechanism (LCM) by Kohonen (see Kohonen’s book 2001 p.179). This LCM is one of the most striking differences to so-called artificial neural networks (ANN).
It is now rather straightforward to think that the node keeps the index of the matching observation in its local memory. Over the course of learning, a node collects many records, which are all similar. This gathering of observations into an explicit collection is one of the MOST salient differences of our interpretation of the SOM to most of the standard interpretations!
Figure 3: As Fig.2, showing the extensional container of one of the nodes.
The consequences are highly significant: The SOM is not a tool for visualization any more, it is a mechanism with inherent and nevertheless transparent abstraction! To be explicit: While we retain the full power of the SOM mechanism we also not only get an explicit clustering, but even the opportunity for a fully validated modeling, inclusive a full description of the structure of the risk of mis-classification, hence there is no “black box” any more (as in contrast say to ANN, or even statistical methods).
Now we can see what we gained from changing the description, dropping the unholy concept of “input layer.” It now becomes clearly visible that nodes can be conceived of as containers, comprised of an extensional and an intensional part (as Carnap used the terms). The intensional part is what usually is called the weight vector of a node.The extensional part is the list of observations matching this intension.
The intensional part of a node thus represents a type. The extensional part of our revised SOM node represents the matching tokens.
But wait! As it is usual done, we called the intensional part of the node the “weight vector”. Yet, this is a drastic misnomer. It is not “weights” of the variables. It is simply a value that can be calculated in different ways, and which is influenced from different sides. It is a function of
- – the underlying extensional part = the list of records;
- – the similarity functional that is used for this node
- – the general network dynamics;
- – any kind of dynamic rule relating the new observation.
It is thus much more adequate to talk about an “intensionality profile” than about weights. Of course, we can additionally introduce real “weights” for each of the positions in a structure profile vector.
A second important advance of dropping this bad concept of “input layer” is that we can localize this function that results in the actualization of the intensional part of the node. For instance, we can localize the similarity function. As part of the similarity function we could even consider to implement a dynamic rule (dependent on the extensional content of the node) that excludes certain positions = variables as arguments from the determination of the similarity!
The third important consequence is that we created a completely new compartment, the “extensional container” of a node. Using the concept of “input layer” this compartment is simply not visible. Thus, the concept of the input layer violates central insights from the theory of epistemic action.
This “extensional container” is not just a list of records. We can conceive it as a “functional” compartment, that allows for a great deal of new flexibility and dynamics. This inner dynamics could be used to create new elements of the intensional part of the node, e.g. about the variance of the tokens contained in the “extensionality container”. Or about their relation as measured by the correlation. In fact, we could use any mechanism to create new positions in the intensional profile of node, even the properties of an embedded SOM, a small population of artificial neurons, the result parameters of statistical functions taking the list of observations as input and so on.
It is quite important to understand that the particular dynamics in the extensionality container is purely local. Notably the possibility for this dynamics also makes it possible to implement local differentiation of the SOM network, just as it is induced by the observations itself.
There is even a fourth implication of dropping the concept of input layer, which lead us to the separation between intensional and extensional aspects. This implication concerns the numerical production of the intensionality profile. Obviously we can regard the transition from the extensional description to the intensional representation. This abstraction, as any, is accompanied by a loss of information. Referring to the collection of intensional representations means to use them as a model. It is now very important to recognize that there is no explicit down-stream connection to the observations any more. All we have at our disposal are intensional representations that emerged as a consequence of the interaction of three components: (1) the observations, (2) the quasi-material aspects of the modeling procedure(particularly the associative part of it, of course), and (3) the imposed target/risk settings.
As a consequence we have to care explicitly about the variance structure within the extensional containers. More precisely, the internal variance of the extensional containers have to be “comparable.” If we would not care about that, we could not consider the intensional representations as comparable. We simply would compare apples with oranges, since some of the intensional representations simply would represent “a large mess”. On the level of intensionality profile one can’t see the variance anymore, hence we have to avoid the establishment of extensional groups (“micro-clusters”) that do not collect observations that are “similar” with regard to their descriptional values vector (inside the apriori given space of assignates). Astonishingly, this requirement of a homogenized extensional variance measure is overlooked even by Kohonen and his group, not to mention the implementations by countless epigonal fellows. It is clear that only the explicit distinction between intensional and extensional part of a model allows for the visibility of this important structural element.
Finally, and as a fifth consequence, we would like to emphasize that the explicit distinction between intensional and extensional parts opens the road towards a highly interesting region. We already mentioned that the transition from extensional description to intensional representation is a kind of abstraction. Yet, it is a simple kind of abstraction, closely tied to quasi-material aspects of the associative mechanism.
We may, however, easily derive the production of idealistic representations from that, if not even to say “ideas” in the philosophical sense. To achieve that we just have to extend the SOM with a production facility, the capability to simulate. This is of course not a difficult task. We will describe the details elsewhere (essay is scheduled), thus just a brief outline here. The “trick” is to use the intensional representations as seeds for generating surrogate observations by means of a Monte-Carlo simulation, such that the variance of the observations is a bit smaller than that of the empiric observations. Both, the empiric and surrogated “data” (nothing is “given” in the latter case) share the same space of assignates. The variance threshold can be derived dynamically from the SOM itself, it need not be predetermined at implementation time. As the next step one drops the extensional containers of the SOM and feeds the simulated data into it. After several loops of such self-referential modeling the intensional descriptions have “lost” their close ties to empirical data, yet, they are not completely unrelated. We still may use it as a kind of “template” in modeling, or for instance as a kind of null-model. In other words, the SOM contains the first traces of Platonic ideas.
Modeling. What else?
Above we emphasized that the SOM provides the opportunity for a fully validated modeling if we distinguish explicitly intensional and extensional parts in the make-up of the nodes. The SOM is, however, a strange thing, that can act in completely different ways.
In the chapter about modeling we concluded that a model without a purpose is not a model, or it is at most a strongly deficient model. Nevertheless, many people claim to create models without implying a purpose to the learning SOM. They call it “unsupervised clustering”. This is, of course, nonsense. It should be called more appropriately, “clustering with a deliberately hidden purpose,” since all the parameters of the SOM mechanisms and even the implementation act as constraints for the clustering, too. Any clustering mechanism applies a lot of criteria that influence the results. These constraints are supervised by the software, and the software has been produced by a human being (often called programmer), so this human being is supervising the clustering with a long arm. For the same reason one can not say the SOM is learning something and also not that we would train the SOM, without giving it a purpose.
Though the digesting of information by a SOM without a purpose being present is neither modeling nor learning, what can we conceive such a process as then?
The answer is pretty simple, and remember it becomes visible only after having dropped illegitimate ascriptions of mistaken concepts. This clustering has a particular epistemological role:
Self-organizing Maps that are running without purpose (i.e. target variables) are best described as associative storage devices. Nothing more, but above all, also nothing less.
Actually, this has to be rated as one of the greatest currently unrecognized opportunities in the field of machine learning. The reason is again inadequate wording. Of course, the input for such a map should be probabilized (randomized), and it has been already demonstrated how to accomplish this… guess by whom… by Teuvo Kohonen himself, while he was inventing the so-called WebSom. Kohonen proposed random neighborhoods for presenting snippets of texts to the SOM, which are a simple version of random contexts.
Importantly, once one recognizes the categorical differences between the target oriented modeling and the associative storage, it becomes immediately clear that there are strictly different methodological, hence quasi-morphological requirements. Astonishingly, even Kohonen himself, and any of his fellows as well, did not recognize the conceptual difference between the two flavors. He used SOMs created without target variable, i.e. without implying a purpose, as models for performing selections. Note that the principal mechanism of the SOM is the same for both approaches. There are just differences in the cost function(s) regarding the selection of variables.
There should be no doubt that any system intended to advance towards an autonomous machine-based episteme has to combine the two mechanism. There are sill other mechanisms, such like virtual movements, or virtual sequences in the abstract SOM space (we will describe that elsewhere), or the self-referential SOM for developing “crisp ideas”, but such a combination of associative storage and target oriented modeling is definitely inevitable (in our perspective… but we have strong arguments!).
SOM and Self-Organization
A small remark should be made here: Self-organizing maps are not in the same strong sense self-organizing as for instance Turing systems, or other Reaction-Diffusion Systems (RDS). A SOM gets organized by the interaction of its mechanisms and structures and the data. A SOM does not create patterns by it-SELF. Without feeding data into it, nothing happens, in stark contrast to self-organizing systems in the strong sense (see the example we already cited here), or take a look here from where we reproduced this parameter map for Gray-Scott Models.
Figure 4: The parameter map for Gray-Scott models, a particular Reaction-Diffusion System. Only for certain combinations of the two parameters of the system interesting patterns appear, and only for part of them the system remains dynamical, i.e. changing the layout of the patterns continuously.
As we discuss it in the chapter on complexity, it is pretty clear which kind of conditions must be at work to create the phenomenon of self-organization. None of them is present in Self-Organizing Maps; above all, SOMs are neither dissipative, nor are there antagonist influences.
Yet, it is not too difficult to create a self-organizing map that is really self-organizing. What is needed is either a second underlying process or inhibitory elements organized as population. In natural brains, we find both kinds of processes. The key for choosing the right starting point for implementing a system that is showing the transition from SOM to RDS is the complete probabilization of the idea of the network.
Our feeling is that at least one of them is mandatory in order to allow the system to develop logic as a category in an autonomous manner, i.e. not pre-programmed. As any other understanding, the ability to think in logical terms, or using logic as a category should not be programmed into a computer. That ability should emerge from the implemented conditions. Our claim that some concept is quite the opposite to something other is quite likely based on such processes. It is highly indicate in this context that the brain is indeed showing Turing patterns on the level of activity patterns, i.e. the patterns are not made of material entities, but are completely immaterial. Else, like in chemical clocks like the Belousov-Zhabotinsky system, another RDS, the natural brain shows a strong rhythmicity, both in its “local” activity patterns, as well as in the overall activity, affecting billions of cells at a time.
So far, the strong self-organization is not implemented in our FluidSOM.
Spatial Layout Principles
The spatial layout principle is a very important design aspect. It concerns not only the actual geometrical arrangement of nodes, but also their mobility as representations of physical entities. In the case of SOM this has to be taken quite abstract. The “physical entities” represented by the nodes are not neurons. The nodes represent functional roles of populations of neurons.
Usually, the SOM is defined as a collection of nodes that are arranged in a particular topology. This topology may be
- – grid like, 2-(3) dimensional;
- – as kind of a swarm in 2 dimensions;
- – as a gas, freely moving nodes.
The obvious difference between them is the degree of physical freedom for the nodes to move around. In grids, nodes are fixed and cannot move, while in the SOM gas the “nodes” are much more mobile.
There is also a quite important, yet not so obvious commonality between them. Firstly, in all of these layout principles the logical SOM nodes are identical with the “physical” items, i.e. representations of crossings in a grid, swarming entities, or gaseous containers. Thus, the data aspect of the nodes is not cleanly separated from its spatial behavior. If we separate it, the behavior of the nodes and the spatial aspects can be handled more transparently, i.e. the relevant parameters are better accessible.
Secondly, the space where those nodes are embedded is conceived as being completely neutral, as if those nodes would be arranged in deep space. Yet, everything we know of learning entities points to their mediality. In other words, the space that embeds the nodes should not be “empty”.
Using a Grid
In most of the cases the SOM is defined as a collection of nodes that are arrangement as a regular grid (4(8)n, 6n). Think of it as a fixed network like a regular wire fence, or the atomic bonds in a model of a crystal.
This layout is by far the most abundant one, yet it is the most restricted one. It is almost impossible, at least very difficult to make such a SOM dynamic, e.g. to provide it the potential to grow or to differentiate.
The advantage of grids is that it is quite easy to calculate the geometrical distance between the nodes, which is a necessary step to determine the influence between any two nodes. If the nodes are mobile, this measurement requires much much more efforts in terms of implementation.
Using Metaphors for Mobility: Swarms, or Gases
Here, the nodes may range freely. Their movement is strongly influenced (or even) restricted by the moves of its neighbors. Here, experience tells us the flocks of birds, or fishes, or bacteria, do not learn efficiently on the level of the swarm. Structures are destroyed to easy. The same is true for the gas metaphor.
Flexible Phase in a Mediating Space
Our proposal is to render the “phase” flexible according to the requirements that are important in a particular stage of learning. The nodes may be strictly arranged like in a crystal, or quite mobile, they may move around according to physical forces or according to their informational properties like the gathered data.
Ideally, the crystalline phases and the fluid phases are dependent on just a two or three parameters. One example for this is the “repulsive field”, a collection of items in a 2D space which repel each other. If the kinetic energy of those items is not too large, and the range of repellent force is not too low, this automatically leads to a hexagonal pattern. Yet, the pattern is not programmed as an apriori pattern. It is a result of properties of the items (and the embedding space). Such, the emergent arrangement is never affected by something like a “layout defect.”
Inserting a new item or removing one is very easy in such a structure. More important, the overall characteristics of the system does not change despite the fact that the actual pattern changes.
The Collection of Items : “Nodes”
In the classic SOM, nodes serve a double purpose:
- P1 – They serve as container for references that point to records of data (=observations);
- P2 – They present this extensional list in an integrated, “intensional” form ;
The intensional form of the list is simply the weight vector of that node. In the course of learning, the list of the records contained in a particular node will be selected such that they are increasingly similar.
Note that keeping the references to the data records is extremely important. It is NOT part of most SOM implementations. If we would not do it, we could not use the SOM as a modeling tool at all. This might be the reason why most people use the SOM just as visualization tool for data (which is a dramatic misunderstanding)
The nodes are not “directly” linked. Whether they influence each other or not is dependent on the distance between them and the neighborhood function. The neighborhood function determines the neighborhood, and it is a salient property of the SOM mechanism that this function changes over time. Important for our understanding of machine-based epistemology is that the relations between nodes in a SOM are potentially of a probabilistic character.
However, if we use a fixed grid, a fixed distance function, and a deterministically behaving neighborhood function, the resulting relations are not probabilistic any more.
Else, in case of default SOM, the nodes are passive. They even do not perform the calculation of the weight vector, which is performed by a central “update” loop in most implementations. In other words, in a standard SOM a node is a data structure.Here we arrive at a main point in our critique of the SOM
The common concept of a SOM is equivalent to a structural constant.
What we need, however, is something completely different. Even on the level of the nodes we need entities, that can change their structure and their relationality.
The concept of FluidSOM must be based on active nodes.
These active nodes are semi-autonomous. They calculate the weight vector themselves, based either on new input data, or some other “chemical” influences. They may develop a few long-range outgoing fibers or masses of more or less stable (but not “fixed”!) input relations to other nodes. The active meta-nodes in a fluid self-organizing map may develop a nested mini-SOM, or may incorporate any other mechanism for evaluating the data to which it is pointing to, e.g. a small neural network of a fixed structure (see mn-SOM). Meta-nodes also may branch out a further SOM instances locally into relative “3D”, e.g. dependent on its work load, or again, on some “chemical influences”
We see, that meta-nodes are dynamic structures, sth like a category of categories. This flexibility is indispensable for growing and differentiation.
This introduces the seed of autonomy on the lowest possible level. Here, within the almost material processes, it is barely autonomy, it is really a mechanic activity. Yet, this activity is NOT triggered by some reason any more. It is just there, as a property of the matter itself.
We are convinced that the top-level behavioral autonomy is (at least for large parts) an emergent property that grows out of the a-reasonable activity on the micro-material level.
The profile vector of a SOM node usually contains for all mutable variables (non-ID/TV) the average of the values in the extensional list. That is, the profile vector itself does not know anything about TV or index variable… which is solely the business of the Node.
In our case, however, and based on the principle of “strict locality,” the weight vector also may contain a further section, which is referring to dynamic properties of the node, or the data. We introduced this in a different way above when discussing the extensionality container of SOM nodes. For instance, the deviation of the data in the node against a model function (such as a correlation) such internal measurements can not be predefined, and they are also not stable input data since they are constantly changing (due to the list of data in the node, the state of other nodes etc.).
This introduces the possibility of self-referentiality on the lowest possible level. Similar to the case of autonomy, we find the seed for self-referentiality on the topmost-level (call it consciousness…) in midst the material layer.
If there is one lesson we can draw from the studies of naturally occurring brains, then it is the fact that there is no master code between neurons, no “Mentalese.” The brain does not work on the base of its own language. Equivalently, there are no logical circuits implementing logic calculus. As a correlate we can say that the brain is not a thing that consists of a definite wiring. A brain is not a finite state automaton, it does not make any sense to ascribe states to brains. Instead, everything going on in a brain is probabilistic, even on the sub-cellular level. It is not determined in a definite manner, how many vesicles have to burst in a synaptic gap to cause a transmission of the signal, it is not determined how many neurons exactly make up a working group for a particular “function” etc.etc. The only thing we can say is that certain fibers collect from certain “regions”, typically millions of neurons, to other such regions.
Note that any software program IS representable by just such a definite wiring. Hence, what we need is a mechanism that can transcend its own being as mechanism. We already discussed this issue in another chapter, where we identified abstract growth as a possible route to that achievement.
The processing of information in the brain is probabilistic, despite the fact that on the top level it “feels” different for us. Now, when starting to program artificial associative structures that are able to do similar things as a brain can accomplish, we have to respect this principle of probabilization.
We not only have to avoid hard-coded wiring between procedures. We have to avoid any explicit wiring at all. In terms of software architecture this translates into the proposal that we should not rely just on object-oriented programming (OOP). For instance, we would represent nodes in a SOM as objects, and the properties of these objects again would be other objects. OOP is an important, but certainly not a sufficient design element for a machine that shall develop its own episteme.
What we have to actualize in our implementation is not just OOP, but a messaging based architecture, where all elements are only loosely coupled. The Lateral Control Mechanism (LCM) of the Kohonen SOM is a nice example for this, the explicit wiring in ANN is perfect counter-example, a DON’T DO IT. Yet, as we will see in the next section, the LCM should not be considered as a symmetric and structurally constant functional entity!
Concerning programming style, on an even lower level this translates into the heavy use of so-called interfaces, as they are so prevalent in Java. Not objects are wired or passed around, but only interfaces. Interfaces are forward contracts about the standards for the interaction of different parts, that actually can change while the “program” is running.
Of course, these considerations regard only to the lowest, indeed material levels of an associative system, yet, they are necessary. If we start with wires of any kind, we won’t achieve our goals. From the philosophical perspective it does not come as a surprise that the immanence of autonomous abstraction is to be found only in open processes, which include the dimension of mediality. Even in the interaction of its tiniest parts the system should not rely on definite encodings.
During their development, natural systems differentiate in their parts. Bodies are comprised of organs, organs are made of different cell types, within all members of a cell a further differentiation of their actual and context-specific role may occur. The same can be observed in social insects, or any other group of social beings. They are morphologically almost identical, yet, their experience let them do their tasks differentially, or even let them do different tasks. Why then should we assume that all neurons in a large compound should act absolutely equally?
To illustrate the point we should visit a particular African termite species (Schedorhinotermes lamanianus) on which I worked as a young biologist. They are feeding on rodden/rodding wood. Well, since these pieces of wood are much larger than the termites, a problem occurs. The animals have to organize their collective foraging, i.e. where to stay and gnaw onto the wood, and where to travel to return the harvested pieces back to home nest, where they then put it to a processing chamber stuffed with a special kind of fungus. The termites then actually feed that fungus, and mostly not the wood. (though they have also bacteria in their gut to do the job of digesting the cellulose and the lignine.
Important for us is the foraging process. To organize gnawing sites and traveling routes they use pheromones, and no wonder, they use just 2 for that, which build a Turing system, as I proofed with a small bio-test together with a colleague.
In the nervous system of animals we find a similar problematics. The brain is not just a large network, over and over symmetric like a crystal. Of course not. There are compartments (see our chapter about complexity), there are fibers. The various parts of the brain even differ strongly with respect to their topological structure, their “wiring”. Why the heck should an artificial system look like a perfect crystal? In a crystal their will be no stable emergence, hence no structural learning. By the way, we should not expect structural learning in swarms either, for a very similar reason, albeit that reason instantiates in the opposite manner: complete perturbation prevents the emergence of compartments, too, hence no structural learning will be observed (That’s the reason why we do not have swarms in the skull…)
Back to our neurons. We reject the approach of a direct representational simulation of neurons, or parts of the brain. Instead we propose to focus the principles as elements of construction. Any system that is intended to show structural learning, is in urgent need of the basic differentiation into “local” and “tele” (among others). Here we meet even a structural parallelism to large urban compounds.
We can implement the emergence of such fibers in a straightforward manner, if we make it dependent on the occurrence of reproducing / repeating co-excitation of regions. This implies that we have to soften the SOM principle of the “winner-takes-it-all” approach. At least in large networks, any given observation should possibly leave its trace in different regions. Yet, our experience with very large maps indicate that this may happen almost inevitably. We just used very simple observations consisting of only 3 features (r,g, and b, such forming the RGB color triplet) and a large SOM, consisting of around 1’000’000 nodes. The topology was 4n, and the map was placed on a torus (no borders). After approx 200’000 observations, the uniqueness for color concepts started to become eroded. For some colors, two conceptual regions appeared.
In the further development of such SOMs, it is then quite naturally to let fibers grow between such regions, changing the topology of the SOM from that of a crystal to that of a brain. While the first is almost perfectly isotropic in exactly 3 dimensions, the topology of the brain is (due to the functional differentiation into tele-fibres) highly anisotropic in a high and variable dimensionality.
Here we discussed some basic design issues about self-organizing maps and introduced some improvements. We have seen that wording matters when it comes to represent even a mechanism. The issues we touched have been
- – explicit distinction of intensionality and extensionality in the conceptualization of the SOM mechanism, leading to a whole “new” domain of SOM architectures;
- – producing idealistic representations from a collection of extensional descriptions;
- – dynamics in the extensionality domain, including embedding of other structures, thus proceeding to the principle of compartmentalization, functional differentiation and morphological growth;
- – the distinction between modeling and associative storage, which require different morphological structures once they are distinguished;
- – stuffing the SOM with self-organization in the strong sense;
- – spatial layout, fixed rid versus the emergent patterns in a repulsion field of freely moving particles; distinguishing material particles from functional abstract nodes;
- – nodes as active components of the grid;
- – self-referentiality on the microscopic level that gives rise to emergent self-referentiality on the macroscopic level;
- – programming style, which should not only be as abstract (and thus as general) as possible, but also has to proceed from strictly defined, strongly coupled object-oriented style to loosely coupled system based on messaging, even on the lowest levels of implementation, e.g. the interaction of nodes;
- – functional differentiation of nodes, leading to dynamic, fractional dimensionality and topological anisotropy;
Yet, there are still much more aspects that have to be considered if one would try to approach processes on machinic substrate that could be give rise to what we call “thinking.” In discussing the design issues listed above, we remain quite on the material level. But of course, morphology is important. Nevertheless we should not conceive of morphology as a perfect instance of a blueprint, it is more about the potential, if not to say the “virtuality”, that is implied as immanence by the morphology. Beyond that morphology, we have to design the processes of dynamic change of that morphology, which we usually call growth, or tissue differentiation. Even on top of that, we have to think about the informational, i.e. immaterial processes, that only eventually lead to morphological correlates.
Anyway, when thinking about machine-based episteme, we obviously have to forget about crystals and swarms, about perfectness and symmetry in morphological structures. Instead, the design of all of the issues, whether material or immaterial, should be designed with the perspective towards an immanence of virtuality in mind, based on probabilized mechanisms.
In a further chapter (scheduled) we will try to approach two other design issues about the implementation of an advanced Self-organizing Map in more detail that we already mentioned briefly here, again oriented at basic abstract elements and the principles found in natural brains: inhibitory processes and probabilistic negation on the one hand and the chemical milieu on the other. Above we already indicated that we expect a continuum between Self-organizing Maps and Reaction-Diffusion Systems, which in our perspective is highly significant for the working of brains, whether natural or artificial ones.
December 19, 2011 § Leave a comment
Initially, the meaning of ‘associativity’ seems to be pretty clear.
According to common sense, it denotes the capacity or the power to associate entities, to establish a relation or a link between them. Yet, there is a different meaning from mathematics that almost appears as kind of a mocking of the common sense. Due to these very divergent meanings we first have to clarify our usage before discussing the concept.
A Strange Case…
In mathematics, associativity is defined as a neutrality of the results of a compound operation with respect to the “bundling,” or association, of the individual parts of the operation. The formal statement is:
A binary operation ∘ (relating two arguments) on a set S is called associative if it satisfies the associative law:
x∘(y∘z) = (x∘y)∘z for all x, y, z ∊ S
This, however, is just the opposite of “associative,” as it demands the independence from any particular association. If there would be any capacity to establish an association between any two elements of S, then there should not be any difference.
Maybe, some mathematician in the 19th century hated the associative power of so many natural structures. Subsequently, modernism contributed its own part to establish the corruption of the obvious etymological roots.
In mathematics the notion of associativity—let us call it I-associativity in order to indicate the inverted meaning—is an important part of fundamental structures like “classic” (Abelian) groups or categories.
Groups are important since they describe the basic symmetries within the “group” of operations that together form an algebra. Groups cover anything what could be done with sets. Note that the central property of sets is their enumerability. (Hence, a notion of “infinite” sets is nonsense; it simply contradicts itself.) Yet, there are examples of quite successful, say: abundantly used, structures that are not based on I-associativity, the most famous of them being the Lie-group. Lie-groups allow to conceive of continuous symmetry, hence it is much more general than the Abelian group that essentially emerged from the generalization of geometry. Even in the case of Lie-groups or other “non-associative” structures, however, the term refers to the meaning such as to inverting it.
With respect to categories we can say that so far, and quite unfortunately, there is not yet something like a category theory that would not rely on I-associativity, a fact that is quite telling in itself. Of course, category theory is also quite successful, yet…
Well, anyway, we would like to indicate that we are not dealing with I-associativity here in this chapter. In contrast, we are interested in the phenomenon of associativity as it is indicated by the etymological roots of the word: The power to establish relations.
A Blind Spot…
In some way the particular horror creationes so abundant in mathematics is comprehensible. If a system would start to establish relations it also would establish novelty by means of that relation (sth. that simply did not exist before). So far, it was not possible for mathematics to deal symbolically with the phenomenon of novelty.
Nevertheless it is astonishing that a Google raid on the term “associativity” reveals only slightly more than 500 links (Dec. 2011), from which the vast majority consists simply from the spoofed entry in Wikipedia that considers the mathematical notion of I-associativity. Some other links are related to computer sciences, which basically refer to the same issue, just sailing under a different flag. Remarkably, only one (1) single link from an open source robotics project  mentions associativity as we will do here.
Not very surprising one can find an intense linkage between “associative” and “memory,” though not in the absolute number of sources (also around ~600), but in the number of citations. According to Google scholar, Kohonen and his Self-Organizing Map  is being cited 9000+ times, followed by Anderson’s account on human memory , accumulating 2700 citations.
Of course, there are many entries in the web referring to the word “associative,” which, however, is an adjective. Our impression is that the capability to associate has not made its way into a more formal consideration, or even to regard it as a capability that deserves a dedicated investigation. This deficit may well be considered as a continuation of a much older story of a closely related neglect, namely that of the relation, as Mertz pointed out [4, ch.6], since associativity is just the dynamic counterpart of the relation.
Formal and (Quasi-)Material Aspects
In a first attempt, we could conceive of associativity as the capability to impose new relations between some entities. For Hume (in his “Treatise”, see Deleuze’s book about him), association was close to what Kant later dubbed “faculty”: The power to do sth, and in this case to relate ideas. However, such wording is inappropriate as we have seen (or: will see) in the chapters about modeling and categories and models. Speaking about relations and entities implies set theory, yet, models and modeling can’t be covered by set theory, or only very exceptionally so. Since category theory seems to match the requirements and the structure of models much better, we also adapt its structure and its wording.
Associativity then may be taken as the capability to impose arrows between objects A, B, C such that at least A ⊆ B ⊆ C, but usually A ⋐ B ⋐ C, and furthermore A ≃ C, where “≃” means “taken to be identical despite non-identity”. In set theoretic terms we would have used the notion of the equivalence class. Such arrows may be identified with the generalized model, as we are arguing in the chapter about the category of models. The symbolized notion of the generalized abstract model looks like this (for details jump over to the page about modeling):
Those arrows representing the (instances of a generalized) model are functors that are mediating between categories. We also may say that the model imposes potentially a manifold of partially ordered sets (posets) onto the initial collection of objects.
Now we can start to address our target, the structural aspects of associativity, more directly. We are interested in the necessary and sufficient conditions for establishing an instance of an object that is able (or develops the capability) to associate objects in the aforementioned sense. In other words, we need an abstract model for it. Yet, here we are not interested in the basic, that is transcendental conditions for the capability to build up associative power.
Let us start more practically, but still immaterial. The best candidates we can think of are Self-Organizing Maps (SOM) and particularly parameterized Reaction-Diffusion Systems (RDS); both of them can be subsumed into the class of associative probabilistic networks, which we describe in another chapter in more technical detail. Of course, not all networks exhibit the emergent property of associativity. We may roughly distinguish between associative networks and logistic networks . Both, SOM as well as RDS, are also able to create manifolds of partial orderings. Another example from this family is the Boltzmann engine, which, however, has some important theoretical and practical drawbacks, even in its generalized form.
Next, we depict the elementary processes of SOM and RDS, respectively. SOM and RDS can be seen as instances located at the distant endpoints of a particular scale, which expresses the topology of the network. The topology expresses the arrangement of quasi-material entities that serve as persistent structure, i.e. as a kind of memory. In the SOM, these entities are called nodes and they are positioned in a more or less fixed grid (albeit there is a variant of the SOM, the SOM gas, where the grid is more fluid). The nodes do not range around. In contrast to the SOM, the entities of an RDS are freely floating around. Yet, RDS are simulated much like the SOM, assuming cells in a grid and stuffing them with a certain memory.
Inspecting those elementary processes, we of course again find transformations. More important, however, is another structural property to both of them. Both networks are characterized by a dynamically changing field of (attractive) forces. Just the locality of those forces is different between SOM and RDS, leading to a greater degree of parallelity in RDS and to multiple areas of the same quality. In SOMs, each node is unique.
The forces in both types of networks are, however, exhibiting the property of locality, i.e. there is one or more center, where the force is strong, and a neighborhood that is established through a stochastic decay of the strength of this force. Usually, in SOM as well as in RDS, the decay is assumed to be radially symmetric, but this is not a necessary condition.
After all, are we now allowed to ask ‘Where does this associativity come from?’ The answer is clearly ‘no.’ Associativity is a holistic property of the arrangement as a total. It is the result of the copresence of some properties like
- – stochastic neighborhoods that are hosting an anisotropic and monotone field of forces;
- – a certain, small memory capacity of the nodes; note that the nodes are not “points”: in order to have a memory they need some corporeality. In turn this opens the way to think of a separation of of the function of that memory and a variable host that provides a container for that memory.
- – strong flows, i.e. a large number of elementary operations acting on that memory, producing excitatory waves (long-range correlations) of finite velocity;
The result of the interaction of those properties can not be described on the level of the elements of the network itself, or any of its parts. What we will observe is a complex dynamics of patterns due to the superposition of antagonist forces, that are modeled either explicitly in the case of RDS, or more implicitly in the case of SOM. Thus both networks are also presenting the property of self-organization, though this aspect is much more dominantly expressed in RDS as compared to the SOM. The important issue is that the whole network, and even more important, the network and its local persistence (“memory”) “causes” the higher-level phenomenon.
We also could say that it is the quasi-material body that is responsible for the associativity of the arrangement.
The Power of a Capability
So, what is this associativity thing about? As we have said above, associativity imposes a potential manifold of partial orderings upon an arbitrary open set.
Take a mixed herd of Gnus and Zebras as the open set without any particular ordering, put some predators like hyenas or lions into this herd, and you will get multiple partially ordered sub-populations. In this case, the associativity emerges through particular rules of defense, attack and differential movement. The result of the process is a particular probabilistic order, clearly an immaterial aspect of the herd, despite the fact that we are dealing with fleshy animals.
The interesting thing in both the SOM and the RDS is that a quasi-body provides a capability that transforms an immaterial arrangement. The resulting immaterial arrangement is nothing else than information. In other words, something specific, namely a persistent contrast, has been established from some larger unspecific, i.e. noise. Taking the perspective of the results, i.e. with respect to the resulting information, we always can see that the association creates new information. The body, i.e. the materially encoded filters and rules, has a greater weight in RDS, while in case of the SOM the stabilization aspect is more dominant. In any case, the associative quasi-body introduces breaks of symmetry, establishes them and stabilizes them. If this symmetry breaking is aligned to some influences, feedback or reinforcement acting from the surrounds onto the quasi-body, we may well call the whole process (a simple form of) “learning.”
Yet, this change in the informational setup of the whole “system” is mirrored by a material change in the underlying quasi-body. Associative quasi-bodies are therefore representatives for the transition from the material to the immaterial, or in more popular terms, for the body-mind-dualism. As we have seen, there is no conflict between those categories, as the quasi-body showing associativity provides a double-articulating substrate for differences. Else, we can see that these differences are transformed from a horizontal difference (such as 7-5=2) into vertical, categorical differences (such like the differential). If we would like to compare those vertical differences we need … category theory! …or a philosophy of the differential!
Early in the 20th century, the concept of association has been adopted by behaviorism. Simply recall the dog of Pavlov and the experiments of Skinner and Watson. The key term in behaviorism as a belated echo of 17th century hyper-mechanistics (support of a strictly mechanic world view) is conditioning, which appears in various forms. Yet, conditioning always remains a 2-valued relation, practically achieved as an imprinting, a collision between two inanimate entities, despite the wording of behaviorists who equate their conditioning with “learning by association.” What should learning be otherwise? Nevertheless, behaviorist theory commits the mistake to think that this “learning” should be a passive act. As you can see here, psychologists still strongly believe in this weird concept. They write: “Note that it does not depend on us doing anything.” Utter nonsense, nothing else.
In contrast to imprinting, imposing a functor onto an open set of indeterminate objects is not only an exhausting activity, it is also a multi-valued “relation,” or simply, a category. If we would analyze the process of imprinting, we would find that even “imprinting” can’t be covered by a 2-valued relation.
Nevertheless, other people took the media as the message. For instance, Steven Pinker criticized the view that association is sufficient to explain the capability of language. Doing so, he commits the same mistake as the behaviorists, just from the opposite direction. How else should we acquire language, if not by some kind of learning, even if it is a particular type of learning? The blind spot of Pinker seems to be randomization, i.e. he is not able leave the actual representation of a “signal” behind.
Another field of application for the concept of associativity is urban planning or urbanism, albeit associativity is rarely recognized as a conceptual or even as a design tool. [cf. 6] It is obvious that urban environments can be conceived as a multitude of high-dimensional probabilistic networks .
Machines, Machines, Machines, ….Machines?
Associativity is a property of a persistent (quasi-material) arrangement to act onto a volatile stream (e.g. information, entropy) in such a way as to establish a particular immaterial arrangement (the pattern, or association), which in turn is reflected by material properties of the persistent layer. Equivalently we may say that the process leading to an association is encoded into the material arrangement itself. The establishment of the first pattern is the work of the (quasi-)body. Only for this reason it is possible to build associative formal structures like the SOM or the RDS.
Yet, the notion of “machine” would be misplaced. We observe strict determinism only on the level of the elementary micro-processes. Any of the vast number of individual micro-events are indeed uniquely parameterized, sharing only the same principle or structure. In such cases we can not speak of a single machine any more, since a mechanic machine has a singular and identifiable state at any point in time. The concept of “state” does neither hold for RDS nor for SOM. What we see here is much more like a vast population of similar machines, where any of those is not even stable across time. Instead, we need to adopt the concept of mechanism, as it is in use in chemistry, physiology, or biology at large. Since both principles, SOM and RDS, show the phenomenon of self-organization, we even can not say that they represent a probabilistic machine. The notion of the “machine” can’t be applied to SOM or RDS, despite the fact that we can write down the principles for the micro-level in simple and analytic formulas. Yet, we can’t assume any kind of a mechanics for the interaction of those micro-machines.
It is now exciting to see that a probabilistic, self-organizing process used to create a model by means of associating principles looses the property of being a machine, even as it is running on a completely deterministic machine, the simulation of a Universal Turing Machine.
Associativity is a principle that transcends the machine, and even the machinic (Guattari). Assortative arrangements establish persistent differences, hence we can say that they create proto-symbols. Without associativity there is no information. Of course, the inverse is also true: Wherever we find information or an assortment, we also must expect associativity.
-  iCub
-  Kohonen, Teuvo, Self-Organization and Associative Memory. Springer Series in Information Sciences, vol.8, Springer, New York 1988.
-  Anderson J.R., Bower G.H., Human Associative Memory. Erlbaum, Hillsdale (NJ) 1980.
-  Mertz, D. W., Moderate Realism and its Logic, New Haven: Yale 1996.
-  Wassermann, K. (2010), Associativity and Other Wurban Things – The Web and the Urban as merging Cultural Qualities. 1st international workshop on the urban internet of things, in conjunction with: internet of things conference 2010 in Tokyo, Japan, Nov 29 – Dec 1, 2010. (pdf)
-  Dean, P., Rethinking representation. the Berlage Institute report No.11, episode Publ. 2007.
-  Wassermann, K. (2010). SOMcity: Networks, Probability, the City, and its Context. eCAADe 2010, Zürich. September 15-18, 2010. (pdf)