The Self-Organizing Map: SOMe Design Issues
February 4, 2012 § 1 Comment
It is the duality of persistent, quasi-material yet simulated structures
and the highly dynamic, volatile and-most salient-informational aspects that are so characteristic for learning entities like Self-Organizing Maps (SOM) or Artificial Neural Networks (ANN). It should not be regarded as a surprise that the design of manifold aspects of the persistent, quasi-material part of SOM or ANN is quite influential and hence also important.
Here we explore some of the aspects of that design. Sure, there is something like a “classic” version of the SOM, named after its inventor, the so-called “Kohonen-SOM.” Kohonen developed several slightly different SOM mechanisms over many years, starting with statistical covariance matrices. All of them comprise great ideas, for sure. Yet, in a wider perspective it is clear that there are many properties of the SOM that are presumably quite sub-optimal for realizing a generally applicable learning mechanism.
The Elements of SOMs
We shall recapitulate very briefly the principle of SOM below, more detailed descriptions can be found in many places in the Web (one of the best for the newbie, with some formulas and a demo software: ai-junkie), see also our document here that relates some issues to references, as well as our intro in plain language.
Yet, the question beyond all the mathematical formula stuff is: “What are the elements of a SOM?”
We propose to distinguish the following four basic elements:
- (1) a Collection of Items
that have memory for observations, or reflecting them, where all the items start with the same structure for these observations (items are often called “nodes”, or in a more romantic attitude “neurons”);
- (2) the Spatial Layout Principles
and the relational arrangement of this items;
- (3) an Influence Mechanism
that link the items together, and which together with the spatial layout defines the topology of the piece;
- (4) a Perceptional Mechanism
that introduces observations into the SOM in a particular manner.
In the case of the SOM these elements are configured in a way that creates a particular class of “learning” that we can describe as competitive-collaborative abstraction.
Those basic elements of a SOM can be parameterized—and thus also implemented—in very different ways. If we would take only the headlines of that list we could also subsume artificial neural networks (ANN) with these elements. Yet, even the items of a SOM and those of a ANN are drastically different. Else, the meaning of concepts like “layout” or “influence mechanism” are very different. This results in a completely different architecture regarding the relation of the “data”, or if you like potential observations, and the structure (SOM or ANN). Basically, ANNs are analytic,which means that the abstraction is (has to be done) done before the interaction of the structure with the data. In strong contrast to this approach, SOM build up an abstraction while interacting with the data. This abstraction is mostly consisting of the transition from extensional data to intensional representation. Thus SOM are able to find a structure, while ANN only can move within the apriori defined structure. In contrast to ANN, SOM are associative mechanisms (which is the reason why we are so fond of them)
Yet, it is also true for SOMs that the parametrization of the instances of the four elements as listed above have a significant influence on the capabilities and the potential of the resulting actual associative structure. Note that the design of the internals of the SOM does not refer to the issues of the usage or the embedding of the SOM into a wider context of modeling, or the structure of modeling itself.
In the following we will discuss the usual actualizations of those four elements, the respective drawbacks and better alternatives.
The SOM itself
Often one can find schematic representations like the one shown in the following figure 1:
Then this is usually described in this way: “The network is created from a 2D lattice of ‘nodes’, each of which is fully connected to the input layer.”
Albeit this is a possible description, it is a highly misleading one, with some quite unfavorable consequences: as we will see, it hides some important opportunities offered by the SOM mechanism.
Instead of speaking in an opaque manner about the “input layer” we simply can use the concept of “structured observations”. The structure is just given by the features used to establish or describe the observations. The important step that simplifies everything is to give all the nodes the same structure as the observations, at least in the beginning and as the trivial case; we will see that both assumptions may “develop away” as an effect of self-organization.
Anyway, the complicated connectivity in figure 1 changes into the following structure for the simple case:
Figure 2: An interpretation of the SOM grid, where the nodes are stuffed with the same structure (ordered set of variables) as the observations. This interpretation allows for a localizing of structures that is not achievable by the standard interpretation as shown in Fig.1.
To see what we gain by this change we have to visit briefly and partially the SOM mechanism.
The SOM mechanism compares a particular “incoming” observation to “all” nodes and determines a best matching node. The intensional part of this node then gets changed as a function of the given weight vector and the new observation. Some kind of intermediate between the observational vector and the intensional vector of the node is established. As a consequence, the nodes develop different intensional descriptions. This change upon matching with an observation then will be spread in the vicinity of the selected node, decaying with the distance, while this distance additionally is shrinking with increasing duration of the learning process. This is called the lateral control mechanism (LCM) by Kohonen (see Kohonen’s book 2001 p.179). This LCM is one of the most striking differences to so-called artificial neural networks (ANN).
It is now rather straightforward to think that the node keeps the index of the matching observation in its local memory. Over the course of learning, a node collects many records, which are all similar. This gathering of observations into an explicit collection is one of the MOST salient differences of our interpretation of the SOM to most of the standard interpretations!
Figure 3: As Fig.2, showing the extensional container of one of the nodes.
The consequences are highly significant: The SOM is not a tool for visualization any more, it is a mechanism with inherent and nevertheless transparent abstraction! To be explicit: While we retain the full power of the SOM mechanism we also not only get an explicit clustering, but even the opportunity for a fully validated modeling, inclusive a full description of the structure of the risk of mis-classification, hence there is no “black box” any more (as in contrast say to ANN, or even statistical methods).
Now we can see what we gained from changing the description, dropping the unholy concept of “input layer.” It now becomes clearly visible that nodes can be conceived of as containers, comprised of an extensional and an intensional part (as Carnap used the terms). The intensional part is what usually is called the weight vector of a node.The extensional part is the list of observations matching this intension.
The intensional part of a node thus represents a type. The extensional part of our revised SOM node represents the matching tokens.
But wait! As it is usual done, we called the intensional part of the node the “weight vector”. Yet, this is a drastic misnomer. It is not “weights” of the variables. It is simply a value that can be calculated in different ways, and which is influenced from different sides. It is a function of
- – the underlying extensional part = the list of records;
- – the similarity functional that is used for this node
- – the general network dynamics;
- – any kind of dynamic rule relating the new observation.
It is thus much more adequate to talk about an “intensionality profile” than about weights. Of course, we can additionally introduce real “weights” for each of the positions in a structure profile vector.
A second important advance of dropping this bad concept of “input layer” is that we can localize this function that results in the actualization of the intensional part of the node. For instance, we can localize the similarity function. As part of the similarity function we could even consider to implement a dynamic rule (dependent on the extensional content of the node) that excludes certain positions = variables as arguments from the determination of the similarity!
The third important consequence is that we created a completely new compartment, the “extensional container” of a node. Using the concept of “input layer” this compartment is simply not visible. Thus, the concept of the input layer violates central insights from the theory of epistemic action.
This “extensional container” is not just a list of records. We can conceive it as a “functional” compartment, that allows for a great deal of new flexibility and dynamics. This inner dynamics could be used to create new elements of the intensional part of the node, e.g. about the variance of the tokens contained in the “extensionality container”. Or about their relation as measured by the correlation. In fact, we could use any mechanism to create new positions in the intensional profile of node, even the properties of an embedded SOM, a small population of artificial neurons, the result parameters of statistical functions taking the list of observations as input and so on.
It is quite important to understand that the particular dynamics in the extensionality container is purely local. Notably the possibility for this dynamics also makes it possible to implement local differentiation of the SOM network, just as it is induced by the observations itself.
There is even a fourth implication of dropping the concept of input layer, which lead us to the separation between intensional and extensional aspects. This implication concerns the numerical production of the intensionality profile. Obviously we can regard the transition from the extensional description to the intensional representation. This abstraction, as any, is accompanied by a loss of information. Referring to the collection of intensional representations means to use them as a model. It is now very important to recognize that there is no explicit down-stream connection to the observations any more. All we have at our disposal are intensional representations that emerged as a consequence of the interaction of three components: (1) the observations, (2) the quasi-material aspects of the modeling procedure(particularly the associative part of it, of course), and (3) the imposed target/risk settings.
As a consequence we have to care explicitly about the variance structure within the extensional containers. More precisely, the internal variance of the extensional containers have to be “comparable.” If we would not care about that, we could not consider the intensional representations as comparable. We simply would compare apples with oranges, since some of the intensional representations simply would represent “a large mess”. On the level of intensionality profile one can’t see the variance anymore, hence we have to avoid the establishment of extensional groups (“micro-clusters”) that do not collect observations that are “similar” with regard to their descriptional values vector (inside the apriori given space of assignates). Astonishingly, this requirement of a homogenized extensional variance measure is overlooked even by Kohonen and his group, not to mention the implementations by countless epigonal fellows. It is clear that only the explicit distinction between intensional and extensional part of a model allows for the visibility of this important structural element.
Finally, and as a fifth consequence, we would like to emphasize that the explicit distinction between intensional and extensional parts opens the road towards a highly interesting region. We already mentioned that the transition from extensional description to intensional representation is a kind of abstraction. Yet, it is a simple kind of abstraction, closely tied to quasi-material aspects of the associative mechanism.
We may, however, easily derive the production of idealistic representations from that, if not even to say “ideas” in the philosophical sense. To achieve that we just have to extend the SOM with a production facility, the capability to simulate. This is of course not a difficult task. We will describe the details elsewhere (essay is scheduled), thus just a brief outline here. The “trick” is to use the intensional representations as seeds for generating surrogate observations by means of a Monte-Carlo simulation, such that the variance of the observations is a bit smaller than that of the empiric observations. Both, the empiric and surrogated “data” (nothing is “given” in the latter case) share the same space of assignates. The variance threshold can be derived dynamically from the SOM itself, it need not be predetermined at implementation time. As the next step one drops the extensional containers of the SOM and feeds the simulated data into it. After several loops of such self-referential modeling the intensional descriptions have “lost” their close ties to empirical data, yet, they are not completely unrelated. We still may use it as a kind of “template” in modeling, or for instance as a kind of null-model. In other words, the SOM contains the first traces of Platonic ideas.
Modeling. What else?
Above we emphasized that the SOM provides the opportunity for a fully validated modeling if we distinguish explicitly intensional and extensional parts in the make-up of the nodes. The SOM is, however, a strange thing, that can act in completely different ways.
In the chapter about modeling we concluded that a model without a purpose is not a model, or it is at most a strongly deficient model. Nevertheless, many people claim to create models without implying a purpose to the learning SOM. They call it “unsupervised clustering”. This is, of course, nonsense. It should be called more appropriately, “clustering with a deliberately hidden purpose,” since all the parameters of the SOM mechanisms and even the implementation act as constraints for the clustering, too. Any clustering mechanism applies a lot of criteria that influence the results. These constraints are supervised by the software, and the software has been produced by a human being (often called programmer), so this human being is supervising the clustering with a long arm. For the same reason one can not say the SOM is learning something and also not that we would train the SOM, without giving it a purpose.
Though the digesting of information by a SOM without a purpose being present is neither modeling nor learning, what can we conceive such a process as then?
The answer is pretty simple, and remember it becomes visible only after having dropped illegitimate ascriptions of mistaken concepts. This clustering has a particular epistemological role:
Self-organizing Maps that are running without purpose (i.e. target variables) are best described as associative storage devices. Nothing more, but above all, also nothing less.
Actually, this has to be rated as one of the greatest currently unrecognized opportunities in the field of machine learning. The reason is again inadequate wording. Of course, the input for such a map should be probabilized (randomized), and it has been already demonstrated how to accomplish this… guess by whom… by Teuvo Kohonen himself, while he was inventing the so-called WebSom. Kohonen proposed random neighborhoods for presenting snippets of texts to the SOM, which are a simple version of random contexts.
Importantly, once one recognizes the categorical differences between the target oriented modeling and the associative storage, it becomes immediately clear that there are strictly different methodological, hence quasi-morphological requirements. Astonishingly, even Kohonen himself, and any of his fellows as well, did not recognize the conceptual difference between the two flavors. He used SOMs created without target variable, i.e. without implying a purpose, as models for performing selections. Note that the principal mechanism of the SOM is the same for both approaches. There are just differences in the cost function(s) regarding the selection of variables.
There should be no doubt that any system intended to advance towards an autonomous machine-based episteme has to combine the two mechanism. There are sill other mechanisms, such like virtual movements, or virtual sequences in the abstract SOM space (we will describe that elsewhere), or the self-referential SOM for developing “crisp ideas”, but such a combination of associative storage and target oriented modeling is definitely inevitable (in our perspective… but we have strong arguments!).
SOM and Self-Organization
A small remark should be made here: Self-organizing maps are not in the same strong sense self-organizing as for instance Turing systems, or other Reaction-Diffusion Systems (RDS). A SOM gets organized by the interaction of its mechanisms and structures and the data. A SOM does not create patterns by it-SELF. Without feeding data into it, nothing happens, in stark contrast to self-organizing systems in the strong sense (see the example we already cited here), or take a look here from where we reproduced this parameter map for Gray-Scott Models.
Figure 4: The parameter map for Gray-Scott models, a particular Reaction-Diffusion System. Only for certain combinations of the two parameters of the system interesting patterns appear, and only for part of them the system remains dynamical, i.e. changing the layout of the patterns continuously.
As we discuss it in the chapter on complexity, it is pretty clear which kind of conditions must be at work to create the phenomenon of self-organization. None of them is present in Self-Organizing Maps; above all, SOMs are neither dissipative, nor are there antagonist influences.
Yet, it is not too difficult to create a self-organizing map that is really self-organizing. What is needed is either a second underlying process or inhibitory elements organized as population. In natural brains, we find both kinds of processes. The key for choosing the right starting point for implementing a system that is showing the transition from SOM to RDS is the complete probabilization of the idea of the network.
Our feeling is that at least one of them is mandatory in order to allow the system to develop logic as a category in an autonomous manner, i.e. not pre-programmed. As any other understanding, the ability to think in logical terms, or using logic as a category should not be programmed into a computer. That ability should emerge from the implemented conditions. Our claim that some concept is quite the opposite to something other is quite likely based on such processes. It is highly indicate in this context that the brain is indeed showing Turing patterns on the level of activity patterns, i.e. the patterns are not made of material entities, but are completely immaterial. Else, like in chemical clocks like the Belousov-Zhabotinsky system, another RDS, the natural brain shows a strong rhythmicity, both in its “local” activity patterns, as well as in the overall activity, affecting billions of cells at a time.
So far, the strong self-organization is not implemented in our FluidSOM.
Spatial Layout Principles
The spatial layout principle is a very important design aspect. It concerns not only the actual geometrical arrangement of nodes, but also their mobility as representations of physical entities. In the case of SOM this has to be taken quite abstract. The “physical entities” represented by the nodes are not neurons. The nodes represent functional roles of populations of neurons.
Usually, the SOM is defined as a collection of nodes that are arranged in a particular topology. This topology may be
- – grid like, 2-(3) dimensional;
- – as kind of a swarm in 2 dimensions;
- – as a gas, freely moving nodes.
The obvious difference between them is the degree of physical freedom for the nodes to move around. In grids, nodes are fixed and cannot move, while in the SOM gas the “nodes” are much more mobile.
There is also a quite important, yet not so obvious commonality between them. Firstly, in all of these layout principles the logical SOM nodes are identical with the “physical” items, i.e. representations of crossings in a grid, swarming entities, or gaseous containers. Thus, the data aspect of the nodes is not cleanly separated from its spatial behavior. If we separate it, the behavior of the nodes and the spatial aspects can be handled more transparently, i.e. the relevant parameters are better accessible.
Secondly, the space where those nodes are embedded is conceived as being completely neutral, as if those nodes would be arranged in deep space. Yet, everything we know of learning entities points to their mediality. In other words, the space that embeds the nodes should not be “empty”.
Using a Grid
In most of the cases the SOM is defined as a collection of nodes that are arrangement as a regular grid (4(8)n, 6n). Think of it as a fixed network like a regular wire fence, or the atomic bonds in a model of a crystal.
This layout is by far the most abundant one, yet it is the most restricted one. It is almost impossible, at least very difficult to make such a SOM dynamic, e.g. to provide it the potential to grow or to differentiate.
The advantage of grids is that it is quite easy to calculate the geometrical distance between the nodes, which is a necessary step to determine the influence between any two nodes. If the nodes are mobile, this measurement requires much much more efforts in terms of implementation.
Using Metaphors for Mobility: Swarms, or Gases
Here, the nodes may range freely. Their movement is strongly influenced (or even) restricted by the moves of its neighbors. Here, experience tells us the flocks of birds, or fishes, or bacteria, do not learn efficiently on the level of the swarm. Structures are destroyed to easy. The same is true for the gas metaphor.
Flexible Phase in a Mediating Space
Our proposal is to render the “phase” flexible according to the requirements that are important in a particular stage of learning. The nodes may be strictly arranged like in a crystal, or quite mobile, they may move around according to physical forces or according to their informational properties like the gathered data.
Ideally, the crystalline phases and the fluid phases are dependent on just a two or three parameters. One example for this is the “repulsive field”, a collection of items in a 2D space which repel each other. If the kinetic energy of those items is not too large, and the range of repellent force is not too low, this automatically leads to a hexagonal pattern. Yet, the pattern is not programmed as an apriori pattern. It is a result of properties of the items (and the embedding space). Such, the emergent arrangement is never affected by something like a “layout defect.”
Inserting a new item or removing one is very easy in such a structure. More important, the overall characteristics of the system does not change despite the fact that the actual pattern changes.
The Collection of Items : “Nodes”
In the classic SOM, nodes serve a double purpose:
- P1 – They serve as container for references that point to records of data (=observations);
- P2 – They present this extensional list in an integrated, “intensional” form ;
The intensional form of the list is simply the weight vector of that node. In the course of learning, the list of the records contained in a particular node will be selected such that they are increasingly similar.
Note that keeping the references to the data records is extremely important. It is NOT part of most SOM implementations. If we would not do it, we could not use the SOM as a modeling tool at all. This might be the reason why most people use the SOM just as visualization tool for data (which is a dramatic misunderstanding)
The nodes are not “directly” linked. Whether they influence each other or not is dependent on the distance between them and the neighborhood function. The neighborhood function determines the neighborhood, and it is a salient property of the SOM mechanism that this function changes over time. Important for our understanding of machine-based epistemology is that the relations between nodes in a SOM are potentially of a probabilistic character.
However, if we use a fixed grid, a fixed distance function, and a deterministically behaving neighborhood function, the resulting relations are not probabilistic any more.
Else, in case of default SOM, the nodes are passive. They even do not perform the calculation of the weight vector, which is performed by a central “update” loop in most implementations. In other words, in a standard SOM a node is a data structure.Here we arrive at a main point in our critique of the SOM
The common concept of a SOM is equivalent to a structural constant.
What we need, however, is something completely different. Even on the level of the nodes we need entities, that can change their structure and their relationality.
The concept of FluidSOM must be based on active nodes.
These active nodes are semi-autonomous. They calculate the weight vector themselves, based either on new input data, or some other “chemical” influences. They may develop a few long-range outgoing fibers or masses of more or less stable (but not “fixed”!) input relations to other nodes. The active meta-nodes in a fluid self-organizing map may develop a nested mini-SOM, or may incorporate any other mechanism for evaluating the data to which it is pointing to, e.g. a small neural network of a fixed structure (see mn-SOM). Meta-nodes also may branch out a further SOM instances locally into relative “3D”, e.g. dependent on its work load, or again, on some “chemical influences”
We see, that meta-nodes are dynamic structures, sth like a category of categories. This flexibility is indispensable for growing and differentiation.
This introduces the seed of autonomy on the lowest possible level. Here, within the almost material processes, it is barely autonomy, it is really a mechanic activity. Yet, this activity is NOT triggered by some reason any more. It is just there, as a property of the matter itself.
We are convinced that the top-level behavioral autonomy is (at least for large parts) an emergent property that grows out of the a-reasonable activity on the micro-material level.
The profile vector of a SOM node usually contains for all mutable variables (non-ID/TV) the average of the values in the extensional list. That is, the profile vector itself does not know anything about TV or index variable… which is solely the business of the Node.
In our case, however, and based on the principle of “strict locality,” the weight vector also may contain a further section, which is referring to dynamic properties of the node, or the data. We introduced this in a different way above when discussing the extensionality container of SOM nodes. For instance, the deviation of the data in the node against a model function (such as a correlation) such internal measurements can not be predefined, and they are also not stable input data since they are constantly changing (due to the list of data in the node, the state of other nodes etc.).
This introduces the possibility of self-referentiality on the lowest possible level. Similar to the case of autonomy, we find the seed for self-referentiality on the topmost-level (call it consciousness…) in midst the material layer.
If there is one lesson we can draw from the studies of naturally occurring brains, then it is the fact that there is no master code between neurons, no “Mentalese.” The brain does not work on the base of its own language. Equivalently, there are no logical circuits implementing logic calculus. As a correlate we can say that the brain is not a thing that consists of a definite wiring. A brain is not a finite state automaton, it does not make any sense to ascribe states to brains. Instead, everything going on in a brain is probabilistic, even on the sub-cellular level. It is not determined in a definite manner, how many vesicles have to burst in a synaptic gap to cause a transmission of the signal, it is not determined how many neurons exactly make up a working group for a particular “function” etc.etc. The only thing we can say is that certain fibers collect from certain “regions”, typically millions of neurons, to other such regions.
Note that any software program IS representable by just such a definite wiring. Hence, what we need is a mechanism that can transcend its own being as mechanism. We already discussed this issue in another chapter, where we identified abstract growth as a possible route to that achievement.
The processing of information in the brain is probabilistic, despite the fact that on the top level it “feels” different for us. Now, when starting to program artificial associative structures that are able to do similar things as a brain can accomplish, we have to respect this principle of probabilization.
We not only have to avoid hard-coded wiring between procedures. We have to avoid any explicit wiring at all. In terms of software architecture this translates into the proposal that we should not rely just on object-oriented programming (OOP). For instance, we would represent nodes in a SOM as objects, and the properties of these objects again would be other objects. OOP is an important, but certainly not a sufficient design element for a machine that shall develop its own episteme.
What we have to actualize in our implementation is not just OOP, but a messaging based architecture, where all elements are only loosely coupled. The Lateral Control Mechanism (LCM) of the Kohonen SOM is a nice example for this, the explicit wiring in ANN is perfect counter-example, a DON’T DO IT. Yet, as we will see in the next section, the LCM should not be considered as a symmetric and structurally constant functional entity!
Concerning programming style, on an even lower level this translates into the heavy use of so-called interfaces, as they are so prevalent in Java. Not objects are wired or passed around, but only interfaces. Interfaces are forward contracts about the standards for the interaction of different parts, that actually can change while the “program” is running.
Of course, these considerations regard only to the lowest, indeed material levels of an associative system, yet, they are necessary. If we start with wires of any kind, we won’t achieve our goals. From the philosophical perspective it does not come as a surprise that the immanence of autonomous abstraction is to be found only in open processes, which include the dimension of mediality. Even in the interaction of its tiniest parts the system should not rely on definite encodings.
During their development, natural systems differentiate in their parts. Bodies are comprised of organs, organs are made of different cell types, within all members of a cell a further differentiation of their actual and context-specific role may occur. The same can be observed in social insects, or any other group of social beings. They are morphologically almost identical, yet, their experience let them do their tasks differentially, or even let them do different tasks. Why then should we assume that all neurons in a large compound should act absolutely equally?
To illustrate the point we should visit a particular African termite species (Schedorhinotermes lamanianus) on which I worked as a young biologist. They are feeding on rodden/rodding wood. Well, since these pieces of wood are much larger than the termites, a problem occurs. The animals have to organize their collective foraging, i.e. where to stay and gnaw onto the wood, and where to travel to return the harvested pieces back to home nest, where they then put it to a processing chamber stuffed with a special kind of fungus. The termites then actually feed that fungus, and mostly not the wood. (though they have also bacteria in their gut to do the job of digesting the cellulose and the lignine.
Important for us is the foraging process. To organize gnawing sites and traveling routes they use pheromones, and no wonder, they use just 2 for that, which build a Turing system, as I proofed with a small bio-test together with a colleague.
In the nervous system of animals we find a similar problematics. The brain is not just a large network, over and over symmetric like a crystal. Of course not. There are compartments (see our chapter about complexity), there are fibers. The various parts of the brain even differ strongly with respect to their topological structure, their “wiring”. Why the heck should an artificial system look like a perfect crystal? In a crystal their will be no stable emergence, hence no structural learning. By the way, we should not expect structural learning in swarms either, for a very similar reason, albeit that reason instantiates in the opposite manner: complete perturbation prevents the emergence of compartments, too, hence no structural learning will be observed (That’s the reason why we do not have swarms in the skull…)
Back to our neurons. We reject the approach of a direct representational simulation of neurons, or parts of the brain. Instead we propose to focus the principles as elements of construction. Any system that is intended to show structural learning, is in urgent need of the basic differentiation into “local” and “tele” (among others). Here we meet even a structural parallelism to large urban compounds.
We can implement the emergence of such fibers in a straightforward manner, if we make it dependent on the occurrence of reproducing / repeating co-excitation of regions. This implies that we have to soften the SOM principle of the “winner-takes-it-all” approach. At least in large networks, any given observation should possibly leave its trace in different regions. Yet, our experience with very large maps indicate that this may happen almost inevitably. We just used very simple observations consisting of only 3 features (r,g, and b, such forming the RGB color triplet) and a large SOM, consisting of around 1’000’000 nodes. The topology was 4n, and the map was placed on a torus (no borders). After approx 200’000 observations, the uniqueness for color concepts started to become eroded. For some colors, two conceptual regions appeared.
In the further development of such SOMs, it is then quite naturally to let fibers grow between such regions, changing the topology of the SOM from that of a crystal to that of a brain. While the first is almost perfectly isotropic in exactly 3 dimensions, the topology of the brain is (due to the functional differentiation into tele-fibres) highly anisotropic in a high and variable dimensionality.
Here we discussed some basic design issues about self-organizing maps and introduced some improvements. We have seen that wording matters when it comes to represent even a mechanism. The issues we touched have been
- – explicit distinction of intensionality and extensionality in the conceptualization of the SOM mechanism, leading to a whole “new” domain of SOM architectures;
- – producing idealistic representations from a collection of extensional descriptions;
- – dynamics in the extensionality domain, including embedding of other structures, thus proceeding to the principle of compartmentalization, functional differentiation and morphological growth;
- – the distinction between modeling and associative storage, which require different morphological structures once they are distinguished;
- – stuffing the SOM with self-organization in the strong sense;
- – spatial layout, fixed rid versus the emergent patterns in a repulsion field of freely moving particles; distinguishing material particles from functional abstract nodes;
- – nodes as active components of the grid;
- – self-referentiality on the microscopic level that gives rise to emergent self-referentiality on the macroscopic level;
- – programming style, which should not only be as abstract (and thus as general) as possible, but also has to proceed from strictly defined, strongly coupled object-oriented style to loosely coupled system based on messaging, even on the lowest levels of implementation, e.g. the interaction of nodes;
- – functional differentiation of nodes, leading to dynamic, fractional dimensionality and topological anisotropy;
Yet, there are still much more aspects that have to be considered if one would try to approach processes on machinic substrate that could be give rise to what we call “thinking.” In discussing the design issues listed above, we remain quite on the material level. But of course, morphology is important. Nevertheless we should not conceive of morphology as a perfect instance of a blueprint, it is more about the potential, if not to say the “virtuality”, that is implied as immanence by the morphology. Beyond that morphology, we have to design the processes of dynamic change of that morphology, which we usually call growth, or tissue differentiation. Even on top of that, we have to think about the informational, i.e. immaterial processes, that only eventually lead to morphological correlates.
Anyway, when thinking about machine-based episteme, we obviously have to forget about crystals and swarms, about perfectness and symmetry in morphological structures. Instead, the design of all of the issues, whether material or immaterial, should be designed with the perspective towards an immanence of virtuality in mind, based on probabilized mechanisms.
In a further chapter (scheduled) we will try to approach two other design issues about the implementation of an advanced Self-organizing Map in more detail that we already mentioned briefly here, again oriented at basic abstract elements and the principles found in natural brains: inhibitory processes and probabilistic negation on the one hand and the chemical milieu on the other. Above we already indicated that we expect a continuum between Self-organizing Maps and Reaction-Diffusion Systems, which in our perspective is highly significant for the working of brains, whether natural or artificial ones.