The Self-Organizing Map: SOMe Design Issues

February 4, 2012 § 1 Comment

It is the duality of persistent, quasi-material yet simulated structures

and the highly dynamic, volatile and-most salient-informational aspects that are so characteristic for learning entities like Self-Organizing Maps (SOM) or Artificial Neural Networks (ANN). It should not be regarded as a surprise that the design of manifold aspects of the persistent, quasi-material part of SOM or ANN is quite influential and hence also important.

Here we explore some of the aspects of that design. Sure, there is something like a “classic” version of the SOM, named after its inventor, the so-called “Kohonen-SOM.” Kohonen developed several slightly different SOM mechanisms over many years, starting with statistical covariance matrices. All of them comprise great ideas, for sure. Yet, in a wider perspective it is clear that there are many properties of the SOM that are presumably quite sub-optimal for realizing a generally applicable learning mechanism.

The Elements of SOMs

We shall recapitulate very briefly the principle of SOM below, more detailed descriptions can be found in many places in the Web (one of the best for the newbie, with some formulas and a demo software: ai-junkie), see also our document here that relates some issues to references, as well as our intro in plain language.

Yet, the question beyond all the mathematical formula stuff is: “What are the elements of a SOM?”

We propose to distinguish the following four basic elements:

  • (1) a Collection of Items
    that have memory for observations, or reflecting them, where all the items start with the same structure for these observations (items are often called “nodes”, or in a more romantic attitude “neurons”);
  • (2) the Spatial Layout Principles
    and the relational arrangement of this items;
  • (3) an Influence Mechanism
    that link the items together, and which together with the spatial layout defines the topology of the piece;
  • (4) a Perceptional Mechanism
    that introduces observations into the SOM in a particular manner.

In the case of the SOM these elements are configured in a way that creates a particular class of “learning” that we can describe as competitive-collaborative abstraction.

Those basic elements of a SOM can be parameterized—and thus also implemented—in very different ways. If we would take only the headlines of that list we could also subsume artificial neural networks (ANN) with these elements. Yet, even the items of a SOM and those of a ANN are drastically different. Else, the meaning of concepts like “layout” or “influence mechanism” are very different. This results in a completely different architecture regarding the relation of the “data”, or if you like potential observations, and the structure (SOM or ANN). Basically, ANNs are analytic,which means that the abstraction is (has to be done) done before the interaction of the structure with the data. In strong contrast to this approach, SOM build up an abstraction while interacting with the data. This abstraction is mostly consisting of the transition from extensional data to intensional representation. Thus SOM are able to find a structure, while ANN only can move within the apriori defined structure. In contrast to ANN, SOM are associative mechanisms (which is the reason why we are so fond of them)

Yet, it is also true for SOMs that the parametrization of the instances of the four elements as listed above have a significant influence on the capabilities and the potential of the resulting actual associative structure. Note that the design of the internals of the SOM does not refer to the issues of the usage or the embedding of the SOM into a wider context of modeling, or the structure of modeling itself.

In the following we will discuss the usual actualizations of those four elements, the respective drawbacks and better alternatives.

The SOM itself

Often one can find schematic representations like the one shown in the following figure 1:

Then this is usually described in this way: “The network is created from a 2D lattice of ‘nodes’, each of which is fully connected to the input layer.

Albeit this is a possible description, it is a highly misleading one, with some quite unfavorable consequences: as we will see, it hides some important opportunities offered by the SOM mechanism.

Instead of speaking in an opaque manner about the “input layer” we simply can use the concept of “structured observations”. The structure is just given by the features used to establish or describe the observations. The important step that simplifies everything is to give all the nodes the same structure as the observations, at least in the beginning and as the trivial case; we will see that both assumptions may “develop away” as an effect of self-organization.

Anyway, the complicated connectivity in figure 1 changes into the following structure for the simple case:

Figure 2: An interpretation of the SOM grid, where the nodes are stuffed with the same structure (ordered set of variables) as the observations. This interpretation allows for a localizing of structures that is not achievable by the standard interpretation as shown in Fig.1.

To see what we gain by this change we have to visit briefly and partially the SOM mechanism.

The SOM mechanism compares a particular “incoming” observation to “all” nodes and determines a best matching node. The intensional part of this node then gets changed as a function of the given weight vector and the new observation. Some kind of intermediate between the observational vector and the intensional vector of the node is established. As a consequence, the nodes develop different intensional descriptions. This change upon matching with an observation then will be spread in the vicinity of the selected node, decaying with the distance, while this distance additionally is shrinking with increasing duration of the learning process. This is called the lateral control mechanism (LCM) by Kohonen (see Kohonen’s book 2001 p.179). This LCM is one of the most striking differences to so-called artificial neural networks (ANN).

It is now rather straightforward to think that the node keeps the index of the matching observation in its local memory. Over the course of learning, a node collects many records, which are all similar. This gathering of observations into an explicit collection is one of the MOST salient differences of our interpretation of the SOM to most of the standard interpretations! 

Figure 3: As Fig.2, showing the extensional container of one of the nodes.

The consequences are highly significant: The SOM is not a tool for visualization any more, it is a mechanism with inherent and nevertheless transparent abstraction! To be explicit: While we retain the full power of the SOM mechanism we also not only get an explicit clustering, but even the opportunity for a fully validated modeling, inclusive a full description of the structure of the risk of mis-classification, hence there is no “black box” any more (as in contrast say to ANN, or even statistical methods).

Now we can see what we gained from changing the description, dropping the unholy concept of “input layer.” It now becomes clearly visible that nodes can be conceived of as containers, comprised of an extensional and an intensional part (as Carnap used the terms). The intensional part is what usually is called the weight vector of a node.The extensional part is the list of observations matching this intension.

The intensional part of a node thus represents a type. The extensional part of our revised SOM node represents the matching tokens.

But wait! As it is usual done, we called the intensional part of the node the “weight vector”. Yet, this is a drastic misnomer. It is not “weights” of the variables. It is simply a value that can be calculated in different ways, and which is influenced from different sides. It is a function of

  • – the underlying extensional part = the list of records;
  • – the similarity functional that is used for this node
  • – the general network dynamics;
  • – any kind of dynamic rule relating the new observation.

It is thus much more adequate to talk about an “intensionality profile” than about weights. Of course, we can additionally introduce real “weights” for each of the positions in a structure profile vector.

A second important advance of dropping this bad concept of “input layer” is that we can localize this function that results in the actualization of the intensional part of the node. For instance, we can localize the similarity function. As part of the similarity function we could even consider to implement a dynamic rule (dependent on the extensional content of the node) that excludes certain positions = variables as arguments from the determination of the similarity!

The third important consequence is that we created a completely new compartment, the “extensional container” of a node. Using the concept of “input layer” this compartment is simply not visible. Thus, the concept of the input layer violates central insights from the theory of epistemic action.

This “extensional container” is not just a list of records. We can conceive it as a “functional” compartment, that allows for a great deal of new flexibility and dynamics. This inner dynamics could be used to create new elements of the intensional part of the node, e.g. about the variance of the tokens contained in the “extensionality container”. Or about their relation as measured by the correlation. In fact, we could use any mechanism to create new positions in the intensional profile of node, even the properties of an embedded SOM, a small population of artificial neurons, the result parameters of statistical functions taking the list of observations as input and so on.

It is quite important to understand that the particular dynamics in the extensionality container is purely local. Notably the possibility for this dynamics also makes it possible to implement local differentiation of the SOM network, just as it is induced by the observations itself.

There is even a fourth implication of dropping the concept of input layer, which lead us to the separation between intensional and extensional aspects. This implication concerns the numerical production of the intensionality profile. Obviously we can regard the transition from the extensional description to the intensional representation. This abstraction, as any, is accompanied by a loss of information. Referring to the collection of intensional representations means to use them as a model. It is now very important to recognize that there is no explicit down-stream connection to the observations any more. All we have at our disposal are intensional representations that emerged as a consequence of the interaction of three components: (1) the observations, (2) the quasi-material aspects of the modeling procedure(particularly the associative part of it, of course), and (3) the imposed target/risk settings.

As a consequence we have to care explicitly about the variance structure within the extensional containers. More precisely, the internal variance of the extensional containers have to be “comparable.” If we would not care about that, we could not consider the intensional representations as comparable. We simply would compare apples with oranges, since some of the intensional representations simply would represent “a large mess”. On the level of intensionality profile one can’t see the variance anymore, hence we have to avoid the establishment of extensional groups (“micro-clusters”) that do not collect observations that are “similar” with regard to their descriptional values vector (inside the apriori given space of assignates). Astonishingly, this requirement of a homogenized extensional variance measure is overlooked even by Kohonen and his group, not to mention the implementations by countless epigonal fellows. It is clear that only the explicit distinction between intensional and extensional part of a model allows for the visibility of this important structural element.

Finally, and as a fifth consequence, we would like to emphasize that the explicit distinction between intensional and extensional parts opens the road towards a highly interesting region. We already mentioned that the transition from extensional description to intensional representation is a kind of abstraction. Yet, it is a simple kind of abstraction, closely tied to quasi-material aspects of the associative mechanism.

We may, however, easily derive the production of idealistic representations from that, if not even to say “ideas” in the philosophical sense. To achieve that we just have to extend the SOM with a production facility, the capability to simulate. This is of course not a difficult task. We will describe the details elsewhere (essay is scheduled), thus just a brief outline here. The “trick” is to use the intensional representations as seeds for generating surrogate observations by means of a Monte-Carlo simulation, such that the variance of the observations is a bit smaller than that of the empiric observations. Both, the empiric and surrogated “data” (nothing is “given” in the latter case) share the same space of assignates. The variance threshold can be derived dynamically from the SOM itself, it need not be predetermined at implementation time. As the next step one drops the extensional containers of the SOM and feeds the simulated data into it. After several loops of such self-referential modeling the intensional descriptions have “lost” their close ties to empirical data, yet, they are not completely unrelated. We still may use it as a kind of “template” in modeling, or for instance as a kind of null-model. In other words, the SOM contains the first traces of Platonic ideas.

Modeling. What else?

Above we emphasized that the SOM provides the opportunity for a fully validated modeling if we distinguish explicitly intensional and extensional parts in the make-up of the nodes. The SOM is, however, a strange thing, that can act in completely different ways.

In the chapter about modeling we concluded that a model without a purpose is not a model, or it is at most a strongly deficient model. Nevertheless, many people claim to create models without implying a purpose to the learning SOM. They call it “unsupervised clustering”. This is, of course, nonsense. It should be called more appropriately, “clustering with a deliberately hidden purpose,” since all the parameters of the SOM mechanisms and even the implementation act as constraints for the clustering, too. Any clustering mechanism applies a lot of  criteria that influence the results. These constraints are supervised by the software, and the software has been produced by a human being (often called programmer), so this human being is supervising the clustering with a long arm. For the same reason one can not say the SOM is learning something and also not that we would train the SOM, without giving it a purpose.

Though the digesting of information by a SOM without a purpose being present is neither modeling nor learning, what can we conceive such a process as then?

The answer is pretty simple, and remember it becomes visible only after having dropped illegitimate ascriptions of mistaken concepts. This clustering has a particular epistemological role:

Self-organizing Maps that are running without purpose (i.e. target variables) are best described as associative storage devices. Nothing more, but above all, also nothing less.

Actually, this has to be rated as one of the greatest currently unrecognized opportunities in the field of machine learning. The reason is again inadequate wording. Of course, the input for such a map should be probabilized (randomized), and it has been already demonstrated how to accomplish this… guess by whom… by Teuvo Kohonen himself, while he was inventing the so-called WebSom. Kohonen proposed random neighborhoods for presenting snippets of texts to the SOM, which are a simple version of random contexts.

Importantly, once one recognizes the categorical differences between the target oriented modeling and the associative storage, it becomes immediately clear that there are strictly different methodological, hence quasi-morphological requirements. Astonishingly, even Kohonen himself, and any of his fellows as well, did not recognize the conceptual difference between the two flavors. He used SOMs created without target variable, i.e. without implying a purpose, as models for performing selections. Note that the principal mechanism of the SOM is the same for both approaches. There are just differences in the cost function(s) regarding the selection of variables.

There should be no doubt that any system intended to advance towards an autonomous machine-based episteme has to combine the two mechanism. There are sill other mechanisms, such like virtual movements, or virtual sequences in the abstract SOM space (we will describe that elsewhere), or the self-referential SOM for developing “crisp ideas”, but such a combination of associative storage and target oriented modeling is definitely inevitable (in our perspective… but we have strong arguments!).

SOM and Self-Organization

A small remark should be made here: Self-organizing maps are not in the same strong sense self-organizing as for instance Turing systems, or other Reaction-Diffusion Systems (RDS). A SOM gets organized by the interaction of its mechanisms and structures and the data. A SOM does not create patterns by it-SELF. Without feeding data into it, nothing happens, in stark contrast to self-organizing systems in the strong sense (see the example we already cited here), or take a look here from where we reproduced this parameter map for Gray-Scott Models.

Figure 4: The parameter map for Gray-Scott models, a particular Reaction-Diffusion System. Only for certain combinations of the two parameters of the system interesting patterns appear, and only for part of them the system remains dynamical, i.e. changing the layout of the patterns continuously.

As we discuss it in the chapter on complexity, it is pretty clear which kind of conditions must be at work to create the phenomenon of self-organization. None of them is present in Self-Organizing Maps; above all, SOMs are neither dissipative, nor are there antagonist influences.

Yet, it is not too difficult to create a self-organizing map that is really self-organizing. What is needed is either a second underlying process or inhibitory elements organized as population. In natural brains, we find both kinds of processes. The key for choosing the right starting point for implementing a system that is showing the transition from SOM to RDS is the complete probabilization of the idea of the network.

Our feeling is that at least one of them is mandatory in order to allow the system to develop logic as a category in an autonomous manner, i.e. not pre-programmed. As any other understanding, the ability to think in logical terms, or using logic as a category should not be programmed into a computer. That ability should emerge from the implemented conditions. Our claim that some concept is quite the opposite to something other is quite likely based on such processes. It is highly indicate in this context that the brain is indeed showing Turing patterns on the level of activity patterns, i.e. the patterns are not made of material entities, but are completely immaterial. Else, like in chemical clocks like the Belousov-Zhabotinsky system, another RDS, the natural brain shows a strong rhythmicity, both in its “local” activity patterns, as well as in the overall activity, affecting billions of cells at a time.

So far, the strong self-organization is not implemented in our FluidSOM.

Spatial Layout Principles

The spatial layout principle is a very important design aspect. It concerns not only the actual geometrical arrangement of nodes, but also their mobility as representations of physical entities. In the case of SOM this has to be taken quite abstract. The “physical entities” represented by the nodes are not neurons. The nodes represent functional roles of populations of neurons.

Usually, the SOM is defined as a collection of nodes that are arranged in a particular topology. This topology may be

  • – grid like, 2-(3) dimensional;
  • – as kind of a swarm in 2 dimensions;
  • – as a gas, freely moving nodes.

The obvious difference between them is the degree of physical freedom for the nodes to move around. In grids, nodes are fixed and cannot  move, while in the SOM gas the “nodes” are much more mobile.

There is also a quite important, yet not so obvious commonality between them. Firstly, in all of these layout principles the logical SOM nodes are identical with the “physical” items, i.e. representations of crossings in a grid, swarming entities, or gaseous containers. Thus, the data aspect of the nodes is not cleanly separated from its spatial behavior. If we separate it, the behavior of the nodes and the spatial aspects can be handled more transparently, i.e. the relevant parameters are better accessible.

Secondly, the space where those nodes are embedded is conceived as being completely neutral, as if those nodes would be arranged in deep space. Yet, everything we know of learning entities points to their mediality. In other words, the space that embeds the nodes should not be “empty”.

Using a Grid

In most of the cases the SOM is defined as a collection of nodes that are arrangement as a regular grid (4(8)n, 6n). Think of it as a fixed network like a regular wire fence, or the atomic bonds in a model of a crystal.

This layout is by far the most abundant one, yet it is the most restricted one. It is almost impossible, at least very difficult to make such a SOM dynamic, e.g. to provide it the potential to grow or to differentiate.

The advantage of grids is that it is quite easy to calculate the geometrical distance between the nodes, which is a necessary step to determine the influence between any two nodes. If the nodes are mobile, this measurement requires much much more efforts in terms of implementation.

Using Metaphors for Mobility: Swarms, or Gases

Here, the nodes may range freely. Their movement is strongly influenced (or even) restricted by the moves of its neighbors. Here, experience tells us the flocks of birds, or fishes, or bacteria, do not learn efficiently on the level of the swarm. Structures are destroyed to easy. The same is true for the gas metaphor.

Flexible Phase in a Mediating Space

Our proposal is to render the “phase” flexible according to the requirements that are important in a particular stage of learning. The nodes may be strictly arranged like in a crystal, or quite mobile, they may move around according to physical forces or according to their informational properties like the gathered data.

Ideally, the crystalline phases and the fluid phases are dependent on just a two or three parameters. One example for this is the “repulsive field”, a collection of items in a 2D space which repel each other. If the kinetic energy of those items is not too large, and the range of repellent force is not too low, this automatically leads to a hexagonal pattern. Yet, the pattern is not programmed as an apriori pattern. It is a result of properties of the items (and the embedding space). Such, the emergent arrangement is never affected by something like a “layout defect.”

Inserting a new item or removing one is very easy in such a structure. More important, the overall characteristics of the system does not change despite the fact that the actual pattern changes.

The Collection of Items : “Nodes”

In the classic SOM, nodes serve a double purpose:

  • P1 – They serve as container for references that point to records of data (=observations);
  • P2 – They present this extensional list in an integrated, “intensional” form ;

The intensional form of the list is simply the weight vector of that node. In the course of learning, the list of the records contained in a particular node will be selected such that they are increasingly similar.

Note that keeping the references to the data records is extremely important. It is NOT part of most SOM implementations. If we would not do it, we could not use the SOM as a modeling tool at all. This might be the reason why most people use the SOM just as visualization tool for data (which is a dramatic misunderstanding)

The nodes are not “directly” linked. Whether they influence each other or not is dependent on the distance between them and the neighborhood function. The neighborhood function determines the neighborhood, and it is a salient property of the SOM mechanism that this function changes over time. Important for our understanding of machine-based epistemology is that the relations between nodes in a SOM are potentially of a probabilistic character.

However, if we use a fixed grid, a fixed distance function, and a deterministically behaving neighborhood function, the resulting relations are not probabilistic any more.

Else, in case of default SOM, the nodes are passive. They even do not perform the calculation of the weight vector, which is performed by a central “update” loop in most implementations. In other words, in a standard SOM a node is a data structure.Here we arrive at a main point in our critique of the SOM

The common concept of a SOM is equivalent to a structural constant.

What we need, however, is something completely different. Even on the level of the nodes we need entities, that can change their structure and their relationality.

The concept of FluidSOM must be based on active nodes.

These active nodes are semi-autonomous. They calculate the weight vector themselves, based either on new input data, or some other “chemical” influences. They may develop a few long-range outgoing fibers or masses of more or less stable (but not “fixed”!) input relations to other nodes. The active meta-nodes in a fluid self-organizing map  may develop a nested mini-SOM, or may incorporate any other mechanism for evaluating the data to which it is pointing to, e.g. a small neural network of a fixed structure (see mn-SOM). Meta-nodes also may branch out a further SOM instances locally into relative “3D”, e.g. dependent on its work load, or again, on some “chemical influences”

We see, that meta-nodes are dynamic structures, sth like a category of categories. This flexibility is indispensable for growing and differentiation.

This introduces the seed of autonomy on the lowest possible level. Here, within the almost material processes, it is barely autonomy, it is really a mechanic activity. Yet, this activity is NOT triggered by some reason any more. It is just there, as a property of the matter itself.

We are convinced that the top-level behavioral autonomy is (at least for large parts) an emergent property that grows out of the a-reasonable activity on the micro-material level.

Data, Reflection

The profile vector of a SOM node usually contains for all mutable variables (non-ID/TV) the average of the values in the extensional list. That is, the profile vector itself does not know anything about TV or index variable…  which is solely the business of the Node.
In our case, however, and based on the principle of “strict locality,” the weight vector also may contain a further section, which is referring to dynamic properties of the node, or the data. We introduced this in a different way above when discussing the extensionality container of SOM nodes. For instance, the deviation of the data in the node against a model function (such as a correlation) such internal measurements can not be predefined, and they are also not stable input data since they are constantly changing (due to the list of data in the node, the state of other  nodes etc.).

This introduces the possibility of self-referentiality on the lowest possible level. Similar to the case of autonomy, we find the seed for self-referentiality on the topmost-level (call it consciousness…) in midst the material layer.

Programming Style

If there is one lesson we can draw from the studies of naturally occurring brains, then it is the fact that there is no master code between neurons, no “Mentalese.” The brain does not work on the base of its own language. Equivalently, there are no logical circuits implementing logic calculus. As a correlate we can say that the brain is not a thing that consists of a definite wiring. A brain is not a finite state automaton, it does not make any sense to ascribe states to brains. Instead, everything going on in a brain is probabilistic, even on the sub-cellular level. It is not determined in a definite manner, how many vesicles have to burst in a synaptic gap to cause a transmission of the signal, it is not determined how many neurons exactly make up a working group for a particular “function” etc.etc. The only thing we can say is that certain fibers collect from certain “regions”, typically millions of neurons, to other such regions.

Note that any software program IS representable by just such a definite wiring. Hence, what we need is a mechanism that can transcend its own being as mechanism. We already discussed this issue in another chapter, where we identified abstract growth as a possible route to that achievement.

The processing of information in the brain is probabilistic, despite the fact that on the top level it “feels” different for us. Now, when starting to program artificial associative structures that are able to do similar things as a brain can accomplish, we have to respect this principle of probabilization.

We not only have to avoid hard-coded wiring between procedures. We have to avoid any explicit wiring at all. In terms of software architecture this translates into the proposal that we should not rely just on object-oriented programming (OOP). For instance, we would represent nodes in a SOM as objects, and the properties of these objects again would be other objects. OOP is an important, but certainly not a sufficient design element for a machine that shall develop its own episteme.

What we have to actualize in our implementation is not just OOP, but a messaging based architecture, where all elements are only loosely coupled. The Lateral Control Mechanism (LCM) of the Kohonen SOM is a nice example for this, the explicit wiring in ANN is perfect counter-example, a DON’T DO IT. Yet, as we will see in the next section, the LCM should not be considered as a symmetric and structurally constant functional entity!

Concerning programming style, on an even lower level this translates into the heavy use of so-called interfaces, as they are so prevalent in Java. Not objects are wired or passed around, but only interfaces. Interfaces are forward contracts about the standards for the interaction of different parts, that actually can change while the “program” is running.

Of course, these considerations regard only to the lowest, indeed material levels of an associative system, yet, they are necessary. If we start with wires of any kind, we won’t achieve our goals. From the philosophical perspective it does not come as a surprise that the immanence of autonomous abstraction is to be found only in open processes, which include the dimension of mediality. Even in the interaction of its tiniest parts the system should not rely on definite encodings.

Functional Differentiation

During their development, natural systems differentiate in their parts. Bodies are comprised of organs, organs are made of different cell types, within all members of a cell a further differentiation of their actual and context-specific role may occur. The same can be observed in social insects, or any other group of social beings. They are morphologically almost identical, yet, their experience let them do their tasks differentially, or even let them do different tasks. Why then should we assume that all neurons in a large compound should act absolutely equally?

To illustrate the point we should visit a particular African termite species (Schedorhinotermes lamanianus) on which I worked as a young biologist. They are feeding on rodden/rodding wood. Well, since these pieces of wood are much larger than the termites, a problem occurs. The animals have to organize their collective foraging, i.e. where to stay and gnaw onto the wood, and where to travel to return the harvested pieces back to home nest, where they then put it to a processing chamber stuffed with a special kind of fungus. The termites then actually feed that fungus, and mostly not the wood. (though they have also bacteria in their gut to do the job of digesting the cellulose and the lignine.

Important for us is the foraging process. To organize gnawing sites and traveling routes they use pheromones, and no wonder, they use just 2 for that, which build a Turing system, as I proofed with a small bio-test together with a colleague.

In the nervous system of animals we find a similar problematics. The brain is not just a large network, over and over symmetric like a crystal. Of course not. There are compartments (see our chapter about complexity), there are fibers. The various parts of the brain even differ strongly with respect to their topological structure, their “wiring”. Why the heck should an artificial system look like a perfect crystal? In a crystal their will be no stable emergence, hence no structural learning. By the way, we should not expect structural learning in swarms either, for a very similar reason, albeit that reason instantiates in the opposite manner: complete perturbation prevents the emergence of compartments, too, hence no structural learning will be observed (That’s the reason why we do not have swarms in the skull…)

Back to our neurons. We reject the approach of a direct representational simulation of neurons, or parts of the brain. Instead we propose to focus the principles as elements of construction. Any system that is intended to show structural learning, is in urgent need of the basic differentiation into “local” and “tele” (among others). Here we meet even a structural parallelism to large urban compounds.

We can implement the emergence of such fibers in a straightforward manner, if we make it dependent on the occurrence of reproducing / repeating co-excitation of regions. This implies that we have to soften the SOM principle of the “winner-takes-it-all” approach. At least in large networks, any given observation should possibly leave its trace in different regions. Yet, our experience with very large maps indicate that this may happen almost inevitably. We just used very simple observations consisting of only 3 features (r,g, and b, such forming the RGB color triplet) and a large SOM, consisting of around 1’000’000 nodes. The topology was 4n, and the map was placed on a torus (no borders). After approx 200’000 observations, the uniqueness for color concepts started to become eroded. For some colors, two conceptual regions appeared.

In the further development of such SOMs, it is then quite naturally to let fibers grow between such regions, changing the topology of the SOM from that of a crystal to that of a brain. While the first is almost perfectly isotropic in exactly 3 dimensions, the topology of the brain is (due to the functional differentiation into tele-fibres) highly anisotropic in a high and variable dimensionality.

Conclusion

Here we discussed some basic design issues about self-organizing maps and introduced some improvements. We have seen that wording matters when it comes to represent even a mechanism. The issues we touched have been

  • – explicit distinction of intensionality and extensionality in the conceptualization of the SOM mechanism, leading to a whole “new” domain of SOM architectures;
  • – producing idealistic representations from a collection of extensional descriptions;
  • – dynamics in the extensionality domain, including embedding of other structures, thus proceeding to the principle of compartmentalization, functional differentiation and morphological growth;
  • – the distinction between modeling and associative storage, which require different morphological structures once they are distinguished;
  • – stuffing the SOM with self-organization in the strong sense;
  • – spatial layout, fixed rid versus the emergent patterns in a repulsion field of freely moving particles; distinguishing material particles from functional abstract nodes;
  • – nodes as active components of the grid;
  • – self-referentiality on the microscopic level that gives rise to emergent self-referentiality on the macroscopic level;
  • – programming style, which should not only be as abstract (and thus as general) as possible, but also has to proceed from strictly defined, strongly coupled object-oriented style to loosely coupled system based on messaging, even on the lowest levels of implementation, e.g. the interaction of nodes;
  • – functional differentiation of nodes, leading to dynamic, fractional dimensionality and topological anisotropy;

Yet, there are still much more aspects that have to be considered if one would try to approach processes on machinic substrate that could be give rise to what we call “thinking.” In discussing the design issues listed above, we remain quite on the material level. But of course, morphology is important. Nevertheless we should not conceive of morphology as a perfect instance of a blueprint, it is more about the potential, if not to say the “virtuality”, that is implied as immanence by the morphology. Beyond that morphology, we have to design the processes of dynamic change of that morphology, which we usually call growth, or tissue differentiation. Even on top of that, we have to think about the informational, i.e. immaterial processes, that only eventually lead to morphological correlates.

Anyway, when thinking about machine-based episteme, we obviously have to forget about crystals and swarms, about perfectness and symmetry in morphological structures. Instead, the design of all of the issues, whether material or immaterial, should be designed with the perspective towards an immanence of virtuality in mind, based on probabilized mechanisms.

In a further chapter (scheduled) we will try to approach two other design issues about the implementation of an advanced Self-organizing Map in more detail that we already mentioned briefly here, again oriented at basic abstract elements and the principles found in natural brains: inhibitory processes and probabilistic negation on the one hand and the chemical milieu on the other. Above we already indicated that we expect a continuum between Self-organizing Maps and Reaction-Diffusion Systems, which in our perspective is highly significant for the working of brains, whether natural or artificial ones.

۞

Context

November 19, 2011 § Leave a comment

Without context, there is nothing.

Without context, everything would be a singularized item without relations. There wouldn’t be any facts or events, there would be no information or understanding. The context provides the very basic embedding for events, the background for figures, and also hidden influences to the visible. Context could be the content-side of the inevitable mediality of being. Such, context appears as an ontological term.

Yet, context is a concept as little as ontological as any other concept. It is a matter of beliefs, cognitive capacity and convention where one draws the border between figure and ground. Or even a manifold of borders. Their is no necessity in setting a particular border, even if we admit that natural objects may form material compartments without any “cognitive” activity. Additionally, a context not only does not have borders at all, much like in topology the borderless sets, context is also a deeply probabilistic concept. In an important sense, contexts can be defined as positively definite entities only to some extent. The constraint as a way to express the context ex negativo is an important part of the concept of context. Yet, even the constraints have to be conceived as a  probabilistic actualization, as their particular actualization could be dependent on the “local” history or situation.

After all, the concept of context shares a lot with texts and writing, or, even more appropriate, with stories and narrating. As a part of a text, the context becomes subject to the same issues as the text itself. We may find grammaticality, the implied issue of acting as in the speech-act-theory, style and rhetoric, and a runaway interpretive vortex, as in Borges, or any poem. We have to consider this when we are going to choose the tools for modeling and comparing texts.

The neighborhood of texts and contexts points to the important issue of the series, and hence of time and memory. Practically spoken, in order to possibly serve as part of a context synchronicity of signs (not: signals!) have to be established. The degree of the mutual influence as well as the salience of signs is neither defined nor even definable apriori. It is the interpretation itself (understood as streaming process) that eventually forms groups of signs, figures and background by similarity considerations. Before the actual interpretation, but still from the perspective of the interpreting entity, a context is defined only in probabilistic terms. Within the process of an interpretation, now taken the position inside that process itself, the separation of signals into different signs, as well as the separation of signs into different groups, figures or background necessarily needs other “signs” as operable and labeled compounds of rules and criteria. Such “compound” entities are simply (abstract) models, brought in as types.

This result is quite important. In the definition of the concept of context it allows us to refer to signs without committing the symbolic fallacy, if the signs are grounded as operable models outside of the code of the software itself. Fortunately, self-organizing maps (SOM) are able to provide exactly this required quality.

The result provides also hints to issues in a quite different area: the understanding of images. It seems that images can not be “understood” without the use of signs, where those signs have been acquired outside of the actual process of interpreting the pixel information of an image (of course, that interpretation is not limited to descriptions on the level of pixels, despite the fact that any image understanding has to start there)

In the context of our interests here, focusing on machine-based epistemology, the concept of context is important with regard to several aspects. Most generally spoken, any interpretation of data requires a context. Of course, we should neither try to exactly determine the way of dealing with context,  nor even to define the criteria to define a particular context. In doing so, we would commit the symbolic fallacy. Any so-called ontology in computer sciences is a direct consequence of getting victimized by this fallacy.

Formalizing the concept of context does not (and can not) make any proposals about how a context has been formed or established. The formalization of context is a derived, symbolic, hence compressed view of the results of context formation. Since such a description of a context can be exported itself, the context exerts normative power. This normative power can be used, for example, to introduce a signal horizon in the population of self-organized maps (SOMs): not any SOM instance can get any message from another such instance, if contexts are used for organizing messaging between SOM instances. From a slightly shifted perspective we also could say that contexts provide the possibility to define rules that organize affectability.

In order to use that possibility without committing the symbolic fallacy we need a formalization on an abstract level. Whatever framework we use to organize single items—we may choose from set theory, topology or category theory— we also have to refer to probability theory.

A small Example

Before we start to introduce the formalization of context, we would like to provide a small example.

Sometimes, and actually more often than not, a context is considered to embed something. Let us call this item z. The embedding of z together with z then constitutes a context 𝒵, of which z is a part.Let us call the embedding E, then we could write:

𝒵 = {z, E}

Intuitively, however, we won’t allow any embedding. There might be another item p, or more generally p out of a set P, that prohibits to consider  {z, E} as 𝒵.

So we get

𝒵 ≠ {z, E, P}

or, equivalently,

𝒵 = {z, E, ¬P}

Again intuitively, we could think about items that would not prohibit the establishment of a context as a certain embedding, but if there are too much of them, we would stop to consider the embedding as a particular context. Similarly, we can operationalize the figure-ground phenomenon by restricting the length of the embedding that still would be considered as 𝒵. Other constraints could come as mandatory/probabilistic rules addressing the order of the items. Finally, we could consider a certain arrangement of items as a context even without a certain mandatory element z.

These intuitions can now be generalized and written down in a more formal way, e.g. to guide an implementation, or as we will see below, to compare it to other important formal ideas.

Components by Intuition

A context consists of four different kinds of sets, the threshold values associated them, and order relations between pairs of items of those sets. Not all of the components need to be present at the same time, of course. As we have seen, we even may drop the requirement of a particular mandatory item.

The sets are

  • – mandatory items
  • – normal items
  • – facultative items
  • – stopping items

Context, formalized

In the formal definition we do not follow the distinction of different sets as guided by intuition. A proper generalization moves the variability into mappings, i.e. functions. We need then two different kinds of mappings. The first one controls the actualization function, which reflects the relation between presence of an item and the assignment of a context. In some respect, we could call it also a “completeness function.” The second mapping describes order relations.

Such, we propose to start with three elements for a definition of the generalized context. On the upmost level we may say that a context is a collection of items, accompanied by two functions that establish the context by a combination of implying a certain order and demanding a particular completeness.

So, starting with the top level we introduce the context 𝒞 as the 3-tupel

  • 𝒞 = { Ci, A, R }

where Ci is the collection of items, A denotes the actualization function, and finally  R is a function that establishes certain relations between the item c of Ci. The items i need not be items in the sense of set theory. If a more general scope needs to be addressed, items could also be conceived as generic items, e.g. representing categories.
𝒞 itself may be used as a simple acceptance mapping

  • 𝒞: F{0,1}

or as a scalar

  • 𝒞: F { x | 0>=x>=1 }

In the second form we may use our context as basis for similarity measure!

The items c of the collection Ci have a weight property. The weight of an item is simply a degree of expectability. We call it w.

The actualization (completeness) function A describes the effect of three operations that could be applied to the collection Ci. All of those operations can be represented by thresholds.

Items c could be either

  • (i) removed,
  • (ii) non-C-items could be inserted to (or appear in) a particular observation,
  • (iii) C-items could be repeated, affecting the actual size of an observation.
  • A(1): The first case is a deterioration of the “content” of the context. This operation is modulated by the weight w of the items c. We may express this aspect as a degree of internal completeness over the collection Ci. We call it pi.
  • A(2): The second case represents a “thinning” or dilution. This affects the density of the occurrence of the items c within a given observation. We call it px.
  • A(3): The third operation of repeating items c of Ci affects the size of the observation. A context is a context only if there is some other thing than the context. Rather trivially, if the background—by definition the context—becomes figure—by definition not the context—, it is not a context any more. We may denote it simply by the symbol l. l could be given as a maximum length, or as a ratio invoking the size of C.
  • A(4) : The contrast function K, describing the differential aspect of the item sets (of the same type) between two patterns, defined as
    𝒦(x,y) = F(X ∩ Y, α(X-Y), β(Y-X)), α, β ≥ 0,
    with the possible instantiation as a ratio model
    K(a,b) = f(A ∩ B) / f(A ∩ B)+ αf(A-B)+ βf(B-A)

The last aspect of a context we have to consider is the relation R between items c. These relations are described by two functions S and D, the neighborhood function S, the dependency function D.

  • R(1) : The set of all neighborhood function S upon items c results in a partial and probabilistic serial order. One might think for instance about a context with items (v,w,x,y), where S determines a partial order such that the context gets established only if v follows x.
  • R(2) : The dependency function D(ck) imposes a constraint on pi, since it demands the actual co-occurrence of the argumented items ck.

Any formalism to express the serial order of symbolic items is allowed here, whether it is an explicit formalism like a grammar or a finite-state-automaton, or an implicit formalism like a probabilistic associative structure (ANN or SOM) accepting only particular patterns. Imposing a serial order also means to introduce asymmetries regarding the elements.

So we summarize our definition of the concept of context:

  • 𝒞 = { Ci, A, R } eq.1

where the individual terms unfold to:

  • Ci = { c (w) } eq.2, “sets, items & weights”
  • A = f( pi, px, l, K) eq.3, “actualization”
  • R = S ∩ D  eq.4, “relations”

This formal definition of the concept of context is situated on a very general level. Most important, we can use it to represent contextual structures without defining the content or the actualization of a particular instance of the concept at implementation time. Decisions about passing or accepting messages have been lifted to the operable, hence symbolic level. In terms of software architecture we can say, much like it is the case for SOM, that conditions are turned into data. In category theory we meet a similar shift of perspective, as the representability of a transformation (depictable by the “categorial triangle”) is turned into a symbol.

The items forming a context need not to be measurable on the elementary symbolic level, i.e. the items need not to form an alphabet 𝒜. We could think of pixels in image processing, for instance, or more general, any object that could be compared along a simple qualitative dimension (which could be the result of a binary mapping, of course). Yet, in the end a fixation of the measurement of the respective entity has to result in at least one alphabet, even if the items are abstract entities like categories in the mathematical sense. In turn, whenever one invokes the concept of context, this also implies any arbitrary mode of discretization of the measured “numerical” signal. Without letters, i.e. quasi-material symbols, there is no context. Without context, we would not need “letters”.

In the scientific literature, especially about thesauri, you may find similar attempts to formalize the notion of context. We have been inspired by those, of course. Yet, here we introduced it for a different purpose… and in a different context. Given the simple formalization above, we now can implement it.

Random Contexts, Random Graphs

A particular class of concepts we would like to mention here briefly, because they are essential for a certain class of self-organizing maps that have been employed in the so-called WebSom project. This class of SOMs could be described as two-layered abstracting SOM. For brevity, let us call them 2A-SOM here.

2A-SOM are used for the classification of texts with considerable success. The basic idea is to conceive of texts as a semi-ordered set of probabilistic contexts. The 2A-SOM employs random contexts, which are closely related to random graphs.

A particular random context is centered around a selected word that occurs several times in a text (or a corpus of texts). The idea is quite simple. Any of the words in a text gets a fingerprint vector assigned, consisting of random values from  [0..1], and typically of a minimal length of 80..100 positions. To build a random context one measures all occurrences of the targeted word. The length of the random context, say L(rc), is set as an odd number, i.e. L(rc) = 2*n+1, where the targeted word is always put to the center position. “n” then describes the number of preceding/succeeding positions for this word. The random context then is simply the superposition of all fingerprint vectors in the neighborhood of the targeted word. So it should be clear that a random context describes all neighborhoods of a text (or a part of it) in a single set of values.

With respect to our general notion of context there are some obvious differences to the random context as used in 2A-SOM:

  • – constant length
  • – assumption of zero knowledge: no excluding items can be represented, no order relations can be represented;

An intermediate position between the two concepts would introduce a separate weighting function W(0,1) ↦ {0,1}, which could be used to change the contribution of a particular context to the random context.

The concept of context as defined here is a powerful structure that provides even the possibility of a translation into probabilistic phrase structure grammar, or equivalently, into a Hidden-Markov-Model (HMM).

Similarity and Feature Vectors

Generalized feature vectors are an important concept in predictive modeling, especially for the task of calculating a scalar that represents a particular similarity measure. Generalized feature vectors comprise both (1) the standard vector, which basically is a row extracted from a table containing observational data about cases (observations), and (2) the feature set, that may differ between observations. Here, we are interested in this second aspect.

Usually, the difference of the set of features taken from two different observations is evaluated under the assumption that all the features are equally important. It is obvious that this is not appropriate for many cases. One possibility to replace the naive approach that treats all items in the same way is the concept of concept as developed here. Instead of simple sets without structure it is possible to use weights and order relations, both as dynamic parameters that may be adjusted during modeling. In effect, the operationalization of similarity can be changed while searching for the set of appropriate models.

Concerning the notion of similarity, our concept of context shares important ideas with the concept proposed by Tversky [1], for instance the notion of asymmetry. Tversky’s approach is, however, much more limited as compared to ours.

Modeling and Implementation

Random contexts as well as structured probabilistic contexts as defined above provide a quite suitable tool for the probabilization of the input for a learning SOM. We already have reasoned in the chapter about representation that such probabilization is not only mandatory, it is inevitable: words can’t be presented (to the brain, the mind or a SOM) as singularized “words”: they need context, the more the better, as philosophical theories about meaning or those about media suggest. The notion of context (in the way defined above) is also a practicable means to overcome the positivistic separation of syntax, semantics and pragmatics, as it has been introduced by Morris [2]. Robert Brandom in his inferentialist philosophy labeled “expressive reason” denies such a distinction, which actually is not surprising. His work starts with the primacy of interpretation, just as we do [3].

It is clear that any representation of a text (or an image) should always be started as a context according to our definition. Only in this case a natural differentiation could take place from symmetric treatment of items to their differentiated treatment.

A coherent object that consists of many parts, such like a text or an image, can be described as a probabilistic “network” of overlapping (random) contexts. Random contexts need to be used if no further information is available. Yet, even in the case of a first mapping of a complicated structure there is more information available than “no information.” Any further development of a representation beyond the zero-knowledge approach will lead to the context as we have defined it above.

Generalized contexts may well serve as a feasible candidate for unifying different approaches of probabilistic representation (random graphs/contexts) as well as operationalizations of similarity measures. Tversky’s feature-set-based similarity function(al) as well as feature-vector-based measures are just particular instances of our context. In other words, probabilistic representation, similarity and context can be handled using the same formal representation, the difference being just one of perspective (and algorithmic embedding). This is a significant result not only for the practice of machine-based epistemology, but also for philosophical questions around vagueness, similarity and modeling.

This article was first published 19/11/2011, last revision is from 30/12/2011

  • [1] Amos Tversky (1977), Features of Similarity. Psychological Review, Vol.84, No.4. available online
  • [2] Charles Morris,
  • [3] Robert Brandom, Making it Explicit, chp.8.6.2

۞

Probabilistic Networks

November 1, 2011 § Leave a comment

Everything is linked together and related.

There always have been smart people who nor only knew this, but also considered it as primary against the point, the dot, the spot. Thinking in relations is deeply incompatible with one of the most central elements of modernity, the metaphysical belief of independence. Today, in 2013, in the age of the ubiquitous “network,” everything indeed does seem to be improved, doesn’t it, given the fact that for the last 5 or 6 years the concept with the steepest career is the network.

Certainly, one of the main reason the network became a major concept from everyday life to science is given by the fact that connecting things, establishing links between devices and establishing the potential for population of links became a concrete experience, even for private persons. Before the era of WiFi and its almost perfectly automated process to establish a link, the network has been something very palpable. There have been modems for dial-up, confirming their working by a twittering sound, a lot of cables in the office, and the frequent experience of a failure of such technical infrastructure. In other words, networking became an activity with its own specific corporeality.

So, what does it mean to say that things are connected? What are the consequences, both regarding the empiric side concerning the construction or observation of systems or machines, or regarding the conceptual level? For instance, so far there is no particular “network logics”. Whenever networks meet logic, logic wins, meaning that the network will be reduced to individual steps, nodes, transfers, etc., in other words unrelated atoms.

Intuitively, the concept of networks is closely related to the notion of information. Today, this linkage has been integrated deeply into our Form of Life. Through the internet, the world wide web, and of course through the so-called social media we experience and practice this linkage in a rather intensive manner. And the social media just invoke a further important topic that is related to networks: mediality.

Here we meet a first hint for potential friction. Networks are usually well-defined, people speak about nodes and relations. Think just about the telephone network or a network of streets. Even social networks are explicable. Yet, in social media the strict determination starts to get lost. While social media are still based on a network of cables, something different is going on there, which is drastically different from the cable-layer.

The notion of partial indeterminateness brings us to mediality and its inherent element of contingency and probabilism. Yet, what does “probabilistic element” exactly refer to? Particularly with respect to networks? Is it, after all, not just some formalistic exercise to say that there is a random element, largely superfluous when it comes to real systems and problems? Particularly, as cultural artifacts are planned. Actually, I don’t think so. Quite to the opposite, one even would say that in some sense non-probabilistic networks are not networks at all.

In the remainder of this essay we will have to clarify the issues around these concepts, both regarding physical systems and the conceptual aspects, as well as the aspect of application. We will have to take a closer look to the elements of the network, nodes and links, as well as to to the network as an entirety. There is the question of the telos of the network. What is it that networks as a whole introduce? Is it possible to ask about their particular quality, beyond the trivial fact that things are connected?

Such, we first will deal with networks, their elements and the properties of both in a basic manner.

1. Basics of Networks

When dealing with networks, there is immediately a strong reference to topology, that is the way in which items belonging to the network are linked together. More precisely, what actually matters concerning the topology of networks are the symmetry properties of the connectedness. It does not really come as a surprise that the issue of symmetry relates networks to crystals, (mathematical) groups and knots. Yet topology and its symmetry is not the only important dimension.

1.1. Topology

So, before getting precise, let us start with a simple example for a network. What we see here are 3 nodes linked by 3 edges. The nodes represent items, while the edges represent certain relations between them.

o —— o
\       /
o

Actually, this example is almost too simple. Despite the fact that it contains all the basic elements, there are notably only 2, the node and the relation, many would not regard it as a network. What seems to be missing is a certain multiplicity of possible paths. Such a multiplicity would be introduced by at least one “crossing”, that is, we need at least one node that maintains three relations. In turn this means that we need at least 4 nodes to build an arrangement that could be called a network

o —— o
\       /     \
o —— o

On the other hand, we would consider arrangements like the following also as a network, though there is not multiplicity. It is a perfectly hierarchical structure, albeit there are several possible roots for it.

o —— o        o
\                  /
o —— o —-o
|
o — o — o —o
\
o

Obviously, we may distinguish networks by means of their redundancy. In physical systems, if we are going to connect points from a large set within a given “area” among each other, we usually try to avoid redundancy, since redundancy means increased costs for building and maintaining the network. Just think about a street network, the power grid, the water supply grid or the telephone network, in each case the degree of redundancy is quite low.

Yet, things are not that simple, of course. Some degree of redundancy could be quite beneficial if edges or nodes can fail. In case of the internet, for example, it was much higher at its beginnings. Actually, redundancy has been a design goal for the ArpaNet (next figure) for it ought to survive a nuclear attack to the U.S.

Figure 1: The logical layout of the ArpaNet in 1977.

IMG

scale-free Barabasi

1.2. Symmetry

 

1.3. Differentiation

Besides redundancy Taking the case of a street network as an example, the streets between crossings interpreted as edges or relations, we immediately see that beside the redundancy also the transfer capacity of edges is a further important parameter.

 

Those should be clearly distinguished from logistic networks, whose purpose is given by organizing any kind of physical transfer. Associative networks re-arrange, sort, classify and learn

logistics and growth

Yet, we only are at the beginnings to understand what networks “are.” Since there are a lot of prejudices around, we will first give some examples. The second major section discusses the main concepts and adds a few fresh ones. The third section discusses the consequences of changing a network int a probabilistic one.

mapping of items (objects) to nodes and relations to edges

(under construction)

How to Grow Associative Maps?

October 25, 2011 § Leave a comment

It is probably only partially correct to claim that the borders of the world are constituted by the borders of language. Somehow it seems more appropriate to think that the borders of the world are given by the borders of the gardens, in which the associative maps are growing. Well, I admit, we then would have to discuss what’s been first, those gardens or language. Leaving this aside for a moment, then the most essential problem can be put forward in a simple manner:

How to run the garden of associativity?

Before we start I want to briefly recapitulate the chapter about growth. There we investigated what we called “abstract growth.” We related it to a general notion of differentiation, described and operationalized by the concept of “signal strength length.” Based on that, we explored the possibility of software systems that grow by itself, i.e. without external programmers. Here, we will now proceed the next step by explicating the candidate structure for such a structure. Besides the abstract aspect of growth and differentiation, we also have to keep in mind the medial, or if you like, the communicological aspects of the regulation of the actual growth, which needs to be instantiated into a technical representation without loosing too much of the architectural structure.

Unfortunately we can not follow the poetic implications of our introduction. Instead, we will try to identify all possible ways of how our candidate structure can grow. The issues raised by the concept of associativity will be described in a dedicated chapter.

The structure that we have to choose as the basic “segmental” unit needs to be an associative structure by itself. For that, any machine learning algorithm would do it. Yet, it is not reasonable, to take a closed algorithmic procedure for that job, since this would constrain the future functional role—which is necessarily unknown at implementation time—of the unit too much. Else, it should provide robustness and flexibility. For many reasons, the Self-Organizing Map (SOM) is the best choice for the basic unit. A particularly salient property of the SOM is the fact, that it is a network which can change its (abstract) symmetry parameters, something which Artificial Neural Networks can’t as easily achieve. This means that different types of symmetry breaks can be triggered in a SOM, and in turn, that the topology of the connectivity may change even locally. In this way, a single MAP may separate into two (or several). The attractive property of such separation processes in SOM is that any degree of connectivity between the parts can establish in a self-organized manner. In other words, in the SOM the units may develop any kind of division of “labor” (storage, transmission, control), and any unit can split off and develop into a fully-fledged further MAP.

A probabilistic manifold of networks can grow and differentiate in several different ways; any of  the growth patterns we described elsewhere (see chapter about growth) may apply:

  • – growth by accretion on the level of atoms, completely independent of context;
  • – ordered growth due to needs, i.e. controlled by inner mechanisms, mainly at the “tips” of an arrangement, or similar to that, metameric growth as in the case of worms
  • – differentiation by melting, folding and pullulation inside a particular map

Maps that have been split off from existing ones may loose all direct links to its “mother map” after a certain period of time. It would then receive messages through the common and anonymous messaging mechanism. In this way a population of SOMaps will be created. Yet, the entities of this population may develop second-order ties, or in order to use a graph-theoretic term, cliques, just dependent on the stream of data flowing in and being processed by the population.

The additional SOM, whether created through separation or “de novo” by duplication need not work all on the same level of integration. If the data from external sources are enriched by variables about the SOM itself by default, for any SOM in the population, a high-level division of labor will emerge spontaneously, if the whole system is put under time or energy constraints.

It is pretty clear, that this garden of associativity has to run in a fully autonomous manner. There is no place for a gardener in this architecture. Hence, growth must be regulated. This can be effectively achieved by two mechanisms: reinforcement based on usage, and even simple evolutionary selection processes on the basis of scarcity of time, absolute space, or the analogs to energy or supply of matter.

Despite such a system it might appear distantly similar to ant hive or swarm simulations, where the task of the individual entity is that of a single, yet complete SOM, we would like to deny such a relationship.

Of course, the idea of growing SOM have been around for some time. Examples are [1] or [2]. Yet, these papers or systems did conduct neither an analysis of growth processes beforehand, nor of the broader epistemological context, probably because they have been created by software engineers; hence these approaches remain rather limited, albeit they point to the right direction.

  • [1]
  • [2]

۞

Where Am I?

You are currently browsing entries tagged with messaging at The "Putnam Program".