Beyond Containing: Associative Storage and Memory

February 14, 2012 § Leave a comment

Memory, our memory, is a wonderful thing. Most of the time.

Yet, it also can trap you, sometimes terribly, if you use it in inappropriate ways.

Think about the problematics of being a witness. As long as you don’t try to remember exactly you know precisely. As soon as you start to try to achieve perfect recall, everything starts to become fluid, first, then fuzzy and increasingly blurry. As if there would be some kind of uncertainty principle, similar to Heisenberg’s [1]. There are other tricks, such as asking a person the same question over and over again. Any degree of security, hence knowledge, will vanish. In the other direction, everybody knows about the experience that a tiny little smell or sound triggers a whole story in memory, and often one that have not been cared about for a long time.

The main strengths of memory—extensibility, adaptivity, contextuality and flexibility—could be considered also as its main weakness, if we expect perfect reproducibility for results of “queries”. Yet, memory is not a data base. There are neither symbols, nor indexes, and at the deeper levels of its mechanisms, also no signs. There is no particular neuron that would “contain” information as a file on a computer can be regarded able to provide.

Databases are, of course, extremely useful, precisely because they can’t do in other ways as to reproduce answers perfectly. That’s how they are designed and constructed. And precisely for the same reason we may state that databases are dead entities, like crystals.

The reproducibility provided by databases expels time. We can write something into a database, stop everything, and continue precisely at the same point. Databases do not own their own time. Hence, they are purely physical entities. As a consequence, databases do not/can not think. They can’t bring or put things together, they do not associate, superpose, or mix. Everything is under the control of an external entity. A database does not learn when the amount of bits stored inside it increases. We also have to be very clear about the fact that a database does not interpret anything. All this should not be understood as a criticism, of course, these properties are intended by design.

The first important consequence about this is that any system relying just on the principles of a database also will inherit these properties. This raises the question about the necessary and sufficient conditions for the foundations of  “storage” devices that allow for learning and informational adaptivity.

As a first step one could argue that artificial systems capable for learning, for instance self-organizing maps, or any other “learning algorithm”, may consist of a database and a processor. This would represent the bare bones of the classic von Neumann architecture.

The essence of this architecture is, again, reproducibility as a design intention. The processor is basically empty. As long as the database is not part of a self-referential arrangement, there won’t be something like a morphological change.

Learning without change of structure is not learning but only changing the value of structural parameters that have been defined apriori (at implementation time). The crucial step however would be to introduce those parameters at all. We will return to this point at a later stage of our discussion, when it comes to describe the processing capabilities of self-organizing maps.1

Of course, the boundaries are not well defined here. We may implement a system in a very abstract manner such that a change in the value of such highly abstract parameters indeed involves deep structural changes. In the end, almost everything can be expressed by some parameters and their values. That’s nothing else than the principle of the Deleuzean differential.

What we want to emphasize here is just the issue that (1) morphological changes are necessary in order to establish learning, and (2) these changes should be established in response to the environment (and the information flowing from there into the system). These two condition together establish a third one, namely that (3) a historical contingency is established that acts as a constraint on the further potential changes and responses of the system. The system acquires individuality. Individuality and learning are co-extensive. Quite obviously, such a system is not a von Neumann device any longer, even if it still runs on a such a linear machine.

Our claim here is that the “learning” requires a particular perspective on the concept of “data” and its “storage.” And, correspondingly, without the changed concept about the relation between data and storage, the emergence of machine-based episteme will not be achievable.

Let us just contrast the two ends of our space.

  • (1) At the logical end we have the von Neumann architecture, characterized by empty processors, perfect reproducibility on an atomic level, the “bit”; there is no morphological change; only estimation of predefined parameters can be achieved.
  • (2) The opposite end is made from historically contingent structures for perception, transformation and association, where the morphology changes due to the interaction with the perceived information2; we will observe emergence of individuality; morphological structures are always just relative to the experienced influences; learning occurs and is structural learning.

With regard to a system that is able to learn, one possible conclusion from that would be to drop the distinction between storage of encoded information and the treatment of that  encodings. Perhaps, it is the only viable conclusion to this end.

In the rest of this chapter we will demonstrate how the separation between data and their transformation can be overcome on the basis of self-organizing maps. Such a device we call “associative storage”. We also will find a particular relation between such an associative storage and modeling3. Notably, both tasks can be accomplished by self-organizing maps.

Prerequisites

When taking the perspective from the side of usage there is still another large contrasting difference between databases and associative storage (“memories”). In case of a database, the purpose of a storage event is known at the time of performing the storing operation. In case of memories and associative storage this purpose is not known, and often can’t be reasonably expected to be knowable by principle.

From that we can derive a quite important consequence. In order to build a memory, we have to avoid storing the items “as such,” as it is the case for databases. We may call this the (naive) representational approach. Philosophically, the stored items do not have any structure inside the storage device, neither an inner structure, nor an outer one. Any item appears as a primitive qualia.

The contrast to the process in an associative storage is indeed a strong one. Here, it is simply forbidden to store items in an isolated manner, without relation to other items, as an engram, an encoded and reversibly decodable series of bits. Since a database works perfectly reversible and reproducible, we can encode the graphem of a word into a series of bits and later decode that series back into a graphem again, which in turn we as humans (with memory inside the skull) can interpret as words. Strictly taken, we do NOT use the database to store words.

More concretely, what we have to do with the items comprises two independent steps:

  • (1) Items have to be stored as context.
  • (2) Items have to be stored as probabilized items.

The second part of our re-organized approach to storage is a consequence of the impossibility to know about future uses of a stored item. Taken inversely, using a database for storage always and strictly implies that the storage agent claims to know perfectly about future uses. It is precisely this implication that renders long-lasting storage projects so problematic, if not impossible.

In other words, and even more concise, we may say that in order to build a dynamic and extensible memory we have to store items in a particular form.

Memory is built on the basis of a population of probabilistic contexts in and by an associative structure.

The Two-Layer SOM

In a highly interesting prototypical model project (codename “WEBSOM”) Kaski (a collaborator of Kohonen) introduced a particular SOM architecture that serves the requirements as described above [2]. Yet, Kohonen (and all of his colleagues alike) did not recognize so far the actual status of that architecture. We already mentioned this point in the chapter about some improvements of the SOM design; Kohonen fails to discern modeling from sorting, when he uses the associative storage as a modeling device. Yet, modeling requires a purpose, operationalized into one or more target criteria. Hence, an associative storage device like the two-layer SOM can be conceived as a pre-specific model only.

Nevertheless, this SOM architecture is not only highly remarkable, but we also can easily extend it appropriately; thus it is indeed so important, at least as a starting point, that we describe it briefly here.

Context and Basic Idea

The context for which the two-layer SOM (TL-SOM) has been created is document retrieval by classification of texts. From the perspective of classification,texts are highly complex entities. This complexity of texts derives from the following properties:

  • – there are different levels of context;
  • – there are rich organizational constraints, e.g. grammars
  • – there is a large corpus of words;
  • – there is a large number of relations that not only form a network, but which also change dynamically in the course of interpretation.

Taken together, these properties turn texts into ill-defined or even undefinable entities, for which it is not possible to provide a structural description, e.g. as a set of features, and particularly not in advance to the analysis. Briefly, texts are unstructured data. It is clear, that especially non-contextual methods like the infamous n-grams are deeply inappropriate for the description, and hence also for the modeling of texts. The peculiarity of texts has been recognized long before the age of computers. Around 1830 Friedrich Schleiermacher founded the discipline of hermeneutics as a response to the complexity of texts. In the last decades of the 20ieth century, it was Jacques Derrida who brought in a new perspective on it. in Deleuzean terms, texts are always and inevitably deterritorialized to a significant portion. Kaski & coworkers addressed only a modest part of these vast problematics, the classification of texts.

The starting point they took by was to preserve context. The large variety of contexts makes it impossible to take any kind of raw data directly as input for the SOM. That means that the contexts had to be encoded in a proper manner. The trick is to use a SOM for this encoding (details in next section below). This SOM represents the first layer. The subject of this SOM are the contexts of words (definition below). The “state” of this first SOM is then used to create the input for the SOM on the second layer, which then addresses the texts. In this way, the size of the input vectors are standardized and reduced in size.

Elements of a Two-Layer SOM

The elements, or building blocks, of a TL-SOM devised for the classification of texts are

  • (1) random contexts,
  • (2) the map of categories (word classes)
  • (3) the map of texts

The Random Context

A random context encodes the context of any of the words in a text. let us assume for the sake of simplicity that the context is bilateral symmetric according to 2n+1, i.e. for example with n=3 the length of the context is 7, where the focused word (“structure”) is at pos 3 (when counting starts with 0).

Let us resort to the following example, that take just two snippets from this text. The numbers represent some arbitrary enumeration of the relative positions of the words.

sequence A of words rel. positions in text “… without change of structureis not learning …”53        54    55    56       57 58     59
sequence B of words rel. positions in text “… not have any structureinside the storage …”19    20  21       22         23    24     25

The position numbers we just need for calculating the positional distance between words. The interesting word here is “structure”.

For the next step you have to think about the words listed in a catalog of indexes, that is as a set whose order is arbitrary but fixed. In this way, any of the words gets its unique numerical fingerprint.

Index Word Random Vector
 …  …
1264  structure 0.270    0.938    0.417    0.299    0.991 …
1265  learning 0.330    0.990    0.827    0.828    0.445 …
 1266  Alabama 0.375    0.725    0.435    0.025    0.915 …
 1267  without 0.422    0.072    0.282    0.157    0.155 …
 1268  storage 0.237    0.345    0.023    0.777    0.569 …
 1269  not 0.706    0.881    0.603    0.673    0.473 …
 1270  change 0.170    0.247    0.734    0.383    0.905 …
 1271  have 0.735    0.472    0.661    0.539    0.275 …
 1272  inside 0.230    0.772    0.973    0.242    0.224 …
 1273  any 0.509    0.445    0.531    0.216    0.105 …
 1274  of 0.834    0.502    0.481    0.971    0.711 …
1274  is 0.935    0.967    0.549    0.572    0.001 …
 …

Any of the words of a text can now be replaced by an apriori determined vector of random values from [0..1]; the dimensionality of those random vectors should be around  80 in order to approximate orthogonality among all those vectors. Just to be clear: these random vectors are taken from a fixed codebook, a catalog as sketched above, where each word is assigned to exactly one such vector.

Once we have performed this replacement, we can calculate the averaged vectors per relative position of the context. In case of the example above, we would calculate the reference vector for position n=0 as the average from the vectors encoding the words “without” and “not”.

Let us be more explicit. For example sentence A we translate first into the positional number, interpret this positional number as a column header, and fill the column with the values of its respective fingerprint. For the 7 positions (-3, +3) we get 7 columns:

sequence A of words “… without change of structure is not learning …”
rel. positions in text        53        54    55    56       57 58     59
 grouped around “structure”         -3       -2    -1       0       1    2     3
random fingerprints
per position
0.422  0.170  0.834  0.270  0.935  0.706  0.330
0.072  0.247  0.502  0.938  0.967  0.881  0.990
0.282  0.734  0.481  0.417  0.549  0.603  0.827

…further entries of the fingerprints…

The same we have to do for the second sequence B. Now we have to tables of fingerprints, both comprising 7 columns and N rows, where N is the length of the fingerprint. From these two tables we calculate the average value and put it into a new table (which is of course also of dimensions 7xN). Such, the example above yields 7 such averaged reference vectors. If we have a dimensionality of 80 for the random vectors we end up with a matrix of [r,c] = [80,7].

In a final step we concatenate the columns into a single vector, yielding a vector of 7×80=560 variables. This might appear as a large vector. Yet, it is much smaller than the whole corpus of words in a text. Additionally, such vectors can be compressed by the technique of random projection (math. foundations by [3], first proposed for data analysis by [4], utilized for SOMs later by [5] and [6]), which today is quite popular in data analysis. Random projection works by matrix multiplication. Our vector (1R x  560C) gets multiplied with a matrix M(r) of 560R x 100C, yielding a vector of 1R x 100C. The matrix M(r) also consists of flat random values. This technique is very interesting, because no relevant information is lost, but the vector gets shortened considerable. Of course, in an absolute sense there is a loss of information. Yet, the SOM only needs the information which is important to distinguish the observations.

This technique of transferring a sequence made from items encoded on an symbolic level into a vector that is based on random context can be applied to any symbolic sequence of course.

For instance, it would be a drastic case of reductionism to conceive of the path taken by humans in an urban environment just as a sequence locations. Humans are symbolic beings and the urban environment is full of symbols to which we respond. Yet, for the population-oriented perspective any individual path is just a possible path. Naturally, we interpret it as a random path. The path taken through a city needs to be described both by location and symbol.

The advantage of the SOM is that the random vectors that encode the symbolic aspect can be combined seamlessly with any other kind of information, e.g. the locational coordinates. That’s the property of the multi-modality. Which particular combination of “properties” then is suitable to classify the paths for a given question then is subject for “standard” extended modeling as described inthe chapter Technical Aspects of Modeling.

The Map of Categories (Word Classes)

From these random context vectors we can now build a SOM. Similar contexts will arrange in adjacent regions.

A particular text now can be described by its differential abundance across that SOM. Remember that we have sent the random contexts of many texts (or text snippets) to the SOM. To achieve such a description a (relative) frequency histogram is calculated, which has as much classes as the SOM node count is. The values of the histogram is the relative frequency (“probability”) for the presence of a particular text in comparison to all other texts.

Any particular text is now described by a fingerprint, that contains highly relevant information about

  • – the context of all words as a probability measure;
  • – the relative topological density of similar contextual embeddings;
  • – the particularity of texts across all contextual descriptions, again as a probability measure;

Those fingerprints represent texts and they are ready-mades for the final step, “learning” the classes by the SOM on the second layer in order to identify groups of “similar” texts.

It is clear, that this basic variant of a Two-Layer SOM procedure can be improved in multiple ways. Yet, the idea should be clear. Some of those improvements are

  • – to use a fully developed concept of context, e.g. this one, instead of a constant length context and a context without inner structure;
  • – evaluating not just the histogram as a foundation of the fingerprint of a text, but also the sequence of nodes according to the sequence of contexts; that sequence can be processed using a Markov-process method, such as HMM, Conditional Random Fields, or, in a self-similar approach, by applying the method of random contexts to the sequence of nodes;
  • – reflecting at least parts of the “syntactical” structure of the text, such as sentences, paragraphs, and sections, as well as the grammatical role of words;
  • – enriching the information about “words” by representing them not only in their observed form, but also as their close synonyms, or stuffed with the information about pointers to semantically related words as this can be taken from labeled corpuses.

We want to briefly return to the first layer. Just imagine not to measure the histogram, but instead to follow the indices of the contexts across the developed map by your fingertips. A particular path, or virtual movement appears. I think that it is crucial to reflect this virtual movement in the input data for the second layer.

The reward could be significant, indeed. It offers nothing less than a model for conceptual slippage, a term which has been emphasized by Douglas Hofstadter throughout his research on analogical and creative thinking. Note that in our modified TL-SOM this capacity is not an “extra function” that had to be programmed. It is deeply built “into” the system, or in other words, it makes up its character. Besides Hofstadter’s proposal which is based on a completely different approach, and for a different task, we do not know of any other system that would be able for that. We even may expect that the efficient production of metaphors can be achieved by it, which is not an insignificant goal, since all the practiced language is always metaphoric.

Associative Storage

We already mentioned that the method of TL-SOM extracts important pieces of information about a text and represents it as a probabilistic measure. The SOM does not contain the whole piece of text as single entity, or a series of otherwise unconnected entities, the words. The SOM breaks the text up into overlapping pieces, or better, into overlapping probabilistic descriptions of such pieces.

It would be a serious misunderstanding to perceive this splitting into pieces as a drawback or failure. It is the mandatory prerequisite for building an associative storage.

Any further target oriented modeling would refer to the two layers of a TL-SOM, but never to the raw input text.Such it can work reasonable fast for a whole range of different tasks. One of those tasks that can be solved by a combination of associative storage and true (targeted) modeling is to find an optimized model for a given text, or any text snippet, including the identification of the discriminating features.  We also can turn the perspective around, addressing the query to the SOM about an alternative formulation in a given context…

From Associative Storage towards Memory

Despite its power and its potential as associative storage, the Two-Layer SOM still can’t be conceived as a memory device. The associative storage just takes the probabilistically described contexts and sorts it topologically into the map. In order to establish “memory” further components are required that provides the goal orientation.

Within the world of self-organizing maps, simple (!) memories are easy to establish. We just have to combine a SOM that acts as associative storage with a SOM for targeted modeling. The peculiar distinctive feature of that second SOM for modeling is that it does not work on external data, but on “data” as it is available in and as the SOM that acts as associative storage.

We may establish a vivid memory in its full meaning if we establish two further components: (1) targeted modeling via the SOM principle, (2) a repository about the targeted models that have been built from (or using) the associative storage, and (3) at least a partial operationalization of a self-reflective mechanism, i.e. a modeling process that is going to model the working of the TL-SOM. Since in our framework the basic SOM module is able to grow and to differentiate, there is no principle limitation of/for such a system any more, concerning its capability to build concepts, models, and (logical) habits for navigating between them. Later, we will call the “space” where this navigation takes place “choreosteme“: Drawing figures into the open space of epistemic conditionability.

From such a memory we may expect dramatic progress concerning the “intelligence” of machines. The only questionable thing is whether we should call such an entity still a machine. I guess, there is neither a word nor a concept for it.

u .

Notes

1. Self-organizing maps have some amazing properties on the level of their interpretation, which they share especially with the Markov models. As such, the SOM and Markov models are outstanding. Both, the SOM as well as the Markov model can be conceived as devices that can be used to turn programming statements, i.e. all the IF-THEN-ELSE statements occurring in a program as DATA. Even logic itself, or more precisely, any quasi-logic, is getting transformed into data.SOM and Markov models are double-articulated (a Deleuzean notion) into logic on the one side and the empiric on the other.

In order to achieve such, a full write access is necessary to the extensional as well as the intensional layer of a model. Hence, artificial neuronal networks (nor, of course, statistical methods like PCA) can’t be used to achieve the same effect.

2. It is quite important not to forget that (in our framework) information is nothing that “is out there.” If we follow the primacy of interpretation, for which there are good reasons, we also have to acknowledge that information is not a substantial entity that could be stored or processed. Information is nothing else than the actual characteristics of the process of interpretation. These characteristics can’t be detached from the underlying process, because this process is represented by the whole system.

3. Keep in mind that we only can talk about modeling in a reasonable manner if there is an operationalization of the purpose, i.e. if we perform target oriented modeling.

  • [1] Werner Heisenberg. Uncertainty Principle.
  • [2] Samuel Kaski, Timo Honkela, Krista Lagus, Teuvo Kohonen (1998). WEBSOM – Self-organizing maps of document collections. Neurocomputing 21 (1998) 101-117.
  • [3] W.B. Johnson and J. Lindenstrauss. Extensions of Lipshitz mapping into Hilbert space. In Conference in modern analysis and probability, volume 26 of Contemporary Mathematics, pages 189–206. Amer. Math. Soc., 1984.
  • [4] R. Hecht-Nielsen. Context vectors: general purpose approximate meaning representations self-organized from raw data. In J.M. Zurada, R.J. Marks II, and C.J. Robinson, editors, Computational Intelligence: Imitating Life, pages 43–56. IEEE Press, 1994.
  • [5] Papadimitriou, C. H., Raghavan, P., Tamaki, H., & Vempala, S. (1998). Latent semantic indexing: A probabilistic analysis. Proceedings of the Seventeenth ACM Symposium on the Principles of Database Systems (pp. 159-168). ACM press.
  • [6] Bingham, E., & Mannila, H. (2001). Random projection in dimensionality reduction: Applications to image and text data. Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 245-250). ACM Press.

۞

Advertisements

FluidSOM (Software)

January 25, 2012 § 7 Comments

The FluidSOM is a modular component of a SOM population

that is suitable to follow the “Growth & Differentiate” paradigm.

Self-Organizing Maps (SOM) are usually established on fixed grids, using a 4n or 6n topology. Implementations as swarms or gas are quite rare and also are burdened with particular problems. After all, we don’t have “swarms” or “gases” in our heads (at least most of us for most of the time…). This remains true even if we would consider only the informational part of the brain.

The fixed grid prohibits a “natural” growth or differentiation of the SOM-layer. Actually, this impossibility to differentiate also renders structural learning impossible. If we consider “learning” as something that is different from mere adjustment of already available internal parameters, then we could say that the inability to differentiate morphologically also means that that there is no true learning at all.

These limitations, among others, are overcome by our FluidSOM. Instead of fixed grid, we use a quasi-crystalline fluid of particles. This makes it very easy to add or to remove, to merge or to split “nodes”. The quasi-grid will always take a state of minimized tensions (at least after shaking it a bit … )

Instead of fixed grid, we use a quasi-crystalline fluid of particles. This makes it very easy to add or to remove, to merge or to split “nodes”. The quasi-grid will always take a state of minimized tensions (at least after shaking it a bit … )

As said, the particles of the collection may move around “freely”, there is no grid to which they are bound apriori. Yet, the population will arrange in an almost hexagonal arrangement… if certain conditions hold:

  • – The number of particles fits the dimensions of the available surface area.
  • – The particles are fully symmetric across the population regarding their properties.
  • – The parameters for mobility and repellent forces are suitably chosen

Deviations from a perfect hexagonal arrangement are thus quite frequent. Sometimes hexagons enclose an empty position, or pentagons establish instead of hexagons, frequently so near the border or immediately after a change of collection (adding/removing a particle). This, however, is not a drawback at all, especially not in in case of SOM layers that are relatively large (starting with N>~500). In really large layers comprising >100’000 nodes, the effect is neglectable. The advantage of such symmetry breaks on the geometrical level, i.e. on the quasi-material level, is that it provides a starting point for natural pathway of differentiation.

There is yet another advantage: The fluid layer contains particles that not necessarily are identical to the nodes of the SOM, and also the relations between nodes are not bound to the hosting grid.

The RepulsionField class allows for a confined space or for a borderless topology (a torus), the second of which is often more suitable to run a SOM.

Given all the advantages, there is the question why are fixed grids so dramatically preferred against fluid layouts? The answer is simple: it is not simple at all to implement them in a way that allows for a fast and constant query time for neighborhoods. If it takes 100ms to determine the neighborhood for a particular location in a large SOM layer, it would not be possible to run such a construct as a SOM at all: the waiting time would be prohibitive. Our Repulsion Field addresses this problem with buffering, such it is almost as fast as the neighborhood query in fixed grids.

So far, only the RepulsionField class is available, but the completed FluidSOM should follow soon.

The Repulsion Field of the FluidSOM is available through the Google project hosting in noolabfluidsom.

The following four screenshot images show four different selection regimes for the dynamic hexagonal grid:

  • – single node selection, here as an arbitrary group
  • – minimal spanning tree on this disjoint set of nodes
  • – convex hull on the same set
  • – conventional patch selection as it occurs in the learning phase of a SOM

As I already said, those particles may move around such that the total energy of the field gets minimized. Splitting a node as a metaphor for natural growth leads to a different layout, yet in a very smooth manner.

Fig 1a-d: The Repulsion Field used in FluidSOM.
Four different modes of selection are demonstrated.

To summarize, the change to the fluidic architecture comprises

  • – possibility for a separation of physical particles and logical node components
  • – possibility for dynamic seamless growth or differentiation of the SOM lattice, including the mobility of the “particles” that act as node containers;

Besides that FluidSOM offers a second major advance as compared to the common SOM concept. It concerns the concept of the nodes. In FluidSOM, nodes are active entities, stuffed with a partial autonomy. Nodes are not just passive data structures, they won’t “get updated2 by a central mechanism. In a salient contrast they maintain certain states comprised by activity and connectivity as well as their particular selection of a similarity function. Only in the beginning all nodes are equal with respect to those structural parameters. As a consequence of these properties, nodes in FluidSOM are able to outgrow (pullulate) new additional instances of FluidSOM as kind of offspring.

These two advances removes many limitations of the common concept of SOM (for more details see here).

There is last small improvement to introduce. In the snapshots shown above you may detect some “defects,” often as either holes within a perfect hexagon, or sometimes also as a pentagon. But generally it looks quite regular. Yet, this regularity is again more similar to crystals than to living tissue. We should not take the irregularity of living tissue as a deficiency. In nature there are indeed highly regular morphological structures, e.g. in the retina of the eyes in vertebrates, or the faceted eyes of insects. In some parts (motor brains) of some brains (especially birds) we can find quite regular structures. There is no reason to assume that evolutionary processes could not lead to regular cellular structures. Yet, we never will find “crystals” in any kind of brain, not even in insects.

Taking this as an advice, we should introduce a random factor into the basic settings of the particles, such that the emerging pattern will not be regular anymore. The repulsion principle still will lead to a locally stable configuration, though. Yet, strong re-arrangement flows are not excluded either. The following figure show the resulting layout for a random variation (within certain limits) of the repellent force.

Figure 2: The Repulsion Field of FluidSOM, in which the particle are individually parameterized with regard to the repellent force. This leads to significant deviations  from the hexagonal symmetry.

This broken symmetry is based on a local individuality with regard to repellent force attached to it. Albeit this individuality is only local and of a rather weak character, together with the fact of the symmetry break it helps to induce it is nevertheless important as a seed for differentiation. It is easy to imagine that the repellent forces are some (random) function of the content-related role of the nodes that are transported by the particles. For instance, large particles, could decrease or increase this repellent force, leading to a particular morphological correlates to the semantic activity of the nodes in a FluidSOM.

A further important property for the determining the neighborhood of a particle is directionality. The RepulsionField supports this selection mode, too. It is, however, completely controlled on the level of the nodes. Hence we will discuss it there.

Here you may directly download a zip archive containing a runnable file demonstrating the repulsion field (sorry for the size (6 Mb), it is not optimized for the web). Please note that you have to install java first (on Windows). Else, I recommend to read the file “readme.txt” which explains the available commands.

GlueStarter (Software)

January 19, 2012 § Leave a comment

The GlueStarter is a small infrastructure component

that allows for remote start and shutdown of java jar files. A remote instance can send a command via TCP socket to the GlueStarter. Such, the remote instance can start a dedicated Java VM that is running the desired program. The GlueStarter itself may be started as auto-start during startup of the OS. Hence, one could conceive it also as a so-called daemon, yet one that  is just written in pure Java, which is a very convenient property.

The GlueStarter is a helper component for the NooLabGlue system. Both are operating completely independent, of course. In the context of the probabilistic approach to (a population of) growing networks, the GlueStarter can be conceived as something like a “growth mechanism.”

The GlueStarter just executes the commands to create or to shutdown instances of the modules that are linked together via the NooLabGlue system. Such, it does not contain any administration functionality… it really knows (almost) nothing about the modules or their state. The only information that it could make available to other modules is the number of modules that have been started by GlueStarter.

GlueStarter is available through the Google project hosting for noolabglue, a more detailed description is here (coming soon).

NooLabGlue (Software)

November 11, 2011 § Leave a comment

NooLabGlue is a framework to link applications or parts of applications. There are of course already a lot of them, currently O’Reilly lists 185 different ones in their P2P directory (although they include also very low-level items such as XML-RPC).

NooLabGlue is different. The paradigm is aligned to natural neural systems. Thus, its primary emphasis is not on throughput; instead, NooLabGlue tries to support the associative, probabilistic, adaptive linking of a growing population of Self-Organizing Maps, where the individual SOMs tend to diverge regarding their “content.”

Naturally, NooLabGlue is a framework for massively modular neural systems, where parallelity may occur at any level. Such artificial neural systems may be realized as Self-Organizing Maps (SOM, Kohonen-Map), or as more traditional ANN. Yet, NooLabGlue also allows to link traditional components or applications in the EAI context or its usage as a simple service bus. NooLabGlue transcends the client/server paradigm and replaces it by sources/receptor paradigm before a communicological background.

Such, not only people behind their computing machines can be connected, but more or less autonomous entities of software. Yet, NooLabGlue is not a middleware mainly designed to to distribute a single, particular, well-identified learning task, or even to serve as an infrastructure to distribute the training of an “individual” SOM. These kinds of goals are much too narrow-minded. They just virtualize a Turing-Computation-task to run on many machines instead of one. In contrast to that we are looking for a middleware that is supporting a growing population of probabilistic connected SOMs in the context of Non-Turing-Computation, which requires strikingly different architectural means for the middleware.

The level of integration follows a strikingly different paradigm as compared to other packages or approaches, starting from SOAP, webservices, RMI and up to systems like Akka. The contracting is not accomplished on the level of the field (which is the reason for complexity in SOAP and WS), but on the level of document types, behaviors, and names. Related to that is the plan to incorporate the ability for learning into the MessageBoard.

NooLabGlue follows an explicit transactional approach, which is (with the exception of the transaction id) completely hidden on the level of the frontend-API. Available transports are udp, tcp, ftp, restful http (implemented using the Restlet framework), while everything is wrapped into xml. The complexity of those protocols is completely hidden on the level of the API. Supporting different transport protocols allows for speedy connections in a LAN, while at the same time the whole framework also can run over the WWW/WAN; actually, MessageBoards (the message servers) allow for cascading a message (even transactional messages) from local to remote segments of a network.

The API for participants (“clients” of the MessageBoard) is very simple; it just provides a factory method for creating an instance, as well as “dis/connect(),” “send(),” and a callback for receiving the messages. Sending a (transactional) message is realized as a service, i.e. as a persistent contract to the “future” (or in short, as a future). Participants and even the MessageBoard could be shut down and restarted without loosing a (fully transferred) message. Lost connections are re-established without cookies on the basis of (optionally) unique names.

Both, the participants as well as the MessageBoard are running in a multi-threaded manner, of course, for all kinds of transport layers. Language of realization is Java, though any language could be used to connect to the MessageBoard. In principle, the MessageBoard could be realized also using php, for instance, since the server does not store or access any binary object.

NooLabGlue has been created in the context of advanced machine-learning that proceeds ways beyond the issue of the algorithm. The goal (within next 6 weeks) is a “natural account” of understanding (e.g. language) based on the paradigm of machine-based epistemology.

The attached document  glue messaging features v1.0 contains further details about the intended specification. It can be downloaded here.

How to Glue

November 9, 2011 § Leave a comment

Even software engineers know about their holy grail.

As descendants of the peak of modernism, I mean the historical phase between the “Wiener Kreis” and its founders of the positivism and the formalization of cybernetics by Norbert Wiener, this holy grail is given, of course, by the trinity of immediacy, transparency and independence.

It sounds somewhat strange that programmers are chasing this holy trinity. All the day long they are linking and fixing and defining, using finite state automata (paradoxically called “language”). Yet, programming is also about abstraction, which means nothing less as to decouple “things” from “ideas,” or more profane, the material  from the immaterial. Data are decoupled from programs, data are also decoupled from their representation (format), machines are decoupled, nowadays forming clouds (what a romantic label for distributed control), even inside a program, structures are decoupled from their actualization (into a programming language), objects are decoupled as much as possible and so on.

These are probably some of the reasons upon which Donald Knuth coined the issue of “The Art of Programming.” Again, it has of course nothing to do with the proclaimed subject, art in this case, even not metaphorically. Software engineers suffer from the illness of romanticism since the inception of software. Software always has a well-identified purpose, simply put, by definition. By definition, software is not art. Yet, there is of course something concerning software engineering, which can’t be defined formally (even as it is not art); maybe, that’s why Knuth felt to be inclined towards this misleading comparison (it not only hides the “essentials” of art, but also of formalization).

Due to the mere size of the issue of abstraction or even abstraction in programming we have to refrain from the discussion of this trinitarian grail filled with modernist phantasms about evanescence. We will perhaps do so elsewhere. Yet, what we will introduce here is a new approach to link different parts of a program, or different instances of some programs, where “program” means something like “executable” on a standard physical computer in 2011.

This approach will follow the constraint that the whole arrangement should be able to grow and to differentiate. In that we have to generalize and to transcend the principles as listed above. In a first, still coarse step we could say that we relate programs such that

  • – the linkage is an active medium, it is ought to be meta-transparent and almost immediate;
  • – independence is indeed complete, thus causing the necessity of extrinsic higher-order symbolism (non-behavioristic behavior)
  • – the linking instance is able to actualize very different, but first of all randomized communicological models

We created a small piece of software that is able (not yet completely, of course) to  represent these principles. Soon you will find it in the download area.

There are, of course, a lot of highly sophisticated softwares, both commercial and open sourced, dealing with the problem of linking programs, or users to programs. We discuss a sample of the more important ones here, along with the reasons why we did not take them as-is for our purpose of a (physically and abstractly) growing system. Of course, it was inspiring to check their specifications.

This brings us to the specification of or piece, which we call NooLabGlue:

  • – its main purpose is to link instances or parts of programs, irrespective of physical boundaries, such as networks or transport protocols (see OSI-stack);
  • – above all, its usage of the client-library API for the programmer has to be as simple as possible; actually, it has to be so simple, that its instantiation can be formalized.
  • – its functionality of “glueing” should not be dependent on a particular programming language;
  • – in its implementation (on the level of the computation procedure [1]) it has to follow the paradigm of natural neural systems: randomization and associativity; this means that it also should be able to learn;
  • – any kind of “data” should be transferable, even programs, or modules thereof;

In order to fulfill this requirements some basic practical decisions have to be made about the “architecture” and the tools to be employed:

  • – the communicative roles of a “glued” system are participants that are linking to a MessageBoard; participants have properties, e.g. regarding their type of activity (being source, receptor, or both), the type of data they are releasing or accepting, or as specified by more elaborate filters (concerning content), issued by the participants and hosted by the MessageBoard; there may be even “personal” relationships between participants, or between groups (1+:1+) of them;
  • – the exchange of messages is realized as a fully transactional system: participants as well as the MessageBoard (the message “server”) may shut down / restart at any time, and still “nothing” will be lost (nothing: none of the completely transferred messages);
  • – xml and only xml is transferred, no direct object linking like in ORB, RMI, or Akka; objects and binary stuff (like pdf, etc.) are taken as data elements inside the xml and they are transferred in encoded form (basically as a string of base64);
  • – contracting is NOT on the level of fields and data types, but instead on the level of document types and the respective behavioral level;
  • – MessageBoards are able to cascade is necessary, using different transport protocols or OSI-layers at the same time, e.g. for relaying messages between local and remote resources;
  • – the “style” to run remote MessageBoards is actualized following restful approach (we use the Java Restlet framework); that means, that there are no difficulties to connect applications across firewalls; note that the restful resource approach is taken as a style on top of HTTP, as a replacement of home-grown HTTP client; yet, the “semantics” will NOT be encoded into the URL as this would need cookies and similar stuff: the “semantics” remains fully within the xml (which even could be encrypted); in local networks, one may use transport through UDP (small messages) or TCP (any message size), neither the programmer nor the “vegetative” system need to be aware of the actual format/type of transport protocol.

Now, imagine that there are several hundred instances of growing and pullulating Self-Organizing Maps around, linked by this kind of “infrastructure”… the SOM may even be part of the MessageBoard (at least concerning the adaptive routing of “messages”).

Such an infrastructure seems to be perfect for a system that is able to develop the capability for Non-Turing-Computation.

(download of a basic version will be available soon)

  • [1]

Where is the Limit?

October 20, 2011 § Leave a comment

What is the potential of “machines” ?

As a favorable first way to approach this topic—it is the central topic of this blog—we consider the question about the limits of language, of minds, brains or machines

Nothing brand new, of course, to think about limits. Famous answers have been given throughout the last 2’500 years. Yet, those four words denote not just things, so neither words nor the related concepts are simple. Even worse, language becomes deceptive exactly in the moment when we start to try to be exact. Is there something like “the” mind? Or “the” language? Supposedly, not. The reasoning for that we will meet later. How, then, to ask about the limit of language, minds and brains, if there is no other possibility than to using just that, language, mind(s), and brain(s)? And what, if we consider limits to “any” side, even those not yet determined? What are the lower limits for a language {brain,mind} such that we still can call it a language {mind,brain}? After a few seconds we are deep on philosophical territory. And, yes, we guess, that reasonable answers to the introductory question could be found only together with philosophy. Any interest in the borders of the “machinic,” a generalization coined by Félix Guattari, is only partially a technical concern.

Another common understanding of limits takes it as the demarcation line between things we know and things we believe. In philosophical terms, it is the problem of foundations and truth: What can we know for sure? How can we know that something is true? We know quite well that this question – it is somehow the positive version of the perspective towards the limits – can not be answered. A large area of disputes between various and even cross-bred sorts and flavors of idealism, realism, scientism, gnosticism and pragmatics blossomed upon this question, and all ultimately failed. In some way, science is as ridiculous (or not) as believing in ghosts (or god(s)), but what do they share? Perhaps, to neglect each other, or that they both neglect a critical attitude towards language? Just consider that scientists not only speak of “natural laws,” they even do not believe in them, they take them (and the idea of it) as a fundamental and universally unquestionable truth. Or clergymen strive to proof (sic!) the existence (!) of God. Both styles are self-contradictory, simply and completely.

The question about limits can be put forward in a more fruitful manner. Instead of asking about the limits we probably better ask for the conditions of our something. This includes the condition of the possibility as well as it points to the potential for conditions. What are the conditions to speak about logics, and to use it? Is it justified, to speak about “justified beliefs”? What are the conditions to use concepts (grm.: Begriff), notions, apprehensions? And to bring in ‘the’ central question of empiricism: How do words acquire meaning? (van Fraassen)

The fourth of our items, the machines, seem to stand a bit apart from the others, at first sight. Yet, the relationship between man and machines is a philosophical and a cultural issue since the beginnings of written language. The age of enlightenment was full of machine phantasms, Descartes and Leibniz built philosophies around it. Today, we call computers “information processing machines”, which is a quite insightful and not quite innocent misnomer, as we will see: Information can not be processed by principle. Computers re-arrange symbols of a fixed code, which is used to encode information. This mapping is not complete. of course, a fact deliberately overlooked by many computer scientists.

Yet, what are the limits of those machines? Are those limits given by what we call programming language (another misnomer)? How should we program a computer such that it becomes able to think, to understand language, to understand? Many approaches have been explored by computer engineers, and sometimes these approaches have been extremely silly in a twofold way: as such, and given the fact that they repeat rather primitive thoughts from medieval ages.

The goal of making the machine intelligent can not be accomplished by programming in principle. One of the more obvious reasons is that there is nothing like a meta-intelligence, no meta-knowledge, as well as there is no private language (it was Wittgenstein who proofed that). If an entity reacts in a predetermined way, it is not intelligent, neither it understands anything – it performs look-ups in a database, nothing else. The conclusion seems to be that we have to program the indeterminate, which is obviously a contradictio in adjecto. Where then does the intelligence, the potential for understanding come from? Nevertheless, we start with some kind of programming. But what is it that makes the machine intelligent? How to program the inprogrammable? Or, more fruitful, which kind of conditions should we meet in the program such that we do not exclude the emergence of intelligence? The only thing which seems to be sure here is that neither the brain is a computer, nor the computer is a kind of brain. So, what we strive to do won’t be simulation either. It has more to do with a careful blend of rather abstract principles that are hosted in a flowing stream of information, actualized by (software) structures that inherently tend towards indeterminableness. It is (not quite) another question whether such software still could be called software.

Where Am I?

You are currently browsing entries tagged with software at The "Putnam Program".