November 20, 2011 § Leave a comment

It is not just the relation between brain matter and thinking,

for which language seems to play a salient role. It is more appropriate to conceive language as the most important ingredient for any aspect of the “relationability” of humans, our ability to build and maintain relations to any aspect of the world. Yet, it is not the only important one.

Quite obviously, it would be seriously presumptuous to try to develop a theory (just another one…) about language. So, we take an easy first step in declaring that we follow largely the Wittgensteinian view about language. Still, there are many aspects that we have to derive in other chapters, such as associativity, modeling, the status of symbols in the brain and in thinking (more general: in epistemic affairs), the famous transition from the probabilistic to the propositional, the enigma of the relation between structure and content, and so on.

Here, our interest is driven by the difference between “(Natural Language) Processing” and “Natural (Language Processing),” where we would like to move from the previous to the latter. From that we can derive two directions.

  • (1) With regard to the question of the possibility of machine-based epistemology, we have to understand the role of the symbolic in thinking from the perspective of language.
  • (2) We have to understand language and its role in thinking if we would like to “implement” the basic conditions for developing a capability to deal with (human) language.

One of the most puzzling issues about language—and I take these puzzles as something quite positive, if not even wonderful—is the rich set of observable phenotypes. It stretches  from arrangements consisting only from a few dozens of words up to several hundreds of thousands, from designs almost free of any (linguistic) structure through pieces that are almost completely regulated. There is a large variety in the relation between rules for arrangements and how things are said. And above all, the way to handle, but also to introduce the indeterminacy (e.g. perlocutionary [1] or even “translocutionary” aspects) into the stream of utterances, signs and language tokens exhibits an extreme variety.

Yet, for all languages it holds that private languages are not possible. This is even true for programming languages, which are an extremely reduced version of a language. Wittgenstein’s private language argument reaches far beyond what could be called the necessity for commonality.

There are essentially two indeed almost explosive consequences. The first one concerns meaning: Meaning is not a mental entity [2]. Meaning does not appear just by some mental processes. Meaning can’t be the result of some computation. Hence, it is also impossible to “transfer” meaning from one entity to another.

It is therefore not quite rewarding to assume that the purpose of language is to transfer meaning unambiguously, as it is abundant in linguistics or computer sciences. But one of the most serious instances of deep nonsense is the attempt to create a database, call it ontology and then claim that the meaning of words and concepts have been captured in it. This nonsense is not just a marginalia, it is a dangerous phenomenon, because people start to believe in it (see the so-called semantic search  engines like Wolfram alpha, of which they say that is based on formulas… I really could not imagine a more ugly thought…)

Yet, it still makes sense to say that we know. The private language argument, however, enforces a radical change of our conception of knowledge. Given (or, for the moment: accepting) the fact that our thinking takes place in structures like our brain (or body as a whole), the question arises what is going on there? On the level of neurons there are no signs. Mentalesic (Fodor’s hypothesis [3]), societies in mind, or of minds inside the skull (Minsky [4]) and any other concept referring to some kind of homunculus are wrong, such things can not and actually also do not exist. Equivalently, it is impossible to “program” language understanding as we program, say, a spreadsheet software.

On the other hand we know that the brain is indeed a complex system, full of emergent phenomena. Third, if we say “I know,” we clearly refer to internal and private and stateless processes taking place in our brain. We also have the experience that words are appearing in our thought, yet not in the same way as in an external text. They are always somehow fluffy, and in most speech acts, it is more a “it is speaking” than a “I am speaking”. The latter is a diagnostic statement that is always a posteriori to the utterance.

From all of these circumstantial evidences we have to conclude (and its the only possible conclusion) that (1) we have to translate our internal models (on the level of “mind matter”) into language, and (2) language itself is that tool for translation.

So our first result can be comprised in the following two statements.

  • (1) Meaning is not a mental entity.
  • (2) Language is the tool and the result of a mental translation process.

Everyone knows the well-founded feeling (!) of not having found the “appropriate” word. Yet, we can’t talk about that in any more detail, it is a true language feeling. The translation of a set of internal model residing somewhere in the associative matter of the brain into language is deficient, and always and necessarily so. The translation itself is some kind of modeling, of course.

This translation not only removes information it also adds a lot. Translating something into a series of words means to create a text. Our internal relationships thus are quite similar to the hermeneutic setting. Foucault was completely right to investigate the “hermeneutics of the subject” [5]. Nevertheless it would be quite inappropriate then to return to the homunculus. Human thinking thus is not comprised from some kind of author and an audience. Anyway, we can understand now that we relate to ourselves always only in a modeling relation. If it would not be wrong to say so, we could say that we have no direct access to ourselves. This “me” appears only after rather complicated modeling. There is no “me” instance somewhere in the brain or the mind or the body. It is impossible. Claiming it instead, one would give up all the insights up to the denial of the possibility of a private language. Equivalently, one would then claim that we are trivial, pre-programmed machines.

Returning to our two introductory questions from above we recognize that we have answered them both.

Our results are not only relevant for the realm of natural languages, they are also quite important for what is called “formal concept analysis.” In the same way we have to translate the almost fleshy part of our thoughts into language, we have to translate them into logics, or any other formal concept. That is, logics and formalisms as we communicate it, are epiphenomena. Analogous to Wittgenstein’s private language argument we thus can conclude that there is no private logics. It can not be programmed and there is absolutely no objectivity in it, just as little (or as much) as it is the case for language. For Wittgenstein, logics was not a starting point (which would need arbitrary axioms), it was a consequence of “using and partaking in language.” [6,7] This dependence of logics from practical living introduces a distinction, which is often overlooked. Logics is a purely transcendental entity, something that we have to “assume” like space and time. Of course, it does not make much sense to doubt on these. But logics can not be “found” in the world. It is without sense to babble about truth, or truth values. We can not talk about truth in this world as we can not talk about god being in this world. If people do instead, the usual consequence is murder, million-fold often (think about religious wars, or ideological wars or civil wars). What we instead do find in the world is a practical, negotiated, and negotiable instance of transcendental logics. It almost looks like (transcendental) logics, but is not. Any real-world logics, even in mathematics, is “contaminated” by usage, that is by some kind of reference to rather arbitrary material contexts.

These results exert some rather non-trivial consequences for epistemology, of course. First, we have to understand that we can not “transfer” knowledge on the symbolic level, neither as a sequence of symbols or signs, nor as a series of commands. The whole bag of rule-following in social contexts is implied. This can be related to the famous Grey Parrot “Alex” who has been “educated” by Irene Pepperberg [8].

Second, the role of the “me” in sentences like “I know” or “I am thinking” has to be re-evaluated. There might be some rare situations where this indeed happens, but in most situations it is not appropriate to assign too much weight to such statements. We actually are entitled to ask “Where does thinking take place?” since the answer is not anymore “in the brain” or “in the mind”. The brain and its minds are hosting processes that are private, yes, but this privacy is that of a liver working in the belly of the body. It is not relevant for the issue of epistemology. If three (or 2,4,12,25,…) humans are talking to each other, we can say, much as a programmer does, that the thoughts are running as a kind of distributed process. It does not make any sense to ask where in a self-organizing map the processing of a particular item took place, even if we can find that item at a definite (virtual) location in the SOM (or in brain matter) after processing. Yet, that definiteness appears only after the processing, if at all.

This triggers a further, quite different perspective. Quite obviously, thinking is a strongly deterritorialized phenomenon. It is almost free of sense to try to locate thinking. It is not only useless, it is even wrong to claim that thinking takes place in the brain. The whole area of the so-called analytic philosophy of mind (e.g. Ansgar Beckermann [9]) is devoid of sense.

This deterritorialization of thought links us directly to Deleuze [10], his concept of “images of thought” [11], the differential and his notions of immanence and virtuality. All these insights would not be applicable if we would drop, even implicitly or as a matter of fact, the impossibility of private languages. Saying so, it is also clear that we do not agree in the claim that language is a transcendental entity (e.g. H.J. Schneider [12]), language is a very worldly phenomenon, albeit it triggers the issues of immateriality, immanence, and virtuality. Deleuze and Guattari provided us a smart and beautiful entry point into this field by their book “What is Philosophy?” [13] A nice account for a “Deleuzian” interpretation of the “brain” has recently been given by Lambert and Flaxman [14]. After all, I find it also almost beautiful that there is a deep link between the philosophical stances of Deleuze and Wittgenstein, despite the fact that Deleuze once called Wittgenstein’s philosophy a “catastrophe.”

Conceiving thinking and language as deterritorialized phenomena is equivalent with the insight that there can not be a single model about it. In much the same way it is not possible to “isolate” brains. There is no such thing as a single brain [15], not only in the case of humans. This applies to animals and machines as well. It is also not possible to isolate a single sentence in order to “analyze” it. Trying to do so is simply silly. If at all, a sentence in language is a phenomenon where the virtual is captured by the immanent. “As such,” a sentence is pure void.

Here is the perfect place to remember to Stanislav Lem and his famous novels and short stories. In one of the pieces, “Personetics,”[16] he shows that simulated beings need to be simulated in a group to develop and evolve a language on their own.

We can not program language, language understanding, or symbolic representations of texts, then calling it knowledge, by following some utopian specification. Yet, it is a fact that we as humans think also by means of our body, i.e. the body is a necessary condition for it. The basic models spring out from the associativity of our material arrangement: you may call it brain, nervous system, central processing unit, self-organizing map, etc. Of course, we should not commit the mistake of representationalism here, asking about the “wiring” or data structures. The processing of information in the brain is first a probabilistic affair, the capability to deal with propositional structures being only a consequence, among other factors, also of language.

The aspect of corporeality is not limited to biological matter that we use to call “body”. Our biological body is just one particular means to provide self-sustaining complexity and persistence. Even in the most abstract regions we find habits that could be said to form a “body of thought,” blurring the categorial difference between the material and the immaterial.

Thus, before starting to program the processing of “natural languages” we have to answer two rather important questions, which will help us avoiding the main obstacles.

What then are the abstract conditions for being a language being?
How to arrange the proposal of such conditions?

Before we will head over to the chapter about the penultimate conditions of language, thinking and knowing (LTK), we may collect our achievements (also from some other chapters) here in a short list:

  • – LTK is a deterritorialized phenomenon;
  • – LTK is based on the dialectics induced and established by the contrast between a community and an individual;
  • – epistemologically, the only way to establish a contact to the world as well as to the brain (both are complex entities) is through modeling;
  • – as an activity, LTK implies virtuality;
  • – as a phenomenon, LTK implies planes of immanence;
  • – as a performance, LTK implies impredicativity, vagueness and processuality that result in a metaphoricological structure;
  • models are used as source for equivalence classes (but see our objections against the set theoretic approach), but this association is fluid and it can not be determined in a unique, singular, or stable manner;
  • – models can neither contain the conditions of their applicability, or of their application, nor of their symbolic or referential setup.

It seems that the conditions for the trias of language, thinking and knowledge can not be formulated in a positive definite manner. It is not possible in principle to formulate “the conditions for language are such and such.”  This would again introduce a territorializing stance, which in turn would introduce a self-contradiction to our thought. Pinker got trapped by this misunderstanding throughout his book about the proclaimed language instinct [17], actually even invoking some sort of “language cybernetics” (e.g. p.303, 319). In the following chart, reproduced from his book [17], the circles indeed are meant to represent neurons, or identifiable groups of neurons. One can see the excitatory and inhibitory relations that refer to cybernetic mindset. (Pinker just forgot to draw the homunculus itself…)

This in turn lets us conclude that it is even not possible to give a positive definition for language, thinking or knowledge itself. We strongly believe that even modest materialism is dead: neither political materialism nor scientific positivism have been able to keep their central promise of providing “stable grounds.” Quite likely, “stable grounds”, or even “foundations” is an inappropriate goal in itself [18].

Actually, from a different perspective the insight that there is a “clear barrier to analyzing knowledge” [19] receives more and more support since Gettier has posed his problem [20], according to which  knowledge can’t be conceived as justified true belief. The problem in the view that Gettier attacked is precisely equivalent to the claim that there is a private language. We may build a series of related claims: No private language, no positivism in epistemology, no criteria-based justification. Hence, no quarrels about values.

Programming the capability, and hence the conditions, for “language understanding” is probably more easy than it seems at first sight. Reluctance regarding the explicit control is likely a preferable strategy. Actually, there is almost nothing to “program” besides an appropriately growing system of growing self-organizing maps (for our notion of “growth” see this or this). As we have seen, the conditions for a “languagability” are outside of “processing” language.

  • [1] John Searle
  • [2] Wilhelm Vossenkuhl
  • [3] Jerry Fodor
  • [4] Marvin Minsky
  • [5] Michel Foucault
  • [6] Colin Johnston, Tractarian objects and logical categories. Synthese (2009) 167: 145-161. available here in an annotated version
  • [7] Williams
  • [8] Irene Pepperberg
  • [9] Ansgar Beckermann
  • [10] Gilles Deleuze, Félix Guattari, Mille Plateaus. 1978.
  • [11] Gilles Deleuze, Difference and Repetition.
  • [12] Hans-Jörg Schneider
  • [13] Gilles Deleuze, Félix Guattari, What is Philosophy?
  • [14] Gregg Lambert, Gregory Flaxman (2000), Five Propositions on the Brain. Journal of Neuro-Aesthetic Theory #2 (2000-02), available online.
  • [15] Peter Sloterdijk. Sphären II. p.4.
  • [16] Stanislav Lem, Personetics. reprinted in: Douglas Hofstadter (ed.), The Minds I.
  • [17] Steven Pinker, The Language Instinct. 1995.
  • [18] Wilfrid Sellars, Does Empirical Knowledge have a Foundation? in: H. Feigl and M. Scriven (eds.), The Foundations of Science and the Concepts of Psychology and Psychoanalysis. Minnesota Studies in the Philosophy of Science, vol. I. University of Minnesota Press, Minneapolis 1956. pp. 293-300. available online.
  • [19] John L. Pollock, Joseph Cruz, Contemporary theories of knowledge. Rowman & Littlefield Publishers, Lanham 1999.  pp. 13–14.
  • [20] Edmund Gettier (1963), Is Justified True Belief Knowledge?, Analysis 23: 121-123.




November 19, 2011 § Leave a comment

Without context, there is nothing.

Without context, everything would be a singularized item without relations. There wouldn’t be any facts or events, there would be no information or understanding. The context provides the very basic embedding for events, the background for figures, and also hidden influences to the visible. Context could be the content-side of the inevitable mediality of being. Such, context appears as an ontological term.

Yet, context is a concept as little as ontological as any other concept. It is a matter of beliefs, cognitive capacity and convention where one draws the border between figure and ground. Or even a manifold of borders. Their is no necessity in setting a particular border, even if we admit that natural objects may form material compartments without any “cognitive” activity. Additionally, a context not only does not have borders at all, much like in topology the borderless sets, context is also a deeply probabilistic concept. In an important sense, contexts can be defined as positively definite entities only to some extent. The constraint as a way to express the context ex negativo is an important part of the concept of context. Yet, even the constraints have to be conceived as a  probabilistic actualization, as their particular actualization could be dependent on the “local” history or situation.

After all, the concept of context shares a lot with texts and writing, or, even more appropriate, with stories and narrating. As a part of a text, the context becomes subject to the same issues as the text itself. We may find grammaticality, the implied issue of acting as in the speech-act-theory, style and rhetoric, and a runaway interpretive vortex, as in Borges, or any poem. We have to consider this when we are going to choose the tools for modeling and comparing texts.

The neighborhood of texts and contexts points to the important issue of the series, and hence of time and memory. Practically spoken, in order to possibly serve as part of a context synchronicity of signs (not: signals!) have to be established. The degree of the mutual influence as well as the salience of signs is neither defined nor even definable apriori. It is the interpretation itself (understood as streaming process) that eventually forms groups of signs, figures and background by similarity considerations. Before the actual interpretation, but still from the perspective of the interpreting entity, a context is defined only in probabilistic terms. Within the process of an interpretation, now taken the position inside that process itself, the separation of signals into different signs, as well as the separation of signs into different groups, figures or background necessarily needs other “signs” as operable and labeled compounds of rules and criteria. Such “compound” entities are simply (abstract) models, brought in as types.

This result is quite important. In the definition of the concept of context it allows us to refer to signs without committing the symbolic fallacy, if the signs are grounded as operable models outside of the code of the software itself. Fortunately, self-organizing maps (SOM) are able to provide exactly this required quality.

The result provides also hints to issues in a quite different area: the understanding of images. It seems that images can not be “understood” without the use of signs, where those signs have been acquired outside of the actual process of interpreting the pixel information of an image (of course, that interpretation is not limited to descriptions on the level of pixels, despite the fact that any image understanding has to start there)

In the context of our interests here, focusing on machine-based epistemology, the concept of context is important with regard to several aspects. Most generally spoken, any interpretation of data requires a context. Of course, we should neither try to exactly determine the way of dealing with context,  nor even to define the criteria to define a particular context. In doing so, we would commit the symbolic fallacy. Any so-called ontology in computer sciences is a direct consequence of getting victimized by this fallacy.

Formalizing the concept of context does not (and can not) make any proposals about how a context has been formed or established. The formalization of context is a derived, symbolic, hence compressed view of the results of context formation. Since such a description of a context can be exported itself, the context exerts normative power. This normative power can be used, for example, to introduce a signal horizon in the population of self-organized maps (SOMs): not any SOM instance can get any message from another such instance, if contexts are used for organizing messaging between SOM instances. From a slightly shifted perspective we also could say that contexts provide the possibility to define rules that organize affectability.

In order to use that possibility without committing the symbolic fallacy we need a formalization on an abstract level. Whatever framework we use to organize single items—we may choose from set theory, topology or category theory— we also have to refer to probability theory.

A small Example

Before we start to introduce the formalization of context, we would like to provide a small example.

Sometimes, and actually more often than not, a context is considered to embed something. Let us call this item z. The embedding of z together with z then constitutes a context 𝒵, of which z is a part.Let us call the embedding E, then we could write:

𝒵 = {z, E}

Intuitively, however, we won’t allow any embedding. There might be another item p, or more generally p out of a set P, that prohibits to consider  {z, E} as 𝒵.

So we get

𝒵 ≠ {z, E, P}

or, equivalently,

𝒵 = {z, E, ¬P}

Again intuitively, we could think about items that would not prohibit the establishment of a context as a certain embedding, but if there are too much of them, we would stop to consider the embedding as a particular context. Similarly, we can operationalize the figure-ground phenomenon by restricting the length of the embedding that still would be considered as 𝒵. Other constraints could come as mandatory/probabilistic rules addressing the order of the items. Finally, we could consider a certain arrangement of items as a context even without a certain mandatory element z.

These intuitions can now be generalized and written down in a more formal way, e.g. to guide an implementation, or as we will see below, to compare it to other important formal ideas.

Components by Intuition

A context consists of four different kinds of sets, the threshold values associated them, and order relations between pairs of items of those sets. Not all of the components need to be present at the same time, of course. As we have seen, we even may drop the requirement of a particular mandatory item.

The sets are

  • – mandatory items
  • – normal items
  • – facultative items
  • – stopping items

Context, formalized

In the formal definition we do not follow the distinction of different sets as guided by intuition. A proper generalization moves the variability into mappings, i.e. functions. We need then two different kinds of mappings. The first one controls the actualization function, which reflects the relation between presence of an item and the assignment of a context. In some respect, we could call it also a “completeness function.” The second mapping describes order relations.

Such, we propose to start with three elements for a definition of the generalized context. On the upmost level we may say that a context is a collection of items, accompanied by two functions that establish the context by a combination of implying a certain order and demanding a particular completeness.

So, starting with the top level we introduce the context 𝒞 as the 3-tupel

  • 𝒞 = { Ci, A, R }

where Ci is the collection of items, A denotes the actualization function, and finally  R is a function that establishes certain relations between the item c of Ci. The items i need not be items in the sense of set theory. If a more general scope needs to be addressed, items could also be conceived as generic items, e.g. representing categories.
𝒞 itself may be used as a simple acceptance mapping

  • 𝒞: F{0,1}

or as a scalar

  • 𝒞: F { x | 0>=x>=1 }

In the second form we may use our context as basis for similarity measure!

The items c of the collection Ci have a weight property. The weight of an item is simply a degree of expectability. We call it w.

The actualization (completeness) function A describes the effect of three operations that could be applied to the collection Ci. All of those operations can be represented by thresholds.

Items c could be either

  • (i) removed,
  • (ii) non-C-items could be inserted to (or appear in) a particular observation,
  • (iii) C-items could be repeated, affecting the actual size of an observation.
  • A(1): The first case is a deterioration of the “content” of the context. This operation is modulated by the weight w of the items c. We may express this aspect as a degree of internal completeness over the collection Ci. We call it pi.
  • A(2): The second case represents a “thinning” or dilution. This affects the density of the occurrence of the items c within a given observation. We call it px.
  • A(3): The third operation of repeating items c of Ci affects the size of the observation. A context is a context only if there is some other thing than the context. Rather trivially, if the background—by definition the context—becomes figure—by definition not the context—, it is not a context any more. We may denote it simply by the symbol l. l could be given as a maximum length, or as a ratio invoking the size of C.
  • A(4) : The contrast function K, describing the differential aspect of the item sets (of the same type) between two patterns, defined as
    𝒦(x,y) = F(X ∩ Y, α(X-Y), β(Y-X)), α, β ≥ 0,
    with the possible instantiation as a ratio model
    K(a,b) = f(A ∩ B) / f(A ∩ B)+ αf(A-B)+ βf(B-A)

The last aspect of a context we have to consider is the relation R between items c. These relations are described by two functions S and D, the neighborhood function S, the dependency function D.

  • R(1) : The set of all neighborhood function S upon items c results in a partial and probabilistic serial order. One might think for instance about a context with items (v,w,x,y), where S determines a partial order such that the context gets established only if v follows x.
  • R(2) : The dependency function D(ck) imposes a constraint on pi, since it demands the actual co-occurrence of the argumented items ck.

Any formalism to express the serial order of symbolic items is allowed here, whether it is an explicit formalism like a grammar or a finite-state-automaton, or an implicit formalism like a probabilistic associative structure (ANN or SOM) accepting only particular patterns. Imposing a serial order also means to introduce asymmetries regarding the elements.

So we summarize our definition of the concept of context:

  • 𝒞 = { Ci, A, R } eq.1

where the individual terms unfold to:

  • Ci = { c (w) } eq.2, “sets, items & weights”
  • A = f( pi, px, l, K) eq.3, “actualization”
  • R = S ∩ D  eq.4, “relations”

This formal definition of the concept of context is situated on a very general level. Most important, we can use it to represent contextual structures without defining the content or the actualization of a particular instance of the concept at implementation time. Decisions about passing or accepting messages have been lifted to the operable, hence symbolic level. In terms of software architecture we can say, much like it is the case for SOM, that conditions are turned into data. In category theory we meet a similar shift of perspective, as the representability of a transformation (depictable by the “categorial triangle”) is turned into a symbol.

The items forming a context need not to be measurable on the elementary symbolic level, i.e. the items need not to form an alphabet 𝒜. We could think of pixels in image processing, for instance, or more general, any object that could be compared along a simple qualitative dimension (which could be the result of a binary mapping, of course). Yet, in the end a fixation of the measurement of the respective entity has to result in at least one alphabet, even if the items are abstract entities like categories in the mathematical sense. In turn, whenever one invokes the concept of context, this also implies any arbitrary mode of discretization of the measured “numerical” signal. Without letters, i.e. quasi-material symbols, there is no context. Without context, we would not need “letters”.

In the scientific literature, especially about thesauri, you may find similar attempts to formalize the notion of context. We have been inspired by those, of course. Yet, here we introduced it for a different purpose… and in a different context. Given the simple formalization above, we now can implement it.

Random Contexts, Random Graphs

A particular class of concepts we would like to mention here briefly, because they are essential for a certain class of self-organizing maps that have been employed in the so-called WebSom project. This class of SOMs could be described as two-layered abstracting SOM. For brevity, let us call them 2A-SOM here.

2A-SOM are used for the classification of texts with considerable success. The basic idea is to conceive of texts as a semi-ordered set of probabilistic contexts. The 2A-SOM employs random contexts, which are closely related to random graphs.

A particular random context is centered around a selected word that occurs several times in a text (or a corpus of texts). The idea is quite simple. Any of the words in a text gets a fingerprint vector assigned, consisting of random values from  [0..1], and typically of a minimal length of 80..100 positions. To build a random context one measures all occurrences of the targeted word. The length of the random context, say L(rc), is set as an odd number, i.e. L(rc) = 2*n+1, where the targeted word is always put to the center position. “n” then describes the number of preceding/succeeding positions for this word. The random context then is simply the superposition of all fingerprint vectors in the neighborhood of the targeted word. So it should be clear that a random context describes all neighborhoods of a text (or a part of it) in a single set of values.

With respect to our general notion of context there are some obvious differences to the random context as used in 2A-SOM:

  • – constant length
  • – assumption of zero knowledge: no excluding items can be represented, no order relations can be represented;

An intermediate position between the two concepts would introduce a separate weighting function W(0,1) ↦ {0,1}, which could be used to change the contribution of a particular context to the random context.

The concept of context as defined here is a powerful structure that provides even the possibility of a translation into probabilistic phrase structure grammar, or equivalently, into a Hidden-Markov-Model (HMM).

Similarity and Feature Vectors

Generalized feature vectors are an important concept in predictive modeling, especially for the task of calculating a scalar that represents a particular similarity measure. Generalized feature vectors comprise both (1) the standard vector, which basically is a row extracted from a table containing observational data about cases (observations), and (2) the feature set, that may differ between observations. Here, we are interested in this second aspect.

Usually, the difference of the set of features taken from two different observations is evaluated under the assumption that all the features are equally important. It is obvious that this is not appropriate for many cases. One possibility to replace the naive approach that treats all items in the same way is the concept of concept as developed here. Instead of simple sets without structure it is possible to use weights and order relations, both as dynamic parameters that may be adjusted during modeling. In effect, the operationalization of similarity can be changed while searching for the set of appropriate models.

Concerning the notion of similarity, our concept of context shares important ideas with the concept proposed by Tversky [1], for instance the notion of asymmetry. Tversky’s approach is, however, much more limited as compared to ours.

Modeling and Implementation

Random contexts as well as structured probabilistic contexts as defined above provide a quite suitable tool for the probabilization of the input for a learning SOM. We already have reasoned in the chapter about representation that such probabilization is not only mandatory, it is inevitable: words can’t be presented (to the brain, the mind or a SOM) as singularized “words”: they need context, the more the better, as philosophical theories about meaning or those about media suggest. The notion of context (in the way defined above) is also a practicable means to overcome the positivistic separation of syntax, semantics and pragmatics, as it has been introduced by Morris [2]. Robert Brandom in his inferentialist philosophy labeled “expressive reason” denies such a distinction, which actually is not surprising. His work starts with the primacy of interpretation, just as we do [3].

It is clear that any representation of a text (or an image) should always be started as a context according to our definition. Only in this case a natural differentiation could take place from symmetric treatment of items to their differentiated treatment.

A coherent object that consists of many parts, such like a text or an image, can be described as a probabilistic “network” of overlapping (random) contexts. Random contexts need to be used if no further information is available. Yet, even in the case of a first mapping of a complicated structure there is more information available than “no information.” Any further development of a representation beyond the zero-knowledge approach will lead to the context as we have defined it above.

Generalized contexts may well serve as a feasible candidate for unifying different approaches of probabilistic representation (random graphs/contexts) as well as operationalizations of similarity measures. Tversky’s feature-set-based similarity function(al) as well as feature-vector-based measures are just particular instances of our context. In other words, probabilistic representation, similarity and context can be handled using the same formal representation, the difference being just one of perspective (and algorithmic embedding). This is a significant result not only for the practice of machine-based epistemology, but also for philosophical questions around vagueness, similarity and modeling.

This article was first published 19/11/2011, last revision is from 30/12/2011

  • [1] Amos Tversky (1977), Features of Similarity. Psychological Review, Vol.84, No.4. available online
  • [2] Charles Morris,
  • [3] Robert Brandom, Making it Explicit, chp.8.6.2



November 17, 2011 § Leave a comment

Given the ubiquity of models

in our contemporary world, it may seem astonishing that only approx. 100 years ago the concept of model in everyday language was nearly completely absent [1]. Etymology tells us that model derives from Latin: modulus, meaning “measure, standard,” from where it found its way to modern language through architecture [2]. Apparently, the scientism of the 19th century as well as mass production (and mass culture) also paved the way for its spreading. The usage of models unfolded between positrons (Dirac) and cars (Model T), between strategy games and fashion. Surely there is an important relationship between models and the medial intensification throughout the 20th century.

Model is a term almost as iridescent as models are. The concept of model appears as soon as we enter any context of production, hence communication about future entities. They denote a kind of more or less accurate templates, often used in relation to accessible instances of it, whether it refers to physical or conceptional accessibility. Models can be itself established, or taken, as a dedicated entity in order to support exchange and communication about consequences. Such, models act as templates and almost original images  as well as a basis for simulations and what-if analysis. Models must not be conceived just as the result of dealing with data, there are also models that are created without data on the basis of a system of symmetry relations (call it “idea”), for instance, or as a result of making analogy, as we will see in the chapter(s) about analogical thinking. Modeling is a key aspect of epistemology, albeit it is recognized as such only in some peripheral areas of the philosophy of science so far. If we conceive of it in a sufficiently abstract way we might find almost nothing else than modeling, as far as it concerns the activities of the individual. (We never shall forget about the communal part of cognition, of course.)

We are convinced that there is no “direct” access not only to the empirical world. There is even not a thing which we could separate as “accessing the world.” We think that any sort of “direct” realism is just a naive perspective, hence we deny it, and consequently we dismiss any reasonability of notions like”real” things as-such, things per-se, the “real” world as independent from us observers etc. In order to invoke an icon about that, we also could say that we take a strictly non-idealistic, non-Hegelian position with respect to epistemology and the issue of our relations to what we call “the world.”

In contrast, our thesis is that we create any aspect of our world, whether empirical or not, only through interpretation and thus only through modeling. Of course, there is something out there, yet it does not make much sense to refer to this outside as a “world”. The world is real, but this “reality” (and the world) is completely inside everyone’s private mental life. Undeniably we can try to speak about it and we can try to share our realities through speaking, i.e. using a public language. Yet, we have to translate our internals into models first. In other words, we even have to model our own internal mental “reality” before being able to talk about it, notwithstanding the fact, that the second step, the translation of the model into language, adds further complications. This perspective is strictly compatible to Wittgenstein’s Solipsism, as it has been recently described by Wilhelm Vossenkuhl [3].

It should be clear now that our position is not an Anti-Aristotelian attitude, but rather a non-Aristotelian one. It is true that everything we find in the mind first passed our “senses,” extending the notion of sense to any afferent fiber reaching out into the brain. However, this is just not a sufficient contribution to be able to “think.” Similarly, we reject Descartes’ position of the primacy of the “Cogito.” To us it seems that Descartes is refuting both the dependency on external data as well as he is neglecting the role of language and the accompanied necessity of a double translation. Yet, his “innate ideas” resemble to Wittgenstein’s dictum about the alignment of logics to the structure of the world.

Compatible with such frameworks, we can define the concept of “model” from an epistemological perspective. We could say that models are tools for anticipation given the expectation of weak repeatability.

We are convinced that modeling is the only way to synchronize in a useful way with the outer world, or, in case of social affairs, to connect to other realities and to share them. Modeling is the only gateway to connect. In his extension of the Wittgensteinian language game, Richard Brandom deepened and popularized that idea in his work, supposing that the mutual inference of intentions is at the core of the ability to understand language. Modeling is not necessary exactly in the case where external relations are determined. This may happen by means of bureaucrazies or in any other case of “programming the world,” e.g. in a misunderstood human-machine-interface design. The social world is full of attempts to fix the structure of interactions in order to diminish the need for modeling. In this perspective, traditions, grammars and any sort of convention are just tools to facilitate this reduction of necessary efforts, or even to enable modeling at all.

The only things we receive are fluctuations of physical signals. We have to recognize patterns, assign symbols, infer structures and intentions, but we never can know completely. Induction from empirical impressions is a phantasm of gnostic scientists or philosophers. This fact of not knowing enforces us to apply modeling, to derive models, in other words to predict—in every tenth of a second. Members of all human societies enjoy to play with this fact, we call it sports or humor. The input for such modeling are never symbols, but always—even on the level of symbols—probabilistic densities. The way “back” from probabilistic densities to symbols is one of the key issues for any epistemology, not just that of the section engaged with  the machine-based flavor. It should be clear that we neither do agree with radical empiricism nor with radical constructivism.

Here, in the context of our general interests, where we focus the possibility (as well as the structure and the potential) of machine-based epistemology, we thus have to find an appropriate formal representation of the concept of model. This formal representation should not be limited to models in any particular domain, e.g. mathematics, architecture, hermeneutics or science. The concept of “model” is strongly different across these domains. In contrast, our representation should (1) allow for a maximum of generality, while at the same time it (2) should obey to the penultimate conditions of any epistemology. Only by means of such a representation we will be able to investigate further the conditions for the ability of autonomous modeling, things like the concept of data or the role of logics. This general “model of model” also will allow us to find an appropriate concept about theory. Currently, there is no appropriate theory about theory and models. Most frameworks called “theory” are just models, as we will see in the chapter about the theory of theory. Despite the fact that models may be quite abstract and theoretic, models are not theory. In order to understand the specific difference of both, we have to introduce an appropriate, and most general, notion about models. It is quite obvious that such issues are concerning the cornerstones of any attempt to get clear about epistemology, especially concerning the presumed machine-based form of it.

The Formal Representation of the Model

The formal representation of an object of the “real world” is helpful precisely for the reason that it is the only way to investigate what could be said in principle about that object. Of course, formalization itself needs concepts, and quite naturally, formalization introduces the constraints of those concepts. Often, yet not necessarily, formalization introduces a strong reduction of the observed object, often related to the assumption of enumerability or identifiability.

In order to create a model, one needs tools and methods. In order to create a formal model, one has to introduce basic elements (often called axioms), operators and possible relations between them. Given the complexity of the matter, we suggest to start in-medias-res, and to explain the components subsequently.

Our model of model appears as a 6-Tupel. You may conceive of them also as six different, incommensurable domains, where no possible way can be thought of from one domain to one of the other. These six domains are, by their label:

  • (1) usage U
  • (2) observations O
  • (3) featuring assignates F on O
  • (4) similarity mapping M
  • (5) quasi-logic Q
  • (6) procedural aspects of the implementation

Taken together this renders into the following single expression:


These six domains, or “elements”, forming the (abstract) model are themselves high-level compound concepts. In the following we will give the complete yet brief descriptions of those compound elements.


For many reasons we follow the philosophical attitude as it has been developed first by Wittgenstein, then deepened by (the late) Putnam, by Vossenkuhl or Brandom, among others (but, for instance, not by Quine, Lewis, Davidson, Kripke, Stegmüller or Moulines).

Wittgenstein conceived “language as ongoing regulated activities involving coordination and cooperation of the participants, ” as Meredith Williams [4, p.228] put it so clearly. In this perspective, language is not just a set of symbols arranged in a well-ordered manner as a consequence of applying a formal language. The use of words in a language follows a purpose. Without that purpose it would be detached from the world. Usage, world, language and modeling are co-extensive. If we do not assign a purpose, i.e. an intended usage to a symbol, or even to a percept, we can not achieve a model. Likewise, if we act in a structured way, even if only partially structured, we implicitly apply a model. In this way, modeling and rule-following are closely related to each other.

Here in our attempt to get clear about models we have to operationalize the notion of usage. In doing that we have to keep in mind that we are neither equipped with perfect knowledge nor with the possibility to detect “truth” in our perceptions.In many cases something like a “complete” measurement is not possible in principle, even independent from our factual methods of measurements; measurement always provides only a segment of the world, it is itself based on a theory, it is representing a model in itself. This imperfectness in our empirical relation to the world is almost trivial. Yet, what is not trivial is our attitude to the various kinds of errors we will commit due to this imperfectness. Usually, this attitude is expressed by referring to the notion of risk.

We conceive risk as a part of the usage. Irrespective the formal tools used to express “risk,” in the end risk expresses the cost we assign to the misclassifications due to a partially erroneous model. More general, risk can be expressed most generally as the ratio of costs assigned to different kinds of errors, or in short, the Error-Cost-Ratio (ECR). The ECR itself need not to be handled as a constant, it could be defined as dependent or related to a particular class created by the model as a whole.

The purpose itself can be reflected by a particular intensity of a target variable. Such a target variable is, of course, completely fictitious in the beginning. Its usefulness has to be confirmed. Quite often, however, the target variable is determined by factors completely outside of the model (and the modeling process). Obviously, the danger of such a separation is a particular kind of blindness concerning the modeling process and the meaningfulness of its results.

The target variable itself is not a complete (and proper) operationalization of the purpose. We also need a scale and the selection of a range of values that represent the “targeted” (“desired”) outcome of the process which we are going to represent by the model. Note that this is true also for multi-objective modeling. It is always possible to map a set of several outcome variables onto a single target variable and a well-defined range of values for this target variable, though this mapping is usually not a continuous mapping. The reason for the possibility of this reduction from multi-variate “outcomes” to a single target variable is given by the obvious fact that we will use = apply the model as a basis for a decision to act in a particular way.

So we can cover the notion of usage in the following 3-tupel:

U = { TV, TGTV, ECR }  eq.2

The symbols are

TV = target variable, the basis for measure-theoretic operationalization of purpose or usage; the scaling of the variable may be numeric or nominal;
TGTV = target group, defined as a sub-set within the set created by the target variable TV, e.g. as a selection or an interval of values;
ECR = the ratio of costs of the different types of errors a model can commit (ECR=error-cost-ratio); the ECR expresses the the ratio of the costs for misclassifications of Type-I and Type-II; the ECR can be represented also as any kind of (cost-)function.

The ECR is an import docking point for any kind of risk as well as for value, both in the economical and even in the philosophical sense.

You may have noticed that we operationalize usage without referring to an intended area of usage, as it is the case for structuralist concepts of theory. We suppose that it is one of the major faults of scientific structuralism as it has been proposed by Sneed [5] and Stegmüller[6] (based on the work of Carnap [7]) to include a description of the intended area of application into the formalism. We will discuss this in much more detail in the chapter about theory. We just would like to remark here that structuralist concepts about theory are hopelessly mixing models and theories.

We even could say that whenever a proposal includes statements about its area of application—besides the rule “Apply it there!”—it is not even a model anymore. A formalization, and hence a model, can never make any proposal about the conditions of its applicability. This would be another model or it would invoke a rule external to the model at hand, besides the difficulty that the model can not predict its own applicability. The model is itself such an expectation (but not a formal prediction).


It is pretty clear, and presumably widely accepted, that observations do not reflect things of a world. Rather observations are based on theories, because already measurements are based on theories and models. Both, measurement and observation are based on interpretation, i.e. generally spoken on some kind of transformation. While we could accept that measurements need not to include classification, observations clearly do. Observations can be reflected only by a 3-valued relation.

The theories, models, or habits necessary to perform measurement or to achieve an observation enforce us to conceive of observations only as potential observations.

Those theories and models are inevitably preceding any actual observation; to this regard, they are apriori necessary, yet, this “necessity” is not a physical necessity, it is established as a historical convention. Since they are preceding any actual observation or measurement, those theories imply serious constraints on any possible result of the modeling.


Given an observation, we have to describe the “whole thing.” Usually, we select some abstract structures like “color” and call it feature. Yet, observations do not “possess” features or properties independent of our theories, of course. It is much more appropriate to take them as a kind of assignments that create the conditions of possible descriptability. Instead of “feature” or “properties” we should talk more clearly about “assignates.”

Features are then put into a list that is used as a scheme to described the observations. This list is also the basis for any comparison of observations.

In a particular attempt to create a model M, one of the most important activities is to create and to select features from the observations.


Similarity is the one of the biggest blind-spots in epistemology, philosophy of science, and even science and the associated area of data analysis itself. If it is not completely disregarded, it is mistaken or reduced in weird ways. Thus it pays to be explicit here.

Similarity expresses the expected success of a mapping, that relates an unordered set of observations into a partially ordered set. Similarity is nothing that is attached to an object. Once two observations have been put into the same subset on the partially ordered side, we have lost the information to distinguish them. In such case we usually call two observations “identical.” Empirical identity is not equal to logical identity, albeit they are often mixed. We also can say that through this mapping we are going to establish “equivalence classes.”

Of course, there are infinite ways of relating observations. Even the list of assignates need to match only partially. Astonishingly, almost the whole community of data analysts apply the Euclidean distance as a similarity measure for sorting observations. This is nothing else than utter self-contradictory nonsense, as we will discuss in more detail in another chapter about similarity. There we will also  show the details of possible mappings and their drawbacks.

The important issue is that the choice of a particular mapping determines the quality of the result, just as much as the selection of assignates does. While most people are aware about the role of “feature selection,” nobody pays attention to the mapping that we call similarity. Due to this importance, however, our strive for generality of the abstract definition of the “model” leads us to conceive of similarity as a family of function, or short as a functional (mathematically we could also say functor). The functional expresses a potentiality. Which particular actualization as a determined set of mappings we will use in a given attempt to create a model depends solely on the usage U as defined above.

Concerning our abstract model it is quite important to conceive of “similarity” as an irreducible element of a model. Only if we do that we can succeed in keeping a critical distance to what we call a “method.”


The quasi-logics implicitly entails any formal language used to describe the relations between any of the elements or between any of their instances. In order to achieve independence from a particular such language we have to abstract from it and include it as an element.

In this sense, formal languages are imposed onto the process of modeling. Within a particular attempt of modeling, the quasi-logics is out of reach, though not invisible. It also constrains seriously the expressiveness of the model. If we, for instance, relay on the classic bivalent logics with mutually exclusive truth values, we can not make any proposal about partial memberships, multiple memberships, uncertainty etc., and most important, we can not deal with (observations taken on) complex and creative systems.

QS  = { Loc, Lay, Lin, SR, DR }eq.3

The symbols are:

Loc = locality, dependence on the process of the formation of a distribution, context sensitivity;
Lay = capability for self-directed stratification into different epistemic layers;
Lin = linearity (incl. commutativity, associativity);
SR = self-referentiality;
DR = distinctiveness of relations, or, equivalently, the choice between the identity or the difference relation as the fundamental one;

We follow Wittgenstein in his conclusion that the structure of the world precipitates in the structure of logics. We even must not say that we “apply” logics to our observations in order to make sense of them, for any “application” introduces a trace of semantics into the logics. Yet, there is no way not to use some kind of logics. Logics, or more precise, the chosen quasi-logics and the world are co-genetic, they mutually imply another. Thus, we have to be very careful with respect to the quasi-logics we choose.

In our attempt to describe the most general form of model we have to abstract from this choice. We do so by including elements that are determinants of that choice.

Procedural Aspects

In some respects, this element is the most trivial of the set. Yet, despite its relation to practical and actual implementations it needs to be an element of the abstract model, since the selection of a particular (family of) method(s) and their  implementation impose specific constraints on an actual instance of the abstract model.




Group Properties

It is now quite interesting to check out the various sorts of deficiency that can be constructed following the formula given in Fig.1 above, as well as to interpret the given formal definition as a group. Both types of investigations would not be possible without a formal notation, of course. First we will deal with the group properties, the deficiencies will be handled in the next section.

Why considering mathematical groups here? Well, from today’s perspective, we can easily give an example. Could you answer how it feels to you without the knowledge about the 0 (zero) or the negative numbers? Or how you would do ordinary calculations? No, of course not. The difference between those two worlds, one stuffed with zeroes and negatives, the other not, is precisely covered by the invention of group theory. Without group theory, we can not give a satisfying account for the zero or for negatives.

One of the motivations for group theory in mathematics is rooted in crystallography of the 19th century. The “Erlanger Programm” by Abel and Klein then quickly revealed that there is more about it than just crystals. Today,  group theory is the basis for mathematical bodies like an algebra. Yet, group theory is still related to the notion of (abstract) symmetry regarding sets of elements whose order can be permutated. Symmetry is invariance under a specified group of transformations.

Importing group theory into the theory of modeling directly leads us—and it does so on purely syntactic “reasons”—to questions like “Is a model combined with a model again a model?”

The following table lists the group axioms as applied to the model.

Closure For all elements a, b in group G, the result of the operation a  b is also in G;
Associativity For all a, b and c in G, (a  b) c = a(bc);
Identity element There exists an element e in G, such that for every element a in G, the equation ea = ae = a holds. Dependent on the operator, e may be conceived as 1 or 0;
Inverse element For each a in G, there exists an element b in G, such that   a  b = ba = 1G;

Group Axiom 1: The first property, closure, is easy to understand. It surely applies to models too, even if we combine two very different models.

Group Axiom 2: The story is different for associativity. It is not generally valid, since modeling is a mapping that destroys information, as all non-bijective mappings do. Associativity is fulfilled only for special cases of models, or in special circumstances. Usually, it makes a difference whether we destroy an informational segment I(1) or an segment I(2) before proceeding with the next model of the set.

The only analytically visible case where it does not make a difference is given by a situation, where all three models a,b,c are completely disjoint with regard to their space of mappings. Such models could be called geometric, or logic models. Mostly, however, combining models is asymmetric and introduces a notion of irreversibility. Yet, in a sufficiently large population of models (i.e. mappings) there might well be a selection of models a,b,c for which associativity holds. This, obviously, is not an analytical issue, but an empirical one.  This would give rise to something like a probabilistic group, which does not seem to make much sense for now. Anyway… let us presume that normally models are not associative. Using the results of our investigation about information and causality, we also can conclude that models are causally effective. Actions like rule-following are not only irreversible due to their materiality, but also due to their structure. Or even shorter, we could say that modeling is an activity, and as an activity it introduces irreversibility.

Note that modeling includes both the creation and the application of models. If we would only consider the application of models on a fixed set of data, even if the usage would not be fixed, all resulting models are associative, because they simply filter the records independently from each other. In general, however, if we consider the whole process, models as a result of modeling are not associative.

In Quasi-Groups and their respective algebras associativity is not required. In case of the Lie-group, for example, associativity is replaced by the Jacobi-identity, which introduces a basic asymmetry. Yet, there are many non-associative operations in mathematics, e.g. matrix cross-products. We do not delve further into this topic.

Group Axiom 3: Is there an instance of our abstract model which could be conceived as the neutral, or identity element? Combining it with a normal model would not change anything, regardless the order. Both cases a*e=a as well as e*a=a are possible through a particular error-cost-ratio (leading to 100% false positive classifications, i.e. all data are selected, and there is only one (1) single equivalence class). Thus we conclude that our abstract model can be instantiated such as to form the neutral identity element.

Group Axiom 4: Finally, we have to check whether there is a model which could invert the changes of another model, such that the result is the same for any pairings of models and their inverse. This would be possible only if the mapping would preserve all of the initial information. By definition, however, models are creating equivalence classes, i.e. they destroy information. The inverse model would have to reconstruct this lost information, which is not possible given the fact that the body of data is fixed for both instances, the model as well as for its presumed inverse. Thus, there is no model possible that could be conceived as something like an inverse element.

We conclude that models do not form a group. Hence, there is no calculus possible which would take models as arguments. Groups are not general enough to cover the characteristics of models. A generalization into mathematical structures like the Lie Group, despite its theoretical appeal, particularly the formalized account of asymmetry, is not possible.

Yet, it is too early to guess that there is nothing else one can think of that would be more abstract than the general form of the model. We have to check the relationships between our abstract model and mathematical category theory first. Category theory is basically about abstract transformations, and the inverse element is not an essential piece in it. So, it is well worthwhile to check it out. We prefer to do so in a separate chapter.

A corollary of this (so far still assumed) generality would be that it is not possible to conceive of a theory as a generalization of a model or a set of those. This raises the question, of course, about the status of theories and how we could talk about the relation between theories and models. We will do this in our chapter about theory.

Experimentally Deficient Models

Given the definition of the model as shown in eq.1 above, we now will investigate various sorts of deficiencies, simply created by removing or restricting one or more of the six elements.

We repeat eq.1 for better readability:

Usage: Removing usage U  from a model renders the model into a formal method, performing an arbitrary transformation. Without usage we can not decide on the usefulness, and quite obviously so. This implies that we also can not select a “suitable” selection of features or similarity mappings. We just arrive at some kind of sorting that is completely unjustified. Even worse, the selection of features and the particular transformation destroys information, and this irreversible act is not aligned to any purpose. If we would use such a sorting for a decision we would commit a serious mistake: setting U(0)=U. This renders the algorithmic structure of M and P into some kind of internal criteria, that effects any subsequent decision.

As weird as this sounds, it is an abundant misbehavior in data mining, but also in social sciences, or in disciplines like urban planning. People frequently believe that they perform modeling if they do what they call “non-supervised clustering,” or, equivalently to that, if they represent some measured data by a formula.

Clustering is not modeling, if there is no U-term, hence “clusters” and “classes” are different things. If both are equated, the algorithm or the method dictates the utility. And that’s really weird. (But, as already mentioned, quite unfortunately quite abundant) It is also a simple mistake, since as soon as we introduce a target variable (as an operationalization of purposes) we change the cost function for optimization, hence the sorting of the observation will be different and so also our conclusions. We conclude that “non-supervised clustering” is either useless, a mistake, or nonsense.

There are two other modifications of U we can think of. First, we can replace a dedicated usage by a set of usages U as described above in eq.2. Accordingly, the level of our proposals will change (see, as a parallel, the different modes of comparison). Another modification is to replace the externally defined target variable by a criterion that is constructed solely from the error-cost-ratio and internal consistency measures that are part of P. Doing so, however, we initiate a circular structure and we loose the contact to the world. Such kind of modeling could be taken really only as a very first starting point in a modeling project. One examples for this would be to create models which cover a certain amount of data. The resulting models will be very different with respect to the weight of the features, and thus it will provide the possibility for a first inspection what the data is about.

Observations: Without observations, the model is approaching the area of creatio-ex-nihilo phenomena, miracles or revelations. Yet, since we still have assignates available, which we choose maybe randomly, we can construct matching observations. This kind of activity is quite abundant in the human mind. We call it dreaming.

Similarity etc.: In contrast to that the removal of similarity is not possible at all. The same holds for the quasi-logics and the procedural aspects.

Theoretical Accounts

It is a cornerstone of our proposals here that a formalization, and hence a model, can never make any proposal about the conditions of its applicability.

Difference is the starting point, not identity.

… … …


Trope Theory

According to [8], a “trope is an instance or bit (not an exemplification) of a property or a relation. […] The appeal of tropes for philosophers is as an ontological basis free of the postulation of supposedly obscure abstract entities such as propositions and universals.”

Despite some resemblance to our theory of about the abstract model and the assignates, trope theory is radically different. Tropes are about ontology. The appeal of tropes for philosophers is as an ontological basis free of the postulation of supposedly obscure abstract entities such as propositions and universals. [8] Such, the theory of tropes builds upon the separation of ontology and epistemology, which we reject, and vigorously so. Separating them is equivalent to deny the primacy of interpretation and modeling.

Yet, there is an interesting extension, or variety, of trope theory, introduced by Meinard Kuhlmann (which we found here). He relates Algebraic/Axiomatic Quantum Field Theory (AQFT) to the theory of tropes.  Based on (higher) category theory, AQFT is a formalization of quantum field theory that axiomatizes the assignment of algebras of observables to patches of parameter space that one expects a quantum field theory to provide [9]. For Kuhlmann, the basic things are “tropes”, which he defined as “individual property instances,” as opposed to abstract properties that happen to have instances.  “Things” then, are just collections of tropes.  Now the interesting (intermediate) conclusion provided by Kuhlmann is: To talk about the “identity” of a thing means to pick out certain of the tropes as the core ones that define that thing, and others as peripheral.

This again resembles much to our notion of assignates. In our perspective as proposed here, “things” are established through an instance of an abstract model, which comprises a selection of assignates that have been “picked out” from the set of available ones. Yet, Kuhlmann obviously follows an ontology (which we reject) that is based on identity (which we also reject as a feasible starting point). Consequently, his distinction between core-tropes and non-core-tropes, which is quite abundant in trope theory, tries to separate what before has been mistakingly conflated: assignates, their instances as features (properties), and a particular value of it. For Kuhlmann, the core tropes are those properties that define an irreducible representation of a C*-algebra (things like mass, spin, charge, etc.), whereas the non-core tropes are those that identify a state vector within such a representation. Why calling them both tropes? Why discerning them apriori to a particular modeling? In some way, trope theory appears to me like “ontologized epistemology,” in this respect not so distant to Frege’s hyper-platonism.

Other Concepts

Basically, nowadays one can find two salient concepts in the discourses, Popper’s rationalist empiricism and Sneed/Stegmüller’s scientific structuralism. Some conceptions are somehow in between of them, yet without overcoming their weaknesses [9].

Both frameworks are quite interesting attempts, of course, yet they suffer from serious drawbacks. The main problem of both is that they do not provide sufficient means to distinguish theories from models. Else, they both claim that theories make empirical statements about potential observations. Here, we strictly disagree. (For detailed account of the arguments see the chapter about theory of theory).

Both theories also disregard the problem that neither a model nor a theory can make any proposal about the conditions of its applicability. Such conditions are, as we will see, the availability of symbols, the mediality of any kind of relation, and the (implied) virtuality of any activity, which, when taken together, create a very particular “space.”


Parts of this article have been published in [x].

  • [1] Müller Research. available online
  • [2] Online Etymology Dictionary about model.
  • [3] Wilhelm Vossenkuhl, Solipsismus und Sprachkritik. Beiträge zu Wittgenstein. Parerga, 2009.
  • [4] Meredith Williams, Wittgenstein, Mind and Meaning: Towards a Social Conception of Mind. Routledge 1999.
  • [5] Sneed
  • [6] Wolfgang Stegmüller
  • [7] Rudolf Carnap
  • [8] Tropes . Stanford:
  • [9] nLab,
  • [9] Weiss
  • [x] Klaus Wassermann, The Model of Model. Vera Bühlmann, Ludger Hovestadt (eds.) Printed Physics, 2011/2012. (in press, a draft version of it is available online)

Feeding a SOM (or a population of them)

November 11, 2011 § Leave a comment

Most likely, words and objects are not a suitable diet for SOMs.

We can’t feed them as whole pieces, nor chopped into letters. Quine proofed the the fact of a fundamental indeterminacy regarding words and objects [1]. Many computer scientists are getting despaired and try to define how the world should look like, which they then sell as “ontology”, insolently enough.

More seriously, in the traditional perspective (AI, data mining) the question is which data to take and how to “prepare” that data. Yet, put in this way, these questions are “wrong,” at least upside-down, similar to the word/reference game, or the frame game in artificial intelligence, which funny enough is also called symbol grounding problem. Here it is again, the territory, even quite literally. We should not ground symbols, we should create them by swimming. Probably much like the bubble chamber in early particle physics.

(to be cont’d)




[1] W.v.O. Quine, Word and Object, 1960, Cambridge, Mass. 1960.

But it Does Move.

November 11, 2011 § Leave a comment

We may put it simply, and—we are quite sure—everybody will

agree upon that: Everything is moving, spinning, jumping, turning, winking, on any level, from the electrons, to the galaxies, from molecules and plants to animals and humans. Yet, even the founders of philosophy, those demigods from classic Greece, got trapped by the idea—or even more appropriate: ideology—of stasis, which in the beginning was the idea of the idea.

Throughout the history of culture there is a salient trace of that ideology. From said idea to Archimedes’ linchpin, from the silly idea of the earth as the centerpoint of the universe to the idea of the universal itself, or quite recently, to the idea of the state, which has been claimed as a proper concept for dealing with language and mind in thousands of publications. You find it in Hegel, everywhere there, not in Darwin or Nietzsche, again in the territorialism of Heidegger (should I say terrorialism?), but neither in Wittgenstein’s nor in Deleuze’s thought, whose whole oeuvres were directed strictly against any kind of stasis, territory, state or universal. Everything is flight, escape, series, event, and logics is transcendental, if there is something as logics at all. Above all and beyond any other, of course, so-called analytic philosophy, particularly that which once originated in German culture (though near Vienna) is still a proponent of the stasis, whether or not they think about the mind (and the brain) or not. Since it has been the program of whole modernism to expel time from the world view, it does not come as a surprise that Neuroscience as well as computer science forgot about the movement.

But it does move. What? Everything, not just the earth as Galileo was so eager to popularize. Actually, we even should not first put an object there (or a species, a gen, an idea) and only then, as a second step, asking how does it came about. We know quite well about the worries of Zenon and the limitations of Newton’s physics. It is indeed a radical move to put movement and change as the primary entity. It was radical in physics, biology, and it will be even more radical in “soft” sciences like libuistics or cognitive psychology, even in philosophy.

The question at the core of understanding the “world” therefore is about the transition from the moving, the vortices, the clinamen, the indeterminate to their territorial counterpart, the object, the symbol, the word, the proposition. It would be too bold to call them (semi-)illusions, probably; yet, one could find quite some arguments to support that.

Of course, throughout history there always have been people emphasizing the primacy of the open transition, Lucretius, Ovid, Whitehead (but not Marx, of course), Serres. They do not, however, form any part of the mainstream of contemporary philosophy.

So, here still as a (well-founded) suggestion, we could say, that van Fraassen’s question is upside down in the same way as Minsky’s or Clancey’s “frame of reference.” We should NOT ask about how words acquire reference, but instead about how the sheafed stream of references exudes and secretes words. In the beginning there is not the word, in the beginning there is just the (associativity of) bodies.

To put it still more exactly, we should not ask about the applicability of logics in the world, but instead about the transition from the probabilistic to the propositional. This holds even for categories like category or relations. If one would take category theory with categories as the quasi-objects, nothing would have been gained. What is nice about category theory is its abstractness, yet the arrows (“transitions”) have to be randomized, or represented as probabilistic functions, similar to Dirac’s Delta: the probability density can’t be a well-defined one, there is a cascaded, higher-order indeterminacy.

The transition from the probabilistic to the propositional is basically a movement, since it involves bodilyness. It would be a mistake to conceive of that transition as a purely formal one. In an important sense it is also a (deep) synthesis, a construction. Note that this holds for any perception, on whatsoever level you’d like to choose. From very similar reasons, Putnam called  \ˌa-nə-ˈli-tik\ (pronunciation of “analytic”) an inexplicable noisy sound [1].

There is a wealth of corollaries here, which we can’t dig into. Yet, it is very clear to us that this transition is near a transcendental category, probably even before space and time. As such, it is also one of the primary architectural (though quite abstract) principles for our undertaking of a machine-based epistemology here. An instance of this transition can be found in the relationship of information and causality.

One of those corollaries is represented by a whole cluster of the ill-posed questions about the mind-body-problem. We would not deny that there is an important difference (now opposing cognitivism, the computational theory of mind, modern neuroscience, etc.), and that it is important to think about that difference. Yet it is not a problem. The concept that makes this problem disappear is  exactly the formulation of the question about the transition from the probabilistic to the propositional. At the times of Descartes whose work paved the way for this pseudo-problem, everything had been conceived as mechanical machines. Information was unknown, computers not available, even a concept of probabilistic networks or computational structures endowed with associative power far beyond any intellectual reach.

The big question here is… well, actually it is not sooo big, how to transfer this insight into a software system. Concerning our population of glued SOMs the simple question is: How to feed them? In less “metaphorical” style—though it is not that metaphorical at all—we (as programmers) have the task to decide about the way we present “information” to the SOM and how we introduce it to it.

Whatsoever the answer will be, the answer does not contain the “symbol.” We would be trapped by the “fallacy of the symbolic,” and concerning our reasoning we would commit a petitio principii: if we put symbols into the concept (or a body) in its very beginning, it is quite likely that we will not find any other thing than symbols thereafter (or destructive “secondary” chaos). It will not solve the problem where the symbols come from. Undeniably, however, we use digital computers and quite obviously also a symbol-based instruction coding system (“programming language”). How then to present information in a non-symbolic manner?

The answer is: by probabilization. We should not think that it is possible to present “facts” to the machine. You may remember the failure of the logics-oriented AI, the Edinburgh-school and their Prolog initiative. Instead, we have to present “probable contexts.” Of course, we have to define the concept of context such that it becomes operable, and again we use symbols for that. But this can be accomplished in a manner compatible to the probabilistic perspective. Any observational act could be conceived of as an interpretation of certain more or less anisotropic and regular changes of energy density. Such a description is almost purely physical. We are definitely on the proto-symbolic, even on a proto-semiotic stage. Fluctuations of physical energy densities are perceived as differential intensities. This scheme is not an absolute one, though. “Physicality” is best conceived as a relative property. For instance, words may form a physical layer for a novel. This view has been developed and emphasized also by Bühlmann [2].

The key element, though often overlooked, here is “interpretation” and its structural quality. We need some habits, methods and theories to be able to interpret. As always, it is important to keep in mind that interpretation is not a formal act, since formal acts are simply rewritings of some graphemes into some other, obeying to a certain fully defined space of allowed relations and transformations. We will discuss this issue in much more detail in the chapter about models and modeling.

I other words, probabilization of observable items, even of symbolic ones, means that we transform their digital symbolics “back” into a level, which could be labeled as “proto-interpretive.” This back-transformation should neither be conceived as a kind of “particularization” nor as a kind of “atomization.” The former would assume a subsuming class, which does not exist on the proto-interpretive level, while the latter would propose a kind of independence between the almost “physical” aspects.  Let us call it the level of “elements,” despite the fact that we do not mean that this level is “more elementary” in the sense of “more basic.” This again would induce the petitio principii of the class-fallacy.

The selection, the design and the arrangement of “elements” is based on habits and theories that are completely outside of the item or context at hand. Obviously we meet a circular relationship here. Yet, that’s not a surprise, we even have a word for it: culture. Ultimately, even the structure of representing identifiable items by their elements may be assumed to be unstable. We simply can’t know about the elements actually in use by principle.

Yet, on the side of the receiving body (in our case the SOM, or the human brain, respectively) this means that there are certain observables (fully within the limit of theory-boundedness of any observing), which need to be taken as densities, not as symbols. The proto-symbolic phase of observation is hence homeomorphic to the space given by the super-position of body and numbers, or more precisely, the space opened by the associative power of particularly arranged matter. As said before, in the beginning there is neither the word nor even the sign. We may call it “impressions,” coming from the external world. Nevertheless, it remains fully acceptable for us that those “impressions”, forming into signs or words downstream the perceptive processes, are also dependent—and mandatorily so—on some kind of theory. Clearly, a proper concept of theory needs to be developed here.

So we find three important elements for dealing with the question about the appropriate presentational level: probabilistic contexts, relativity of physicality, and elements as a precipitation of culture. Nice food, isn’t it?

The transition from the probabilistic to the propositional includes the genesis of labels, and later also of symbols—if the former are going to be repeated and through their usage as abbreviations, or abbreviating models. This transition thus is also the correct description of the problem of “symbol grounding,” on which there is so much babbling. It does not come as a big surprise that a combination of associative concepts and formal concepts is rated as being very promising for the further development of machine-based cognition [3]. Yet, we have to start with the associative part.

Note that for the transition from labels to symbols we need a community, hence mediality, which both are outside of any body. If members of a community aggregate to a form that we then again call “body,” such a body is again on the lower, the “boiling”, levels of the overall system. We will meet this topic again in our discussion on complexity. and the short piece about the strong limitations of swarms and their so-called “collective intelligence.”

Since we necessarily have to refer to certain kinds of bodies, we may be allowed to keep the notion of feeding. That feeding and herding (hoarding?) depends obviously on the inner mechanisms, on the anabolic metabolism of any of the individual SOMs. How should we conceive of the digestion processes that turn “stuff” into “words”? Taking the animal body as a kind of template, we can see that the body removes most of the form of the input-information, it establishes a deformation, before any macroscopic structure is going to be assembled. It is so-to-speak a SOM-on-Steroids that is able to propel us from the body and its world to the word and its logical body.

This article was first published 11/11/2011, last revision is from 28/12/2011

  • [1] Hilary Putnam, The Meaning of “Meaning”. Minnesota Studies in the Philosophy of Science 7:131-193. 1975. available online.
  • [2] Vera Bühlmann, Inhabiting Media. Thesis, University of Basel, 2008.
  • [3] Uta Priss, Associative and Formal Concepts. ICCS’02. available online.


NooLabGlue (Software)

November 11, 2011 § Leave a comment

NooLabGlue is a framework to link applications or parts of applications. There are of course already a lot of them, currently O’Reilly lists 185 different ones in their P2P directory (although they include also very low-level items such as XML-RPC).

NooLabGlue is different. The paradigm is aligned to natural neural systems. Thus, its primary emphasis is not on throughput; instead, NooLabGlue tries to support the associative, probabilistic, adaptive linking of a growing population of Self-Organizing Maps, where the individual SOMs tend to diverge regarding their “content.”

Naturally, NooLabGlue is a framework for massively modular neural systems, where parallelity may occur at any level. Such artificial neural systems may be realized as Self-Organizing Maps (SOM, Kohonen-Map), or as more traditional ANN. Yet, NooLabGlue also allows to link traditional components or applications in the EAI context or its usage as a simple service bus. NooLabGlue transcends the client/server paradigm and replaces it by sources/receptor paradigm before a communicological background.

Such, not only people behind their computing machines can be connected, but more or less autonomous entities of software. Yet, NooLabGlue is not a middleware mainly designed to to distribute a single, particular, well-identified learning task, or even to serve as an infrastructure to distribute the training of an “individual” SOM. These kinds of goals are much too narrow-minded. They just virtualize a Turing-Computation-task to run on many machines instead of one. In contrast to that we are looking for a middleware that is supporting a growing population of probabilistic connected SOMs in the context of Non-Turing-Computation, which requires strikingly different architectural means for the middleware.

The level of integration follows a strikingly different paradigm as compared to other packages or approaches, starting from SOAP, webservices, RMI and up to systems like Akka. The contracting is not accomplished on the level of the field (which is the reason for complexity in SOAP and WS), but on the level of document types, behaviors, and names. Related to that is the plan to incorporate the ability for learning into the MessageBoard.

NooLabGlue follows an explicit transactional approach, which is (with the exception of the transaction id) completely hidden on the level of the frontend-API. Available transports are udp, tcp, ftp, restful http (implemented using the Restlet framework), while everything is wrapped into xml. The complexity of those protocols is completely hidden on the level of the API. Supporting different transport protocols allows for speedy connections in a LAN, while at the same time the whole framework also can run over the WWW/WAN; actually, MessageBoards (the message servers) allow for cascading a message (even transactional messages) from local to remote segments of a network.

The API for participants (“clients” of the MessageBoard) is very simple; it just provides a factory method for creating an instance, as well as “dis/connect(),” “send(),” and a callback for receiving the messages. Sending a (transactional) message is realized as a service, i.e. as a persistent contract to the “future” (or in short, as a future). Participants and even the MessageBoard could be shut down and restarted without loosing a (fully transferred) message. Lost connections are re-established without cookies on the basis of (optionally) unique names.

Both, the participants as well as the MessageBoard are running in a multi-threaded manner, of course, for all kinds of transport layers. Language of realization is Java, though any language could be used to connect to the MessageBoard. In principle, the MessageBoard could be realized also using php, for instance, since the server does not store or access any binary object.

NooLabGlue has been created in the context of advanced machine-learning that proceeds ways beyond the issue of the algorithm. The goal (within next 6 weeks) is a “natural account” of understanding (e.g. language) based on the paradigm of machine-based epistemology.

The attached document  glue messaging features v1.0 contains further details about the intended specification. It can be downloaded here.

The Miracle of Comparing

November 11, 2011 § Leave a comment

Miracles denote the incomparable.

Since comparing is so abundant in our thinking, we even can’t think of any activity devoid of comparing, miracles also signify zones that are truly non-cognitive. Curiously, not believing in the existence of such non-cognitive zones is called agnostic.

Actually, divine revelations seem to be the only cognitive acts that are not based on comparisons. In such kinds of events, we directly find some entity in the world or in our mind, without cause, notably. We simply and suddenly can look at it, humble so. In some way, the only outside of the comparison is the miracle, which is not really an outside, since it is outside of everything. Thinking and comparing do not have a proper neighbor, in order to use a Wittgensteinian concept. Thus, we could conclude that it is not really possible to talk about it. Without any reasonable comparable, thinking itself is outside of anything else. We just can look at it, silently. Obviously, there is a certain resemblance to the event of a miracle. Maybe, that’s the reason for the fact that there are so much misunderstandings about thinking.

Of course, we may build models about thinking. Yet, this does not change very much, even if we apply modern means in our research. On the other hand, this also does not reduce the interestingness of the topic. Astonishingly, even renowned researchers in cognitive sciences as Robert Goldstone (Indiana) feel inclined to justify their engagement with the issue of comparison [1].

It might not be immediately clear why the topic of comparison warrants a whole chapter in a book on human thinking. […] In fact, comparison is one of the most integral components of human thought. Along with the related construct of similarity, comparison plays a crucial role in almost everything that we do.

We fully agree to these introductory words, but Goldstone, as so many researcher in cognitive sciences, proceeds with remarkable trivia.

Furthermore, comparison itself is a powerful cognitive tool – in addition to its supporting role in other mental processes, research has demonstrated that the simple act of comparing two things can produce important changes in our knowledge.

In contrast to Goldstone we also will separate the concept of comparison completely from that of similarity. In his article, Goldstone discusses comparison only through the concept of similarity.

Thinking is closely related to consciousness, hence to self-consciousness, as the German philosopher Manfred Frank argued for [2]. Probably for that reason humans avoid to call mental processes in animals “thinking.” Anyway, for us humans thinking is so natural that we usually do not bother with the operation of comparing, I mean with the structure of that operation. Of course, there is rhetoric, which teaches us the different ways of comparing things with one another. Yet, this engagement is outside of the operation, it just concerns its application. The same is true for mathematics, where always a particular way of comparing is presupposed.  In contrast, we are interested in the structure of the operation of comparing.

Well, our thesis here does not, of course, follow the route of the miraculous, at least not without a better specification of the situation. Miracles are a rather unsuitable thing to rely on as a (software) programmer.

What we will try here is to clarify the anatomy of comparisons. Indeed, we distinguish at least three basic types.

The Anatomy of Comparison: Basic Ingredients

Comparisons are operations. They imply some kind of matter, providing a certain storage or memory capacity. Without such storage/memory-matter, no comparison is possible. It is thus not perfectly correct to speak about the anatomy of comparisons, since any type of comparison is a process in time.

Let us first consider the basic case of a pairwise comparison. Without loss of generality, we take an apple (A) and an orange (B) that we are going to compare. What are, in a first account, the basic elements of such a comparison?

As the proverb intends to convey, what can’t be compared that can’t be compared. In order to compare two entities A and B, we have to assign properties that can be applied to both of the entities. Take “COLOR” as an example here. So, first we assign properties…


In a second step we select just those of the properties that shall (!) be applied to both. There is no necessity that a particular feature is common to both items. Actually, the determination of those sets is a free parameter in modeling. Yet, here we are just interested in the actuality of two such aligned sets (“vectors”).


Those selected properties represent subsets with respect to their parent sets. Given those two subsets a(j.) and b(m.) we now can align the vectors of properties. In data analysis, such vectors are often called feature vectors.


Any comparison then refers to those aligned feature vectors. The more features in such a vector the more comparisons are possible.

In diagnostic/predictive modeling a particular class of operations is applied to such feature vectors, mainly in order to determine the similarity, or its inverse, the so-called distance. We will see in the chapter about similarity what “distance” actually means and why it is deeply problematic to apply this concept as an operationalization of similarity in comparisons.

Before we start with the typology of comparison we should note that the “features” can be very abstract, depending on the actual items and their abstractness to be compared. Features could be taken from physical measurement, from information theoretic considerations, or from any kind of transformation of “initial” features. Regardless the actual setting, there will always be some criteria, even if we use an abstract similarity measure based on information theory, as e.g. proposed by Lin [3].

A second note concerns the concept of the feature vector and its generality. The feature vector does not imply by itself a particular similarity measure, e.g. a distance measure, or a cosine similarity. In other words, it does not imply a particular space of comparison. Only the interaction of the similarity functional with the feature vector creates such a space. Related to that we also have to emphasize again that the sets aj and bm need not be completely identical. There might be features that are considered as being indispensable for the description of the item at hand. Such reasons are, however, external to a particular comparison, opening just a further level. In a larger context we have to expect feature vectors that are not completely matching, since similarity includes the notion of commonality. To measure this requires differences, hence the difference in the feature sets. In turn this requires a measure for the (dimensional) difference of potential solution spaces.

We will discuss all these issues in more detail in the chapter about similarity. There we will argue that the distinctions of different kinds of similarity like the one proposed by cognitive psychologist Goldstone in [1] into geometric, feature-based, alignment-based, and transformational similarity are not appropriate.

Yet, for now and first and here, we will focus on the internal structure of the comparison as an event, or an operation.

Three Types of Comparison

Type I: Comparison within a closed Framework

Type I is by far the most simple of all three types. It can be represented by finite algorithms. Hence, type I is the type of comparison that is used (almost exclusively) in computer sciences, e.g. in advanced optimization techniques or in data mining.

In this type the observables are data (“givens”), the entities as well as their features. The salient example is provided by the database. Hence, the space of potential solutions is well-defined, albeit its indeterminacy and its vast size could render it into a pseudo-open space.

We start with basic propertization, exactly as in the general case shown above in Fig.1:


From these, feature vectors are set up into a table-like structure for column-wise comparison. This comparison is organized as a “similarity function,” which in turn is an operationalization of the concept of similarity. Note that this structure separates the concepts of “comparison” and “similarity,” the latter being set as a part of the former. It is quite important not to conflate comparison and (the operationalization of) similarity


Based on the result of applying the similarity function to the table of feature vectors, a particular proposal can be made about the relations between A and B.


The diagram clearly shows that the proposal can’t be conceived as something “objective,” which could be extracted from an “outside” reality. Such a perspective is widely believed to be appropriate in the field of so-called “data mining.” Obviously, proposals are heavily dependent on feature vectors and the similarity functional. Even in predictive model, where the predictive accuracy can be taken as a corrective or defense against proliferate relativism, the proposal is still dependent on the selected features. As many different selections allow for an almost equal predictive accuracy, there isn’t objectivity either.

From the simple case we can take the important lesson that there is no such thing as a “direct” comparison, hence also no “built-in” objectivity, even not on those low levels of cognition.


We also can see that the structure of comparison comprises three levels of abstraction. This structure further applies to simple translations and topics like transdisciplinary discourses, i.e. the task of relating domain specific vocabularies, where any of them are supposed to consist from well-defined singular terms.

The features that are ultimately used as input into the similarity function are often called the “model parameters”, or also “dimensions.” In philosophical terms we suggest them to relate closely to Spinoza’s Common Terms.


Type II: “Inverted” goal-directed Comparison

The second type is quite different from the type I. Here we do not start with data completely defined also on the level of the properties. Instead, the starting point for processes of type II is determined by the proposal and rather coarse entities, or vaguely described contexts. Hence, it is kind of an inverted comparison.

The diagram for the first step doesn’t look quite spectacular, yet, in some sense it is also quite dramatic in its emptiness.


The second step is based on habits or experience; abstract properties and some similarity function is selected before the comparison.


These are then applied to the large or vaguely given entities. By means of this top-down projection a relation between A and B appears. Only subsequent to establishing that relation we can start with a forward comparison!


The final step then associates the initially vague observations, now constructively related, to the initial proposal.


As we already said, in this process the properties are taken from experience before the are projected onto amorphous observations. In Spinoza’s terms, those properties are “abstract terms”. Essentially, the projection can be also conceived as a construction. In the structure shown above, fiction and optimization are co-dependent and co-extensive. We also could simply call it “invention,” either of a solution or of a problematics, just as you prefer to call it.

The same process of “optimized fiction”, or “fictional optimization,” is often mistaken as “induction from empiric data.” Using the schemes of figure 3 we easily can understand why this claim is a misunderstanding. Actually, such an induction is of course not possible, since their is no necessity in any of the steps.

About the role of experience: The selection of “abstract terms” viz. “suggested properties” is itself based on models, of course. Yet, these models are far outside of the context induced by the observables A, B.

We should note that type-I and type-II comparisons are usually used in a constant interplay, resulting in an open, complex dynamics. This interplay creates a new quality, as Goldstone remarks in [1] as his final conclusion:

When we compare entities, our understanding of the entities changes, and this may turn out to be a far more important consequence of comparison than simply deriving an assessment of similarity.

Goldstone fails, however, to separate similarity and comparison in an appropriate manner. Consequently, he also fails to put categorization into the right place:

Despite the growing body of evidence that similarity comparisons do not always track categorization decisions, there are still some reasons to be sanguine about the continued explanatory relevance of similarity. Categorization itself may not be completely flexible.

Such statements are almost awful in their production of conceptual mess. Our impression is that Goldstone (as many others) does not have the concept at his disposal that we call the transition from probabilistic description to propositional representation.

Type III: Comparison of Populations

The third type, finally, describes the process of comparison on the level of populations. The most striking difference to type I + II concerns the fact that there is no explicitly given proposal, which could be assigned to some kind of input data. Instead, the only visible goal is (long-term) stability. We could say that the comparison is an open comparison.

Let us start with the basic structure we already know as results from type-I and type-II.


Now we introduce two significant elements, population and, as a consequence, time. Indeed, these two elements mark a very important step, not the least with regard to philosophical concepts like idealism. For instance, Hegel’s whole system suffers from the utter negligence of population and (individual) time.

By introducing populations we also introduce repetition and signal horizons. Yet, even more important, we dissolve the objecthood that allowed us to denote two entities as A and B, respectively, into a probabilistic representation. In other words, we replace (crisp) symbols by (open, borderless) distributions. In natural evolution, the logical sequence is just the other way round. There, we start with populations as a matter of fact.

Comparing A and B means to compare two populations A(n) and B(m). Instead of objects, singulars or concepts we talk about types, or species. Comparing populations also means to repeat (denoted by “::n”) the comparison between approximate instances of A(n), aj and  approximate instances of B(m), bk.

It is quite obvious that in real situations we never compare uniquely identifiable, apriori existing objects that are completely described. We rather always have to deal with populations of them, not the least due to the inevitability of modeling, even with regard to simple perception.


Comparing two populations does not result in just one proposal, instead we are faced with a lot of different ones. Even more, the set of achieved proposal can not be expected to be constant in its actuality, since not all proposals arrive at the same time. We then could try to reduce this manifoldness by applying a model, that is, by comparing proposals. Yet, doing so we are faced with both a certain kind of empirical underdetermination and a conceptual indeterminacy.

It is this indeterminacy that causes a qualitative shift in the whole game. It is not the proposals any more that are subject of the comparison. The manifold of proposals lift us to the level of the frequency distribution upon all proposals. Since comparing proposals can not refer to other information than that which is present in the population, deciding about proposals turns into an investigation about the influence of the variation above A(n) or B(m), or the influence of the similarity functional.

The comparison of populations obviously introduces an element of self-referentiality. There are two consequences of this. First, it introduces an undecidability, secondly, comparing populations induces an anisotropy, a symmetry break within them. Compare two populations and you’ll get three. Since this changes the input for the comparison itself, the process either develops the perlocutionary aspect of pragmatic stability as a content-free intention, or the whole game disappears at all.

This pragmatics of induced stability can be described by using the concept of fitness landscape.


The figure above could be conceived as a particular segment of evolutionary processes. For us, it is so natural that things are connected and related to each other that we can hardly image a different picture. Yet, we could revert the saying that evolution is based on competition or competitive selection. Any population that does not engage in the evolutionary comparison game will not develop the pragmatics of stability and hence will disappear more soon than later.

Practical Use

The three types of comparison that we distinguished above are abstract structures. In a practical application, e.g. in modeling, some issues have to be considered.

The most important misunderstanding would be to apply those abstract forms as practical means. Doing so one would commit the mistake not only to equate the local and the global, but also to claim a necessity for this equality.

Above we introduced the principle of aligned feature lists that need to be common for both instances that we are going to compare. Note that we are comparing only two (2) of them! From such a proposal one can not conclude that all items out of a set of available observations are necessarily compared using exactly the same list of features in order to arrive at a particular classification. As Wittgenstein put it, “there is no property all games have in common which distinguishes them from all the activities which are not games.” (cited after Putnam [4, p.11]) This, of course, does not exclude a field of overlapping feature lists suitable for deciding whether something is a game or not. Of course, there is no unique result to be expected. The crispness of the result of such comparison depends on the purpose and its operationalization in modeling, mainly through the choice of the similarity measure.

Yet, such a uniqueness is never to be expected unless we enforce it, e.g. by some sort of formal axiomatics as in mathematics, or mathematical statistics, or, not so different from that, by organizational constraints. If our attempt to create a model for a particular classification task requires a lot of features, such uniqueness is excluded even then if we would use the same list of features for creating all observations. On the other hand this does not mean either that we could not find a suitable result.

The Differential and Ortho-Regulation

Finally, we arrive at the differential as the most sophisticated form of comparison. We consider the differential as the major element of abstract thinking. Here, we neither can discuss the roots of the concepts nor the elaborated philosophy that Gilles Deleuze developed in his book Difference and Repetition [5]. We just would like to emphasize that much of this work has been influenced it.

The differential is not a tool to compare given observables in order to derive a proposal about those observables. Instead, when playing the “Differential Game” we are interested in the potentially achievable proposals. Of course, we also start with a population of observations. Those observations are yet not in any pre-configured, or observable relationship. The diagrams that we will develop in the series below look very different from those we found for the comparisons.

The starting point. Given a set of observations and the respective propertization that we derived according to intellectual habits, our forms of intuition, we may ask which proposals, statements or solutions are possible?


The first step is to replace immutable properties by a more dynamic entity, a procedure. This procedure could be taken as the tool to create a particular partition in the observations {O}, or as a dynamic representation of possible equivalence classes on {O}. We also could call it a model. Note that models always imply also a usage, or purpose.

The interesting thing now is that procedures can be conceived as consisting from rules and their parameters, or in the language of mathematical category theory, of objects and their transformations. The parameters are much like variables, but from the perspective of any particular partitioning, or say, instance, the parameters are constants. This  scheme has been originally invented, or at least it has been written down, by Lagrange in the late 18th century. Most remarkably, he also observed that this scheme can be cascaded. The parameters on the abstract level can be taken as new, quasi-empirical “observations,” and so on.


The important part of this schemes are indeed the free parameters, which are, we have to remember that, also constants. If we now play around with these pfree parameters, we can construct different partitions from {O}, but this also means, that by varying the parameters we can create a proposal beyond such partitioning, or solutions regarding some request (again upon {O}).


Of course, what now becomes possible is the simulation game. Which statement we are actually going to construct is again a matter of habits. Let us call this the forms of construction. Astonishingly, this structure has been overlooked completely so far (with one exception), it went also unnoticed for Immanuel Kant and all of his fellows in philosophy.


Given this scheme we would like to emphasize that there is no direct path from observations to statements. Hence, habits and conventions become active at two different positions in the process that allows us to speak about an observation. This is already true for the most simple judgments about {O}, indicated by S(0). Again, this has not been recognized by epistemology up to date.


Finally, we apply the Lagrangian insight to our scheme. Forms of intuition as well as forms of construction are, of course, not just constants. They are regulated, too. This results in the induction of a cascade. Since this regulation of mental forms (of intuition, or construction, respectively) does not refer to {O}, but instead to the abstraction itself (mainly the selection of the parameters), it appears as if this secondary mental forms are in a different dimension, not visible within the underlying activity of comparing. Thus we call it ortho-regulation.


It is actually quite surprising that there are whole philosophical schools that deny the (cascaded) vertical dimension of these processes. One of the examples is provided by Jacques Derrida’s work. At many places throughout his writings he comes up with rather weird ideas to preserve the mental flatness. One of them is the infamous “uninterpretable trace” (grm.: Spur).

A common misunderstanding is committed by many scientists influenced by positivism (who of them is not?) concerning the alternatives S(k). Determinists claim that the step from abstraction to the solution is unique, or at least determined as a well-defined alternative of a finite set. Doing so, they deny implicitly the necessity for orthoregulation, hence they also deny any form of internal freedom as well as the importance of conventions (see below). This paves the way for the nonsensical conviction that the choice between S(k) can be computed algorithmically (as a Turing-Machine computes). The schemes above clearly show that such a conception must be considered as seriously deficient.

The following scheme may be taken as an abbreviated form for the phenomenon of (abstract) thinking.


Quite important, the differential is isomorphic to the metaphor. Actually, we are convinced that metaphors are not a linguistic phenomenon. Metaphors are the direct consequence of necessary structures in thinking and modeling.

Architectonics of Comparison

Comparisons belong, together with the classifications that are based on them, to the basic elements of cognition and higher-level mental processes. Thus they may be taken as a well-justified starting point for any consideration of the conditions for epistemic processes.

Astonishingly, such considerations are completely absent in science as well as the humanities (and their mixed forms). The only vaguely related references that can be found—and there are really on very few of them—are from the field of (comparative) linguistics or literature science. In linguistics it is not the structure abstract operation of comparison that is focused [6]; here comparison is taken as a primitive and then applied to linguistic structures. One of the research paradigms is given by the case of adjectives like smaller, higher etc. What is studied there is the structure of the application of the operation of comparing, not the operation of comparing itself.  In comparative literature science, however, an interesting note can be found. In his inquiry of the writings of Jean Paul, a German Romanticist, Coker [7, p.397] distinguishes different types of comparisons, at least implicitly, and relates a particular one directly to imagination, a result that we can confirm through our formal analysis:

“The imagination is a structure of comparison through which desire can realize its infinite nature, always transcending finite givens.”

Besides this really rare occurrence, however, comparisons are always taken in its most simple and reduced form, the comparison along a numerical scale. This type is even more primitive than our simplest type shown in Fig.2. It is true that all of our three types ultimately are based on the primitive type, yet, considering normal thinking, the reduction to the primitive case is inappropriate. The language is full of comparisons and comparative moves, where we call it metaphor. For more than 100 years now linguists are reasoning about metaphors in largely inappropriate ways precisely because they impose a reduced concept of comparison.

A reference to a more elaborated and rich concept of comparison is missing completely, in cognitive sciences as well as in computer science. Even the field of metaphorology did not contribute a clear structural view. So we conclude that the problematics of comparison seems to play a role only in improper proverbs about apples and oranges, or apples and pears.

Hence, the two cornerstones of any type of comparison remained undetected, propertization and ortho-regulation. We propose that these are the elements of an  architectonics of comparison. The propertization will be discussed in the chapter about modeling, so we can turn to the phenomenon of ortho-regulation.


The concept of ortho-regulation becomes visible only if we take two approaches serious: rule-following (Wittgenstein) and the differential (Deleuze). The first step is the discovery of the Forms of Construction. In a second step, symmetry considerations lead us to the cascaded view.

The notion of “forms of construction” may appear as trivial and well-known. Yet, it is usually applied as a concept used while thinking, in the sense of a particular way to construct something, not as a basic concept constitutive for thinking (e.g. Quine in [8], or Sandbothe in [9]); for example, it is spoken about “forms of construction of reality.” In contrast to that we consider “forms of construction” here as a transcendental principle of thinking.

Orthoregulations are rules that organize rules. Wittgenstein dismissed such an attempt, since he feared an infinite regress. Since then this remained the received view. Nevertheless, we think that this dismissal has been devised to hastily. There is no thread through an infinite regress because the rules on the level of orthoregulations are neither based on nor directed towards observations {O} about facts. The subject of ortho-regulative rules are rules. In other words, their empirical basis is not only completely different, but also much smaller and much more difficult to learn. Ortho-regulative rules can not be demonstrated as readily as, say, how to follow an instruction.

The cascade is thus not an infinite one, rather, it stops presumably quite soon. To derive rules Rx about rules from a more basic body of rules Rb, you need a lot of instances or observations about Rb. There are less hawks than mice. Proceeding in the chain of rules about rules soon there are not enough observations available anymore to derive further regularities and rules. We agree to Wittgenstein’s claim that rule-following must come to an end, yet for a different reason. Accordingly, the stopping point where rule-following becomes impossible is not the fear of the philosopher, it is a point deeply buried in our capability to think, precisely because thinking is a bodily, hence empirical activity. Note that “our” here means “human stuffed with a brain.” This point is a very interesting one, which quite unfortunately we can not investigate here. As a last remark on the subject we would like to hint to Leibniz’ idea of the monad and the associated concept of absolute interiority.

Another discussion we can not follow is of course Kant’s notion of the form of intuition. We simply are not fluent enough to develop serious arguments from a Kantian perspective. We find it, however, remarkable, that he missed the counterpart of the rising branch of abstraction. In some way, we guess, this could be the reason for his prevailing difficulties with the (ethical) notion of freedom, which Kant considered to be in an antinomic relation to being determined [e.g. 10]. His categorical imperative is a weak argument, since Kant had to introduce it actually much like an axiom. He was quite desperate about that, as he expressed this in his last writing [11]. His argument that the capability to choose one’s own determination reflects or implies freedom is at least incomplete and does not work in our times any more, where we know crazy things about the brain. Our analysis shows that this antinomic contrast is misplaced.

Instead freedom arises inevitably with thinking itself through the necessity of applying forms of construction. There is no necessity of any kind to choose a particular statement from all potential alternatives S(k) (see Fig.5e). Note that this choice is indeed actualizing a potential therefore. Furthermore, it is not only a creative act, though not without being bound to rules, it is also an act that can not be completely determined any kind of subsequent model. Hence, it is actions that are introducing virtuality into the world by virtue of creating statements in a non-predictable way. Saying non-predictable, one should not think that there could be some kind of measurement that would allow to render this choice or creation predictable. It is non-predictable because it is not even in the space of predictable things.

Freedom is thus not an issue of quantum mechanics, as Kauffman tries to argue so hard [12]. It is also not an issue of human jurisdiction or any other concept of human society. Above all, freedom is nothing which could be created or prepared, as Peter Bieri [13] and other analytic philosophers believe. A reduction to the probabilistic nature of the world would be circular and the wrong level of description. Quite to the contrast to those proposals we think that freedom it is a necessary consequence of abstraction in thought. Since any kind of modeling that is not realized as body (think of adaptive behavior of amoebas or bacterias) implies abstraction, it makes perfectly sense even in philosophical terms to say that even animals have a free will. Everyone who lives with a cat knows about that. We can also see that freedom is directly related to the intensity in the cognitive domain. They do so as long as they are performing abstract modeling. No thoughts, so no freedom, no expression of will, so no cognitive capacity. Being a machine (whether from silicon or from flesh), so no will and no cognitive capacity.

While forms of intuition can be realized quite easily on a computer as a machine learning algorithm, this is not possible for forms of construction. It is the inherently limited cascade of ortho-regulations on the one side, and the import of conventions through those that create a double-articulation for rule-following, that point towards a transcendental singularity. It is not possible to speak formally or clearly about this singularity, of course. Maybe we could say that this singularity is a zone where the being’s exteriority (conventions) directly interferes with its interiority (the associative power of the body). It feels a bit like a wormhole in space, since we find entities connecting that normally are far apart from each other. We also could call it a miracle, no problem with that.

Fortunately enough, there is also a perspective that is closer to the application. More profane we could also simply say (in an expression near the surface of the story) that freedom exists because those brains form “minds” in a community, where those “minds” need to be able to step onto the Lagrangian path of abstraction. We do not need the concept of will for creating freedom, it is just the other way round.

Consequences for Epistemology

Orthoregulation and the underlying forms of construction are probably one of the most important concepts for a proper formulation of epistemology. Without the capability for ortho-regulation we will not find autonomy. A free-ranging machine is not an autonomous being, of course, even if it “develops” “own” “decisions” if it is put into a competitive swarm with similar entities.

The concept of ortho-regulation throws some light onto our path towards machine-based epistemology. Last but not least, it is a strong argument for a growing self-referential system of associative entities that are parts of a human community.

This article was first published 11/11/2011, last revision is from 28/12/2011

  • [1] Robert L. Goldstone, Sam Day, Ji Y. Son, Comparison. in: Britt Glatzeder, Vinod Goel, Albrecht von Müller (eds.), Towards a Theory of Thinking – Building Blocks for a Conceptual Framework. Springer New York. pp.103-122.
  • [2] Manfred Frank, Wege aus dem Deutschen Idealismus.
  • [3] Dekang Lin, An Information-Theoretic Definition of Similarity. In: Proceedings of the 15th International Conference on Machine Learning ICML, 1998, pp. 296-304.
  • [4] Hilary Putnam, Renewing Philosophy. 1992.
  • [5] Gilles Deleuze, Difference and Repetition. Continuum Books, London, New York 1994 [1968].
  • [6] Scott Fults, The Structure of Comparison: An Investigation of Gradable Adjectives. Diss. University of Maryland, 2006.
  • [7] Coker, William N. (2009) “Narratives of Emergence: Jean Paul on the Inner Life,” Eighteenth-Century Fiction: Vol. 21: Iss. 3, Article 5. Available at:
  • [8] WVO Quine, Two Dogmas of Empiricism,
  • [9] Mike Sandbothe (1998), The Transversal Logic of the World Wide Web,11th Annual Computers and Philosophy Conference in Pittsburgh (PA), August 1998; available online
  • [10] Rudolf Eisler, Freiheit des Willens, Wörterbuch der philosophischen Begriffe. 1904. available online.
  • [11] Immanuel Kant, Zum Ewigen Frieden.
  • [12] Stuart A Kauffman (2009), Five Problems in the Philosophy of Mind.,  available online.
  • [13] Peter Bieri, Das Handwerk der Freiheit, Hanser 2001.


Where Am I?

You are currently viewing the archives for November, 2011 at The "Putnam Program".