February 28, 2012 § Leave a comment

There are good reasons to think that data appear

as the result of friendly encounters with the world.

Originally, “data” has been conceived as the “given”, or as things that are given, if we follow the etymological traces. That is not quite surprising since it is closely related to the concept of date as a point in time. And what if not time could be something that is given? The concept of date is, on the the other, related to the computation, at least, if we consider etymology again. Towards the end of the medieval ages, the problems around the calculation of the next Easter date(s) triggered the first institutionalized recordings of rule-based approaches that have been called “computation.” At those times, it already has been a subject for specialists…

Yet, the cloud of issues around data also involves things. But “things” are nothing that are invariably given, so to speak as a part of an independent nature. In Nordic languages there is a highly interesting link to constructivism. Things originally denoted some early kind of parliament. The Icelandic “alþingi”, or transposed “Althingi” is the oldest parliamentary institution in the world still extant, founded in 930. If we take this thread further it is clear that things refer to entities that have been recognized by the community as subject for standardization. That’s the job of parliaments or councils. Said standardization comprises the name, rules for recognizing it, and rules for using or applying it, or simply, how to refer to it, e.g. as part of a semiosic process. That is, some kind of legislation, or norming, if not to say normalization. (That’s not a bad thing in itself, only if a society is too eager in doing so, standardization is a highly relevant condition for developing higher complexity, see here) And, back to the date, we fortunately know also about a quite related usage of the “date” as in “dating” or to make a date, in other words, to fix the (mostly friendly) issues with another person…

The wisdom of language, as Michel Serres once coined it (somewhere in his Hermes series, I suppose) knew everything, it seems. Things are not, because they remain completely beyond even any possibility to perceive them if there is no standard to treat the differential signals it provides. This “treatment” we usually call interpretation.

What we can observe here in the etymological career of “data” is nothing else than a certain relativization, a de-centering of the concept away from the absolute centers of nature, or likewise the divine. We observe nothing else than the evolution of a language game into its reflected use.

This now is just another way to abolish ontology and its existential attitude, at least as far as it claims an “independent” existence. In order to become clear about the concept of data, what we can do about it, or even how to use data, we have to arrive at a proper level of abstraction, that to understand is not a difficult thing in itself.

This, however, also means that “data processing” can’t be conceived in the way as we conceive, for instance, the milling of grain. Data processing should me taken much more as a “data thinging” than as a data milling, or data mining. There is deep relativity in the concept of data, because it is always an interpretation that creates them. It is nonsense to naturalize them in the infamous equation  “information=data+meaning”, we already discussed that in the chapter about information. Yet, this process probably did not reach its full completion, especially not in the discipline of so-called computer “sciences”. Well, every science started as some kind of Hermetism or craftmenship…

Yet, one still might say that at a given point at time we come upon encoded information, we encounter some written, stored, or somehow else materially represented structured differences. Well, ok, that’s  true. However, and that’s a big however: We still can NOT claim that the data is something given.

This raises a question: what are we actually doing when we say that we “process” data? At first sight, and many people think so, that this processing data produces information. But again, it is not a processing in the sense of milling. This information thing is not the result of some kind of milling. It needs constructive activities and calls for affected involvement.

Obviously, the result or the produce of processing data is more data. Data processing is thus a transformation. Probably it is appropriate to say that “data” is the language game for “transforming the possibility for interpretation into its manifold.” Nobody should wonder about the fact that there are more and more “computers” all the time and everywhere. Besides the fact that the “informationalization” of any context allows for a improved generality as well as for improved accuracy (they excluded each other in the mechanical age), the conceptual role of data itself produces an built-in acceleration.

Let us leave the trivial aspects of digital technology behind, that is, everything that concerns mere re-arrangement and recombination without loosing and adding anything. Of course, creating a pivot table may lead to new insights since we suddenly (and simply) can relate things that we couldn’t without pivoting. Nevertheless, it is mere re-arrangement, despite it is helpful, of course. It is clear that pivoting itself does not produce any insight, of course.

Our interest is in machine-based episteme and its possibility. So, the natural question is: How to organize data and its treatment such that machine-based episteme is possible? Obviously this treatment has to be organized and developed in a completely autonomous manner.

Treating Data

In so-called data mining, which only can be considered as a somewhat childish misnomer, people often report that they spend most of the time in preparing data. Up to 80% of the total project time budget is spent for “preparing data”. Nothing else cold render the inappropriate concepts behind data mining more visible than this fact. But one step at a time…

The input data to machine learning are often considered to be extremely diverse. At first place, we have to distinguish between structured and unstructured data, secondly, we unstructured qualities like text or images or the different scales of expression.

Table 1: Data in the Quality Domain

structured data things like tables, or schemes, or data that could be brought into that form in one way or another; often related to physical measurement devices or organizational issues (or habits)
unstructured data  —- entities that can’t be brought into a structured form before processing them in principle. It is impossible to extract the formal “properties” of text before interpreting it; those properties we would have to know before being able to set up any kind of table into which we could store our “measurement”. Hence, unstructured data can’t be “measured”. Everything is created and constructed “on-the-fly”, sailing while building the raft, as Deleuze (Foucault?) put it once. Any input needs to be conceived as and presented to the learning entity in a probabilized form.

Table 1: Data in the Scale Domain

real-valued scale numeric, like 1.232; mathematically: real numbers, (ir)rational numbers, etc. infinitely different values
ordinal scale enumerations, ordering, limited to a rather small set of values, typically n<20, such like 1,2,3,4; mathematically: natural numbers, integers
nominal scale singular textual tokens, such like “a”, “abc”, “word”
binary scale only two values are used for encoding, such as 1,0, or yes,no etc.

Often it is proposed to regard the real-valued scale as the most dense one, hence it is the scale that could be expected to transport the largest amount of information. Despite the fact that this is not always true, it surely allows for a superior way to describe the risk in modeling.

That’s not all of course. Consider for instance domains like the financial industry. Here, all the data are marked by a highly relevant point of anisotropy regarding the scale: the zero. As soon something becomes negative, it belongs to a different category, albeit it could be quite close to another value if we consider just the numeric value. It is such domain specific issues that contribute to the large efforts people spend to the preparation of data. It is clear that any domain is structured by and knows about lot of such “singular” points. People then claim that they have to be a specialist in the respective domain in order to be able to prepare the data.

Yet, that’s definitely not true, as we will see.

In order to understand the important point we have to understand a further feature of data in the context of empirical analysis.Remember, that in empirical analysis we are looking primarily for a mapping function, which transforms values from measurement into values of a prediction or diagnosis, in short, into the values that describe the outcome. In medicine we may measure physiological data in order to achieve a diagnosis, and doing so is almost identical as other people perform measures in an organization.
Measured data can be described by means of a distribution. A distribution simply describes the relative frequency of certain values. Let us resort to the following two. examples. Here you see simply frequency histograms, where each bin reflects the relative frequency of the values falling into the respective bin.

What is immediately striking is that both are far from the analytical distributions like the normal distribution. They are both strongly rugged, far from being smooth. What we can see also: they have more than one peak, even as it is not clear how many peaks there are.

Actually, in data analysis one meets such conditions quite often.

Figure 1a. A frequency distribution showing (at least) two modes.

Figure 1b. A sparsely filled frequency distribution

So, what to do with that?

First, the obvious anisotropy renders any trivial transformation meaningless. Instead, we have to focus precisely those inhomogeneities.  In a process perspective we may reason that the data that have been measured by a single variable actually are from at least two different processes, or that the process is non-stationary and switches between (at least two) different regimes. In either case, we split the variable into two, applying a criterion that is intrinsic to the data. This transformation is called deciling, and it is probably the third-most important transformation that could be applied to data.

Well, let us apply deciling to data shown inFigure 1a.

Figure 2a,b: Distributions after deciling a variable V0 (as of Figure 1a) into V1 and V2. The improved resolution for the left part is not shown.

The result is three variables, and each of them “expresses” some features. Since we can treat them (and the values comprised) independently, we obviously constructed something. Yet, we did not construct a concept, we just introduced additional potential information. At that stage, we do not know whether this deciling will help to build a better model.

Variable V1 (Figure 2a (left part ) ) can be transformed further, by shifting the value to the right through applying a log-transformation. A log-transformation increases the differences between small values and decreases the differences between large values, and it does so in a continuous fashion. As a result, the peak of the distribution will move more to the right (and it will also be less prominent). Imagine a large collection of bank accounts, most of them filled with amounts between 1’000 and 20’000, while some host 10’00’000.  If we map all those values onto the same width, the small amounts can’t be well distinguished any more, and we have to do that mapping, called linear normalization, with all our variables in order to make variances comparable. It is mandatory to transform such left-skewed distributions into a new variable in order to access the potential information represented by it. Yet, as always in data analysis, before we didn’t complete the whole modeling cycle down to validation we can not know whether a particular transformation will have any or even a positive effect for the power of our model.

The log transformation has a further quite neat feature: it is defined only for positive values. Thus, is we apply a transformation that creates negative values for some of the observed values and subsequently apply a log-transform, we create missing values. In other words, we disregard some parts of the information that originally has been available in the data. So, a log-transform can be used to

  • – render items discernible in left-skewed distributions, and to
  • – blend out parts of information dedicatedly by a numeric transformation.

These two possible achievements make the log-transform one of the most frequently applied.

The most important transformation in predictive modeling is the construction of new variables by combining a small number (typically 2) of hitherto available ones, either analytically by some arithmetics, or more generally, any suitable mapping, inclusive the SOM, from n variables to 1 variable. Yet, this will be discussed at a later point (in another chapter, for an overview see here). The trick is to find the most promising of such combinations of variables, because obviously the number of possible combinations is almost infinitely large.

Anyway, the transformed data will be subject to an associative mechanism, such like the SOM. Such mechanism are based on the calculation of similarities and the comparison of similarity values. That is, the associative mechanism does not consider any of the tricky transformations, it just reflects the differences in the profiles (see here for a discussion of that).

Up to this point the conclusion is quite clear. Any kind of data preparation just has to improve the distinguishability of individual bits. Since we anyway do not know anything about the structure of the relationship between measurement, the prediction and the outcome we try to predict, there is nothing else we could do in advance. On the second line this means that there is no need to import any kind of semantics. Now remember that transforming data is an analytic activity, while it is the association of things that is a constructive activity.

There is a funny effect of this principle of discernibility. Imagine an initial model that comprises two variables v-a and v-b, among some others, for which we have found that the combination a*b provides a better model. In other words, the associative mechanism found a better representation for the mapping of the measurement to the outcome variable. Now first remember that all values for any kind of associative mechanism has to be scaled to the interval [0..1]. Multiplying two sets of such values introduces a salient change if both values are small or if both values are large. So far, so good. The funny thing is that the same degree of discernibility can be achieved by the transformative coupling v-a/v-b, by the division. The change is orthogonal to that introduced by the multiplication, but that is not relevant for the comparison of profiles. This simple effect nicely explains a “psychological” phenomenon… actually, it is not psychological but rather an empiric one: One can invert the proposal about a relationship between any two variables without affecting the quality of the prediction. Obviously, it is rather not the transformative function as such that we have to consider as important. Quite likely, it is the form aspect of the data space warping qua transformation that we should focus on.

All of those transformation efforts exhibit two interesting phenomena. First, we apply them all as a hypothesis, which describes the relation between data, the (more or less) analytic transformation, the associative mechanism, and the power of the model. If we can improve the power of the model by selecting just the suitable transformations, we also know which transformations are responsible for that improvement. In other words, we carried out a data experiment, that, and that’s the second point to make here, revealed a structural hypothesis about the system we have measured. Structural hypotheses, however, could qualify as pre-cursors of concepts and ideas. This switching forth and back between the space of hypotheses H and the space of models (or the learning map L, as Poggio et al. [1] call it)

Thus we end up with the insight that any kind of data preparation can be fully automated, which is quite contrary to the mainstream. For the mere possibility of machine-based episteme it is nevertheless mandatory. Fortunately, it is also achievable.

One (or two) last word on transformations. A transformation is nothing else than a method, and importantly, vice versa. This means that any method is just: a potential transformation. Secondly, transformations are by far, and I mean really by far, more important than the choice of the associative method. There is almost no (!) literature about transformations, and almost all publications are about the proclaimed features of a “new” method. Such method hell is dispensable. The chosen method just needs to be sufficiently robust, i.e. it should not—preferably: never—introduce a method-specific bias or, alternatively, it should allow to control as much of its internal parameters as possible. Thus we chose the SOM. It is the most transparent and general method to associate data into groups for establishing the transition from extensions to intensions.

Besides the choice of the final model, the construction of a suitable set of transformation is certainly one of the main jobs in modeling.

Automating the Preparation of Data

How to automate the preparation of data? Fortunately, this question is relatively easy to answer: by machine-learning.

What we need is just a suitable representation of the problematics. In other words, we have to construct some properties that together potentially describe the properties of the data, especially the frequency distribution.

We have made good experiences by applying curve fitting to the distribution in order to create the fingerprint that describe the properties of the values represented by a variable. For instance, a 5-th order polynomial, together with an negative exponential and a harmonic fit (trigonometric functions) are essential for such a fingerprint (don’t forget the first derivatives, and the deviation from the models). Further properties are the count and location of empty bins. The resulting vector typically comprises some 30 variables and thus contains enough information for learning the appropriate transformation.


We have seen that the preparation of data can be automated. Only very few domain-specific rules are necessary to be defined apriori, such as the anisotropy around zero for the financial domain. Yet, the important issue is that they indeed can be defined apriori, outside the modeling process, and fortunately, they are usually quite well-known.

The automation of the preparation of data is not an exotic issue. Our brain does it all the time. There is no necessity for an expert data-mining homunculus. Referring to the global scheme of targeted modeling (in the chapter about technical aspects) we now have completed the technical issues for this part. Since we already handled the part of associative storage, “only” two further issues on our track towards machine-based episteme remain: the issue of the emergence of ideas and concepts, and secondly, the glue between all of this.

From a wider perspective we definitely experienced the relativity of data. It is not appropriate to conceive data as “givens”. Quite in contrast, they should be considered as subject for experimental re-combination, as kind of an invitation to transform them.

Data should not be conceived as a result of experiments or measurements, some kind of immutable entities. Such beliefs are directly related to naive realism, to positivism or the tradition of logical empiricism. In contrast, data are the subject or the substrate of experiments of their own kind.

Once the purpose of modeling is given, the automation of modeling thus is possible.  Yet, this “purpose” can be first quite abstract, and usually it is something that results from social processes. It is a salient and an open issue, not only for machine-based episteme, how to create, select or achieve a “purpose.”

Even as it still remains within the primacy of interpretation, it is not clear so far whether targeted modeling can contribute here. We guess, not so much, at least not for its own. What we obviously need is a concept for “ideas“.

  • [1] Tomaso Poggio, Ryan Rifkin, Sayan Mukherjee1 & Partha Niyogi (2004). General conditions for predictivity in learning theory. Nature 428: 419-422 (25 March 2004).


A Pragmatic Start for a Beautiful Pair

February 17, 2012 § Leave a comment

The status of self-referential things is a very particular one.

They can be described only by referring to the concept of the “self.”

Of course, self-referential things are not without conditions, just as any other thing, too. It is, however, not possible to describe self-referential things completely just by means of those conditions, or dependencies. Logically, there is an explanatory gap regarding their inward-directed dependencies. The second peculiarity with self-referential things is that there are some families of configurations for which they become generative.

For strongly singular terms no possible justification exists. Nevertheless, they are there, we even use them, which means that the strong singularity does not imply isolation at all. The question then is about how we can/do achieve such an embedding, and which are the consequences of that.


Despite the fact that there is no entry point which could by apriori be taken as a justified or even salient one we still have to make a choice which one actually to take. We suppose that there is indeed such a choice. It is a particular one though. We do not assume that the first choice is actually directed to an already identified entity as this would mean that there already would have been a lot of other choices in advance. We would have to select methods and atoms to fix, i.e. select and choose the subject of a concrete choice, and so on.

The choice we propose to take is neither directed to an actual entity, nor is it itself a actual entity. We are talking about a virtual choice. Practically, we start with the assumption of choosability.

Actually, Zermelo performed the same move when trying to provide a sound basis for set theory [1] after the idealistic foundation developed by Frege and others had failed so dramatically, leading into the foundational crisis of formal sciences [2]. Zermelo’s move was to introduce choosability as an axiom, called the axiom of choice.

For Zermelo’s set theory the starting point, or if you prefer, the anchor point, lies completely outside the realm of the concept that is headed for. The same holds for our conceptualization of formalization. This outside is the structure of pragmatic act of choice itself. This choice is a choice qua factum, it is not important that we choose from a set from identified entities.

The choice itself proposes by its mere performance that it is possible to think of relations and transformations; it is the unitary element of any further formalization. In Wittgenstein’s terms, it is part of the abstract life form. In accordance to Wittgenstein’s critique of Moore’s problems1, we can also say that it is not reasonable, or more precise: it is without any sense, to doubt on the act of choosing something, even if we did not think about anything particular. The mere executive aspect of any type of activity is sufficient for any a posteriori reasoning that a choice has been performed.

Notably, the axiom of choice implies the underlying assumption of intensive relatedness between yet undetermined entities. In doing so, this position represents a fundamental opposite to the attitude of Frege, Russell and any modernist in general, who always start with the assumption of the isolated particle. For these reasons we regard the axiom of choice as one of the most interesting items in mathematics!

The choice thus is a Deleuzean double-articulation [3], closely related to his concept of the transcendental status of difference; we also could say that the choice has a transcendental dualistic characteristics. On the one hand there is nothing to justify. It is mere movement, or more abstract, a pure mapping or transformation, just as a matter of fact. On the other hand, it provides us with the possibility of just being enabled to conceive mere movement as such a mapping transformation; it enables us to think the unit before any identification. Transformation comes first; Deleuze’s philosophy similarly puts the difference into the salient transcendental position. To put it still different, it is the choice, or the selection, that is inevitably linked to actualization. Actualization and choice/selection are co-extensive.

Just another Game

So, let us summarize briefly the achievements. First, we may hold that similarly to language, there is no justification for formalization. Second, as soon as we use language, we also use symbols. Symbols on the other hand take, as we have seen, a double-articulated position between language and form. We characterized formalization as a way to give a complicated thing a symbolic form that lives within a system of other forms. We can’t conceive of forms without symbols. Language hence always implies, to some degree, formalization. It is only a matter of  intensity, or likewise, a matter of formalizing the formalization, to proceed from language to mathematics. Third, both language and formalization belong to particular class of terms, that we characterized as strongly singular terms. These terms may be well put together with an abstract version of Kant’s organon.

From those three points follows that concepts that are denoted by strongly singular terms, such as formalization, creativity, or “I”, have to be conceived, as we do with language, as particular types of games.

In short, all these games are being embedded in the life form of or as a particular (sub-)culture. As such, they are not themselves language games in the original sense as proposed by Wittgenstein.

These games are different from the language game, of course, mainly because the underlying mechanisms as well as embedding landscape of purposes is different. These differences become clearly visible if we try to map those games into the choreostemic space. There, they will appear as different choreostemic styles. Despite the differences, we guess that the main properties of the language game apply also to the formalization game. This concerns the setup, the performance of such games, their role, their evaluation etc.etc., despite the effective mechanisms might be slightly different; for instance, Brandom’s principle of the “making it explicit” that serves well in the case of language is almost for sure differently parameterized for the formalizatin or the creativity game. Of course, this guess has to be subject of more detailed investigations.

As there are different natural languages that all share the same basement of enabling or hosting the possibility of language games, we could infer—based on the shared membership to the family of strongly singular terms— that there are different forms of formalization. Any of course, everybody knows at least two of such different forms of formalization: music and mathematics. Yet, once found the glasses that allow us to see the multitude of games, we easily find others. Take for instance the notations in contemporary choreography, that have been developed throughout the 20ieth century. Or the various formalizations that human cultures impose onto themselves as traditions.

Taken together it is quite obvious that language games are not a singularity. There are other contexts like formalization, modeling or the “I-reflexivity” that exist for the same reason and are similarly structured, although their dynamics may be strikingly different. In order to characterize any possible such game we could abstract from the individual species by proceeding to the -ability. Cultures then could be described precisely as the languagability of their members.


Based on the concept of strongly singular terms we first proof that we have to conceive of formalization (and symbol based creativity) in a similar way as we do for language. Both are embedded into a life form (in the Wittgensteinian sense). Thus it makes sense to propose to transfer the structure of the “game” from the domain of natural language to other areas that are arranged around strongly singular terms, such as formalization or creativity in the symbolic domain. As a nice side effect this brought us to the proper generalization of the Wittgensteinian language games.

Yet, there is still more about creativity that we have to clarify before we can relate it to other “games” like formalization and to proof the “beauty” of this particular combination. For instance, we have to become clear about the differences of systemic creativity, which can be observed in quasi-material arrangements (m-creativity), e.g. as self-organization, and the creativity that is at home in the realm of the symbolic (s-creativity).

The next step is thus to investigate the issue of expressibility.

Part 2: Formalization and Creativity as Strongly Singular Terms

Part 4: forthcoming: Elementarization and Expressibility


1. In an objection to Wittgenstein, Moore raised the skeptic question about the status of certain doubts: Can I doubt that this hand belongs to me? Wittgenstein denied the reasonability of such kind of questions.

  • [1] Zermelo, Set theory
  • [2] Hahn, Grundlagenkrise
  • [3] Deleuze & Guattari, Milles Plateaus


Formalization and Creativity as Strongly Singular Terms

February 16, 2012 § Leave a comment

Formalization is based on the the use of symbols.

In the last chapter we characterized formalization as a way to give a complicated thing a symbolic form that lives within a system of other forms.

Here, we will first discuss a special property of the concepts of formalization and creativity, one that they share for instance with language. We call this property strong singularity. Then, we will sketch some consequences of this state.

What does “Strongly Singular” mean?

Before I am going to discuss (briefly) the adjacent concept of “singular terms” I would like to shed a note on the newly introduced term of “strong singularity”.

The ordinary Case

Let us take ordinary language, even as this may be a difficult thing to theorize about. At least, everybody is able to use it. We can do a lot of things with language, the common thing about these things is, however, that we use it in social situations, mostly in order to elicit two “effects”: First, we trigger some interpretation or even inference in our social companion, secondly, we indicate that we did just that. As a result, a common understanding emerges, formally taken, a homeomorphism, which in turn then may serve as the basis for the assignment of so-called “propositional content”. Only then we can “talk about” something, that is, only then we are able to assign a reference to something that is external to the exchanged speech.

As said, this is the usual working of language. For instance, by saying “Right now I am hearing my neighbor exercising piano.” I can refer to common experience, or at least to a construction you would call an imagination (it is anyway always a construction). This way I refer to an external subject and its relations, a fact. We can build sentences about it, about which we even could say whether they correspond to reality or not. But, of course, this already would be a further interpretation. There is no direct access to the “external world”.

In this way we can gain (fallaciously) the impression that we can refer to external objects by means of language. Yet, this is a fallacy, based on an illegitimate shortcut, as we have seen. Nevertheless, for most parts of our language(s) it is possible to refer to external or externalized objects by exchanging the mutual inferential / interpretational assignments as described above. I can say “music” and it is pretty clear what I mean by that, even if the status of the mere utterance of a single word is somewhat deficient: it is not determined whether I intended to refer to music in general, e.g. as the totality of all pieces or the cultural phenomenon, or to a particular piece, to a possibility of its instantiation or the factual instance right now. Notwithstanding this divergent variety, it is possible to trigger interpretations and to start a talk between people about music, while we neither have to play or to listen to music at that moment.

The same holds for structural terms that regulate interpretation predominantly by their “structural” value. It is not that important for us here, whether the externalization is directed to objects or to the speech itself. There is an external, even a physical justification for the starting to engage in the language game about such entities.

Something different…

Now, this externalization is not possible for some terms. The most obvious is “language”. We neither can talk about language without language, nor can we even think “language” or have the “feeling” of language without practicing it. We also can’t investigate language without using or practicing it. Any “measurement” about language inevitably uses language itself as the means to measure, and this includes any interpretation of speech in language as well. This self-referentiality further leads to interesting phenomena, such as “n-isms” like the dualism in quantum physics, where we also find a conflation of scales. If we would fail to take this self-referentiality into consideration we inevitably will create faults or pseudo-paradoxa.

The important issue about that is that there is no justification of language which could be expressed outside of language, hence there is no (foundational) justification for it at all. We find a quite unique setting, which corrodes any attempt for a “closed” , i.e. formal analysis of language.

The extension of the concept “language” is at the same time an instance of it.

It is absolutely not surprising that the attempt for a fully mechanic, i.e. apriori determined or algorithmic analysis of language must fail. Wittgenstein thus arrived at the conclusion that language is ultimately embedded as a practice in the life form [1] (we would prefer the term “performance” instead). He demanded, that justifications (of language games as rule-following) have to come to an end1; for him it was fallacious to think that a complete justification—or ultimate foundation—would be possible.

Just to emphasize it again: The particular uniqueness of terms like language is that they can not be justified outside of themselves. Analytically, they start with a structural singularity. Thus the term “strong singularity” that differs significantly from the concept of the so-called “singular term” as it is widely known. We will discuss it below.

The term “strong singularity” indicates the absence of any possibility for an external justification.

In §329 of the Philosophical Investigations, Wittgenstein notes:

When I think in language, there aren’t ”meanings” going through my mind in addition to the verbal expressions: the language is itself the vehicle of thought.

It is quite interesting to see that symbols do not own this particular property of strong singularity. Despite that they are a structural part of language they do not share this property. Hence we may conceive it as a remarkable instance of a Deleuzean double articulation [2] in midst thinking itself. There would be lot to say about it, but it also would not fit here.

Further Instances

Language now shares the property of strong singularity with formalization .  We can neither have the idea nor the feeling of formalization without formalization, and we even can not perform formalization without prior higher-order formalization. There is no justification of formalization which could be expressed outside of formalization, hence there is no (foundational) justification for it at all. The parallel is obvious: Would it then be necessary, for instance, to conclude that formalization is embedded in the life form much in the same way as it is the case for language? That mere performance precedes logics? Precisely this could be concluded from the whole of Wittgenstein’s philosophical theory, as Colin Johnston suggested [3].

Per­forma­tive activity precedes any possibility of applying logics in the social world; formulated the other way round, we can say that transcendental logics is getting instantiated into an applicable quasi-logics. Before this background, the idea of truth functions determining a “pure” or ideal truth value is rendered into an importunate misunder­standing. Yet, formali­zation and language are not only similar with regard to this self-referentiality, they are also strictly different. Nevertheless, so the hypothesis we try to strengthen here, formalization resembles language in that we can not have the slightest thought or even any mental operation without formalization. It is even the other way round, in that any mental operation invokes a formalizing step.

Formalization and language are not the only entities, which exhibit self-referentiality and which can not defined by any kind of outside stance. Theory, model and metaphor belong to the family, too, not to forget finally about thinking, hence creativity, at large. A peculiar representative of these terms is the “I”. Close relatives, though not as critical as the former ones, are concepts like causality or information. All these terms are not only self-referential, they are also cross-referential. Discussing any of them automatically involves the others. Many instances of deep confusion derive from the attempt to treat them separately, across many domains from neurosciences, socio­logy, computer sciences and mathematics up to philosophy. Since digital techno­logies are based seriously on formalization and have been developing yet further into a significant deep structure of our contemporary life form, any area where software technology is pervasively used is endangered by the same misunderstandings. One of these areas is architecture and city-planning, or more general, any discipline where language or the social in involved as a target of the investigation.

There is last point to note about self-referentiality. Self-referentiality may likely lead to a situation that we have described as “complexity”. From this perspective, self-referentiality is a basic condition for the potential of novelty. It is thus interesting to see that this potential is directly and natively implanted into some concepts.

Singular Terms

Now we will briefly discuss the concept of “singular term” as it is usually referred to. Yet, there is not a full agreement about this issue of singular terms, in my opinion mainly due to methodological issues. Many proponents of analytical philosophy simply “forget that there are speaking”, in the sense mentioned above.

The analytical perspective

Anyway, according to the received view, names are singular terms. It is said that the reference of singular terms are singular things or objects, even if they are immaterial, like the unicorn. Yet, the complete distinctive list of singular terms would look like this:

  • – proper names (“Barack Obama”);
  • – labeling designation (“U.S. President”);
  • – indexical expressions (“here”, “this dog”).

Such singular terms are distinguished from so-called general terms. Following Tugendhat [4], who refers in turn to Strawson [5], the significance of a general term F consists from the conditions to be fulfilled, such that F matches one or several objects. In other words, the significance of a singular term is given by a rule for identification, while the significance of a general term is given by a rule for classification. As a consequence, singular terms require knowledge about general terms.

Such statements are typical for analytical philosophy.

There are serious problems with it. However, even the labeling is misleading. It is definitely NOT the term that is singular. Singular is at most a particular contextual event, which we decided to address by a name. Labelings and indexical expressions are not necessarily “singular,” and quite frequently the same holds for names. Think about “John Smith” first as a name, then as a person…  This mistake is quite frequent in analytic philosophy. We can trace it even to the philosophy of mathematics [6], when it comes to certain claims of set theory about infinity.

The relevance for the possibility of machine-based episteme

There can be little doubt, as we already have been expressing it elsewhere, that human cognition can’t be separated from language. Even the use of most primitive tools, let alone be the production and distribution of them, requires the capability for at least a precursor of language, some first steps into languagability.

We know by experience that, in our mother tongue, we can understand sentences that we never heard before. Hence, understanding of language (quite likely as any understanding) is bottom-up, not top-down, at least in the beginning of the respective processes. Thus we have to ask about the sub-sentential components of a sentence.

Such components are singular terms. Imagine some perfectly working structure that comprises the capability for arbitrary classification as well as the capability for non-empirical analogical thinking, that is based on a dynamic symmetries. The machine wold not only be able to perform the transition from extensions to intensions, it would even be able to abstract the intension into a system of meta-algebraic symmetry relations. Such a system, or better, the programmer of it then would be faced with the problem of naming and labeling. Somehow the intensions have to be made addressable. A private index does not help, since such an index would be without any value for communication purposes.

The question is how to make the machine referring to the proper names? We will see elsewhere (forthcoming: “Waves, Words, and Images“), that this question will lead us to the necessity of multi-modality in processing linguistic input, e.g. language and images together into the same structure (which is just another reason why to rely on self-organizing maps and our methodology of modeling).

Refutation of the analytical view

The analytical position about singular term does not provide any help or insight into the the particular differential quality of terms as words that denote a concept.2   Analytical statements as cited above are inconsistent, if not self-contradictory. The reason is simple. Words as placeholders for concepts can not have a particular meaning attached to them by principle. The meaning, even that of subsentential components, is an issue of interpretation, and the meaning of a sentence is given not only by its own totality, it is also dependent on the embedding of the sentence itself into the story or the social context, where it is performed.

Since “analytic” singular terms require knowledge about general terms, and the general terms are only determined if the sentence is understood, it is impossible to identify or classify single terms, whether singular or general, before the propositional content of the sentence is clear to the participants. That propositional content of the sentence, however, is, as Robert Brandom in chapter 6 of his [7] convincingly argues, only accessible through their role in the inferential relations between the participants of the talk as well as the relations between sentences. Such we can easily see that the analytical concept of singular terms is empty, if not self-nullifying.

The required understanding of the sentence is missing in the analytical perspective, the object is dominant against the sentence, which is against any real-life experience. Hence, we’d also say that the primacy of interpretation is not fully respected. What we’d need instead is a kind of bootstrapping procedure that works within a social situation of exchanged speech.

Robert Brandom moves this bootstrapping into the social situation itself, which starts with a medial symmetry between language and socialness. There is, coarsely spoken, a rather fixed choreography to accomplish that. First, the participants have to be able to maintain what Brandom calls a de-ontic account. The sequence start with a claim, which includes the assignment of a particular role. This role must be accepted and returned, which is established by signalling that the inference / interpretation will be done. Both the role and the acceptance are dependent on the claim, on the de-ontic status of the participants and on the intended meaning. (now I have summarized about 500 pages of Brandom’s book…, but, as said, it is a very coarse summary!)

Brandom (chp.6) investigates the issue of singular terms. For him, the analytical perspective is not acceptable, since for him, as it the case for us, there is the primacy of interpretation.

Brandom refutes the claim of analytical philosophy that singular names designate single objects. Instead he strives to determine the necessity and the characteristics of singular terms by a scheme that distinguishes particular structural (“syntactical”) and semantic conditions. These conditions are further divergent between the two classes of possible subsentential structures, the singular terms (ST) and predicates (P). While syntactically, ST take the role of substitution-of/substitution-by and P take the structural role of providing a frame for such substitutions, in the semantic perspective ST are characterised exclusively by so called symmetric substitution-inferential commitments (SIC), where P also take asymmetric SIC. Those inferential commitments link the de-ontic, i.e. ultimately socialness of linguistic exchange, to the linguistic domain of the social exchange. We hence may also characterize the whole situation as it is described by Brandom as a cross-medial setting, where the socialness and linguistic domain provide each other mutually a medial embedding.

Interestingly, this simultaneous cross-mediality represents also a “region”, or a context, where materiality (of the participants) and immateriality (of information qua interpretation) overlaps. We find, so to speak, an event-like situation just before the symmetry-break that we ay identify as meaning. To some respect, Brandom’s scheme provides us the pragmatic details of of a Peircean sign situation.

The Peirce-Brandom Test

This has been a very coarse sketch of one aspect of Brandom’s approach. Yet, we have seen that language understanding can not be understood if we neglect the described cross-mediality. We therefore propose to replace the so-called Turing-test by a procedure that we propose to call the Peirce-Brandom Test. That test would proof the capability to take part in semiosis, and the choreography of the interaction scheme would guarantee that references and inferences are indeed performed. In contrast to the Turing-test, the Peirce-Brandom test can’t be “faked”, e.g. by a “Chinese Room.” (Searle [8]) Else, to find out whether the interaction partner is a “machine” or a human we should not ask them anything, since the question as a grammatical form of social interaction corroborates the complexity of the situation. We just should talk to it/her/him.The Searlean homunculus inside the Chinese room would not be able to look up anything anymore. He would have to be able to think in Chinese and as Chinese, q.e.d.

Strongly Singular Terms and the Issue of Virtuality

The result of Brandom’s analysis is that the label of singular terms is somewhat dispensable. These terms may be taken as if they point to a singular object, but there is no necessity for that, since their meaning is not attached to the reference to the object, but to their role in in performing the discourse.

Strongly singular terms are strikingly different from those (“weakly) singular terms. Since they are founding themselves while being practiced through their self-referential structure, it is not possible to find any “incoming” dependencies. They are seemingly isolated on their passive side, there are only outgoing dependencies towards other terms, i.e. other terms are dependent on them. Hence we could call them also “(purely) active terms”.

What we can experience here in a quite immediate manner is pure potentiality, or virtuality (in the Deleuzean sense). Language imports potentiality into material arrangements, which is something that programming languages or any other finite state automaton can’t accomplish. That’s the reason why we all the time heftily deny the reasonability to talk about states when it comes to the brain or the mind.

Now, at this point it is perfectly clear why language can be conceived as ongoing creativity. Without ongoing creativity, the continuous actualization of the virtual, there wouldn’t be anything that would take place, there would not “be” language. For this reason, the term creativity belongs to the small group of strongly singular terms.


In this series of essays about the relation between formalization and creativity we have achieved an important methodological milestone. We have found a consistent structural basis for the terms language, formalization and creativity. The common denominator for all of those is self-referentiality. On the one hand this becomes manifest in the phenomenon of strong singularity, on the other hand this implies an immanent virtuality for certain terms. These terms (language, formalization, model, theory) may well be taken as the “hot spots” not only of the creative power of language, but also of thinking at large.

The aspect of immanent virtuality implicates a highly significant methodological move concerning the starting point for any reasoning about strongly singular terms. Yet, this we will check out in the next chapter.

Part 1: The Formal and the Creative, Introduction

Part 3: A Pragmatic Start for a Beautiful Pair


1.  Wittgenstein repeatedly has been expressing this from different perspectives. In the Philosophical Investigations [1], PI §219, he states: “When I obey the rule, I do not choose. I obey the rule blindly.” In other words, there is usually no reason to give, although one always can think of some reasons. Yet, it is also true that (PI §10) “Rules cannot be made for every possible contingency, but then that isn’t their point anyway.” This leads us to §217: “If I have exhausted the justifications I have reached bedrock, and my spade is turned. Then I am inclined to say: ‘This is simply what I do’.” Rules are are never intended to remove all possible doubt, thus  PI  §485: “Justification by experience comes to an end. If it did not it would not be justification.” Later Quine proofed accordingly from a different perspective what today is known as the indeterminacy of empirical reason (“Word and Object”).

2. There are, of course, other interesting positions, e.g. that elaborated by Wilfrid Sellars [9], who distinguished different kinds of singular terms: abstract singular terms (“triangularity”), and distributive singular terms (“the red”), in addition to standard singular terms. Yet, the problem of which the analytical position is suffering also hits the position of Sellars.

  • References
  • [1] Ludwig Wittgenstein, Philosophical Investigations.
  • [2] Gilles Deleuze, Felix Guattari, Milles Plateaus.
  • [3] Colin Johnston (2009). Tractarian objects and logical categories. Synthese 167: 145-161.
  • [4] Ernst Tugendhat, Traditional and Analytical Philosophy. 1976
  • [5] Strawson 1974
  • [6] Rodych, Victor, “Wittgenstein’s Philosophy of Mathematics”, The Stanford Encyclopedia of Philosophy (Summer 2011 Edition), Edward N. Zalta (ed.),
  • [7] Robert Brandom, Making it Explicit. 1994
  • [8] John Searle (1980). Minds, Brains and Programs. Behav Brain Sci 3 (3), 417–424.
  • [9] Wilfrid Sellars, Science and Metaphysics. Variations on Kantian Themes, Ridgview Publishing Company, Atascadero, California [1967] 1992.


The Formal and the Creative

February 15, 2012 § Leave a comment

If there is such a category as the antipodic at all,

it certainly applies to the pair of the formal and the creative, at least as long as we consult the field of propositions1 that is labeled as the “Western Culture.” As a consequence, in many cultures, and even among mathematicians, these qualities tend to be conceived as completely separated.

We think that this rating is based on a serious myopia, one that is quite common throughout rationalism, especially if that comes as a flavor of idealism. In a small series of essays—it is too much material for a single one—we will investigate the relation between these qualities, or concepts, of the formal and the creative. Today, we just will briefly introduce some markers.

The Basic Context

The relevance of this endeavor is pretty obvious. On the one hand we have the part of creativity. If machine-based episteme implies the necessity to create new models, new hypothesis and new theories we should not only get clear about the necessary mechanisms and the sufficient conditions for its “appearance.” In other chapters we already mentioned complexity and evolutionary processes as the primary, if not the only candidates for such mechanisms. These domains are related to the transition from the material to the immaterial, and surely, as such they are indispensable for any complete theory about creativity. Yet, we also have to take into consideration the space of the symbolic, i.e. of the immaterial, of information and knowledge, which we can’t find in the domains of complexity and evolution, at least not without distorting them too much. There is a significant aspect of creativity that is situated completely in the realm of the symbolic (to which we propose to include diagrams as well). In other words, there is an aspect of creativity that is related to language, to story-telling, understood as weaving (combining) a web of symbols and concepts, that often is associative in its own respect, whether in literature, mathematics, reading and writing, or regarding the DNA.

On the other hand, we have the quality of the formal, or when labelled as a domain of activity, formalization. The domain of the formal is fully within the realm of the symbolic. And of course, the formal is frequently conceived as the cornerstone, if not essence, of mathematics. Before the beginning of the 20ieth century, or around its onset, the formal was almost a synonym to mathematics. At that time, the general movement to more and more abstract structures in mathematics, i.e. things like group theory, or number theory, lead to the enterprise to search for the foundations of mathematics, often epitomized as the Hilbert Program. As a consequence, kind of a “war” broke out, bearing two parties, the intuitionists and the formalists, and the famous foundational crisis started, which is lasting till today. Gödel then proofed that even in mathematics we can not know perfectly. Nevertheless, for most people mathematics is seen as the domain where reason and rationalism is most developed. Yet, despite mathematicians are indeed ingenious (as many other people), mathematics itself is conceived as safe, that is static and non-creative. Mathematics is about symbols under analytic closure. Ideally, there are no “white territories” in mathematics, at least for the members of the formalist party.

The mostly digital machines finally pose a particular problem. The question is whether a deterministic machine, i.e. a machine for which a complete analytic description can exist, is able to develop creativity.

This question has been devised many times in the history of philosophy and thinking, albeit in different forms. Leibniz imagined a mathesis universalis and characteristica universalis as well. In the 20ieth century, Carnap tried to proof the possibility of a formal language that could serve as the ideal language for science [1]. Both failed, Carnap much more disastrously than Leibniz. Leibniz also thought about the transition from the realm of the mechanic to the realm of human thought, by means of his ars combinatoria, which he had imagined to create any possible thought. We definitely will return to Leibniz and his ideas later.

A (summarizing) Glimpse

How will we proceed, and what will we find?

First we will introduce and discuss some the methodological pillars for our reasoning about the (almost “dialectic”) relation between creativity and formalization; among those the most important ones are the following:

  • – the status of “elements” for theorizing;
  • – the concept of dimensions and space;
  • – relations
  • – the domain of comparison;
  • – symmetries as a tool;
  • – virtuality.

Secondly, we will ask about the structure of the terms “formal” and “creative” while they are in use, especially however, we are interested in their foundational status. We will find, that both, formalization and creativity belong to a very particular class of language games. Notably, these terms turn out to be singular terms, that are at the same time not names. They are singular because their foundation as well as the possibility to experience them are self-referential. (ceterum censeo: a result that is not possible if we’d stick to the ontological style by asking “What is creativity…”)

The experience of the concepts associated to them can’t be externalized. We can not talk about language without language, nor can we think “language” without practicing it. Thus, they also can’t be justified by external references, which is a quite remarkable property.

In the end we hopefully will have made clear that creativity in the symbolic space is not achievable without formalization. They are even co-generative.

Introductory Remarks

Let us start with creativity. Creativity has always been considered as something fateful. Until the beginnings of psychology as a science by William James, smart people have been smart by the working of fate, or some gods. Famous, and for centuries unchallenged, the passage in Plato’s Theaitetos [2], where Sokrates explains his role in maieutics by mentioning that the creation of novel things is task of the gods. The genius as well as concept of intuition could be regarded just a rather close relatives of that. Only since the 1950ies, probably not by pure chance, people started to recognize creativity as a subject in its own right [3]. Yet, somehow it is not really satisfying to explain creativity by calling it “divergent” or “lateral” thinking [4]. Nothing is going to be explained by replacing one term by another. Nowadays, and mostly in the domain of design research, conditions for creativity are often understood in terms of collaborations. People even resort to infamous swarm intelligence, which is near a declaration of bankruptcy.

Any of these approaches are just replacing some terms with some other terms, trying to conjure some improvement in understanding. Most of the “explanations” indeed look rather like rain dancing than a valuable analysis. Recently a large philosophical congress in Berlin, with more than 1200 inscribed participants, and two books comprising around 2000 pages focused on the subject largely in the same vein and without much results [5]. We are definitely neither interested in any kind of metaphysical base-jumping by referring directly or indirectly to intuition, and the accompanying angels in the background, nor in phenomenological, sociological or superficial psychological approaches, tying to get support by some funny anecdotes.

The question really is, what are we talking about, and how, when referring to the concept of creativity. Only because this question is neither posed nor answered, we are finding so much esoterics around this topic. Creativity surely exceeds problem solving, although sometimes it occurs righteous when solving a problem. It may be observed in calm story telling, in cataclysmic performances of artists, or in language.

Actually, our impression is that creativity is nothing that sometimes “is there”, and sometimes not. In language it is present all the time, much like it is the case for analogical thinking. The question is which of those phenomena we call “creative,” coarsely spoken, which degree of intensity regarding novelty and usefulness of that novelty we allow to get assigned a particular saliency. Somehow, constraints seem to play an important role, as well as the capability to release it, or apply it, at will. Then, however, creativity must be a technique, or at least based on tools which we could learn to use. It is, however, pretty clear that we have to distinguish between the assignment of the saliency (“this or that person has been creative”) and the phenomenon and its underlying mechanisms. The assignment of the term is introducing a discreteness that is not present on the level of the mechanism, hence we never will understand about what we are talking about if we take just the parlance as the source and the measure.

The phenomenon of language provides a nice bridge to the realm of the formal. Today, probably mainly due to the influence of computer sciences, natural languages are distinguished from artificial languages, which often are also called formal languages. It is widely accepted, that formalization either is based on formal languages or that the former creates an instance of the latter. The concept of formal language is important in mathematics, computer science and science at large. Instantiated as programming languages, formal languages are of an enormous relevance for human society; one could even say that these languages themselves establish some kind of a media.

Yet, the labeling of the discerned phenomena as “natural” and “formal” always strikes me. It is remarkable that human languages are so often also called “natural” languages. Somehow, human language appears so outstanding to humans that they call their language in a funny paradoxical move a “natural” thing, as if this language-thing would have been originated outside human culture. Today, as we know about many instances of cultural phenomena in animals, the strong dichotomy between culture and nature blurred considerably. A particular driver of this is provided by the spreading insight that we as humans are also animals: our bodies contain a brain. Thus, we and our culture also build upon this amazing morphological structure, continuously so. We as humans are just the embodiment of the dichotomy between nature and culture, and nothing could express the confusion about this issue more than the notion of “natural language.” A bit shamefaced we call the expressions of whales and dolphins “singing”, despite we know that they communicate rather complicated matters. We are just unable to understand anything of it. The main reason for that presumable being that we do not share anything regarding their Lebensform, and other references than the Lebensform are not relevant for languages.

Language, whether natural or formal, is supposed to be used to express things. Already here we now have been arriving in deep troubles as the previous sentence is anything than innocent. First, speaking about things is not a trivial affair. A thing is a difficult thing. Taking etymology into consideration, we see that things are the results of negotiations. As a “result,” in turn, “things” are reductions, albeit in the realm of the abstract. The next difficulty is invoked by the idea that we can “express” things in a formal language. There has been a large debate on the expressive capabilities of formal languages, mainly induced by Carnap [1], and carried further by Quine [6], Sneed [7], Stegmüller [8], Spohn [9], and Moulines [10], among others, up to today.

In our opinion, the claim of the expressibility of formal language, and hence the proposed usage of formal languages as a way to express scientific models and theories, is based on probably more than just one deep and drastic misunderstanding. We will elucidate this throughout this series; other arguments has been devised for instance by Putnam in his small treatise about the “meaning of meaning” [11], where he famously argued that “analyticity is an inexplicable noise” without any possibility for a meaningful usage. That’s also a first hint that analyticity is not about the the same thing(s) as formalization.

Robert Brandom puts the act of expressing within social contexts into the center of his philosophy [12], constructing a well-differentiated perspective upon the relation between principles in the usage of language and its structure. Following Brandom, we could say that formal language can not be expressive almost by its own definition: the mutual social act of requesting an interpretation is missing there, as well as any propositional content. If there is no propositional content, nothing could be expressed. Yet, propositional content comes into existence only by a series of events where the interactees in a social situation ascribe it mutually to each other and are also willing to accept that assignment.

Formal languages consist of exactly defined labeled sets, where each set and its label represents a rewriting rule. In other words, formal languages are finite state machines; they are always expressible as a compiler for a programming language. Programming languages organize the arrangement of rewriting rules, they are however not an entity capable for semantics. We could easily conclude that formal languages are not languages at all.

A last remark about formalization as a technique. Formalization is undeniably based on the use, or better, the assignment of symbols to particular, deliberately chosen contexts, actions, recipes or processes. Think of proofs of certain results in mathematics where the symbolized idea later refers to the idea and its proof. Such, they may act as kind of abbreviations, or they will denote abstractions. They also may support the visibility of the core of otherwise lengthy reasonings. Sometimes, as for instance in mathematics, formalization requires several components, e.g. the item or subject, the accompanying operators or transformations (take that as “usage”), and the reference to some axiomatics or a explicit description of the conditions and the affected items. The same style is applied in physics. Yet, this complete structure is not necessary for an action to count as a formalization. We propose to conceive of formalization as the selection of elements (will be introduced soon) that consecutively are symbolized. Actually, it is not necessary to write down a “formula” about something in order to formalize that something. It is also not necessary, so we are convinced, to apply a particular logic when establishing the formalization through abstraction. It is just the symbolic compression that allows to achieve further results which would remain inaccessible otherwise. Or briefly put, to give a complicated thing a symbolic form that lives within a system of other forms.

Finally, there is just one thing we always should keep in mind. Using, introducing or referring to a formalization irrevocably implies an instantiation when we are going to apply it, to bring it back to more “concrete” contexts. Thus, formalization is deeply linked to the Deleuzean figure of thought of the “Differential.” [13]

part 2: Strong Singularity of certain Terms

Part 3: A Pragmatic Start for a Beautiful Pair

Part 4: Elementarization and Expressibility


1. Here we refer to Michel Foucault’s concept of the “field of propositions” / “field of proposals”, which he developed in the book “The Archaeology of Knowledge.”

  • [1] Rudolf Carnap, Logische Syntax der Sprache, Wien 1934 [2. Aufl. 1968].
  • [2] Platon, Theaitetos.
  • [3] Guilford,  Creativity , 1950
  • [4] DeBono about lateral thinking
  • [5] Günther Abel (ed.), Kreativität. Kolloquiumsband XX. Kongress der Deutschen Philosophie. Meiner Verlag, Hamburg 2007.
  • [6] Quine, Two Dogmas of Empiricism.
  • [7] Sneed
  • [8] Stegmüller
  • [9] Spohn
  • [10] Moulines
  • [11] Hilary Putnam, The Meaning of Meaning
  • [12] Robert Brandom, Making it Explicit.
  • [13] Gilles Deleuze, Difference and Repetition.

Beyond Containing: Associative Storage and Memory

February 14, 2012 § Leave a comment

Memory, our memory, is a wonderful thing. Most of the time.

Yet, it also can trap you, sometimes terribly, if you use it in inappropriate ways.

Think about the problematics of being a witness. As long as you don’t try to remember exactly you know precisely. As soon as you start to try to achieve perfect recall, everything starts to become fluid, first, then fuzzy and increasingly blurry. As if there would be some kind of uncertainty principle, similar to Heisenberg’s [1]. There are other tricks, such as asking a person the same question over and over again. Any degree of security, hence knowledge, will vanish. In the other direction, everybody knows about the experience that a tiny little smell or sound triggers a whole story in memory, and often one that have not been cared about for a long time.

The main strengths of memory—extensibility, adaptivity, contextuality and flexibility—could be considered also as its main weakness, if we expect perfect reproducibility for results of “queries”. Yet, memory is not a data base. There are neither symbols, nor indexes, and at the deeper levels of its mechanisms, also no signs. There is no particular neuron that would “contain” information as a file on a computer can be regarded able to provide.

Databases are, of course, extremely useful, precisely because they can’t do in other ways as to reproduce answers perfectly. That’s how they are designed and constructed. And precisely for the same reason we may state that databases are dead entities, like crystals.

The reproducibility provided by databases expels time. We can write something into a database, stop everything, and continue precisely at the same point. Databases do not own their own time. Hence, they are purely physical entities. As a consequence, databases do not/can not think. They can’t bring or put things together, they do not associate, superpose, or mix. Everything is under the control of an external entity. A database does not learn when the amount of bits stored inside it increases. We also have to be very clear about the fact that a database does not interpret anything. All this should not be understood as a criticism, of course, these properties are intended by design.

The first important consequence about this is that any system relying just on the principles of a database also will inherit these properties. This raises the question about the necessary and sufficient conditions for the foundations of  “storage” devices that allow for learning and informational adaptivity.

As a first step one could argue that artificial systems capable for learning, for instance self-organizing maps, or any other “learning algorithm”, may consist of a database and a processor. This would represent the bare bones of the classic von Neumann architecture.

The essence of this architecture is, again, reproducibility as a design intention. The processor is basically empty. As long as the database is not part of a self-referential arrangement, there won’t be something like a morphological change.

Learning without change of structure is not learning but only changing the value of structural parameters that have been defined apriori (at implementation time). The crucial step however would be to introduce those parameters at all. We will return to this point at a later stage of our discussion, when it comes to describe the processing capabilities of self-organizing maps.1

Of course, the boundaries are not well defined here. We may implement a system in a very abstract manner such that a change in the value of such highly abstract parameters indeed involves deep structural changes. In the end, almost everything can be expressed by some parameters and their values. That’s nothing else than the principle of the Deleuzean differential.

What we want to emphasize here is just the issue that (1) morphological changes are necessary in order to establish learning, and (2) these changes should be established in response to the environment (and the information flowing from there into the system). These two condition together establish a third one, namely that (3) a historical contingency is established that acts as a constraint on the further potential changes and responses of the system. The system acquires individuality. Individuality and learning are co-extensive. Quite obviously, such a system is not a von Neumann device any longer, even if it still runs on a such a linear machine.

Our claim here is that the “learning” requires a particular perspective on the concept of “data” and its “storage.” And, correspondingly, without the changed concept about the relation between data and storage, the emergence of machine-based episteme will not be achievable.

Let us just contrast the two ends of our space.

  • (1) At the logical end we have the von Neumann architecture, characterized by empty processors, perfect reproducibility on an atomic level, the “bit”; there is no morphological change; only estimation of predefined parameters can be achieved.
  • (2) The opposite end is made from historically contingent structures for perception, transformation and association, where the morphology changes due to the interaction with the perceived information2; we will observe emergence of individuality; morphological structures are always just relative to the experienced influences; learning occurs and is structural learning.

With regard to a system that is able to learn, one possible conclusion from that would be to drop the distinction between storage of encoded information and the treatment of that  encodings. Perhaps, it is the only viable conclusion to this end.

In the rest of this chapter we will demonstrate how the separation between data and their transformation can be overcome on the basis of self-organizing maps. Such a device we call “associative storage”. We also will find a particular relation between such an associative storage and modeling3. Notably, both tasks can be accomplished by self-organizing maps.


When taking the perspective from the side of usage there is still another large contrasting difference between databases and associative storage (“memories”). In case of a database, the purpose of a storage event is known at the time of performing the storing operation. In case of memories and associative storage this purpose is not known, and often can’t be reasonably expected to be knowable by principle.

From that we can derive a quite important consequence. In order to build a memory, we have to avoid storing the items “as such,” as it is the case for databases. We may call this the (naive) representational approach. Philosophically, the stored items do not have any structure inside the storage device, neither an inner structure, nor an outer one. Any item appears as a primitive qualia.

The contrast to the process in an associative storage is indeed a strong one. Here, it is simply forbidden to store items in an isolated manner, without relation to other items, as an engram, an encoded and reversibly decodable series of bits. Since a database works perfectly reversible and reproducible, we can encode the graphem of a word into a series of bits and later decode that series back into a graphem again, which in turn we as humans (with memory inside the skull) can interpret as words. Strictly taken, we do NOT use the database to store words.

More concretely, what we have to do with the items comprises two independent steps:

  • (1) Items have to be stored as context.
  • (2) Items have to be stored as probabilized items.

The second part of our re-organized approach to storage is a consequence of the impossibility to know about future uses of a stored item. Taken inversely, using a database for storage always and strictly implies that the storage agent claims to know perfectly about future uses. It is precisely this implication that renders long-lasting storage projects so problematic, if not impossible.

In other words, and even more concise, we may say that in order to build a dynamic and extensible memory we have to store items in a particular form.

Memory is built on the basis of a population of probabilistic contexts in and by an associative structure.

The Two-Layer SOM

In a highly interesting prototypical model project (codename “WEBSOM”) Kaski (a collaborator of Kohonen) introduced a particular SOM architecture that serves the requirements as described above [2]. Yet, Kohonen (and all of his colleagues alike) did not recognize so far the actual status of that architecture. We already mentioned this point in the chapter about some improvements of the SOM design; Kohonen fails to discern modeling from sorting, when he uses the associative storage as a modeling device. Yet, modeling requires a purpose, operationalized into one or more target criteria. Hence, an associative storage device like the two-layer SOM can be conceived as a pre-specific model only.

Nevertheless, this SOM architecture is not only highly remarkable, but we also can easily extend it appropriately; thus it is indeed so important, at least as a starting point, that we describe it briefly here.

Context and Basic Idea

The context for which the two-layer SOM (TL-SOM) has been created is document retrieval by classification of texts. From the perspective of classification,texts are highly complex entities. This complexity of texts derives from the following properties:

  • – there are different levels of context;
  • – there are rich organizational constraints, e.g. grammars
  • – there is a large corpus of words;
  • – there is a large number of relations that not only form a network, but which also change dynamically in the course of interpretation.

Taken together, these properties turn texts into ill-defined or even undefinable entities, for which it is not possible to provide a structural description, e.g. as a set of features, and particularly not in advance to the analysis. Briefly, texts are unstructured data. It is clear, that especially non-contextual methods like the infamous n-grams are deeply inappropriate for the description, and hence also for the modeling of texts. The peculiarity of texts has been recognized long before the age of computers. Around 1830 Friedrich Schleiermacher founded the discipline of hermeneutics as a response to the complexity of texts. In the last decades of the 20ieth century, it was Jacques Derrida who brought in a new perspective on it. in Deleuzean terms, texts are always and inevitably deterritorialized to a significant portion. Kaski & coworkers addressed only a modest part of these vast problematics, the classification of texts.

The starting point they took by was to preserve context. The large variety of contexts makes it impossible to take any kind of raw data directly as input for the SOM. That means that the contexts had to be encoded in a proper manner. The trick is to use a SOM for this encoding (details in next section below). This SOM represents the first layer. The subject of this SOM are the contexts of words (definition below). The “state” of this first SOM is then used to create the input for the SOM on the second layer, which then addresses the texts. In this way, the size of the input vectors are standardized and reduced in size.

Elements of a Two-Layer SOM

The elements, or building blocks, of a TL-SOM devised for the classification of texts are

  • (1) random contexts,
  • (2) the map of categories (word classes)
  • (3) the map of texts

The Random Context

A random context encodes the context of any of the words in a text. let us assume for the sake of simplicity that the context is bilateral symmetric according to 2n+1, i.e. for example with n=3 the length of the context is 7, where the focused word (“structure”) is at pos 3 (when counting starts with 0).

Let us resort to the following example, that take just two snippets from this text. The numbers represent some arbitrary enumeration of the relative positions of the words.

sequence A of words rel. positions in text “… without change of structureis not learning …”53        54    55    56       57 58     59
sequence B of words rel. positions in text “… not have any structureinside the storage …”19    20  21       22         23    24     25

The position numbers we just need for calculating the positional distance between words. The interesting word here is “structure”.

For the next step you have to think about the words listed in a catalog of indexes, that is as a set whose order is arbitrary but fixed. In this way, any of the words gets its unique numerical fingerprint.

Index Word Random Vector
 …  …
1264  structure 0.270    0.938    0.417    0.299    0.991 …
1265  learning 0.330    0.990    0.827    0.828    0.445 …
 1266  Alabama 0.375    0.725    0.435    0.025    0.915 …
 1267  without 0.422    0.072    0.282    0.157    0.155 …
 1268  storage 0.237    0.345    0.023    0.777    0.569 …
 1269  not 0.706    0.881    0.603    0.673    0.473 …
 1270  change 0.170    0.247    0.734    0.383    0.905 …
 1271  have 0.735    0.472    0.661    0.539    0.275 …
 1272  inside 0.230    0.772    0.973    0.242    0.224 …
 1273  any 0.509    0.445    0.531    0.216    0.105 …
 1274  of 0.834    0.502    0.481    0.971    0.711 …
1274  is 0.935    0.967    0.549    0.572    0.001 …

Any of the words of a text can now be replaced by an apriori determined vector of random values from [0..1]; the dimensionality of those random vectors should be around  80 in order to approximate orthogonality among all those vectors. Just to be clear: these random vectors are taken from a fixed codebook, a catalog as sketched above, where each word is assigned to exactly one such vector.

Once we have performed this replacement, we can calculate the averaged vectors per relative position of the context. In case of the example above, we would calculate the reference vector for position n=0 as the average from the vectors encoding the words “without” and “not”.

Let us be more explicit. For example sentence A we translate first into the positional number, interpret this positional number as a column header, and fill the column with the values of its respective fingerprint. For the 7 positions (-3, +3) we get 7 columns:

sequence A of words “… without change of structure is not learning …”
rel. positions in text        53        54    55    56       57 58     59
 grouped around “structure”         -3       -2    -1       0       1    2     3
random fingerprints
per position
0.422  0.170  0.834  0.270  0.935  0.706  0.330
0.072  0.247  0.502  0.938  0.967  0.881  0.990
0.282  0.734  0.481  0.417  0.549  0.603  0.827

…further entries of the fingerprints…

The same we have to do for the second sequence B. Now we have to tables of fingerprints, both comprising 7 columns and N rows, where N is the length of the fingerprint. From these two tables we calculate the average value and put it into a new table (which is of course also of dimensions 7xN). Such, the example above yields 7 such averaged reference vectors. If we have a dimensionality of 80 for the random vectors we end up with a matrix of [r,c] = [80,7].

In a final step we concatenate the columns into a single vector, yielding a vector of 7×80=560 variables. This might appear as a large vector. Yet, it is much smaller than the whole corpus of words in a text. Additionally, such vectors can be compressed by the technique of random projection (math. foundations by [3], first proposed for data analysis by [4], utilized for SOMs later by [5] and [6]), which today is quite popular in data analysis. Random projection works by matrix multiplication. Our vector (1R x  560C) gets multiplied with a matrix M(r) of 560R x 100C, yielding a vector of 1R x 100C. The matrix M(r) also consists of flat random values. This technique is very interesting, because no relevant information is lost, but the vector gets shortened considerable. Of course, in an absolute sense there is a loss of information. Yet, the SOM only needs the information which is important to distinguish the observations.

This technique of transferring a sequence made from items encoded on an symbolic level into a vector that is based on random context can be applied to any symbolic sequence of course.

For instance, it would be a drastic case of reductionism to conceive of the path taken by humans in an urban environment just as a sequence locations. Humans are symbolic beings and the urban environment is full of symbols to which we respond. Yet, for the population-oriented perspective any individual path is just a possible path. Naturally, we interpret it as a random path. The path taken through a city needs to be described both by location and symbol.

The advantage of the SOM is that the random vectors that encode the symbolic aspect can be combined seamlessly with any other kind of information, e.g. the locational coordinates. That’s the property of the multi-modality. Which particular combination of “properties” then is suitable to classify the paths for a given question then is subject for “standard” extended modeling as described inthe chapter Technical Aspects of Modeling.

The Map of Categories (Word Classes)

From these random context vectors we can now build a SOM. Similar contexts will arrange in adjacent regions.

A particular text now can be described by its differential abundance across that SOM. Remember that we have sent the random contexts of many texts (or text snippets) to the SOM. To achieve such a description a (relative) frequency histogram is calculated, which has as much classes as the SOM node count is. The values of the histogram is the relative frequency (“probability”) for the presence of a particular text in comparison to all other texts.

Any particular text is now described by a fingerprint, that contains highly relevant information about

  • – the context of all words as a probability measure;
  • – the relative topological density of similar contextual embeddings;
  • – the particularity of texts across all contextual descriptions, again as a probability measure;

Those fingerprints represent texts and they are ready-mades for the final step, “learning” the classes by the SOM on the second layer in order to identify groups of “similar” texts.

It is clear, that this basic variant of a Two-Layer SOM procedure can be improved in multiple ways. Yet, the idea should be clear. Some of those improvements are

  • – to use a fully developed concept of context, e.g. this one, instead of a constant length context and a context without inner structure;
  • – evaluating not just the histogram as a foundation of the fingerprint of a text, but also the sequence of nodes according to the sequence of contexts; that sequence can be processed using a Markov-process method, such as HMM, Conditional Random Fields, or, in a self-similar approach, by applying the method of random contexts to the sequence of nodes;
  • – reflecting at least parts of the “syntactical” structure of the text, such as sentences, paragraphs, and sections, as well as the grammatical role of words;
  • – enriching the information about “words” by representing them not only in their observed form, but also as their close synonyms, or stuffed with the information about pointers to semantically related words as this can be taken from labeled corpuses.

We want to briefly return to the first layer. Just imagine not to measure the histogram, but instead to follow the indices of the contexts across the developed map by your fingertips. A particular path, or virtual movement appears. I think that it is crucial to reflect this virtual movement in the input data for the second layer.

The reward could be significant, indeed. It offers nothing less than a model for conceptual slippage, a term which has been emphasized by Douglas Hofstadter throughout his research on analogical and creative thinking. Note that in our modified TL-SOM this capacity is not an “extra function” that had to be programmed. It is deeply built “into” the system, or in other words, it makes up its character. Besides Hofstadter’s proposal which is based on a completely different approach, and for a different task, we do not know of any other system that would be able for that. We even may expect that the efficient production of metaphors can be achieved by it, which is not an insignificant goal, since all the practiced language is always metaphoric.

Associative Storage

We already mentioned that the method of TL-SOM extracts important pieces of information about a text and represents it as a probabilistic measure. The SOM does not contain the whole piece of text as single entity, or a series of otherwise unconnected entities, the words. The SOM breaks the text up into overlapping pieces, or better, into overlapping probabilistic descriptions of such pieces.

It would be a serious misunderstanding to perceive this splitting into pieces as a drawback or failure. It is the mandatory prerequisite for building an associative storage.

Any further target oriented modeling would refer to the two layers of a TL-SOM, but never to the raw input text.Such it can work reasonable fast for a whole range of different tasks. One of those tasks that can be solved by a combination of associative storage and true (targeted) modeling is to find an optimized model for a given text, or any text snippet, including the identification of the discriminating features.  We also can turn the perspective around, addressing the query to the SOM about an alternative formulation in a given context…

From Associative Storage towards Memory

Despite its power and its potential as associative storage, the Two-Layer SOM still can’t be conceived as a memory device. The associative storage just takes the probabilistically described contexts and sorts it topologically into the map. In order to establish “memory” further components are required that provides the goal orientation.

Within the world of self-organizing maps, simple (!) memories are easy to establish. We just have to combine a SOM that acts as associative storage with a SOM for targeted modeling. The peculiar distinctive feature of that second SOM for modeling is that it does not work on external data, but on “data” as it is available in and as the SOM that acts as associative storage.

We may establish a vivid memory in its full meaning if we establish two further components: (1) targeted modeling via the SOM principle, (2) a repository about the targeted models that have been built from (or using) the associative storage, and (3) at least a partial operationalization of a self-reflective mechanism, i.e. a modeling process that is going to model the working of the TL-SOM. Since in our framework the basic SOM module is able to grow and to differentiate, there is no principle limitation of/for such a system any more, concerning its capability to build concepts, models, and (logical) habits for navigating between them. Later, we will call the “space” where this navigation takes place “choreosteme“: Drawing figures into the open space of epistemic conditionability.

From such a memory we may expect dramatic progress concerning the “intelligence” of machines. The only questionable thing is whether we should call such an entity still a machine. I guess, there is neither a word nor a concept for it.

u .


1. Self-organizing maps have some amazing properties on the level of their interpretation, which they share especially with the Markov models. As such, the SOM and Markov models are outstanding. Both, the SOM as well as the Markov model can be conceived as devices that can be used to turn programming statements, i.e. all the IF-THEN-ELSE statements occurring in a program as DATA. Even logic itself, or more precisely, any quasi-logic, is getting transformed into data.SOM and Markov models are double-articulated (a Deleuzean notion) into logic on the one side and the empiric on the other.

In order to achieve such, a full write access is necessary to the extensional as well as the intensional layer of a model. Hence, artificial neuronal networks (nor, of course, statistical methods like PCA) can’t be used to achieve the same effect.

2. It is quite important not to forget that (in our framework) information is nothing that “is out there.” If we follow the primacy of interpretation, for which there are good reasons, we also have to acknowledge that information is not a substantial entity that could be stored or processed. Information is nothing else than the actual characteristics of the process of interpretation. These characteristics can’t be detached from the underlying process, because this process is represented by the whole system.

3. Keep in mind that we only can talk about modeling in a reasonable manner if there is an operationalization of the purpose, i.e. if we perform target oriented modeling.

  • [1] Werner Heisenberg. Uncertainty Principle.
  • [2] Samuel Kaski, Timo Honkela, Krista Lagus, Teuvo Kohonen (1998). WEBSOM – Self-organizing maps of document collections. Neurocomputing 21 (1998) 101-117.
  • [3] W.B. Johnson and J. Lindenstrauss. Extensions of Lipshitz mapping into Hilbert space. In Conference in modern analysis and probability, volume 26 of Contemporary Mathematics, pages 189–206. Amer. Math. Soc., 1984.
  • [4] R. Hecht-Nielsen. Context vectors: general purpose approximate meaning representations self-organized from raw data. In J.M. Zurada, R.J. Marks II, and C.J. Robinson, editors, Computational Intelligence: Imitating Life, pages 43–56. IEEE Press, 1994.
  • [5] Papadimitriou, C. H., Raghavan, P., Tamaki, H., & Vempala, S. (1998). Latent semantic indexing: A probabilistic analysis. Proceedings of the Seventeenth ACM Symposium on the Principles of Database Systems (pp. 159-168). ACM press.
  • [6] Bingham, E., & Mannila, H. (2001). Random projection in dimensionality reduction: Applications to image and text data. Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 245-250). ACM Press.


A Model for Analogical Thinking

February 13, 2012 § Leave a comment

Analogy is an ill-defined term, and quite naturally so.

If something is said to be analog to something else, one supposes not only just a distant resemblance. Such a similarity could be described indeed by a similarity mapping, based on a well identified function. Yet, analogy is more than that, much more than just similarity.

Analogical thinking is significantly different from determining similarity, or by “selecting” a similar thought. Actually, it is not a selection at all, and it is also not based on modeling. Despite it is based on experience, it is not based on identifiable models. We may even conclude that analogical thinking is (1) non-empiric, and (2) based on a constructive use of theories. In other words, and derived from our theory of theory, an analogy is itself a freshly derived model! Yet, that model is of a  particular kind, since it does not contain any data. In other words, it looks like the basic definition to be used as the primer for surrogating simulated data, which in turn can be used to create SOM-based expectancy. We may also say it in simple terms: analogical thinking is about producing an ongoing stream of ideas. Folding, inventing.

In 2006, on the occasion of the yearly Presidential Lecture at Stanford, Douglas Hofstadter gave some remarkable statements in an interview, which I’d like to quote here, since they express some issues we have argued for throughout our writings.

I knew some people wouldn’t like what I was going to say, since it was diametrically opposed to the standard cognitive-science party line that analogy is simply a part of “reasoning” in the service of some kind of “problem solving” (makes me think of doing problem sets in a physics course). For some bizarre reason, don’t ask me why, people in cognitive science only think of analogies in connection with what they call “analogical reasoning” — they have to elevate it, to connect it with fancy things like logic and reasoning and truth. They don’t seem to see that analogy is dirt-cheap, that it has nothing to do with logical thinking, that it simply pervades every tiny nook and cranny of cognition, it shapes our every thinking moment. Not seeing that is like fish not perceiving water, I guess.


The point is that thinking amounts to putting one’s finger on the essence of situations one is faced with, which amounts to categorization in a very deep and general sense, and the point is that in order to categorize, one has to compare situations “out there” with things already in one’s head, and such comparisons are analogies. Thus at the root of all acts of thought, every last one of them, is the making of analogies. Cognitive scientists talk a lot about categories, but unfortunately the categories that they study are far too limited. For them, categories are essentially always nouns, and not only that, they are objects that we can see, like “tree” or “car.” You get the impression that they think that categorization is very simple, no more complicated than looking at an object and identifying the standard “features”

The most salient issues here are, in preliminary terms for the time being

  • (1) Whenever thinking happens, this thinking is performed as “making analogy”; there are no “precise = perfectly defined” items in the brain or the mind.
  • (2) Thinking can not be equated with problem-solving and logic;
  • (3) Comparison and categorization are the most basic operations, while both of these take place in a fluid, open manner.

It is due to a fallacy to think that there is analogical reasoning and some other “kinds” (but the journals and libraries are filled with this kind of shortcomings, e.g. [1,2,3]), or vice versa, that there is logical reasoning and something other, such as analogical reasoning. Thinking so would mean to put logic into the world (of neurons, in this case). Yet, we know that logic that is free from interpretive parts can’t be part of the real world. We always and inevitably can deal just and only with quasi-logic, an semantically contaminated instance of transcendental logic. The issue is not just one about wording here. It is the self-referentiality that always is present in multiple respects when dealing with cognitive capacities that enforces us to be not to lazy regarding the wording. Dropping the claim posited by the term “analogical reasoning” we quickly arrive at the insight that all thinking is analogical, or even, that “making analogies” is the label for the visible parts of the phenomenon that we call thinking.

The next issue mentioned by Hofstadter is about categories. Categories in the mind are NOT about the objects, they are even not about any external reference. It is a major and widely spread misunderstanding, according to Hofstadter, among cognitive scientists. It is clear that such referentialism, call it materialism, or naive realism, is conceptually primitive and inappropriate. We also could refer to Charles Peirce, the great American philosopher, who repeatedly stated (and also as the first one) that signs always refer only to signs. Yet, signs are not objects, of course. Similarly, in §198 of the Philosophical Investigations Wittgenstein notes

[…] any interpretation still hangs in the air along with what it interprets, and cannot give it any support. The interpretations alone do not determine the meaning.

The materiality of road sign should not be taken as its meaning, the semiotic sign associated with it is only partially to be found in the matter standing there near the road. The only thing that ties “us” (i.e. our thinking) to the world is modeling, whether this world is taken as the external or the internal one, it is the creation of tools (the models) for anticipation given the expectation of weak repeatability. That means, we have to belief and to trust in the first instance, which yet can not really be taken as “ties”.

The kind of modeling we have in mind is neither covered by model theory, nor by the constructivist framework, nor by the empiricist account of it. Actually, despite we can at least go on believing into an ontology of models ( “a model is…”) as long as we can play the blind man’s buff, that modeling can nevertheless not be separated from entities like concepts, code, mediality or virtuality. We firmly believe in the impossibility of reductionism when it comes to human affairs. And, I guess, we could agree upon the proposal that thinking is such a human affair, even if we are going to “implement” it into a “machine.” In the end, we anyway just strive for understanding ourselves.

It is thus extremely important to investigate the issue of analogy in an appropriate model system. The only feasible model that is known up to these days is that which has been published by Douglas Hofstadter and Melanie Mitchell.

They describe their results on a non-technical level in the wonderful book “Fluid Concepts and Creative Analogies“, while Mitchell focuses on more technical aspects and the computer experiments in “Analogy-Making as Perception: A Computer Model

Though we do not completely agree on their theoretical foundation, particularly the role they ascribe the concept of perception, their approach is nevertheless brilliant.  Hofstadter discusses in a very detailed manner why other approaches are either fake of failure (see also [4]), while Mitchell provides a detailed account about the model system, which they call “CopyCat”.


CopyCat is the last and most advanced variant of a series of similar programs. It deals with a funny example from the letter domain. The task that the program should solve is the following:

Given the transformation of a short sequence of letters, say “abc”, into a similar one, say “abd”, what is the result of applying the very “same” transformation to a different sequence, say “ijk”?

Most people will answer “ijl”. The rule seems to be “obvious”. Yet, there are several solutions, though there is a strong propensity to “ijl”. One of the other solutions would be “ijd”. However, this solution is “intellectually” not appealing, for humans notably…

Astonishingly, CopyCat reproduces the probabilistic distribution of solution provided by a population of humans.

The following extracts show further examples.

Now the important question: How does CopyCat work?

First, the solutions are indeed produced, there is no stored collection of solutions. Thus, CopyCat derives its proposals not from experience that could be expressed by empiric models.

Second, the solutions are created by a multi-layered, multi-component random process. It reminds in some respect to the behavior of a developed ant state, where different functional roles are performed by different sub-populations. In other words, it is more than just swarming behavior, there is division of labor among different populations of agents.

The most important third component is a structure that represents a network of symmetry relations, i.e. a set of symmetry relations are arranged as a graph, where the weights of the relations are dynamically adapted to the given task.

Based on these architectural principals, Copycat produces—and it is indeed a creative production— its answers.


Of course, Copycat is a model system. The greatest challenge is to establish the “Platonic” sphere (Hofstadter’s term) that comprises the dynamical relational system of symmetry relations. In a real system, this sphere of relations has to be fed by other parts of system, most likely the modeling sub-system. This sub-system has to be able to extract abstract relations from data, which then could be assembled into the “Platonic” device. All of those functional parts can be covered or served by self-organizing maps, and all of them are planned. The theory of these parts you may find scattered throughout this blog.

Copycat has been created as a piece of software. Years ago, it had been publicly available from the ftp site at the Illinois University of Urbana-Champaign. Then it vanished. Fortunately I was able to grab it before.

Since I rate this piece as one of the most important contributions to machine-based episteme, I created a mirror for downloading it from Google code. But be aware, so far, you need to be a programmer to run it, it needs a development environment for the Java programming language. You can check it out from the source repository behind the link given above.  In the near future I will provide a version that runs more easily as a standalone program.

  • [1] David E Rumelhart, Adele A Abrahamson (1972), “A model for analogical reasoning,”
    Cognitive Psychology  5(1) 1-28. doi
  • [2] John F. Sowa and Arun K. Majumdar (2003). “Analogical Reasoning,” in: A. Aldo, W. Lex, & B. Ganter, eds. (2003) Conceptual Structures for Knowledge Creation and Communication, LNAI 2746, Springer-Verlag, pp. 16-36. Proc Intl Conf Conceptual Structures, Dresden, July 2003.
  • [3] Morrison RG, Krawczyk DC, Holyoak KJ, Hummel JE, Chow TW, Miller BL, Knowlton BJ. (2004). A neurocomputational model of analogical reasoning and its breakdown in frontotemporal lobar degeneration. J Cogn Neurosci. 16(2), 260-71.
  • [4] Chalmers, D. J., R. M. French, & D. R. Hofstadter (1992). High-level perception, representation, and analogy: A critique of artificial intelligence methodology. J Exp & Theor Artificial Intelligence 4, 185-211.


Theory (of Theory)

February 13, 2012 § Leave a comment

Thought is always abstract thought,

so thought is always opposed to work involving hands. Isn’t it? It is generally agreed that there are things like theory and practice, which are believed to belong to different realms. Well, we think that this perspective is inappropriate and misleading. Deeply linked to this first problem is a second one, the distinction between model and theory. Indeed, there are ongoing discussions in current philosophy of science about those concepts.

Frequently one can meet the claim that theories are about predictions. It is indeed the received view. In this essay we try to reject precisely this received view. As an alternative, we offer a Wittgensteinian perspective on the concept of theory, with some Deleuzean, dedicatedly post-Kantian influences. This perspective we could call a theory about theory. It will turn out that this perspective not only is radically different from the received view, it also provides some important otherwise unachievable benefits, or (in still rather imprecise wording) both concerning “practical” as well as philosophical aspects. But let us start first with some examples.

Even before let me state clearly that there is much more about theory than can be mentioned in a single essay. Actually, this essay is based on a draft for book on the theory of theory that comprises some 500 pages…

The motivation to think about theory derives from several hot spots. Firstly, it is directly and intrinsically implied by the main focus of the first “part” of this blog on the issue of (the possibility for a) machine-based episteme. We as humans only can know because we can willingly take part in a game that could be appropriately described as mutual and conscious theorizing-modeling induction. If machines ever should develop the capability for their own episteme, for their autonomous capability to know, they necessarily have to be able to build theories.

A second strain of motivation comes from the the field of complexity. There are countless publications stating that it is not possible to derive a consistent notion of complexity, ranging from Niklas Luhmann [1986] to Hermann Haken [2012] (see []), leading either to a rejection of the idea that it is a generally applicable concept, or to an empty generalization, or to a reduction. Obviously, people are stating that there is no possibility for a theory about complexity. On the other hand, complexity is more and more accepted as a serious explanatory scheme across disciplines, from material science to biology, sociology and urbanism. Complexity is also increasingly a topic in the field of machine-based episteme, e.g. through the concept of self-organizing maps (SOM). This divergence needs to be clarified, and to be dissolved, of course.

The third thread of motivation is given by another field where theory has  been regarded usually as something exotic: urbanism and architecture. Is talking about architecture, e.g. its history, without actually using this talking in the immediate context of organizing and rising a building already “theory”? Are we allowed to talk in this way at all, thereby splitting talking and doing? Another issue in these fields is the strange subject of planning. Plans are neither models nor theory, nor operation, and planning often fails, not only in architecture, but also in the IT-industry. In order to understand the status of plans, we have first to get clear about the abundant parlance that distinguishes “theory” and “practice”.

Quite obviously, a proper theory of theory in general, that is, not just a theory about a particular theory, is also highly relevant what is known as theory about theory change, or in terms used often in the field of Artificial Intelligence, belief revision. If we do not have a proper theory about theory at our disposal, we also will not talk reasonably about what it could mean to change a belief. Actually, the topic about beliefs is so relevant that we will discuss it in a dedicated essay. For the time being, we just want to point out the relevance of our considerations here. Later, we will include a further short remark about it.

For these reasons it is vital in our opinion (and for us) to understand the concept of theory better than it is possible on the basis of current mainstream thinking on the subject.


In line with that mainstream attitude it has been said for instance that Einstein’s theory predicted—or: Einstein predicted from his theory—the phenomenon of gravitational lenses for light. In Einstein’s universe, there is no absoluteness regarding the straightness of a line, because space itself has a curvature that is parametrized. Another example is the so-called Standard Model, or Standard Interpretation in particle physics. Physicists claim that this model is a theory and that it is the best available theory in making correct predictions about the behavior of matter. The core of this theory is given by the relation between two elements, the field and its respective mediating particle, a view, which is a descendant of Einstein’s famous equation about energy, mass and the speed of of light. Yet, the field theory leads to the problem of infinite regress, which they hope to solve in the LHC “experiments” currently performed at the CERN in Geneva. The ultimate particle that also should “explain” gravity is called the Higgs-Boson. The general structure of the Standard Model, however, is a limit process: The resting mass of the particles is thought to become larger and larger, such, the Higgs-Boson is the last possible particle, leaving gravitation and the graviton still unexplained. There is also a pretty arrangement of the basic types of elementary particles that is reminding the periodic table in chemistry. Anyway, by means of that Standard Model it is possible to build computers, or at least logical circuits, where a bit is represented by just some 20 electrons. Else, Einstein’s theory has a direct application in the GPS, where a highly accurate common time base shared between the satellites is essential.

Despite these successes there are still large deficits of the theory. Physicists say that they did not detect gravitational waves so far that are said to be predicted by their theory. Well, physics even does not even offer any insight about the genesis of electric charges and magnetism. These are treated as phenomena, leaving a strange gap between the theory and the macroscopic observations (Note that the Standard Model does NOT allow decoherence into a field, but rather only into particles). Else, physicists do not have even the slightest clue about some mysterious entities in the universe that they call “dark matter” and “dark energy”, except that it exerts positive or negative gravitational force. I personally tend to rate this as one of the largest (bad) jokes of science ever: Building and running the LHC (around 12 billion $ so far) on the one hand and at the same time taking the road back into mythic medieval language serious. We meet also and again meet dark ages in physics, not only dark matter and dark energy.

Traveling Dark Matter in a particular context, reflecting and inducing new theories: The case of Malevich and his holy blackness.1

Anyway, that’s not our main topic here. I cited these examples just to highlight the common usage of the concept of theory, according to which a theory is a more or less mindful collection of proposals that can be used to make predictions about worldly facts.

To be Different, or not to be Different…

But what is then the difference between theories and models? The concept of model is itself an astonishing phenomenon. Today, it is almost ubiquitous, We hardly can imagine anymore that only a few decades ago, back in the 19th century, the concept of model was used mainly by architects. Presumably, it was the progress made in physics in the beginning of the 20th century, together with the foundational crisis in mathematics that initiated the career of the concept of model (for an overview in German language see this collection of pages and references).

One of the usages of the concept of model refers to the “direct” derivation of predictions from empirical observations. We can take some observations about process D, e.g. an illness of the human body, where we know the outcome (cured or not) and then we could try to build an “empiric” model that links the observations to the outcome. Observations can include the treatment(s), of course. It is clear that predictions and diagnoses are almost synonyms.

Where is the theory here? Many claim that there is no theory in modeling in general, and particularly that there is no theory possible in the case of medicine and pharmacology. Statistical techniques are usually regarded as some kind of method. For there is no useful generalization is is believed that a “theory” would not be different from stating that the subject is alive. It is claimed that we are always directly faced with the full complexity of living organisms, thus we have to reduce or perspective. But stop, shouldn’t we take the notion of complexity here already as a theory, should we?

For Darwin’s theory of natural selection it is also not easy to draw a separating line between the concept of models and theories. Darwin indeed argued on a quite abstract level, which led to the situation that people think that his theory can not be readily tested. Some people feel thus inclined to refer to the great designer, or to the  Spaghetti monster alike. Others, notably often physicists, chemists or mathematicians, tried to turn Darwin’s theory into a system that actually could be tested. For the time being we leave this as an open issue, but we will return to it later.

Today it is generally acknowledged that measurement always implies a theory. From that we directly can conclude that the same should hold for modeling. Modeling implies a theory, as measurement implies a particular model. In the latter case the model is often actualized by the materiality or the material arrangement of the measurement device. Both, the material aspects together with the immaterial design aspects that mainly concern informational filtering, establish at least implicitly a particular normativity, a set of normative rules that we can call “model.” This aspect of normativity of models (and of theories alike) is quite important, we should keep this in mind.

In the former relation, the implication of theories by modeling, we may expect a similar dependency. Yet, as far as we do not clearly distinguish models and theory, theories would be simply some kind of more general models. If we do not discern them, we would not need both. Actually, precisely this is the current state of affairs, at least in the mainstreams across various disciplines.

Reframing. Into the Practice of Languagability.

It is one of the stances inherited from materialism to pose questions about a particular subject in an existential, or if you like, ontological, manner. Existential questions take the form “What is X?”, where the “is” already claims the possibility of an analytical treatment, implied by the sign for equality. In turn this equality, provoked by the existential parlance, claims that this equation is a lossless representation. We are convinced that this approach destroys any chance for sustainable insights already in the first move. This holds even for the concepts of “model” or “theory” themselves. Nevertheless, the questions “What is a model?” or “What is a theory?” can be frequently met (e.g. [1] p.278)

The deeper reason for the referred difficulties is that it implies the primacy of the identity relation. Yet, the only possible identity relation is a=a, the tautology, which of course is empirically empty. Despite we can write a=b, it is not an identity relation any more. Either it is a claim, or it is based on empiric arguments, that means, it is always a claim. In any case, one have to give further criteria upon which the identity a=b appears as justified. The selection of those criteria is far outside of the relation itself. It invokes the totality of the respective life form. The only conclusion we can draw from this is that the identity relation is transcendent. Despite its necessity it can not be part of the empirical world. All the same is hence true for logic.

Claiming the identity relation for empirical facts, i.e. for any kind of experience and hence also for any thought, is self-contradictive. It implies a normativity that remains deliberately hidden. We all know about the late and always disastrous consequences of materialism on the societal level, irrespective of choosing the marxist or the capitalist flavor.

There are probably only two ways of rejecting materialism and such also for avoiding its implications. Both of them reject the primacy of the identity relation, yet in slightly different ways. The first one is Deleuze’s transcendental difference, which he developed in his philosophy of the differential (e.g. in Difference & Repetition, or his book about the Fold and Leibniz). The second one is Wittgenstein’s proposal to take logic as a consequence of performance, or more precise, as an applicable quasi-logic, and to conceive of logic as a transcendental entity. Both ways are closely related, though developed independently from each other. Of course, there are common traits shared by Deleuze and Wittgenstein such as rejecting what has been known as “academic philosophy” at their time. All the philosophy had been positioned just as “footnotes to Platon”, Kant or Hegel.

In our reframing of the concept of theory we have been inspired by both, Deleuze and Wittgenstein, yet we follow the Wittgensteinian track more explicitly in the following.

Actually, the move is quite simple. We just have to drop the assumption that entities “exist” independently. Even if we erode that idealistic independence only slightly we are ultimately actually enforced to acknowledge that everything we can say, know or do is mediated by language, or more general by the conditions that imply the capability for language, in short by languagability.

In contrast to so-called “natural languages”—which actually is a revealing term— languagability is not a dualistic, bivalent off-or-on concept. It is applicable to any performing entity, including animals and machines. Hence, languagability is not only the core concept for the foundation of the investigation of the possibility of machine-based episteme. It is essential for any theory.

Following this track, we stop asking ontological questions. We even drop ontology as a whole. Questions like “What is a Theory?”, “What is Language?” etc. are almost free of any possible sense. Instead, it appears much more reasonable to accept the primacy of languagability and to ask about the language game in which a particular concept plays a certain role. The question that promises progress therefore is:

What can we say about the concept of theory as a language game?

To our knowledge, the “linguistic turn” has not been performed in philosophy of science so far, let it even be in disciplines like computer science or architecture. The consequence of which is a considerable mess in the respective disciplines.

Theory as a Language Game

One of the first implications of the turn towards the primacy of languagability is the vanishing of the dualism between theory and practice. Any practice requires rules, which in turn can only be referred to in the space of languagability. Of course, there is more than the rule in rule-following. Speech acts have been stratified first by Austin [2] into locutionary, illocutionary and perlocutionary parts. There might be even further ones, implying evolutionary issues or the play as story-telling. (Later we we call these aspects “delocutionary”) On the other hand, it is also true that one can not pretend to follow a rule, as Wittgenstein recognized [3].

It is interesting in this respect that the dualistic, opposing contrast between theory and practice has not been the classical view; not just by chance it appeared as late as in the early 17th century [4]. Originally, theory just meant “to look at, to speculate”, a pairing that is interesting in itself.

Ultimately, rules are embedded in the totality of a life form (“Lebensform” in the Wittgensteinian, non-phenomenological sense), including the complete “system” of norms in charge at a given moment. Yet, most rules are regulated themselves, by more abstract ones, that set the conditions for the less abstract ones. The result is not a perfect hierarchy of course, the collection of rules being active in a Lebensform is not an analytic endeavor. We already mentioned this layered system in another chapter (about “comparing”) and labeled it “orthoregulation” there. Rules are orthoregulated, without orthoregulation rules would not be rules.

This rooting of rules in the Forms of Life (Wittgenstein), the communal aspect (Putnam), the Field of Proposals (“Aussagefeld”, Foucault) or the Plane of Immanence provoked by attempting to think consistently (Deleuze), which are just different labels for closely related aspects, prevents the ultimate justification, the justifiable idea, and the presence of logical truth values or truth functions in actual life.

It is now important to recognize and to keep in mind that rules about rules are not referring to any empiric entity that could be found as material or informational fact! Rules about rules are referring to the regulated rules only. Of course, usually even the meta-rules are embedded into the larger context of valuation, the whole system should work somehow, that is, the whole system should allow to create predictive models. Here we find the link to risk (avoidance) and security.

Taking an empiricist or pragmatic stance also for the “meta”-rules that are part of the orthoregulative layer we could well say that the empiric basis of the ortho-rules are other, less abstract and less general rules.

Now we can apply the principle of orthoregulation to the subject of theory. Several implications are immediately and clearly visible, namely and most important that

  • – theories are not about the prediction of empirical “non-normative” phenomena, the subject of Popper’s falsificationism is the model, nor the theory;
  • – theories can not be formalized, because they are at least partially normative;
  • – facts can’t be “explained” as far as “explanations” are conceived to be non-normative entities;

It is clear that the standard account to the status of scientific theories is not compatible with that (which actually is a compliment). Mathias Frisch [5] briefly discusses some of the issues. Particularly, he dismisses the stance that

“the content of a theory is exhausted by its mathematical formalism and a mapping function defining the class of its models.” (p.7)

This approach is also shared by the influential Bas van Fraassen, especially his 1980 [6]. In contrast to this claim we definitely reject that there is any necessity consistency between models and the theory from which they have been derived, nor among the family of models that could be associated with a theory. Life forms (Lebensformen) can not and should not be  evaluated by means of “consistency”, unless you are a social designer, that for instance  has been inventing a variant of idealism practicing in and on Syracuse… The rejection of a formal relationship between theories and models includes the rejection of the set theoretic perspective onto models. Since theories are normative they can’t be formalizable and it is near to scandal to claim ([6], p.43) that

Any structure which satisfies the axioms of a theory…is called a model of that theory.

The problem here being mainly the claim that theories consist of or contain axioms. Norms never have been and never will be “axiomatic.”

There is a theory about belief revision that has been quite influential for the discipline or field that is called “Artificial Intelligence” (we dismiss this term/name, since it is either empty or misleading). This theory is known under the label AGM theory, where the acronym derives from the initials of the names of three proponents Alchourrón, Gärdenfors, and Makinson [7]. The history of its adoption by computer scientists is a story in itself [8]; what we can take here is that it is believed by the computer scientists that the AGM theory is relevant for the update of so-called knowledge bases.

Despite its popularity, the AGM theory is seriously flawed, as Neil Tennant has been pointing out [9] (we will criticize his results in another essay about beliefs (scheduled)). A nasty discussion mainly characterized by mutual accusations started (see [10] as an example), which is typical for deficient theories.

Within AGM, and similar to Fraassen’s account on the topic, a theory is a equal to a set of beliefs, which in turn is conceived as a logically closed set of sentences. There are several mistakes here. First, they are applying truth-function logic as a foundation. This is not possible, as we have seen elsewhere. Second, a belief is not a belief any more as soon as we conceive it as a preposition, i.e. a statement within logic, i.e. under logical closure. It would be a claim, not a belief. Yet, claims belong to a different kind of game. If one would to express the fact that we can’t know anything precisely, e.g. due to the primacy of interpretation, we simply could take the notion of risk, which is part of a general concept of model. A further defect in AGM theory and any similar approach that is trying to formalize the notion of theory completely is that they conflate propositional content with the form of the proposition. Robert Brandom demonstrates in an extremely thorough way, why this is a mistake, and why we are enforced to the view that propositional content “exists” only as a mutual assignment between entities that talk to each other (chapter 9.3.4 in [11]). The main underlying reason for this is the primacy of interpretation.

In turn we can conclude that the AGM theory as well as any attempt to formalize theory can be conceived as a viable theory only, if the primacy of interpretation is inadequate. Yet, this creates the problem how we are tied to the world. The only alternative would be to claim that this is going on somehow “directly”. Of course, such claims are either 100% nonsense, or 100% dictatorship.

Regarding the application of the faulty AGM theory to computer science we find another problem: Knowledge can’t be saved to a hard disk, as little as it is possible for information. Only a strongly reductionist perspective, which almost is a caricature of what could be called knowledge, allows to take that route.

We already argued elsewhere that a model neither can contain the conditions of its applicability nor of its actual application. The same applies of course to theories. As a direct consequence of that we have to investigate the role of conditions (we do this in another chapter).

Theories are precisely the “instrument” for organizing the conditions for building models. It is the property of being an instrument about conditions that renders them into an entity that is inevitably embedded into community. We could even bring in Heidegger’s concept of the “Gestell” (scaffold) here, which we coined in the context of his reflections about technology.

The subject of theories are models, not the proposals about the empirical world, as far as we exclude models from the empirical world. The subject of Popper’s falsificationism is the realm of models. In the chapter about modeling we determined models as tools for anticipation given the expectation of weak repeatability. These anticipations can fail, hence they can be tested and confirmed. Inversely, we also can say that every theoretical construct that can be tested is an anticipation, i.e. a model. Theoretical constructs that can not be tested are theories. Mathias Frisch ([5], p.42) writes, quote:

I want to suggest that in accepting a theory, our commitment is only that the theory allows us to construct successful models of the phenomena in its domain, where part of what it is for a model to be successful is that it represents the phenomenon at issue to whatever degree of accuracy is appropriate in the case at issue. That is, in accepting a theory we are committed to the claim that the theory is reliable, but we are not committed to its literal truth or even just of its empirical consequences.

We agree with him concerning the dismissal of truth or empiric content regarding the theories. Yet, the term “reliable” could still be misleading. One never would say that a norm is reliable. Norms themselves can’t be called reliable, only its following. You not only just obey to a norm, the norm is also something that has been fixed as the result of social process, as a habit of a social group. On a wider perspective, we probably could assign that property, since we tend to expect that a norm supports us in doing so. If norm would not support us, it would not “work,” and in the long run it will be replaced, often in a catastrophically sweeping event. That “working”of a norm is, however, almost unobservable by the individual, since it belongs to the Lebensform. We also should keep in mind that as far as we would refer to such a reliability, it is not directed towards the prediction, at least not directly, it refers just to the possibility to create predictive models.

From  safe grounds we now can reject all the attempts that try to formalize theories according to the line Carnap-Sneed-Stegmüller-Moulines [12, 13, 14, 15]. The “intended usage” of a theory (Sneed/Stegmüller) can not be formalized, since it is related to the world, not just to an isolated subject. Scientific languages (Carnap’s enterprise) are hence not possible.

Of course, it is possible to create models about the modeling, i.e. taking models as an empiric subject. Yet, such models are still not a theory, even as they look quite abstract. They are simply models,  which imply or require a theory. Here lies the main misunderstanding of the folks cited above.

The turn towards languagability includes the removal of the dualistic contrast between theory and practice. This dualism is replaced by a structural perspective according to which theory and practice are co-extensive. Still, there are activities, that we would not call a practice or an action, so to speak before any rule. Such activities are performances. Not to the least this is also the reason why performance art is… art.

Heinrich Lüber, the Swiss performance artist, standing on-top of a puppet shaped as himself. What is no visible here: He stood there for 8 hours, in the water on shore of the French Atlantic coastline.

Besides performance (art) there are no activities that would be free of rules, or equivalently, free of theory. Particularly modeling is of course a practice, quite in contrast to theory. Another important issue we can derive from our distinction is that any model implies a theory, even if the model just consists of a particular molecule, as it is the case in the perception mechanisms of individual biological cells.

Another question we have sharply to distinguish from that about the reach of theories is whether the models predict well. And of course, just as norms, also theories can be inappropriate.

Theories are simply there. Theories denote what can be said about the influence of the general conditions—as present in the embedding “Lebenswelt”—onto the activity of modeling.

Theories thus can be described by the following three properties:

  • (1) A theory is the (social) practice of determining the conditions for the actualization of virtuals, the result of which are models.
  • (2) A theory acts as a synthesizing milieu, which facilitate the orthoregulated  instantiation of models that are anticipatively related to the real world (where the “real world” satisfies the constraints of Wittgensteinian solipsism).
  • (3) A theory is a language generating language game.

Theories, Models, and in between

Most of the constructs called “theory” are nothing else than a hopeless mixture of models and theories, committing serious naturalistic fallacies in comparing empiric “facts” with normative conditions. We will give just a few examples for this.

It is generally acknowledged that some of Newton’s formulas constitute his theory of gravitation. Yet, it is not a theory, it is a model. It allows for direct and, in the mesocosmic scale, even for almost lawful predictions about falling objects or astronomical satellites. Newton’s theory, however, is given by his belief in a certain theological cosmology. Due to this theory, which entails absoluteness, Newton was unable to detect relativism.

Similarly the case of Kepler. For a long time (more than 20 years) Kepler’s theory entailed the belief in a pre-established cosmic harmony that could be described by Euclidean geometry, which itself was considered as being a direct link to divine regions at that time. The first model that Kepler constructed to fulfill this theory comprised the inscription of platonic solids into the planetary orbits. But those models failed. Based on better observational data he derived different models, yet still within the same theory. Only when we dropped the role of the geometrical approach in his theory he was able to find his laws about the celestial ellipses. In other words, he dropped most of his theological orthoregulations.

Einstein’s work about relativity finally is clearly a model as there is not only one formula. Einstein’s theory is not related to the space-time structure of the macroscopic universe. Instead, the condition for deriving the quantitative / qualitative predictions are related to certain beliefs in non-randomness of the universe. His conflict with quantum theory is well-known: “God does not play dice.

The contemporary Standard Model in particle physics is exactly that: a model. Its not a theory. The theory behind the standard model is logical flatness and materialism. It is a considerable misunderstanding of most physicists to accuse proponents of the String theory not to provide predictions. They can not, because they are thinking about a theory. Yet, string theorists themselves do not properly understand the epistemic role of their theory as well.

A particular case is given by Darwin’s theory. Darwin of course did not distinguish perfectly or explicit between models and theories, it was not possible for him at these days. Yet, throughout his writings and the organization of his work we can detect that he implicitly followed that distinction. From Darwin’s writings we know that he was deeply impressed by the non-random manifoldness in the domain of life. Precisely this represented the core of his theory. His formulation about competition, sexual selection or inheritance are just particular models. In our chapter about the abstract structure of evolution we formulated a model about evolutionary processes in a quite abstract way. Yet, it is still a model, within almost the same theory that Darwin once followed.2

There is a quite popular work about the historical dynamics of theory, Thomas Kuhn’s “The Structure of Scientific Revolutions“, which is not theory, but just a model. For large parts it is not even a model, but just a bad description, which he coined the paradigm of the “paradigm shift”. There is almost no reflection in it. Above all, it is certainly not a theory about theory, nor a theory about the evolution of theories. He had to fail, since he does not distinguish between theories and models to the least extent.

So, leaving these examples, how do relate models and theories practically? Is there a transition between them?

Model of Theory, Theory of Model, and Theory of Theory

I think we can we can derive from these examples a certain relativity regarding the life-cycle of models and theories. Theories can be transformed into models through removal of those parts that refer to the Lebenswelt, while models can be transformed into theories if the orthoregulative part of models gets focused (or extracted from theory-models)

Obviously, what we just did was to describe a mechanism. We proposed a model. In the same way it represents a model to use the concept of the language game for deriving a structure for the concept of theory. Plainly spoken, so far we created a model about theory.

As we have seen, this model also comprises proposals about the transition from model to theory. This transition may take two different routes, according to our model about theory. The first route is taken if a model gets extended by habits and further, mainly socially rooted, orthoregulation, until the original model appears just as a special case. The abstract view might be still only implicit, but it may be derived explicity if the whole family of models is concretely going to be constructed, that are possible within those orthoregulations. The second route draws upon a proceeding abstraction, introducing thereby the necessity of instantiation. It is this necessity that decouples the former model from its capability to predict something.

Both routes, either by adding orthoregulations explicitly or implicitly through abstraction, turn the former model de actio into a milieu-like environment: a theory.

As productive milieus, theories comprise all components that allow the construction and the application of models:

  • – families of models as ensembles of virtualized models;
  • – rules about observation and perception, including the processes of encoding and decoding;
  • – infrastructural elements like alphabets or indices;
  • – axiomatically introduced formalizations;
  • – procedures of negotiation the procedures of standardization and other orthoregulations up to arbitrary order

The model of model, on the other hand, we already provided here, where we described it as a 6-Tupel, representing different, incommensurable domains. No possible way can be thought of from one domain to one of the other. These six domains are, by their label:

  • (1) usage U
  • (2) observations O
  • (3) featuring assignates F on O
  • (4) similarity mapping M
  • (5) quasi-logic Q
  • (6) procedural aspects of the implementation

or, taken together:

This model of model is probably the most abstract and general model that is not yet a theory. It provides all the docking stations that are required to attach the realm of norms. Such, it would be only a small step to turn this model into a theory. That step towards a theory of model would include statements about two further dimensions: (1) the formal status and (2) the epistemic role of models. The first issue is largely covered by identifying them as a category (in the sense of category theory). The second part is related to the primacy of interpretation, that is, to a world view that is structured by (Peircean) sign processes and transcendental differences (in the Deleuzean sense).

The last twist concerns the theory of theory. There are good reasons to assume that for a theory of theory we need to invoke transcendental categories. Particularly, a theory of theory can’t contain any positive definite proposal, since in this case it would automatically turn into a model. A theory of theory can be formulated only as a self-referential, self-generating structure within transcendental conditions, where this structure can act as a borderless container for any theory about any kind of Lebensform. (This is the work of the chapter about the Choreosteme.)

Remarkably, we thus could not formulate that we could apply a theory to itself, as a theory is a positive definite thing, even if it would contain only proposals about conditions (yet, this is not possible either). Of course, this play between (i) ultimately transcendent conditions, (ii) mere performance that is embedded in a life form and finally (iii) the generation of positivity within this field constitutes a quite peculiar “three-body-problem” of mental life and (proto-)philosophy. We will return to that in the chapter about the choreosteme, where we also will discuss the issue of “images of thoughts” (Gilles Deleuze) or, in slightly different terms, the “idioms of thinking” (Bernhard Waldenfels).


Finally, there should be our cetero censeo, some closing remarks about the issue of machine-based episteme, or even machine-based epistemology.  Already in the beginning of this chapter we declared our motivation. But what can we derive and “take home” in terms of constructive principles?

Our general goal is to establish—or to get clear about—some minimal set of necessary conditions that would allow a “machinic substrate” in such a way that we could assign to it the property of “being able to understand” in a fully justified manner.

One of the main results in this respect here was that modeling is nothing that could be thought of as running independently, as algorithm, in such a way that we could regard this modelling as sufficient for ascribing the machine the capability to understand. More precisely, it is not even the machine that is modeling, it is the programmer, or the statistician, the data analyst etc., who switched the machine into the ON-state. For modeling, knowing and theorizing the machine should act autonomously.

On the other hand, performing modeling inevitably implies a theory. We just have to keep this theory somehow “within” the machine, or more precisely, within the sign processes that take place inside the machine. The ability to build theories necessarily implies self-referentiality of the informational processes. Our perspective here is that the macroscopic effects of  self-referentiality, such like the ability for building theories, or consciousness, can not be “programmed”, they have to be a consequence of the im-/material design aspects of the processes that make up this aspects…

Another insight is, also not a heavily surprising one, though, that the ability to build theories refers to social norms. Without social norms there is no theorizing. It is not the mathematics or the science that would be necessary it is just the presence and accessibility of social norms. We could call it briefly education. Here we are aligned to theories (i.e. mostly models) that point to the social origins of higher cognitive functions. It is quite obvious that some kind of language is necessary for that.

The road to machine-based episteme thus does not imply a visit in the realms of robotics. There we will meet only insects and …roboters. The road to episteme leads through languagability, and anything that is implied by that, such as metaphors or analogical thinking. These subjects will be the topic of next chapters. Yet, it also defines the programming project accompanying this blog: implementing the ability to understand textual information.

u .


1. The image in the middle of this tryptich shows the situation in the first installation on the exhibition in Petrograd in 1915, arranged by Malevich himself. He put the “Black Square” exactly at the same place where traditionally the christian cross was to be found in Russian living rooms at that time: up in the corner under the ceiling. This way, he invoked a whole range of reflections about the dynamics of symbols and habits.

2. Other components of our theory of evolutionary processes entail the principle of complexity, and the primacy of difference and the primacy of interpretation.

This article has been created on Oct 21st, 2011, and has been republished in a considerably revised form on Feb 13th, 2012.


  • [1] Stathis Psillos, Martin Curd (eds.) The Routledge Companion to Philosophy of Science.

    Taylor & Francis, London and New York 2008.

  • [2] Austin, Speech Act Theory;
  • [3] Wittgenstein, Philosophical Investigations;
  • [4] etymology of “theory”; “theorein”
  • [5] Mathias Frisch, Inconsistency, Asymmetry, and Non-Locality: A Philosophical Investigation of Classical Electrodynamics. Oxford 2005.
  • [6] Bas van Frassen, The Scientific Image,

    Oxford University Press, Oxford 1980.

  • [7] Alchourron, C., Gärdenfors, P. and Makinson, D. (1985). On the Logic of Theory Change: Partial Meet Contraction and Revision Functions. Journal of Symbolic Logic, 50, 510-30.
  • [8] Raúl Carnota and Ricardo Rodríguez (2011). AGM Theory and Artificial Intelligence.

    in: Belief Revision meets Philosophy of ScienceLogic, Epistemology, and the Unity of Science, 2011, Vol.21, 1-42.

  • [9] Neil Tennant (1997). Changing the Theory of Theory Change: Reply to My Critics.

    Brit. J. Phil. Sci. 48, 569-586.

  • [10] Hansson, S. 0. and Rott, H. [1995]: ‘How Not to Change the Theory of Theory Change: A Reply to Tennant’, British Journal for the Philosophy of Science, 46, pp. 361-80.
  • [11] Robert Brandom, Making it Explicit. 1994.
  • [12] Carnap
  • [13] Sneed
  • [14] Wolfgang Stegmüller
  • [15] Moulines


Where Am I?

You are currently viewing the archives for February, 2012 at The "Putnam Program".