Similarity

December 30, 2011 § 1 Comment

Similarity appears to be a notoriously inflationary concept.

Already in 1979 a presumably even incomplete catalog of similarity measures in information retrieval listed almost 70 ways to determine similarity [1]. In contemporary philosophy, however, it is almost absent as a concept, probably because it is considered merely as a minor technical aspect of empiric activities. Often it is also related to naive realism,which claimed a similarity between a physical reality and concepts. Similarity is also a central topic in cognitive psychology, yet not often discussed, probably for the same reasons as in philosophy.

In both disciplines, understanding is usually equated with drawing conclusions. Since the business of drawing conclusions and describing the kinds and surrounds of that is considered to be the subject of logic (as a discipline), it is comprehensible that logic has been rated by many practitioners and theoreticians alike as the master discipline. While there is a vivid discourse about logical aspects for many centuries now, the role of similarity is largely neglected, and where vagueness makes its way to the surface, it is “analyzed” completely within logic. Also not quite surprising, artificial intelligence focused strongly on a direct link towards propositional logic and predicate calculus for a long period of time. This link has been represented by the programming language “Prolog,” an abbreviation standing for “programming in logic.” It was established in the first half of the 1970ies by the so-called Edinburgh-school. Let us just note that this branch of machine-learning disastrously failed. Quite remarkably, the generally attested reason for this failure has been called the “knowledge acquisition bottleneck” by the community suffering from it. Somehow the logical approach was completely unsuitable for getting in touch with the world, which actually is not really surprising for anyone who understood Wittgenstein’s philosophical work, even if only partially. Today, the logic oriented approach is generally avoided in machine-learning.

As a technical aspect, similarity is abundant in the so-called field of data mining. Yet, there it is not discussed as a subject in its own rights. In this field, as represented by the respective software tools, rather primitive notions of similarity are employed, importing a lot of questionable assumptions. We will discuss them a bit later.

There is a particular problematics with the concept of similarity, that endangers many other abstract terms, too. This problematics appears if the concept is equated with its operationalization. Sciences and engineering are particularly prone for the failure to be aware of this distinction. It is an inevitable consequence of the self-conception of science, particularly the hypothetico-deductive approach [cf. 2], to assign ontological weight to concepts. Nevertheless, such assignment always commits a naturalization fallacy. Additionally, we may suggest that ontology itself is a deep consequence of an overly scientific, say: positivistic, mechanic, etc. world view. Dropping the positivistic stance removes ontology as a relevant attitude.

As a consequence, science is not able to reflect about the concept itself. What science can do, regardless the discipline, is just to propose further variants as representatives of a hypothesis, or to classify the various proposed approaches. This poses a serious secondary methodological problematics, since it equally holds that there is no science without the transparent usage of the concept of similarity. Science should control free parameters of experiments and their valuation. Somewhat surprisingly, almost the “opposite” can be observed. The fault is introduced by statistics, as we will see, and this result really came as a surprise even for me.

A special case is provided by “analytical” linguistics, where we can observe a serious case of reduction. In [3], the author selects the title “Vagueness and Linguistics,” but also admits that “In this paper I focused my discussion on relative adjectives.” Well, vagueness can hardly be restricted to anything like relative adjectives, even in linguistics. Even more astonishing is the fact that similarity does not appear as a subject at all in the cited article (except in a reference to another author).

In the field engaged in the theory of the metaphor [cf. 4, or 5], one can find a lot of references to similarity. In any case known to me it is, however, regarded as something “elementary” and unproblematic. Obviously neither extra-linguistic modeling nor any kind of inner structure of similarity is recognized as important or even as possible. No particular transparent discourse about similarity and modeling is available from this field.

From these observations  it is possible in principle to derive two different, and mutually exclusive conclusions. First, we could conclude that similarity is irrelevant for understanding phenomena like language understanding or the empirical constitution. We don’t believe in that. Second, it could be that similarity represents a blind spot across several communities. Therefore we will try to provide a brief overview about some basic topics regarding the concept of similarity.

Etymology

Let us add some etymological considerations for a first impression. Words like “similar,” “simulation” or “same” derive all from proto-indoeuropean (PIE) base “*sem-/*som-“, which meant “together, one”, in Old English then “same.”  Yet, there is also the notion of “simulacrum” in the “same cloud”; the simulacrum is a central issue in the earliest pieces of philosophy of which we know (Platon) in sufficient detail.

The German word “ähnlich,” being the direct translation of “similar,” derives from Old German (althochdeutsch, well before ~1050 a.c.) “anagilith” [6], a composite from an- and gilith, meaning together something like “angleichen,” for which in English we find the words adapt, align, adjust or approximate, but also “conform to” or “blend.” The similarity to “sema” (“sign”) seems to be only superficial; it is believed that sema derives from PIE “dhya” [7].

If some items are said to be “similar,” it is meant that they are not “identical,” where identical means indistinguishable. To make them (virtually) indistinguish- able, they would have to be transformed. Even from etymology we can see that similarity needs an activity before it can be attested or assigned. Similarity is nothing to be found “there,” instead it is something that one is going to produce in a purposeful manner. This constructivist aspect is quite important for our following considerations.

Common Usage of Similarity

In this section, we will inspect the usage of the concept of “similarity” in some areas of particular relevance. We will visit cognitive psychology, information theory, data mining and statistical modeling.

Cognitive Psychology

Let us start with the terminology that has been developed in cognitive psychology, where one can find a rich distinction of the concept of similarity. It started with the work of Tversky [8], while Goldstone provides a useful overview more recently [9].

Tversky, a highly innovative researcher on cognition, tried to generalize the concept of similarity. His intention is to overcome the typical weakness of “geometric models”, which “[…] represent objects as points in some coordinate space such that the observed dissimilarities between objects correspond to the metric distances between the respective points.”1 The major assumption (and drawback) of geometric models is the (metric) representability in coordinate space. A typical representative of “geometric models” as Tversky calls them employs the nowadays widespread Euclidean distance as an operationalization for similarity.

A new set-theoretical approach to similarity is developed in which objects are represented as collections of features, and similarity is described as a feature-matching process. Specifically, a set of qualitative assumptions is shown to imply the contrast model, which expresses the similarity between objects as a linear combination of the measures of their common and distinctive features.

Tversky’s critique in the “geometrical approach” applies only if two restrictions are active: (1) If one would disregard missing values, which actually is the case for most of the practices. (2) If the dimensional interpretation is considered to be stable and unchangeable, no folding or warping of the data space via transformation of the measured data will be applied.

Yet, it is neither necessary to disregard missing values in a feature-based approach nor to dismiss dimensional warping. Here Tversky does not differentiate between form of representation and the actual rule for establishing the similarity relation. This conflation is quite abundant in many statements about similarity and its operationalization.

What Tversky effectively has been proposing is now known as binning. The  approach by him is based on features, though in a way quite different (at first sight) from our proposal, as we will show below. Yet, not the values of the features are compared, but instead the two sets on the level of the items by means of a particular ratio function. In a different perspective, the data scale used for assessing similarity is reduced to the ordinal or even the nominal scale. Tversky’s approach thus is prone to destroy information present in the “raw” signal.

An attribute (Tversky’s “feature”) that occurs in different grades or shades is translated into a small set of different, distinct and mutually exclusive features.  Tversky obviously does not recognize that binning is just one out of many, many possible ways to deal with observational data, i.e. to transform it. Applying a particular transformation based on some theory in a top-down manner is equivalent to the claim that the selected transformation builds a perfect filter for the actually given data. Of course, this claim is deeply inadequate (see the chapter about technical aspects of modeling). Any untested, axiomatically imposed algorithmic filter may destroy just those pieces of information that would have been vital to achieve a satisfying model. One simply can’t know before.

Tversky’s approach built on feature sets. The difference of those sets (on a nominal level) should represent the similarity and are expressed by the following formula:

s(a,b) = F(A ∩ B, A-B, B-A). eq.1

which Tversky describes in the following way:

The similarity of a to b is expressed as a function F of three arguments: A ∩ B, the features that are common to both a and b; A-B, the features that belong to a but not to b; B-A, the features that belong to b but not to a.

This formula reflects also what he calls “contrast.” (It is similar to Jaccard’s distance, so to speak, an extended practical version of it) Yet, Tversky, like any other member of the community of cognitive psychologist referring to this or a similar formula, did not recognize that the features, when treated in this way, are all equally weighted. It is a consequence of sticking to set theory. Again, this is just the fallback position of initial ignorance of the investigator. In the real world, however, features are differentially weighted, building a context. In the chapter about the formalization of the concept of context we propose a more adequate possibility to think about feature sets, though our concept of context shares important aspects with Tversky’s approach.

Tversky emphasizes that his concept does not consist from just one single instance or formula. He introduces weighting factors for the terms of eq.1, which then leads to families of similarity functions. To our knowledge this is the only instance (besides ours) arguing for a manifold regarding similarity. Yet, again, Tversky still does not draw the conclusion, that the chosen instance of a similarity “functional” (see below) has to be conceived just a hypothesis.

In cognitive psychology (even today), the term “feature-based models” of similarity does not refer to feature vectors as they are used in data mining, or even generalized vectors of assignates, as we proposed it in our concept of the generalized model. In Tversky’s article this becomes manifest on p.330. Contemporary psychologists like Goldstone [9] distinguish four different ways of operationalizing similarity: (1) geometric, (2) feature-based, (3) alignment-based, and (4) transformational similarity. For Tversky [8] and Goldstone, the label “geometric model” refers to models based on feature vectors, as they are used in data mining, e.g. as Euclidean distance.

Our impression is that cognitive psychologist fail to think in an abstract enough manner about features and similarity. Additionally, it seems that there is a tendency to the representationalist fallacy. Features are only recognized as features as far as they appear “attached” to the object for human senses. Dropping this attitude it becomes an easy exercise to subsume all those four types in a feature-vector approach, that (1) allows for missing values and assigns them a “cost”, and which (2) is not limited to primitive distance functions like Euclidean or Hamming distance. The underdeveloped generality is especially visible concerning the alignment or transformational subtype of similarity.

A further gap in the similarity theory in cognitive psychology is the missing separation between the operation of comparison and the operationalization of similarity as a projective transformation into a 0-dimensional space, that is a scalar (a single value). This distinction is vital, in our opinion, to understand the characteristics of comparison. If one does not separate similarity from comparison, it becomes impossible to become aware of higher forms of comparison.

Information theory

A much more profound generalization of similarity, at least at first sight, has been proposed by Dekang Lin [10], which is based on an “information-theoretic definition of similarity that is applicable as long as there is a probabilistic model.” The main advantage of this approach is its wide applicability, even in cases where only coarse frequency data are available. Quite unfortunately, Lin’s proposal neglects a lot of information if there are accurate measurements in the form of feature-vectors. Besides the comparison of strings and statistical frequency distributions, Lin’s approach is applicable to sets of features, but not to profile-based data, as we propose for our generalized model.

Data Mining

Data mining is an distinguished set of tools and methods that are employed in a well-organized manner in order to facilitate the extraction of relevant patterns [11], either for predictive or for diagnostic purposes. Data Mining (DM) is often conceived as a part of  so-called “knowledge discovery,” building the famous abbreviation KDD: knowledge discovery in databases [11]. In our opinion, the term “data mining” is highly misleading, and “knowledge discovery” even deceptive. In contrast to earthly mining, in the case of information the valuable objects are not “out there” like minerals or gems, while knowledge can’t be “discovered” like diamonds or physical laws. Even the “retrieval” of information is impossible by principle. To think otherwise dismisses the need of interpretation and hence contradicts widely acknowledged positions in contemporary epistemology. One has to know that the terms “KDD” and “data mining” are shallow marketing terms, coined to release the dollars of naive customers. Yet, KDD and DM are myths many people believe in and which are reproduced in countless publications. As concepts, they simply remain to be utter nonsense. As a non-sensical practice that is deeply informed by positivism, it is harmful for society. It is more appropriate to call the respective activity more down-to-earth just diagnostic or predictive modeling (which actually is equivalent).

Any observation of entities takes place along apriori selected properties, often physical ones. This selection of properties is part of the process of creating an operationalization, which actually means to make a concept operable through making it measurable. Actually, those properties are not “natural properties of objects.” Quite to the contrast, objecthood is created by the assignment of a set of features. This inversion is often overlooked in data mining projects, and consequently also the eminently constructive characteristics of data-based modeling. Hence, it is likewise also not correct to call it “data analysis”: an analysis does not add anything. Predictive/ diagnostic models are constructed and synthesized like small machines. Models may well be conceived as an informational machinery. To make our point clear: nobody among the large community of machine-building engineers would support the view that any machine comes into existence just through “analysis.”

Given the importance of similarity in comparison, it is striking to see that in many books about data mining the notion of “similarity” does not appear even a single time [e.g. 12], and in many more publications only in a very superficial manner. Usually, it is believed that the Euclidean distance is a sound, sufficient and appropriate operationalization of similarity. Given its abundance, we have to take a closer look to this concept, how it works, and how it fails.

Euclidean Distance and its Failure

We already met the idea that objects are represented along a set of selected features. In the chapter about comparison we saw that in order to compare items of a population of objects, those objects are to be compared on the basis of a selected and shared feature set. Next, it is clear that for each of the features some values can be measured. For instance, presence could be indicated by the dual pair of values 1/0. For nominal values like names re-scaling mechanisms have been proposed [13]. Such, any observation can be transformed into a table of values, where the (horizontal) rows represent the objects and the columns describe the features.

We also can say that any of the objects contained in such a table is represented by a profile. Note that the order of the columns (features) is arbitrary, but it is also constant for all of the objects covered by the table.

The idea now is that each of the columns represent a dimension in a Cartesian, orthogonal coordinate system. As a preparatory step, we normalize the data, i.e. for each single column the values contained in it are scaled such that the ratios remain unchanged, but the absolute values are projected into the interval [0..1].

By means of such a representation any of the objects (=data rows) can be conceived as a particular point in the space spanned by the coordinate system. The similarity S then is operationalized as the “inverse” of the distance, S=1-d, between any of the points. The distance can be calculated according to the Euclidean formula for the length of the hypotenuse in the orthogonal triangle (2d case). In this way, the points are understood as the endpoint of a vector that starts in the origin of the coordinate system. Thus, this space is often called “data space” or “vector space.” The distance is called “Euclidean distance.”

Since all of the vectors are within the unit sphere (any value is in [0..1]), there is another possibility for an operationalization of the similarity. Instead of the distance one could take the angle between any two of those vectors. This yields the so-called cosine-measure of (dis-)similarity.

Besides the fact that missing values are often (and wrongly) excluded from a feature-vector-based comparison, this whole procedure has a serious built-in flaw, whether as cosine- or as Euclidean distance.

The figure 1a below shows the profiles of two objects above a set of assignates (aka attributes, features, properties, fields). The embedding coordinate space has k dimensions. One can see that the black profile (representing object/case A) and the red profile (representing object/case B) are fairly similar. Note that within the method of the Euclidean distance all ai are supposed to be independent from each other.

Figure 1a: Two objects A’ and B’ has been represented as profiles A, B across a shared feature vector ai of size k ;

Next, we introduce a third profile, representing object C. Suppose that the correlation between profiles A and C is almost perfect. This means that the inner structure of objects A and C could be considered to be very similar. Some additional factor just might have damped the signal, such all values are proportionally lower by an almost constant ratio when compared to values measured from object A.

Figure 1b: Compared to figure 1a, a third object C’ is introduced as a profile C; this profile causes a conflict about the order that should be induced by the similarity measure. There are (very) good reasons, from systems theory as well as from information theory, to consider A and C more similar to each other than either A-B or B-C. Nevertheless, employing Euclidean distance will lead to a different result, rating the pairing A-B as the most similar one.

The particular difficulty now is given by the fact, that it depends on some objections that are completely outside of the chosen operationalization of similarity, which two pairs of observations are considered more similar to each other. Yet, this dependency inverses the intended arrangement of the empiric setup. The determination of the similarity actually should be used to decide about those outside objections. Given the Euclidean distance, A and B are clearly much more similar to each other than either  A-C or B-C. Using in contrast a correlative measure would select A-C as the most similar pairing. This effect gets more and more serious the more assignates are used to compare the items.

Now imagine that there are many observations, dozens, hundreds or hundreds of thousands, that serve as a basis for deriving an intensional description of all observations. It is quite obvious that the final conclusions will differ drastically upon the selection of the similarity measure. The choice of the similarity measure is by no means only of technical interest. The particular problematics, but also, as we will see, the particular opportunity that is related to the operationalization of similarity consists in the fact that there is a quite short and rather strong link between a technical aspect and the semantic effect.

Yet, there are measures that reflect the similarity of the form of the whole set of items more appropriately, such like least-square distances, or measures based on correlation, like the Mahalanobis distance. However, these analytic measures have the disadvantage of relying to certain global parametric assumptions, such as normal distribution. And they do not completely resolve the situation shown in figure 1b even in theory.

We just mentioned that the coherence of value items may be regarded as a form. Thus, it is quite natural to use a similarity measure that is derived from geometry or topology, which also does not suffer from any particular analytic apriori assumption. One such measure is the Hausdorff metric, or more general the Gromov-Hausdorff metric. Being developed in geometry they find their “natural” application in image analysis, such as partial matching of patterns to larger images (aka finding “objects” in images). For the comparison of profiles we have to interpret them as figures in a 2-dimensional space, with |ai|/2 coordinate points. Two of such figures are then prepared to be compared. The Hausdorff distance is also quite interesting because it allows to compare whole sets of observations, not only as two paired observations (profiles) interpreted as coordinates in ℝ2, but also three observations as ℝ3, or a whole set of n observations, arranged as a table, as a point cloud in ℝn. Assuming compactness, i.e. a macroscopic world without gaps, we may interpret them also as curves. This allows to compare whole sub-sets of observations at once, which is a quite attractive feature for the analysis of relational data. As far as we know, nobody ever used the Hausdorff metric in this way.

Epistemologically, it is interesting that a topologically inspired assessment of data provides a formal link between feature-based observations and image processing. Maybe, this is relevant for the subjective impression to think in “images,” though nobody has ever been able to “draw” such an image… This way, the idea of “form” in thought could acquire a significant meaning.

Yet, already in his article published more than 30 years ago, Tversky [8] mentioned that the metric approach is barely convincing. He writes (p.329)

The applicability of the dimensional assumption is limited, […] minimality is somewhat problematic, symmetry is apparently false, and the triangle inequality is hardly compelling.

It is of utmost importance to understand that the selection of the similarity measure as well as the selection of the feature to calculate it are by far the most important factors in the determination, or better hypothetical  presupposition, of the similarity between the profiles (objects) to be compared. The similarity measure and feature selection is by far more important than the selection of a particular method, i.e. a particular way of organizing the application of the similarity measure. Saying “more important” also means that the differences in the results are much larger between different similarity measures than between methods. From a methodological point of view it is thus quite important that the similarity measure is “accessible” and not buried in a “heap of formula.”

Similarity measures that are based only on dimensional interpretation and coordinate spaces are not able to represent issues of form and differential relations, what is also (and better) known as “correlation.” Of course, other approaches different from correlation that would reflect the form aspect of the internal relations of a set of variables (features) would do the job, too. We just want to emphasize that the assumption of perfect independence among the variables is “silly” in the sense that it contradicts the “game” that the modeler actually pretends to play. This leads more often than not to irrelevant results. The serious aspect about this is, however, given by the fact that this deficiency remains invisible when comparing results between different models built according to the Euclidean dogma.

There is only one single feasible conclusion from this: Similarity can’t be regarded as property of actual pairings of objects. The similarity measure is a free parameter in modeling, that is, nothing else than a hypothesis, though on the structural level. As a hypothesis, however, it needs to be tested for adequacy.

Similarity in Statistical Modeling

In statistical modeling the situation is even worse. Usually, the subject of statistical modeling is not the individual object or its representation. The reasoning in statistical modeling is strongly different from modeling in predictive modeling. Statistics compares populations, or at least groups as estimates of populations. Dependent on the scale of the data, the amount of data, the characteristics of data and the intended argument a specialized method has to be picked from a large variety of potential methods. Often, the selected method also has to be parameterized. As a result, the whole process of creating a statistical model is more “art” than science. Results of statistical “analysis” are only approximately reproducible across analysts. It is indeed kind of irony that at the heart of quantitative science one finds a non-scientific methodological core.

Anyway, our concern is similarity. In statistical modeling there is no similarity function visible at all. All that one can see is the result and proposals like “population B is not part of population A with a probability for being false positive of 3%.” Yet, the final argument that populations can be discerned (or can’t) is obviously also an argument about the probability for a correct assignment of the members of the compared populations. Hence, it is also clearly an argument about the group-wise as well as the individual similarity of the objects. The really bad thing is the similarity function is barely visible at all. Often it is some kind of simple difference between values. The main point is that it is not possible to parametrize the hidden similarity function, except by choosing the alpha level for the test. It is “distributed” across the whole procedure of the respective method. In its most important aspect, any of the statistical methods has to be regarded as a black box.

These problems with statistical modeling are prevalent even across the general framework, i.e. whether one chooses a frequentist or a Bayesian attitude. Recently, Alan Hajek [14] proofed that statistics is a framework that in all its flavors suffers from the reference class problem. Cheng [15] correctly notes about the reference class problem that

“At its core, it observes that statistical inferences depend critically on how people, events, or things are classified. As there is (purportedly) no principle for privileging certain categories over others, statistics become manipulable, undermining the very objectivity and certainty that make statistical evidence valuable and attractive …”

So we can see that the reference class problem is just a corollary of the fact that the similarity function is not given explicitly and hence also is not accessible. Thus, Cheng seeks unfulfillable salvation by invoking the cause of defect itself: statistics. He writes

I propose a practical solution to the reference class problem by drawing on model selection theory from the statistics literature.

Despite he is right in pointing to the necessity of model selection, he fails to recognize that statistics can’t be helpful in this task. We find it interesting that this author (Cheng) has been writing for the community of law theoreticians. This sheds bright light onto the relevance of an appropriate theory of modeling.

As a consequence we conclude that statistical methods should not be used as the main tool for any diagnostic/predictive modeling of real-world data. The role of statistical methods in predictive/diagnostic modeling is just the same as that of any other transformation: they are biased filters, whose adequacy has to be tested, nothing less, and, above all, definitely nothing more. Statistics should be used only within completely controllable, hence completely closed environments, such like simulations, or “data experiments.”

The Generalized View

Before we are going to start we would like to recall the almost trivial aspect that the concept of similarity makes sense exclusively in the context of diagnostic/ predictive modeling, where “modeling” refers to the generalized model, which in turn is part of a transcendental structure.

After having briefly discussed the relation of the concept of similarity to some major domains of research, we now may turn to the construction/description of a proper concept of similarity. The generalized view that we are going to argue for should help determining the appropriate mode of speaking about similarity.

Identity

Identity is often seen as the counterpart of similarity, or also as some kind of a simple asymptotical limit to it. Yet, these two concepts are so deeply incommen-surable that they can not be related at all.

One could suggest that identity is a relation that indicates a particular result of a comparison, namely indistinguishability. We then also could say that under any possible transformation applied to identical items the respective item remain indistinguishable. Yet, if we compare two items we refer to the concept of similarity, from which we want to distinguish it. Thus it is clear that identity and similarity are structurally different. There is no way from one to the other.

In other words, the language game of identity excludes any possibility for a comparison. We can set it only by definition, axiomatically. This means that not only the concepts can’t be related to each other, additionally he see that the subjects of the two concepts are categorically different. Identity is only meaningful as an axiomatically introduced equality of symbols.

In still other words we could say that identity is restricted to axiomatically defined symbols in formal surrounds, while similarity is applicable only in empirical contexts. Similarity is not about symbols, but about measurement and the objects constructed from it.

This has profound consequences.

First, identity can’t be regarded as a kind of limit to which similarity would asymptotically approximate. For any two objects that have been rated as being “equal,” notably through some sort of comparison, it is thus possible to find a perspective under which they are not equal any more.

Second, it is impossible to take an existential stance towards similarity. Similarity is a the result of an action, of a method or technique that is embedded in a community. Hence it is not possible to assign similarity an ontic dimension. Similarity is not part of any possible ontology.

We can’t ask “What is similarity?”, we also can not even pretend to determine “the” similarity of two subjects. “Similarity” is a very particular language game, much like its close relatives like vagueness. We only can ask “How to speak about similarity?”

Third, it is now clear that there is no easy way from a probabilistic description to a propositional reference. We already introduced this in another chapter, and we will deal dedicatedly elsewhere with it. There is no such transition within a single methodology. We just see again how far Wittgenstein’s conclusion about the relation of the world and logic is reaching. The categorical separation between identity and similarity, or between the empiric and the logic can’t be underestimated. For our endeavor of a machine-based epistemology it is of vital interest to find a sound theory for this transition, which in any of the relevant research areas has not been even recognized as a problematic subject so far.

Practical Aspects

Above we have seen that any particular similarity measure should be conceived as part of a general hypothesis about the best way to create an optimized model. Within such a hypothesizing setting we can distinguish two domains embedded into a general notion of similarity. We could align these two modes to the distinction Peirce introduced with regard to uncertainty: probability and verisimilitude [16]. Such, the first domain regards the variation along assignates that are shared among two items. From the perspective of any of the compared items, there is complete information about the extension of the world of that item. Any matter is a matter of degree and probability, as Peirce understood it. Taking the perspective of Deleuze, we could call it also possibility.

The second domain is quite different. It is concerned with the difference of the structure of the world as it is accessible for each of the compared items, where this difference is represented by a partial non-matching of the assignates that provide the space for measurement. Here we meet Tversky’s differential ratio that builds upon differences in the set of assignates (“features,” as he called it) and can be used also to express differential commonality.

Yet, the two domains are not separated from each other in an absolute manner. The logarithm is a rather simple analytic function with some unique properties. For instance, it is not defined for argument values [-∞..0].  The zero (“0”), however, in turn can be taken to serve as a double articulation that allows to express two very different things: (1) through linear normalization the lowest value of the range, and (2) the (still symbolic) absence of a property. Using the logarithm then, the value “0” gets transformed into a missing value, because the logarithm is not defined for arg=0, that is, we turn the symbolic into a quasi-physical absence. The extremely (!) valuable consequence of this is that by means of the logarithmic transformation we can change the feature vector on the fly in a context dependent manner, where “context” (i) can denote any relation between variables of values therein, and (ii) may be related to certain segments of observations. Even the (extensional) items within a (intensional) class or empirical category may be described by dynamically regulated sets of assignates (features). In other words, the logarithmic transformation provides a plain way towards abstraction. Classes as clusters are not just comprising items homogenized by the identical feature set. Hence, it is a very powerful means in predictive modeling.

Given the two domains in the practical aspects of similarity measures it is now becoming more clear that we indeed need to insist on a separation of assignates and the mapping similarity function, as we did in the chapter about comparison. We reproduce Figure 2b from that chapter:

Figure 2: Schematic representation of the comparison of two items. Items are compared along sets of “attributes,” which have to be assigned to the items, indicated by the symbols {a} and {b}.

The set of assignates symbolized as {a} or {b} for items A, B don’t comprise just the “observable” raw “properties.” Of course, all those properties are selected and assigned by the observer, which results in the fact that the observed items are literally established only through this measurement step. Additionally, the assigned attributes, or better “assignates,” comprise also all transformations of raw , primary assignates, building two extended sets of assignates. The similarity function then imposes a rule for calculating the scalar (a single value) that finally serves as a representation of the respective operationalization. This function may represent any kind of mapping between the extended set of assignates.

Such a mapping (function) could consists of a compound of weighted partial functions, according to the proposals of Tversky or Cheng and a particular profile mapping. Sets {a} and {b} need not be equal, of course. One could even apply the concept of formalized contexts instead of a set of equally weighted items. Nevertheless, there remains the apriori of the selection of the assignates, that precedes the calculation of the scalar. In practical modeling this selection will almost for sure lead to a removal of most of the measured “features.”

Above we said that any similarity measure must be considered as a free parameter in modeling, that is, as nothing else than a hypothesis. For the sake of abstraction and formalization this requires that we generalize the single similarity function into a family of functions, which we call “functional.” In category theoretic terms we could call it also a “functor.” The functor of all similarity functions then would be part of the functor representing the generalized model.

Formal Aspects

In the chapter about the category of models we argued that models can not be conceived in a set theoretic framework. Instead, we propose to describe models and the relations among them on the level of categories, namely the category of functors. In plain words, models are transformative relations, or in a term from category theory, arrows. Similarity is a dominant part of those transformative relations.

Before this background (or: within this framing), we could say that similarity is a property of the arrow, while a particular similarity function represents a particular transformation. By expressing similarity as a value we effectively map these properties of the arrow to an scalar, which could be a touchable value or which could be an abstract scalar. Even more condensed, we could say that in a general perspective:

Similarity can be conceived as a mapping of relations onto a scalar.

This scalar should not be misunderstood as the value of the categorical “arrow.” Arrows in category theory are not vectors in a coordinate system. The assessment of similarity thus can’t be taken just as kind of a simple arithmetic transformation. As we already said above from a different perspective, similarity is not a property of objects.

Since similarity makes sense only in the context of comparing, hence in the context of modeling, we also can recognize that the value of this scalar is dependent on the purpose and its operationalization, the target variable. Similarity is nothing which could be measured. For 100% it is the result of an intention.

Similarity and the Symbolic

It is more appropriate to understand it as the actualization of a potential. Since the formal result of this actualization is a scalar, i.e. a primitive with only a simple structure, this actualization prepares also the ground for the possibility of a new symbolization. The similarity scalar is able to take three quite different roles. First, it can act as a criterion to impose a differential semi-order under ceteris paribus conditions for modeling. Actual modeling may be strongly dominated by arbitrary, but nevertheless stable habits. Second, the similarity scalar also could be taken as an ultimate “abbreviation” of a complex activity. Third, and finally, the scalar may well appear as a quasi-material entity due to the fact that there is so little inner structure to it.

It is the “Similarity-Game” that serves as a ground for hatching symbols.

Imagine playing this game according to the Euclidean rules. Nobody could expect rich or interesting results, of course. The same holds if “similarity” is misunderstood as technical issue, which could be represented or determined as a closed formalism.2

It is clear, that these results are also quite important to understand the working of metaphors in practiced language. Actually, we think that there is no other mode of speaking in “natural,” i.e. practiced languages than the metaphorical mode. The understanding of similarity as a ground for hatching symbols directly leads to the conclusion that words and arrangements of words do not “represent” or “refer to” something. Even more concise we may say that neither things nor signs or symbols are able to carry references. Everything is created in the mind. Yet, and still refuting radical constructivism, we suggest that the tools for this creative work are all taken from the public.

Conclusions

As usual, we finally address  the question about the relevance of our achieved results for the topic of machine-based epistemology.

From a technical perspective, the most salient insight is probably the relativity of similarity. This relativity renders similarity into a strictly non-ontological concept (we anyway think that the idea of “pure” ontology is based on a great misunderstanding). Despite the fact that it is pretended thousands of times each day that “the” similarity has been calculated, such “calculation” is not possible. The reason for this is simply that (1) it is not just a calculation as for instance, the calculation of the Easter date, and (2) there is nothing like “the” similarity that could be calculated.

In any implementation that provides means for the comparison of items we have to care for an appropriate generality. Similarity should never be implemented as formula, but instead as an (templated, or abstract) object. Another (not only) technical aspect concerns the increasing importance of the “form-factor” when comparing profiles the more assignates are used to compare the items. This should be respected in any implementation of a similarity measure by increasing the weight of such “correlational” aspects.

From a philosophical perspective there are several interesting issues to mention. It should be clear that our notion of similarity is not following the realist account. Our notion of similarity is not directed towards the relation of “objects” in the “physical world” and “concepts” in the “mental world.” Please excuse the inflationary usage of quotation marks, yet it is not possible otherwise to repel realism in such sentences. Indeed, we think that the similarity can’t be applied to concepts at all. Trying to do so [e.g. 17] one would commit a double categorical mistake: First, concepts may arise exclusively as an embedment (not:entailment) of symbols, which in turn require similarity as an operational field. It is impossible to apply similarity to concepts without further interpretation. Second, concepts can’t be positively determined and they are best conceived as transcendental choreostemic poles. This categorically excludes the application of the concept of similarity to the concept of concepts. A naturalization by means of (artificial) neuronal structures [e.g. 18] is missing the point even more dramatically.3 “Concept” and “similarity” are mutually co-extensive, “similarly” to space and time.

As always, we think that there is the primacy of interpretation, hence it is useless to talk about a “physical world as it is as-such.” We do not deny that there is a physical outside, of course. Through extensive individual modeling that is not only shared by a large community, but also has to provide some anticipatory utility, we even may achieve insights, i.e. derive concepts, that one could call “similar” with regard to a world. But again, this would require a position outside of the world AND outside of the used concepts and practiced language. Such a position is not available. “Objects” do not “exist” prior to interpretation. “Objecthood” derives from (abstract) substance by adding a lot of particular, often “structural,” assignates within the process of modeling, most salient by imposing purpose and the respective instance of the concept of similarity.

There are two conclusions from that. First, similarity is a purely operational concept, it does not imply any kind of relation to an “existent” reference. Second, it would be wrong to limit similarity (and modeling, anticipation, comparison, etc.) to external entities like material “objects.” With the exception of pure logic, we always have to interpret. We interpret by using a word in thought, we interpret even if our thoughts are shapeless. Thinking is an open, processual system of cascaded modeling relations. Modeling starts with the material interaction between material aspects of “objects” or bodies, it takes place throughout the perception of external differences, the transduction and translation of internal signals, the establishment of intensions and concepts in our associative networks, up to the ideas of inference and propositional content.

Our investigations ended by describing similarity as a scalar. This dimensionless appearance should not be misunderstood in a representatio­nalist manner, that is, as indication that similarity does not have a structure. Our analysis revealed important structural aspects that relate to many areas in philosophy.

In other chapters we have seen that modeling and comparing are inevitable actions. Due to their transcendental character we even may say that they are inevitable events. As subjects, we can’t evade the necessity of modeling. We can do it in a diagnostic attitude, directed backward in time, or we can do it in a predictive or anticipatory attitude, directed forward in time. Both directions are connected through learning and bridged by Peirce’s sign situation, but any kind of starting point reduces to modeling. If spelled out in a sudden manner it may sound strange that modeling and comparing are deeply inscribed into the event-structure of the world. Yet, there are good reasons to think so.

Before this background, similarity denotes a hot spot for the actualization of intentions. As an element in modeling it is the operation to transport purposes into the world and its perception. Even more concentrated we may call similarity the carrier of purpose. For all other elements of modeling besides the purpose and similarity one can refer to “necessities,” such like material constraints, or limitations regarding time and energy. (Obviously, contingent selections are nothing one can speak about in other ways than just by naming them, they are singularities.)

Saying this it is is clear that the relative neglect of similarity against logic should be corrected. Similarity is the (abstract) hatching-ground for symbols, so to say, in Platonic terms, the sky for the ideas.

Notes

1. The geometrical approach is largely equal to what today is known as the feature vector approach, which is part of any dimensional mapping. Examples are multi-dimensional scaling, principal component analysis, or self-organizing maps.

2. Category theory provides a formalism that is not closed, since categories can be defined in terms of category theory. This self-referentiality is unique among formal approaches. Examples for closed formalisms are group theory, functional analysis or calculi like λ-calculus.

3. Besides that, Christoph Gauker provided further arguments that concepts could not conceived as regions of similarity spaces [19].

  • [1]  McGill, M.,Koll, M., and Noreault, T. (1979). An evaluation of factors affecting document ranking by information retrieval systems. Final report for grant NSF-IST-78-10454 to the National Science Foundation, Syracuse University.
  • [2] Wesley C. Salmon.
  • [3] van Rooij, Robert. 2011c. Vagueness and linguistics. In: G. Ronzitti (ed.), The vagueness handbook, Springer New York, 2011.
  • [4] Lakoff
  • [5] Haverkamp (ed.), Metaphorologie
  • [6] Duden. Das Herkunftswörterbuch. Die Etymologie der Deutschen Sprache. Mannheim 1963.
  • [7] Oxford Encyclopedia of Semiotics: Semiotic Terminology. available online, last accessed 29.12.2011.
  • [8] Amos Tversky (1977), Features of Similarity. Psychological Review, Vol.84, No.4. available online
  • [9] Goldstone. Comparison. Springer, New York  2010.
  • [10] Dekang Lin, An Information-Theoretic Definition of Similarity. In: Proceedings of the 15th International Conference on Machine Learning ICML, 1998, pp. 296-304. download
  • [11] Usama Fayyad, Gregory Piatetsky-Shapiro, and Padhraic Smyth (1996). From Data Mining to Knowledge Discovery in Databases. American Association for Artificial Intelligence, p.37-54.
  • [12] Thomas Reinartz, Focusing Solutions for Data Mining: Analytical Studies and Experimental Results in Real-World Domains (LNCS) Springer, Berlin 1999.
  • [13]  G. Nakaeizadeh (ed.). Data mining. Physica Weinheim, 1998.
  • [14] Alan Hájek (2007), The Reference Class Problem is Your Problem Too. Synthese 156: 185-215. available online.
  • [15] Edward K. Cheng (2009), A Practical Solution to the Reference Class Problem. COLUMBIA LAW REVIEW Vol.109:2081-2105. download
  • [16] Peirce, Stanford Encyclopedia.
  • [17] Tim Schroeder (2007), A Recipe for Concept Similarity. Mind & Language, Vol.22 No.1. pp. 68–91.
  • [18] Churchland
  • [19] Christoph Gauker (2007), A Critique of the Similarity Space Theory of Concepts. Mind & Language, Vol.22 No.4, pp.317–345.

۞

Vagueness: The Structure of Non-Existence.

December 29, 2011 § Leave a comment

For many centuries now, clarity has been the major goal of philosophy.

updated version featuring new references

It drove the first instantiation of logics by Aristotle, who devised it as a cure for mysticism, which was considered as a kind of primary chaos in human thinking. Clarity has been the intended goal in the second enlightenment as a cure for scholastic worries, and among many other places we find it in Wittgenstein’s first work, now directed to philosophy itself. In any of those instances, logics served as a main pillar to follow the goal of clarity.

Vagueness seems to act as an opponent to this intention, lurking behind the scenes in any comparison, which is why one may regard it as being as ubiquitous in cognition. There are whole philosophical and linguistic schools dealing with vagueness as their favorite subject. Heather Burnett (UCLA) recently provided a rather comprehensive overview [1] about the various approaches, including own proposals to solve some puzzles of vagueness in language, particularly related to relative and absolute adjectives and their context-dependency. In the domain of scientific linguistics, vagueness is characterized by three related properties: being fuzzy, being borderline, or being susceptible to the sorites (heap) paradox. A lot of rather different proposals for a solution have been suggested so far [1,2], most of them technically quite demanding; yet, none has been generally accepted as a convincing one.

The mere fact that there are many incommensurable theories, models and attitudes about vagueness we take as a clear indication for a still unrecognized framing problem. Actually, in the end we will see that the problem of vagueness in language does not “exist” at all. We will provide a sound solution that does not refer just to the methodological level. If we replace vagueness by the more appropriate term of indeterminacy we readily recognize that we can’t speak about vague and indeterminate things without implicitly talking about logics. In other words, the issue of (non-linguistic) vagueness triggers the question about the relation between logics and world. This topic we will investigate elsewhere.

Regarding vagueness, let us consider just two examples. The first one is about Peter Unger’s famous example regarding clouds [3]. Where does a cloud end? This question can’t be answered. Close inspection and accurate measurement does not help. It seems as if the vagueness is a property of the “phenomenon” that we call “cloud.” If we conceive it as a particular type of object, we may attest it a resemblance to what is called an “open set” in mathematical topology, or the integral on asymptotic functions. Bertrand Russell, however, would have called this the fallacy of verbalism [4, p.85].

Vagueness and precision alike are characteristics which can only belong to a representation, of which language is an example. […] Apart from representation, whether cognitive or mechanical, there can be no such thing as vagueness or precision;

For Russell, objects can’t be ascribed properties, e.g. vague. Vague is a property of the representation, not of the object. Thus, when Unger concludes that there are no ordinary things, he gets trapped even by several misunderstandings, as we will see. We could add that open sets, i.e. sets without definable border, are not vague at all.

As the second example we take an abundant habit in linguistics when addressing the problem of vagueness, e.g. supervaluationism. This system has the consequence that borderline cases of vague terms yield statements that are neither true, nor false. Despite there is a truth-value gap induced by that model, it nevertheless keeps the idea of truth values fully intact. All linguistic models about vagueness assume that it is appropriate to apply the idea of truth values, predicates and predicate logics to language.

As far as I can tell from all the sources I have inspected, any approach in linguistics about vagueness is taking place within two very strong assumptions. The first basic assumption is that (1) the concept of “predicates” can be applied to an analysis of language. From that basic assumption, three other more secondary derive. (1.1) Language is a means to transfer clear statements. (1.2) It is possible to use language in a way that no vagueness appears. (1.3) Words are items that can be used to build predicates.

Besides of this first assumption of “predicativity” of language, linguistics further assumes that words could be definite and non-ambiguous. Yet, that is not a basic assumption. The basic second assumption of that is that (2) the purpose of language is to transfer meaning unambiguously. Yet, all three aspects of that assumption are questionable, being a purpose, serving as a tool or even a medium to transfer meaning, and to do so unambiguously.

So we summarize: Linguistics employs two strong assumptions:

  • (1) The concept of apriori determinable “predicates” can be applied to an analysis of language.
  • (2) The purpose of language is to transfer meaning unambiguously.

Our position is that both assumptions are deeply inappropriate. The second one we already dealt with elsewhere, so we focus on the first one here. We will see that the “problematics of vagueness” is non-existent. We do not claim that there is no vagueness, but we refute that it is a problem. There are also no serious threats from linguistic paradoxes, because these paradoxes are simply a consequence from “silly” behavior.

We will provide several examples to that, but the structure of it is the following. The problematics consists of  a performative contradiction to the rules one has set up before.  One should not pretend to play a particular game by fixing the rules upon one’s own interests, only to violate those rules a moment later. Of course, one could create a play / game from this, too. Lewis Carroll wrote two books about the bizarre consequences of such a setting. Let us again listen to Russell’s arguments, now to his objection against the “paradoxicity” of “baldness,” which is usually subsumed to the sorites (heap) paradox.

It is supposed that at first he was not bald, that he lost his hairs one by one, and that in the end he was bald; therefore, it is argued, there must have been one hair the loss of which converted him into a bald man. This, of course, is absurd. Baldness is a vague conception; some men are certainly bald, some are certainly not bald, while between them there are men of whom it is not true to say they must either be bald or not bald. The law of excluded middle is true when precise symbols are employed, but it is not true when symbols are vague, as, in fact, all symbols are.

Now, describing the heap (Greek: sorites) or the hair of “balding” men by referring to countable parts of the whole, i.e. either sand particles or singularized hairs, contradicts the conception of baldness. Confronting both in a direct manner (removing hair by hair) mixes two different games. Mixing soccer and tennis is “silly,” especially after the participants have declared that they intend to play soccer, mixing vagueness and counting is silly, too, for the same reason.

This should make clear why the application of the concept of “predicates” to vague concepts, i.e. concepts that are apriori defined as to be vague, is simply absurd.  Remember, even a highly innovative philosopher as Russell, co-author of an extremely abstract work as the Principia Mathematica is, needed several years to accept Wittgenstein’s analysis that the usage of symbols in the Principia is self-contradictory, because actualized symbols are never free of semantics.

Words are Non-Analytic Entities

First I would recall an observation first, or at least popularly, expressed by Augustinus. His concern was the notion of time. I’ll give a sketch of it in my words. As long as he simply uses the word, he perfectly knows what time is. Yet, as soon as he starts to think about time, trying to get an analytic grip onto it, he increasingly looses touch and understanding, until he does not know anything about it at all.

This phenomenon is not limited to the analysis of a concept like time, which some conceive even as a transcendental “something.” The phenomenon of disappearance by close inspection is not unknown. We meet it in Carroll’s character of the Cheshire cat, and we meet it in Quantum physics. Let us call this phenomenon the CQ-phenomenon.

Ultimately, the CQ-phenomenon is a consequence of the self-referentiality of language and self-referentiality of the investigation of language. It is not possible to apply a scale to itself without getting into some serious troubles like fundamental paradoxicity. The language game of “scale” implies a separation of observer and observed that can’t be maintained in the cases of the cat, the quantum, or language. Of course, there are ways to avoid such difficulties, but only to high costs. For instance, a strong regulations or very strict conventions can be imposed to the investigation of such areas ad the application of self-referential scales, to which one may count linguistics, sociology, cognitive sciences, and of course quantum physics. Actually, positivism is nothing else than such a “strong convention”. Yet, even with such strong conventions being applied, the results of such investigations are surprising and arbitrary, far from being a consequence of rationalist research, because self-referential system are always immanently creative.

It is more than salient that linguists create models about vagueness that are subsumed to language. This position is deeply non-sensical and does not only purport ontological relevance for language, it implicitly also claims a certain “immediacy” for the linkage between language and empirical aspects of the world.

Our position is strongly different from that: models are entities that are “completely” outside of language. Of course, they are not separable from each other. We will deal elsewhere with this mutual dependency in more details and a more appropriate framing. Regardless how modeling and language are related, they definitely can not be related in the way linguistics implicitly assumes. It is impossible to use language to transfer meaning, because it is in principle not possible to transfer meaning at all. Of course, this opens the question what then is going to be “transferred.”

This brings us to the next objection against the presumed predicativity of language, namely its role in social intercourse, from which the CQ-phenomenon can’t be completely separated from.

Language: What can be Said

Many things and thoughts are not explicable. Many things also can be just demonstrated, but not expressed in any kind of language. Yet, despite these two severe constraints, we may use language not only to explicitly speak about such things, but also to create what only can be demonstrated.

Robert Brandom’s work [5] may be well regarded as a further leap forward in the understanding of language and its practitioning. He proposes the inferentialist position, to which our positioning of the model is completely compatible. According to Brandom, we always have to infer a lot of things from received words during a discourse. We even have to signal that we expect those things to be inferred. The only thing what we can try in a language-based interaction is to increasingly confine the degrees of freedom of possible models that are created in the interactees’ minds. Yet, achieving a certain state of resonance, or feeling that one understands each other, does NOT imply that the models are the identical. All what could be said is that the resonating models in the two interacting minds allow a certain successful prediction of the further course of the interaction. Here, we should be very clear about our understanding of the concept of model. You will find it in the chapters about the generalized model and the formal status of models (as a category).

Since Austin [6] it is well-known that language is not equal to the series of graphical of phonic signals. The reason for this simply being that language is a social activity, both structural as well as performative. An illocutionary act is part of any utterance and any piece of text in a natural language, sometimes even in the case of a formal language. Yet, it is impossible to speak about that dimension in language.

A text is even more than a “series” of Austinian or Searlean speech acts. The reason for this is a certain aspect of embodiment: Only entities stuffed with memory can use language. Now, receiving a series of words immediately establishes a more or less volatile immaterial network in the “mind” of the receiving entity as well as in the “sending” entity. This network owns properties for which it is absolutely impossible to speak about, despite the fact that these networks represent somehow the ultimate purpose, or “essence”, of natural language. We can’t speak about that, we can’t explicate it, and we simply commit a categorical mistake if we apply logics and tools from logics like predicates in the attempt to understand it.

Logics and Language

These phenomena clearly proof that logics and language are different things. They are deeply incommensurable, despite the fact that they can’t be separated completely from each other, much like modeling and language. The structure of the world shows up in the structure of logics, as Wittgenstein mentioned. There are good reasons to take Wittgenstein serious on that. According to the Tractatus, the coupling between world and logics can’t be a direct one [7].

In contrast to the world, logics is not productive. “Novelty” is not a logical entity. Pure logics is a transcendental system about usage of symbols, precisely because any usage already would require interpretation. Logical predicates are nothing that need to be interpreted. These games are simply different games.

In his talk to the Jowett Society, Oxford, in 1923, Bertrand Russell, exhibiting an attitude quite different to that in the Principia and following much the line drawn by Wittgenstein, writes [p.88]:

Words such as “or” and “not” might seem at first sight, to have a perfectly precise meaning: “p or q'” is true when p is true, true when q is true, and false when both are false. But the trouble is that this involves the notions of “true” and “false”; and it will be found, I think, that all the concepts of logic involve these notions, directly or indirectly. Now “‘true” and “false” can only have a precise meaning when the symbols employed—words, perceptions, images, or what not—are themselves precise. We have seen that, in practice, this is not the case. It follows that every proposition that can be framed in practice has a certain degree of vagueness; that is to say, there is not one definite fact necessary and sufficient for its truth, but a certain region of possible facts, any one of which would make it true. And this region is itself ill-defined: we cannot assign to it a definite boundary.

This is exactly what we meant before: “Precision” concerning logical propositions is not achievable as soon as we refer to symbols that we use. Only symbols that can’t be used are precise. There is only one sort of such symbols: transcendental symbols.

Mapping logics to language, as it happens so frequently and probably even as an acknowledged practice in linguistics in the treatment of vagueness, means to reduce language to logics. One changes the frame of reference, much like Zenon does in his self-generated pseudo-problems, much like Cantor1 [8] and his fellow Banach2 [9] did (in contrast to Dedekind3 [10]), or what Taylor4 did [11]. 3-dimensionality produces paradoxes in a 2-dimensional world, not only faulty projections. It is not really surprising that through the positivistic reduction of language to logics awkward paradoxes appear. Positivism implies violence, not only in the case linguistics.

We now can understand why it is almost silly to apply a truth-value-methodology to the analysis of language. The problem of vagueness is not a problem, it is deeply in the blueprint of “language” itself. It is almost trivial to make remarks as Russell did [3, p.87]:

The fact is that all words are attributable without doubt over a certain area, but become questionable within a penumbra, outside which they are again certainly not attributable.

And it really should be superfluous to cite this 90-year old piece. Quite remarkably it is not.

Language as a Practice

Wittgenstein emphasized repeatedly that language is a practice. Language is not a structure, so it is neither equivalent to logics nor to grammar, or even grammatology. In practices we need models for prediction or diagnosis, and we need rules, we frequently apply habits, which even may get symbolized.

Thus, we again may ask what is happening when we talk to each other. First, we exclude those models of which we now understand that they are not appropriate.

  • – Logics is incommensurable with language.
  • – Language, as well as any of its constituents, can’t be made “precise.”

As a consequence, language (and all of its constituents) is something that can’t be completely explicated. Large parts of language can only be demonstrated. Of course, we do not deny the proposal that a discourse reflects “propositional content,” as Brandom calls it ([5] chp. 8.6.2.). This propositional or conceptual content is given by the various kinds of models appearing in a discourse, models that are being built, inferred, refined, symbolized and finally externalized. As soon as we externalize a model, however, it is not language any more. We will investigate the dynamical route between concepts, logics and models in another chapter. Here and for the time being we may state that applying logics as a tool to language mistakes propositional content as propositional structure.

Again: What happens if I point to the white area up in the air before the blue background that we call sky, calling then “Oh, look a cloud!” ? Do I mean that there is an object called “cloud”? Even an object at all? No, definitely not. Claiming that there are “cloud-constituters,” that we do not measure exactly enough, that there is no proper thing we could call “cloud” (Unger), that our language has a defect etc., any purported “solution” of the problem [for an overview see 11] does not help to the slightest extent.

Anybody having made a mountain hike knows the fog in high altitudes. From lower regions, however, the same actual phenomenon is seen as a cloud. This provides us a hint, that the language game “cloud” also comprises information about the physical relational properties (position, speed, altitude) of the speaker.

What is going to happen by this utterance is that I invite my partner in discourse to interpret a particular, presumably shared sensory input and to interpret me and my interpretations as well. We may infer that the language game “cloud” contains a marker that is both linked to the structure and the semantics of the word, indicating that (1) there is an “object” without sharp borders, (2) no precise measurement should be performed. The symbolic value of “cloud” is such that there is no space for a different interpretation. Not the “object” is indicated by the word “cloud,” but a particular procedure, or class of procedures, that I as the primary speaker suggest when saying “Oh, there is a cloud.” By means of such procedures a particular style of modeling will be “induced” in my discourse partner, a particular way to actualize an operationalization, leading to such a representation of the signals from the external world that both partners are able to increase their mutual “understanding.” Yet, even “understanding” is not directed to the proposed object either. This scheme transparently describes the inner structure of what Charles S. Peirce called a “sign situation.” Neither accuracy, nor precision or vagueness are relevant dimensions in such kinds of mutually induced “activities,” which we may call a Peircean “sign.” They are completely secondary, a symptom of the use and of the openness.

Russell correctly proposes that all words in a language are vague. Yet, we would like to extend his proposal, by drawing on our image of thought that we explicate throughout all of our writings here. Elsewhere we already cited the Lagrangian trick in abstraction. Lagrange got aware about the power of a particular replacement operation: In a proposal or proposition, constants always can be replaced by appropriate procedures plus further constants. This increases generality and abstractness of the representation. Our proposal that is extending Russell’s insight is aligned to this scheme:

Different words are characterised (among other factors) by different procedures to select a particular class (mode) of interpretation.

Such procedures are precisely given as kind of models that are necessary besides those models implied in performing the interpretation of the actual phenomenon. The mode of interpretation comprises the selection of the scale employed in the operationalization, viz. measurement. Coarser scales imply a more profound underdetermination, a larger variety of possible and acceptable models, and a stronger feeling of vagueness.

Note that all these models are outside of language. To our opinion it does not make much sense to instantiate the model inside of language and then claiming a necessarily quite opaque “interpretation function,” as Burnett extensively demonstrates (if I understood her correctly). Our proposal is also more general (and more abstract) than Burnett’s, since we emphasize the procedural selection of interpretation models (note that models are not functions!). The necessary models for words like “taller,” “balder” or “cloudy” are not part of language and can’t be defined in terms of linguistic concepts. I would not call that a “cognitivist” stance, yet.  We conceive it just as a consequence of the transcendental status of models. This proposal is linked to two further issues. First, it implies the acceptance of the necessity of models as a condition. In turn, we have to clarify our attitude towards the philosophical concept of the condition. Second, it implies the necessity of an instantiation, the actualization of it as the move from the transcendental to the applicable, which in turn invokes further transcendental concepts, as we will argue and describe here.

Saying this we could add that models are not confined to “epistemological” affairs. As the relation between language (as a practice) and the “generalized” model shows, there is more in it than a kind of “generalized epistemology.” The generalization of epistemology can’t be conceived as a kind of epistemology at all, as we will argue in the chapter about the choreosteme. The particular relation between language and model as we have outlined it should also make clear that “models” are not limited to the categorization of observables in the outer world. It also applies—now in more classic terms—to the roots of what we can know without observation (e.g. Strawson, p.112 in [12]). It is not possible to act, to think, or to know without implying models, because it is not possible to act, to think or to know without transformation. This gives rise to model as a category and to the question of the ultimate conditionability of language, actions, or knowing. In our opinion, and in contrast to Strawson’s distinction, it is not appropriate to separate “knowledge from observations” and “knowledge without observation.” Insisting on such a separation immediately would also drop the insight about mutual dependency of models, concepts, symbols and signs, among many other things. In short, we would fall back directly into the mystic variant of idealism (cf. Frege’s hyper-platonism), implying also some “direct” link between language and idea. We rate such a disrespect of the body, matter and mediating associativity as inappropriate and of little value.

It would be quite interesting to conduct a comparative investigation of the conceptual life cycle of pictorial information in contrast to textual information along the line opened by such a “processual indicative.” Our guess is that the textual “word” may have a quite interesting visual counterpart. But we have to work on this later and elsewhere.

Our extension also leads to the conclusion that “vague” is not a logical “opposite” of “accurate,” or of “precise” either. Here we differ (not only) from Bertrand Russell’s position. So to speak, the vagueness of language applies here too. In our perspective, “accurate” simply symbolizes the indicative to choose a particular class of models that a speaker suggests to the partner in discourse to use. Nothing more, but also nothing less. Models can not be the “opposite” of other models. Words (or concepts) like “vague” or “accurate” just explicate the necessity of such a choice. Most of the words in a language refer only implicitly to that choice. Adjectives, whether absolute or relative, are bivalent with respect to the explicity or impliciteness of the choice of the procedure, just depending on the context.

For us it feels quite nice to discover a completely new property of words as they occur in natural languages. We call it “processual indicative.” A “word” without such a processual indicative on the structural level would not be a “word” any more. Either it reduces to a symbol, or even an index, or the context degenerates from a “natural” language (spoken and practiced in a community) into a formal language. The “processual indicative” of the language game “word” is a grammatical property (grammar here as philosophical grammar).

Nuisance, Flaws, and other Improprieties

Charles S. Peirce once mentioned, in a letter around 1908, that is well after his major works, and answering a question about the position or status of his own work, that he tends to label it as idealistic materialism. Notably, Peirce founded what is known today as American pragmatism. The idealistic note, as well as the reference to materialism, have to be taken extremely abstract in order to justify such. Of course, Peirce himself has been able for handling such abstract levels.

Usually, however, idealism and pragmatism are in a strong contradiction to each other. This is especially true when it comes to engineering, or more generally, to the problematics of the deviation, or the problematics posed by the deviation, if you prefer.

Obviously, linguistics is blind or even self-deceptive against their domain-specific “flaw,” the vagueness. Linguists are treating vagueness as a kind of flaw, or nuisance, at least as a kind of annoyance that needs to be overcome. As we already mentioned, there are many incommensurable proposals how to overcome it, except one: checking if it is a flaw at all, and which conditions or assumptions lead to the proposal that vagueness is indeed a flaw.

Taking only 1 step behind, it is quite obvious that logical positivism and its inheritance is the cause for the flaw. The problem “appeared” in the early 1960ies, when positivism was prevailing. Dropping the assumptions of positivism also removes the annoyance of vagueness.

Engineering a new device is a demanding task. Yet, there are two fundamentally different approaches. The first one, more idealistic in character, starts with an analytic representation, that is, a formula, or more likely, a system of formulas. Any influence that is not covered by that formula is either shifted into the premises, or into the so-called noise: influences, about nothing “could” be known, that drive the system under modeling into an unpredictable direction. Since this approach starts with a formula, that is, an analytic representation, we also can say that it starts under the assumption of representability, or identity. In fact, whenever you find designers, engineers or politicians experience to speak about “disturbances,” it is more than obvious that they follow the idealistic approach, which in turn follows a philosophy of identity.

The second approach is very different from the first one, since it does not start with identity. Instead, it starts with the acknowledgement of difference. Pragmatic engineering does not work instead of nuisances, it works precisely within and along nuisances. Thus, there is no such thing as a nuisance, a flaw, an annoyance, etc. There is just fluctuation. Instead of assuming the structural constant labeled as “ignorance,” as represented by the concept of noise, there is a procedure that is able to digest any fluctuation. A “disturbance” is nothing that can be observed as such. Quite in contrast, it is just and only a consequence of a particular selection of a purpose. Thus, pragmatic engineering leads to completely different structure that would be generated under idealistic assumptions. The difference between both remains largely invisible in all cases where the information part is neglectable (which actually is never the case), but it is vital to consider it in any context where formalization is dealing with information, whether it is linguistics or machine-learning.

The issue relates to “cognition” too, understood here as the naively and phenomenologically observable precipitation of epistemic conditions. From everyday experience, but also as a researcher in “cognitive sciences”, we know, i.e. we could agree on the proposal that cognition is something that is astonishing stable. The traditional structuralist view, as Smith & Jones call it [13], takes this stability as a starting point and as the target of the theory. The natural consequence is that this theory rests on the apriori assumption of a strict identifiability of observable items and of the result of cognitive acts, which are usually called concepts and knowledge. In other words, the idea that knowledge is about identifiable items is nothing else than a petitio principii: Since it serves as the underlying assumption it is no surprise that the result in the end exhibits the same quality. Yet, there is a (not so) little problem, as Smith & Jones correctly identified (p.184/185):

The structural approach pays less attention to variability (indeed, under a traditional approach, we design experiments to minimize variability) and not surprisingly, it does a poor job explaining the variability and context sensitivity of individual cognitive acts. This is a crucial flaw.  […]

Herein lies our discontent: If structures control what is constant about cognition, but if individual cognitive acts are smartly unique and adaptive to the context, structures cannot be the cause of the adaptiveness of individual cognitions. Why, then, are structures so theoretically important? If the intelligence-and the cause of real-time individual cognitive acts-is outside the constant structures, what is the value of postulating such structures?

The consequence the authors draw is to conceive cognition as process. They cite the work of Freeman [14] about the cognition of smelling

They found that different inhalants did not map to any single neuron or even group of neurons but rather to the spatial pattern of the amplitude of waves across the entire olfactory bulb.

The heir of being affected by naive phenomenology (phenomenology is always naive) and its main pillar of “identifiability of X as X” obviously leads to conclusions that are disastrous for the traditional theory. It vanishes.

Given these difficulties, positivists are trying to adapt. Yet, people still dream of semantic disambiguation as a mechanical technique, or likewise, dream (as Fregean worshipers) of eradicate vagueness from language by trying to explain it away.

One of the paradoxes dealt with over and over again is the already mentioned Sorites (Greek for “heap”) paradox. When is a heap a heap? Closely related to it are constructions like Wang’s Paradox [15]: If n is small, then n+1 is also small. Hence there is no number that s not small. How to deal with that?

Certainly, it does not help to invoke the famous “context dependency” as a potential cure. Jaegher and Rooij recently wrote [16]:

“If, as suggested by the Sorites paradox, ne-grainedness is important, then a vague language should not be used. Once vague language is used in an appropriate context, standard axioms of rational behaviour are no longer violated.”

Yet, what could appropriate mean? Actually, for an endeavor as Jaegher and Rooij have been starting the appropriateness needs to be determined by some means that could not be affected by vagueness. But how to do that for language items? They continue:

“The rationale for vagueness here is that vague predicates allow players to express their valuations, without necessarily uttering the context, so that the advantage of vague predicates is that they can be expressed across contexts.”

At first sight, this seems plausible. Now, any part of language can be used in any context, so all the language is vague. The unfavorable consequence for Jaegher & Rooij being that their attempt is not even a self-disorganizing argument, it has the unique power of being self-vanishing, their endeavor of expelling vagueness is doomed to fail before they even started. Their main failure is, however, that they take the apriori assumption for granted that vagueness and crispness are “real” entities that are somehow existing before any perception, such that language could be “infected” or affected with it. Note that this is not a statement about linguistics, it is one about philosophical grammar.

It also does not help to insist on “tolerance”. Rooij [17] recently mentioned that “vagueness is crucially related with tolerant interpretation”. Rooij desperately tries to hide his problem, the expression “tolerant interpretation” is almost completely empty. What should it mean to interpret something tolerantly as X? Not as X? Also a bit as Y? How then would we exchange ideas and how could it be that we agree exactly on something? The problem is just move around a corner, but not addressed in any reasonable manner. Yet, there is a second objection to “tolerant interpretation”.

Interpretation of vague terms by a single entity must always fail. What is needed are TWO interpretations that are played as negotiation in language games. Two entities, whether humans or machines, have to agree, i.e. they also have to be able to perform the act of agreeing,  in order to resolve vagueness of items in language. It is better to drop vagueness all together and simply to say that at least two entities are necessarily be “present” to play a language game. This “presence” is , of course, an abstract semiotic one. It is given in any Peircean sign situation. Since signs refer only and always just to other signs vagueness is, in other words, not a difficulty that need to be “tolerated”.

Dummett [15] spent more than 20 pages for the examination of the problem of vagueness. Up to date it is one of the most thorough ones, but unfortunately not received or recognized by contemporary linguistics. There is still a debate about it, but no further development of it. Dummett essentially proofs that vagueness is a not a defect of language, it is a “design feature”. First, he proposes a new logical operator “definitely” in order to deal with the particular quality of indeterminateness of language. Yet, it does not remove vagueness or its problematic, “that is, the boundaries between which acceptable sharpenings of a statement or a predicate range are themselves indefinite.” (p.311)

He concludes that “vague predicates are indispensable”, they are not eliminable in principle without loosing language itself. Tolerance does not help as much selecting “appropriate contexts” fails to do, both proposed to get rid of a problem. What linguists propose (at least those adhering to positivism, i.e. nowadays nearly all of them) is to “carry around a colour-chart, as Wittgenstein suggested in one
of his example” (Dummett). This would turn observational terms into legitimated ones by definition. Of course, the “problem” of vagueness would vanish, but along with it also any possibility to speak and to live. (Any apparent similarity to real persons, politicians or organizations such like the E.U. is indeed intended.)

Linguistics, and cognitive sciences as well, will fail to provide any valuable contribution as long as they apply the basic condition of the positivist attitude: that subjects could be separated from each other in order to understand the whole. The whole here is the Lebensform working underneath, or beyond (Foucault’s field of proposals, Deleuze’s sediments), connected cognitions. It is almost ridicule to try to explain anything regarding language within the assumption of identifiability and applicability of logics.

Smith and Jones close their valuable contribution with the following statement, abandoning the naive realism-idealism that has been exhibited so eloquently by Rooij and his co-workers nearly 20 years later:

On a second level, we questioned the theoretical framework-the founding assumptions-that underlie the attempt to define what “concepts really are.” We believe that the data on developing novel word interpretations-data showing the creative intelligence of dynamic cognition-seriously challenge the view of cognition as represented knowledge structures. These results suggest that perception always matters in a deep win: Perception always matters because cognition is always adaptive to the here-and-now, and perception is our only means of contact with the here-and now reality.

There are a number of interesting corollaries here, which we will not follow here. For instance, it would be a categorical mistake to talk about noise in complex systems. Another consequence is that engineering, linguistics or philosophy that is based on the apriori concept of identity is not able to make reasonable proposals about evolving and developing systems, quite in contrast to a philosophy that starts with difference (as a transcendental category, see Deleuze’s work, particularly [18]).

We now can understand that idealistic engineering is imposing its adjudgements ways too early. Consequently, idealistic engineering is committing the naturalistic fallacy in the same way as many linguistics is committing it, at least as far as the latter starts with the positivistic assumption of the possibility of positive assumptions such as identifiability. The conclusion for the engineering of machine-based episteme is quite obvious: we could not start with identified or even identifiable items, and where it seems that we meet them, as in the case of words, we have to take their identifiability as a delusion or illusion. We also could say that the only feasible input for a machine that is supposed to “learn” is made from vague items for which there is only a probabilistic description. Even more radical, we can see that without fundamentally embracing vagueness no learning is possible at all. That’s now the real reason for the failure of “strong” or “symbolic” AI.

Conclusions for Machine-based Epistemology

We started with a close inspection and a critique of the concept of vagueness and ended up in a contribution to the theory of language. Once again we see that language is not just about words, symbols and grammar. There is much more in it and about it that we must understand to bring language into contact with (brain) matter.

Our results clearly indicate, against the mainstream in linguistics and large parts of (mainly analytic) philosophy, that words can’t be conceived as parts of predicates, i.e. clear proposals, and language can’t be used as a vehicle for the latter. This again justifies an initial probabilistic representation of those grouped graphemes (phonemes) as they can be taken from a text, and which we call “words.” Of course, the transition from a probabilistic representation to the illusion of propositions is not a trivial one. Yet, it is not words that we can see in the text, it is just graphemes. We will investigate the role and nature of words at some later point in time (“Waves, Words, and Images”, forthcoming).

Secondly we discovered a novel property or constituent of words, which is a selection function (or a class thereof) which indicates the style of interpretation regarding the implied style of presumed measurement. We called it processual indicative. Such a selection results in the invoking of clear-cut relations or boundaries, or indeterminable ones. Implementing the understanding of language necessarily has to implement such a property for all of the words. In any of the approaches known so far, this function is non-existent, leading to serious paradoxes and disabilities.

A quite nice corollary of these results is that words never could be taken as a reference. It is perhaps more appropriate to conceive of words as symbols for procedural packages, recipes and prescription on how to arrange certain groups of models. Taken such, van Fraassen’s question on how words acquire reference is itself based on a drastic misunderstanding, deeply informed by positivism (remember that it was van Fraassen who invented this weird thing called supervaluationism). There is no such “reference.” Instead, we propose to conceive of words as units consisting from (visible) symbols and a “Lagrangean” differential part. This new conception of words remains completely compatible with Wittgenstein’s view on language as a communal practice; yet, it avoids some difficulties, Wittgenstein has struggled with throughout his life. The core of it may be found in PI §201, describing the paradox of rule following. For us, this paradox simply vanishes. Our model of words as symbolic carriers of “processual indicatives” also sheds light to what Charles S. Peirce called a “sign situation,” being not able to elucidate the structure of “signs” any further. Our inferentialist scheme lucidly describes the role of the symbolic as a quasi-material anchor, from which we can proceed via models as targets of the “processual indicative” to the meaning as a mutually ascribed resonance.

The introduction of the “processual indicative” also allows to understand the phenomenon that despite the vagueness of words and concepts it is possible to achieve very precise descriptions. The precision, however, is just a “feeling” as it is the case for “vagueness,” dependent on a particular discursive situation. Larger amounts of “social” rules that can be invoked to satisfy the “processual indicative” allow for more precise statements. If, however, these rules are indeterminate by themselves quite often more or less funny situation may occur (or disastrous misunderstandings as well).

The main conclusion, finally, is referring to the social aspect of a discourse. It is largely unknown how two “epistemic machines” will perceive, conceive of and act upon each other. Early experiments by Luc Steels involved mini robots that have been far too primitive to draw any valuable conclusion for our endeavor. And Stanislav Lem’s short story “Personetics”[19] does not contain any hint about implementational issues… Thus, we first have to implement it…

Notes

1. One of Cantor’s paradoxes claims that a 2-dimensional space can be mapped entirely onto a 1 dimensional space without projection errors, or overlaps. All of Cantor’s work is “absurd,” since it mixes two games that apriori have been separated: countability and non-countability. The dimensions paradox appears because Cantor conceives of real numbers as determinable, hence countable entities. However, by his own definition via the Cantor triangle, real numbers are supra-countable infinite. Real numbers are not determinable, hence they can’t be “re-ordered,” or put along a 1-dimensional line. Its a “silly” contradiction. We conclude that such paradoxes are pseudo-paradoxes.

2. The Banach-Tarski (BT) pseudo-paradox is of the same structure as the dimensional pseudo-paradox of Cantor. The surface of a sphere is broken apart into a finite number of “individual” pieces; yet , those pieces are not of determinate shape. Then BT proof that from the pieces of 1 sphere 2 spheres can be created. No surprise at all: the pieces are not of determinate shape, they are complicated: they are not usual solids but infinite scatterings of points. It is “silly” first to speak about pieces of a sphere, but then to dissolve those pieces  into Cantor dust. Countability and incountability collide. Thus there is no coherence, so they can be any. The BT paradox is even wrong: from such premises an infinite number of balls could be created from a single ball, not just a second one.

3. Dedekind derives natural numbers as actualizations from their abstract uncountable differentials, the real numbers.

4. Taylor’s paradox brings scales into conflict. A switch is toggled repeatedly after a decreasing period of time, such that the next period is just half of the size of the current one. After n toggling events (n>>), what is the state of the switch? Mathematically, it is not defined (1 AND 0), statistically it is 1/2. Again, countability, which implies a physical act, ultimately limited by the speed of light, is contrasted by infinitely small quantities, i.e. incountability. According to Gödel’s incompleteness, for any formal system it is possible to construct paradoxes by putting up “silly” games, which do not obey to the self-imposed apriori assumptions.

This article has been created on Dec 29th, 2011, and has been republished in a considerably revised form on March 23th, 2012.

References

  • [1] Heather Burnett, The Puzzle(s) of Absolute Adjectives – On Vagueness, Comparison, and the Origin of Scale Structure. Denis Paperno (ed). “UCLA Working Papers in Semantics,” 2011; version referred to is from 20.12.2011. available online.
  • [2] Brian Weatherson (2009), The Problem of the Many. Edward N. Zalta (ed.), The Stanford Encyclopedia of Philosophy. available online, last access 28.12.2011.
  • [3] Peter Unger (1980), The Problem of the Many.
  • [4] Bertrand Russell (1923): Vagueness, Australasian Journal of Psychology and Philosophy, 1(2), 84-92.
  • [5] Robert Brandom, Making it Explicit. 1994.
  • [6] John Austin. Speech act Theory.
  • [7] Colin Johnston (2009). Tractarian objects and logical categories. Synthese 167: 145-161.
  • [8] Cantor
  • [9] Banach
  • [10] Dedekind
  • [11] Taylor
  • [12] Peter Strawson, Individuals: An Essay in Descriptive Metaphysics. Methuen, London 1959.
  • [13] Linda B. Smith, Susan S. Jones (1993). Cognition Without Concepts. Cognitive Development, 8, 181-188. available here.
  • [14] Freeman, W.J. (1991). The physiology of perception. Scientific American. 264. 78-85.
  • [15] Michael Dummett, Wang’s Paradox (1975). Synthese 30 (1975) 301-324. available here.
  • [16] Kris De Jaegher, Robert van Rooij (2011). Strategic Vagueness, and appropriate contexts. Language, Games, and Evolution, Lecture Notes in Computer Science, 2011, Volume 6207/2011, 40-59, DOI: 10.1007/978-3-642-18006-4_3
  • [17] Robert van Rooij (2011). Vagueness, tolerance and non-transitive entailment in Understanding Vagueness – Logical, Philosophical and Linguistic Perspectives, Petr Cintula, Christian Fermuller, Lluis Godo, Petr Hajek (eds.), College Publications, 2011.
  • [18] Gilles Deleuze, Difference and Repetition.
  • [19] Stanislav Lem, Personetics. reprinted in: Douglas Hofstadter, The Minds I.

۞

Technical Aspects of Modeling

December 21, 2011 § Leave a comment

Modeling is not only inevitable in an empirical world,

it is also a primary practice. Being in the world as an empirical being thus means continuous modeling. Modeling is not an event-like activity, it is much more like collecting the energy in an photovoltaic device. This does not apply only to living systems, it also should apply to economic organizations.

Modeling thus comprises much more than selecting some data from a source and applying a method or algorithm to them. You may conceive that difference metaphorically as the difference between a machine and a plant for producing some goods. Here in this chapter we first will identify and briefly characterize the elements of continuous modeling; then we will show the overall arrangement of those elements as well as the structure of the modeling core process. We then will even step further down to the level of properties a modeling software for (continuous) modeling should comprise.

You will find much more details and a thorough discussion of the various design decisions for the respective software system in the attached document “The SPELA-Approach to Predictive Modeling.” This acronym stands for “Self-Configuring Profile-Based Evolutionary Learning Approach.” The document also describes how the result of modeling based on SPELA may be used properly for reasoning about the data.

Elements of Modeling

As we have shown in the chapter about the generalized model, a model needs a purpose. This targeted modeling and its internal organization is the subject of this chapter. Here we will not deal with the problem of modeling unstructured data such as texts or images. Understanding language tokens requires a considerable extension of the modeling framework, despite the fact, that modeling as outlined here remains an important part of understanding language tokens. Those extensions mainly concern an appropriate probabilization of what we experience as words or sentences. We will discuss this elsewhere, more technical here, fully contextualized here.

Goal-oriented modeling can be automated to a great extent, if an appropriate perspective to the concept of model is taken (see chapters about the generalized model, and model as categories).

Such automated modeling also can be run as a continuous process. Its main elements are the following:

  • (1) post-measurement phase: selecting and importing data;
  • (2) extended classification by a core process group:building an intensional representation;
  • (3) reflective post-processing (validation), meta-modeling, based on a (self-)monitoring repository;
  • (4) harvesting results and/or aligning measurement

.

Overall Organization

The elements of (continuous, automated) modeling needs to be arranged according to the following multi-layered, multi-loop organizational scheme:

Figure 1: Organizational elements for automated, continuous modeling; L<n> = loop levels; T=transformation of data, S=segmentation of data, R=detecting pairwise relation­ships between variables and identifying them as mathematical functions F(d)=f(x,y), F(d) being a non-linear function improving the discriminant power represented by the typology derived in S, PP=post-processing, e.g. creating dependency path diagrams, which are connected through 4 levels of loops, L1 thru L4, where L1=finding a semi-ordered list of optimized models, L2=introducing the relationships found by R into data transformation, L3=additional sampling of raw data based on post-processing of core process group (active sampling), e.g. for cross-validation purposes, and finally L4=adapting the objective of the modeling process based on the results presented to the user. Feedback-level L4 may be automated through selecting from pre-configured modeling policies, where a policy is a set of rules and sets of parameters controlling the feedback levels L1 thru L3 as well as the core modules. DB = some kind of data source, e.g. a database;

This scheme may be different to anything you have seen so far about modeling. Common software packages, whether commercial (SPSS, SAS, S-Plus, etc.) or open source (R, Weka, Orange) do not natively support this scheme. Some of them would allow for a similar scheme, but it is hard to accomplish it. For instance, the transformation part is not properly separated and embedded in the overall process, there is no possibility to screen for pairwise relationships, which then are automatically actualized as further transformation of the “data.” There is no meta-data and no abstraction inherent to the process. As a consequence, literally everything is left on the side of the user, rendering those softwares into gigantic formalisms. This comes, on the other hand, only with little surprise, given the current paradigm of deterministic computing.

The main reason, however, for the incapability of any of these softwares is the inappropriate theory behind them. Neither the paradigm of statistics nor that of “data mining” is applicable at all to the task of automated and continuous modeling.

Anyway, next we will describe the loops appearing in the scheme. The elements of the core process we will describe in detail later.

Here we should mention another process-oriented approach for predictive modeling, the CRISP-M scheme, which has been published as early as 1997 as a result of an initiative launched by NEC. CRISP-M stands for Cross-industry standard for predictive modeling. However, the CRISP-M is of a hopelessly solipsistic character and only of little value.

Before we start we should note that the scheme above reflects an ideal and rather simple situation. More often than not, a nested, if not fractal structure appears, especially regarding loop levels L1 and L2.

Loop Level 1: Resonance

Here we find the associative structure, e.g. a self-organizing map. An important requirement is that this mechanism is working bottom-up, and a consequence of this is that it is an approximate mechanism.

The purpose of this element is to perform a segmentation of the available data, given a particular data space as defined by the “features,” or more appropriate, the assignates (see the chapter about the generalized model for this).

It is important to understand, that it is impossible for the segmentation mechanism to change the structure of the available data space. Loop level L1 also provides what is called the transition from extensional to intensional description.

L1 performs also feature selection. Given a set of features FO, representing descriptional “dimensions” or aspects of the observations O, many of those features are not related to the intended target of the model. Hence, they introduce noise and have to be removed, which results, in other words, in a selection of the remaining.

In many applications, there are large numbers of variables, especially if L2 will be repeated, resulting in a vast number of possible selections. The number of possible combinations from the set of assignates easily exceeds 1020, and sometime even 10100. This is a larger quantity than the number of sub-atomar particles in the visible universe. The only way to find a reasonable proposal for a “good” selection is by means of an evolutionary mechanism. Formal, probabilistic approaches will fail.

The result of this step is a segmentation that can be represented as a table. The rows represent profiles of prototypes, while the columns show the selected features (for further details see below in the section about the core process)

Loop Level 2: Hypothetico-deductive Transformation

This step starts with a fixed segmentation based on a particular selection ℱ out of FO. The prototypes identified by L1 are the input data for a screening that employs analytic transformations of values within (mostly) pairwise selected variables, such like f(a,b) = a*b, or : f(a,b) = 1/(a+b). Given a fixed set of analytic functions, a complete screening is performed for all possible combinations. Typically, several millions of individual checks are performed.

It is very important to understand that not the original data are used as input, but instead the data on the level of the intensional description, i.e. a first-order abstraction of the data.

Once the most promising transformations have been identified, they are introduced automatically into the set of original transformations in the element T of figure 1.

Loop Level 3: Adaptive Sampling

see legend for figure 1

Loop Level 4: Re-Orientation

While the use aspects are of course already reflected by the target variable and the selected risk structure, there is a further important aspect concerning the “usage” of models. Up to level 3 the whole modeling process can be run in an autonomous manner. Yet, not so on level 4.

Level 4 and its associated loop has been included in the modeling scheme as a dedicated means for re-orientation. The results of a L3 modeling raid could lead to “insights” that change the preference structure of user. Upon this change in her/his preferences, the user could choose a different risk structure, or even a different target, perhaps also to create a further model with a complementary target.

These choices are obviously dependent on external influences such as organizational issues, or limitations / opportunities regarding the available resources.

Structure of the Modeling Core Process

1. Transformation

….of Data

2. Goal-oriented.. .Segmentation 3. Artificial Evolution 4. Dependencies
P = putative property (“assignate”)
F = arbitrary function
var = “raw” variable(s)
profiles
prototypes
concepts
combinatorial exploration
of associations between variables
complete calculation of relations as analytic functions

Figure 2: Organizational elements of the modeling core process. The bottom row is showing important keywords

.

Transformation of Data

This step performs a purely formal, arithmetic and hence analytic transformation of values in a data table. Examples are :

  • – the log-transformation of a single variable, shifting the mode of the distribution to the right, thus allowing for a better discrimination of small values; one can also use it to create missing-values in order to a adaptively filter certain values, and thus, observations;
  • – combinatorial synthesis of new variables from 2+ variables, which is resulting in a stretching, warping or folding of the parameter space;
  • – separating values from one variable into two new and mutually exclusive variables;
  • – binning, that is reducing the scale of the variable, say from numeric to ordinal;
  • – any statistical measure or procedure, changing the quality of an observation: resulting values are not reflecting observations, but instead represent a weight relative to the statistical measure.

A salient effect of the transformation of data is the increase of the number of variables. Also note, that any of those analytic transformations destroys a little bit of the total information, although it also leads to a better discriminability of certain sub-spaces of the parameter space. Most important, however, is to understand, that any analytic transformation is conceived as an hypothesis. Whether it is appropriate or not can be revealed ONLY by means of a targeted (goal-oriented) segmentation, which implies a cost-function that in turn comprises the operationalization of risk (see the chapter about generalized model).

Any of the resulting variables consist from assignates, i.e. the assigned attributes or features. Due to the transformation they comprise not just the “raw” or primary properties upon the first contact of the observer with the observed, but also all of the transformations applied to such raw properties (aka variables). This results in an extended set of assignates.

We now can also see that transformations of measured data are taking the same role as measurement devices. Initial differences in signals are received and selectively filtered according to the quasi-material properties of the device. The first step in figure 2 above such represents also what could be called generalized measurement.

Transforming data by whatsoever an algorithm or analytic method does NOT create a model. In other words, the model-aspect of statistical models is not in the statistical procedure, precisely because statistical models are not built upon associative mechanisms. The same is true for the widespread “physicalist” modeling e.g. in social sciences or urbanism. In these areas, measured data are often represented by a “formula,” i.e. a purely formal denotation, often in the form of a system of differential equations. Such systems are not by itself a model, because they are analytic rewritings of the data. The model-aspect of such formulas gets instantiated only through associating parts of the measured data with a target variable as an operationalization of the purpose. Without target variable, no purpose, without purpose no model, without model, no association, hence no prediction, no diagnostics, and not any kind of notion of risk. Formal approaches always need further side-conditions and premises before they can be applied. Yet, it is silly to come up with conditions for instantiations of “models” after the model has been built, since those conditions inevitably would lead to a different model. The modeling-aspect, again, is completely moved to the person(s) applying the model, hence such modeling is deeply subjective, implying serious and above all completely invisible risks regarding reproducibility and stability.

We conclude that the pretended modeling by formal methods has to be rated as bad practice.

Goal-oriented Segmentation

The segmentation of the data can be represented as a table. The rows represent profiles of prototypes, while the columns show the selected assignates (features); for further details see below in the section about the core process (will be added at a future date!).

In order to allow for a comparison of the profiles, the profiles have to be “homogenous” with respect to their normalized variance. The standard SOM tends to collect “waste” or noise in some clusters, i.e. deeply dissimilar observations are collected in a single group because their dissimilarity. Here we find one of the important modification of the standard SOM as it is widely used. The effect of this modification is of vital size. For other design issues around the Self-organizing Map see the discussion here.

Artificial Evolution

Necessary and even inevitable for screening the vast parameter space.

Dependencies

see about Loop Level 2 above.

Bad Habits

In the practice of modeling one can find bad habits regarding any of the elements, loops and steps outlined above. Beginning with the preparation of data there is the myth that missing values need to be “guessed” before an analysis could be done. What would be the justification for the selection of a particular method to “infer” a value that is missing due to incomplete measurement? What do people expect to find in such data? Of course, filling gaps in data before creating a model from it is deep nonsense.

Another myth, still in the early phases of the modeling process, is represented by the belief that analytical methods applied to measurement data “create” a model. They don’t. They just destroy information. As soon as we align the working of the modeling mechanism to some target variable, the whole endeavor is not analytic any more. Yet, without target variable we would not create a model, just re-written measurement values, that even don’t measure “anything”: measurement also needs a purpose. So it would be just silly first to pretend to do measurement and after that to drop that intention by removing the target variable. All of statistics works like that. Whatever statistics is doing, it is not modeling. If someone uses statistics, that person uses just a rewriting tool; the modeling itself remains deeply opaque, based on personal preferences, in short: unscientific.

People recognize more and more that clustering is indispensable for modeling. Yet, many people, particularly in biological sciences (all the -omics) believe that there is a meaningful distinction between unsupervised and supervised clustering, yet that both varieties produce models. That’s deeply wrong. One can not apply, say K-means clustering, or a SOM, without a target variable, that is a cost function, just for checking whether there is “something in the data.” Any clustering algorithm is applying some criteria to separate the observations. Why then should someone believe that precisely the more or less opaque, but surely purely formal, criteria of an arbitrary clustering algorithm should perfectly match to the data at hand? Of course, nobody should believe that. Instead of surrender oneself blindly to some arbitrary algorithmic properties one should think of those criteria as free parameters that have to be tested according to the purpose of the modeling activity.

Another widespread misbehavior concerns what is called “feature selection.” It is an abundant practice first to apply logistic regression to reduce the number of properties, then, in a completely separated second step to apply any kind of “pattern matching” approach. Of course, the logistic regression acts as a kind of filter. But: is this filter compatible to the second method, is it appropriate to the data and the purpose at hand? You will never find out, because you have applied to different methods. It is thus impossible to play the ceteris paribus game. It appears comprehensible to proceed according the split-method approach if you have just paper and pencil at your disposal. It is inexcusable to do so if there are computers available.

Quite to the contrast of the split-method approach one should use a single method that is able to perform feature selection AND data segmentation in the same process.

There are further problematic attitudes concerning the validation of models, especially concerning sampling and risk, which we won’t discuss here.

Conclusion

In this essay we are providing the first general and complete scheme for target oriented modeling. The main structural achievements comprise (1) the separation of analytic transformation, (2) associative sorting, (3) evolutionary optimization of the selection of assignates and (4) the constructive and combinatorial derivation of new assignates.

Note that any  (computational) procedure of modeling fits into this scheme, even this scheme itself. Ultimately, any modeling results in a supervised mapping. In the chapters about the abstract formalization of models as categories we argue that models are level-2-categories.

It precisely this separation that allows for an autonomous execution of modeling once the user has determined her target and the risk that appears as acceptable. It depends completely on the context—whether external, organizational or internal and more psychological—and on individual habits how these dimensions of purpose and safety are being configured and handled.

From the perspective of our general interest in machine-based epistemology we clearly can see that target oriented modeling for itself does not contribute too much to that capability. Modeling, even if creating new hypotheses, and even if we can reject the claim that modeling is an analytic activity, necessarily remains within the borders of the space determined by the purpose and the observations.

There is no particular difficulty to run even advanced modeling in an autonomous manner. Performing modeling is an almost material performance. Defining the target and selecting a risk attitude are of course not. Thus, in any predictive or diagnostic modeling the crucial point is to determine those. Particularly the risk attitude implies unrooted believes and thus the embedding into social processes. Frequently, humans even change the target in order to obey to certain limits concerning risk. Thus, in commercial projects the risk should be the only dimension one has to talk about when it comes to predictive / diagnostic modeling. Discussing about methods or tools is nothing but silly.

It is pretty clear that approaching the capability for theory-building needs more than modeling, although target oriented modeling is a necessary ingredient. We will see in further chapters how we can achieve that. The important step will be to drop the target from modeling. The result will be a pre-specific modeling, or associative storage, which serves as a substrate for any modeling that is serving a particular target.

This article was first published 21/12/2011, last revision is from 5/2/2012

۞

The SOM and the Symbolic

December 20, 2011 § Leave a comment

Symbols are at the roots of intelligence, language, or culture.

Once, if these roots will have been revealed and described, one could control any of those puzzling phenomena. For instance, we would understand how to create intelligence, how to do science in the best (if not correct) way, how to optimally develop and plan our society.

All of the claims and hopes listed above are victims of a particular misunderstanding: that symbols are entities that could be naturalized. The symbol is not a thing that you could write down, or something which you could carry around.

The Setting

In order to develop the relation between the symbolic and abstract associative structures (for which we take the self-organizing map as a salient example), we first have to identify our hypothesis about symbols, and second we should describe the problem as it is perceived in the fields of “computational intelligence,” or “artificial intelligence.”

Above we said that the symbol is not a thing that you could write down or carry around. This is true for the symbolon, σύμβολον, the name-giving mythological object, it is true for the x in mathematics, or the Ferrari driven by a young Italian male. It is true despite you can read the opposite in the dictionaries.

The etymological roots of the classic languages are much more precise. In Latin, the symbolum meant a sign, but also a mark, token, or, of course, symbol, while in ancient Greek the sumbolon meant “a sign by which one infers something.” It is not possible to carry around a symbol because it is not possible to carry around a sign, as we know since the great works of Charles S. Peirce [1].

Yet, what do we carry around and write down? Simply shapes, or more abstract, forms, I would say. Those forms serve as a kind of anchor that serves several purposes, and it can provide this services only through a significant and signifying double-articulation. The form itself just allows for a doubtless identification through modeling and concept matching. The particular form of the form is not important at all, yet it is not completely arbitrary. The reason is the involvement of modeling, which needs a particular context. Different symbols need different kinds of contexts to a different extent. A Ferrari needs more external context than a Kanji sign, perhaps.

The whole process from a particular identifiable form up to the process of matching concepts we have to call “symbol.” Yet, similar things could be said about signs. Symbols are situations, mediated events, of a particular quality. The difference between symbols and signs is just a difference in the degree of fixation in the chain of interpretation. That fixation is not based on particular absolute necessities, of course, it is dependent on a cultural dynamics, say contingencies, eventually making their way into deontic contexts.

Our hypothesis is that particularly symbols, and in a strong difference to the chain of signification by “signs,” provide a chain of references that ends up in the material world again.

This hypothesis distinguishes our position from many other writings (if not any) about symbols, including Cassirer’s. Only a very short note is indicated about Cassirer’s position(s) here. Albeit our position is sometimes quite similar to Cassirer’s, for example with respect to the role of externalization of mental processes, or the role of doing and acting [2, p.239], we reject the role he assigned to philosophy [3, p.110] as a “philosophy of man” and kind of a “tool” (he doesn’t use this term) to provide insight into human culture as an “organic whole.” We also reject the strong apriori of logics that he assumes.

„Unter ‚symbolischer Prägnanz‘ soll also die Art verstanden werden in der ein Wahrnehmungserlebnis, als ‚sinnliches‘ Erlebnis, zugleich einen bestimmten nicht-anschaulichen ‚Sinn‘ in sich faßt und ihn zur unmittelbaren konkreten Darstellung bringt. [4, p.235]

For us, that assumption of a direct “immediate” representation is deeply inappropriate, if not referring to an impossibility. It neglects the primacy of interpretation and the primacy of modeling. We always remember to the fact that it is this primacy that saves us from being deterministic machines. Despite his reference to perception, Cassirer remains blind against both, not quite surprising and fully consistent with his idealist attitude.

Cassirer misunderstands symbols as schemes, precisely because he is blind against modeling. His (Anti-Kantian) claim that any kind of reference to the world is dependent on symbolization thus just starts from the false end. He can’t conceive of different grades of symbolization, he can’t conceive the relation (and the difference) between concepts, notions, words and symbols, he can’t talk about the relation between immanence and the virtual (it seems that there is only immanence for him), and so on. His position is straightforwardly developed as an idealistic philosophy of signs. Due to his almost hyper-idealistic attitude as induced by the claim of “immediatedness”, Cassirer also can’t understand the relation between the mental and the communal, or shortly, the issue of rule-following in cultural settings. For all these reasons, Cassirer’s philosophy of symbols is not of any relevance for our considerations here, despite he is still serving as a kind of inevitable measure for many approaches in the philosophy of the symbolic.

Intention and Hypothesis

It is pretty clear, so to speak, by intuition, that the role of symbols is vital for any understanding of higher mental processes, whether they are related to a brain or to a machine. Cassirer’s proposal that animals do / can not have symbols at their disposal, is deeply misleading, as well as the link between consciousness and the symbolic, which he thought to be the essence of human spirit.

There are many myths and misconceptions around concerning the relationship between mental processes and symbols. It does not start with the Whorf-hypothesis [5] (syntax determines condition of experience) and does not end with the unfortunate model of “mentalese,” developed by Fodor [6] and Pinker [7], among others. Despite Fodor vigorously refutes (or tries to do so) the position of Whorf, the reason for their failure (Fodor’s as well as Whorf’s) is the same. Both obviously assume that symbols can exist and that they can even exist outside of usage.

Symbols belong to class of immaterial entities that we usually call pointer. As such, the concept of symbol is related to the concept of reference. Again, we meet van Fraassen [8] here, but of course, the question is much older and posed by many philosophers from Aristotle [9] to Kant [10]. Famous also the serious critique that Wittgenstein addressed to Russell’s way of introducing symbols in the Principia Mathematica as early as in the Tractatus [11].

Here, we should briefly stay at the usage of the concept of the notion of “symbols” in everyday contexts of the human Lebensform. We already met the Ferrari that is not only a symbol for luxury, but also for machism. All around our cities we find thousands of pictograms, which could be considered themselves as symbols, but which also are a symbol in their totality. Of course, we find symbols not only in the cities, but also in villages, and, as Cassirer correctly observed, everywhere in social intercourse and in human culture. This includes fields like logics and science, of course. Many people know about the meaning of the logical symbol ¬ standing for negation, or the mathematical symbol ℕ, standing for natural numbers. Mathematics is full of symbols, it is apparently even a science about systems of symbols.

From these observations we may derive that there is one single property that is shared among all sorts and instances of symbols (it is not the only one, of course): Symbols serve as an abbreviation. A symbol indicates that there is a strict reference between two entities, of which one of them, the subject of the symbol, is taken to be comparatively persistent, i.e. unchangeable. Without this persistence and stability subjects can not be symbolized. The persistence of symbols is much stronger than the persistence required by signs. Actually, it is so strong that one could misunderstand symbols as a grammatical entity. Just this was the critique by Wittgenstein of Russell’s Principia.

Yet, we are allowed to take this notion of persistent stability of the subject of symbols as an indicator for a quasi-materiality that indeed could instantiate as matter, but that also could be rendered into something like strong cultural structures. The traditional point-of-view, the way how things appear, may well be taken as a material substance.

It is this play, unfolding between quasi-materiality and reference, that after all allows for mathematics and its proofs. The same link, however, also easily gives rise for the misunderstanding that logics could by applied to reality.
The subjects of symbols need to be stable up to quasi-materiality. Symbols themselves are just representing that stability then, and thus even the reference represented by symbols takes quasi-material form. Nevertheless, symbols should not be conceived as “indices,” or in Peirce’s terms, as firstness. An index is a simple sign, incapable to continue the chain of reference, and so also not back to quasi-materiality. An index is not an abbreviation, it is an equivalence by definition. Sometimes, also a set of individual indices are called “an index,” e.g. in computer science. Such a set then could be again symbolized.

Above we said that the concept of “symbol” describes a particular process of referencing that is rooted two-fold in quasi-materiality. Here we have to be more precise; we  replace quasi-materiality by irreversibility. This allows us to subsume matter and actions, or purposes, respectively. We can see now that a symbol is like a sign with a purpose, only that normal signs never have a dedicated purpose. To express this even more sharpened (becoming thus a bit imprecise) we could say that  symbols imply reference, while signs imply interpretation and modeling. Using signs is more complicated, at least for higher-order signs (see Peirce’s classification) such like rhea-signs, than it is for symbols. Yet, the direct comparison is actually not quite feasible, they simply play different roles, and both are necessary in “thinking.”

If you already read the chapters about modeling (generalized model, model as category) you know about the salient role that we assign to the model. For us it is a practical and transcendental primary. From that perspective, symbols are not pre-cursors of signs, but formally degenerate semi-signs.

This result is quite important for, as well as fully consistent with the view we will develop in the chapter about generalized conditions. It also nicely fits to the observable role of mathematics and logics in our human Lebensform.

Our position here, particularly the aspect of linking symbols to action, is similar to that of Robert Cowles [12], who developed his perspective following and extending the work of Lev Vygotsky [13]. Yet, Cowles does not discern symbols and signs, which for us are clearly distinct. Else, he does not recognize the aspect of irreversibility in the concept of actions, and thus also not their quasi-material characteristics and inherent relationships to logics. Presumably mainly due to these missing distinctions, Cowles replaces the symbol-grounding-obstacle by the question “How do semiotic symbols come to play a role in thinking?” We again meet the old question cited frequently here about the reference of signs and its relationship to meaning. After all, we think that this question is almost as ill-posed as the symbol-grounding-problem, just from the opposite direction. Remarkably enough, though Cowles insists on a central role of interpretation, he does not further investigate the structure of it and henceforth modeling does not play any role in his proposals.

The Pseudo-Problem(s)

Everybody being active in the area of machine-learning and/or computational intelligence, or artificial intelligence knows about the so-called symbol grounding problem. In [11] we can read:

According to a widely held theory of cognition called “computationalism,” cognition (i.e., thinking) is just a form of computation. But computation in turn is just formal symbol manipulation: symbols are manipulated according to rules that are based on the symbols’ shapes, not their meanings. How are those symbols (e.g., the words in our heads) connected to the things they refer to? It cannot be through the mediation of an external interpreter’s head, because that would lead to an infinite regress, just as looking up the meanings of words in a (unilingual) dictionary of a language that one does not understand would lead to an infinite regress. [14]

“Formal manipulation” of symbols just means that symbols are replaced by other symbols according to an axiomatically founded system of possible equations between expressions. Nothing is added or removed through such a “manipulation.”

The problem was first (and more exactly) posed by Stevan Harnad [15], by raising the question:

How can the semantic interpretation of a formal symbol system be made intrinsic to the system, rather than just parasitic on the meanings in our heads? How can the meanings of the meaningless symbol tokens, manipulated solely on the basis of their (arbitrary) shapes, be grounded in anything but other meaningless symbols?

In his article, where he notably sets the first header as “From Behaviorism to Cognitivism,” Harnad also proposes a solution which comes in two parts.

[…] (1) “iconic representations” , which are analogs of the proximal sensory projections of distal objects and events, and (2) “categorical representations”, which are learned and innate feature-detectors that pick out the invariant features of object and event categories from their sensory projections. Elementary symbols are the names of these object and event categories, assigned on the basis of their (non-symbolic) categorical representations.

So far, we barely agree on his perspective, despite at a first look it seems appealing. Even the problem statement is weird. Why should anyone try a “semantic interpretation of a formal system”? We should not forget, that the distinction between syntax, semantics and pragmatics has been introduced by a convinced positivist (and behaviorist): Charles Morris. Only today, a few people start to correct this extremely harmful distinction (Peter Janich, Robert Brandom).

Harnad seemingly believes that his solution is also a sufficient one and thus completely overlooks the necessity of some communality, even if it is a restricted one. A single brain, or a single simulation of an artificial neural network can not symbolize anything. Assuming so instead , Harnad commits the same categorical mistake as it is unfortunately quite abundant in computer sciences and cognitive sciences and which ultimately leads to the private-language-fallacy.

The misunderstanding is even visible in the label itself. Symbols can’t be grounded at all. There is neither “ground” as a plane of stability nor a “location” as kind of an origin, from where symbols could emerge. Only the—mistakenly through naturalization reduced—versions, being more “somethings” than symbols anymore could be thought of to have a point of origin or an apriori fixed reference. As we already said, symbols require communality despite their quasi-materiality. Communality can not provide “origins,” except perhaps in radical bureaucrazies, or dictatorships. Note that both organizational forms are characterized by extensive bodies of written, i.e. positively determined rules.

Symbols and Associative (Quasi-)Matter

Symbols imply a compound consisting from quasi-material and immaterial aspects. In that they are indeed indispensable, though in a way quite different from what Cassirer proposed. They link us to irreversibility (deliberately avoiding here the notions of “world”, “reality”, or even “materiality”). Metaphorically, they provide us a “handle” attached to the world, allowing us to carry around that world (not the symbols!). Their character is “dual” in several directions.

By means of their materiality/immateriality duality they provide us the possibility to relate to the world, which relates them closely to associative structures. Without associative structures there are no (primary) models, hence ; on the other hand, without symbols there is no possibility to create compound models.

The relationship between models and symbols is rich, complicated and co-generative. In the chapter about associativity we saw that models and associativity overlap considerably. So, the pas de deux turns into a men…

First, we have seen that modeling requires the assignment of “properties” in order to instantiate measurement and initial—also often called “raw”—data. This assignment thus refers to symbols that are defined outside and apriori to modeling. Primary models then have to establish well-separated classes by means of an idealization process; these ideal classes are based on empirical classes, or, in other words, intensions. Next, these ideals have be named, which again needs symbols from the outside. Once the ideals have been named they can be used, and if this usage both will be repeated and provides stability due to sufficient anticipative power they eventually may be symbolized. This symbolization is both contingent and communal. Yet, they should not be misunderstood as carriers of common sense.

A sufficiently large and rich collection of symbols, or the capability for simulative instantiation of symbols, then allows for higher-order modeling. Higher-order modeling, however, rests on the steam of symbols, not on physical data. High-order modeling increasingly draws on relations between symbols and also returns increasingly idealized results, until finally there are only immanent relations left. Such constructs may be called concepts.

Conclusion: Filling the Gap

It is very important always to remember the interdependency of models, symbols and concepts. There is no possible actualization of a “pure” concept, or a “pure” model. For example, research in physics heading towards a “Grand unified theory” are bare nonsense just for structural reasons, and if actually carried out, utter and dangerous hubris. The impossibility of an actualization of “pure” concepts, models, or symbols lead us to the further result that those labels denote transcendental entities.

It may seem that the interdependency mentioned above can be “sidestepped” only in case of the primary models that are built solely through the associative power of networks on the level of the mere body, whether as software or as neuroware. Yet, it is not really a sidestepping; in its individuation it is just drawing itself on an ontogenetic apriori. Any individual is embedded into an evolutionary/ historical process that itself represents the dynamics between models, symbols and concepts, just on a larger time scale, or on stacks of those.

As a kind of conclusion we could state that our results may be regarded also as a reconstruction of the linkage between concepts and their references. We have seen that the growing of that linkage requires kind of an abstract compart-mentalization into models (as categories), intensions, names, semi-signs, and signs, including a particular dynamics between those parts. It now becomes quite lucid why mathematics is so important, for science, but also for the society at large. Without the semi-signs and its investigation there would not only be a chasm between primary models and concepts, concepts as we know it would even be absent. “Concepts” are freely floating radicals, giving rise to a manifold of metamorphoses (Homer, and Ovid, then playing with it). We even have a name for such conditions: the mythical age. As a corollary, we also could say that phantasy emerges where symbols become volatile or even retreat. In our investigation about vagueness we will find that words take a particular role in corroborating the fixation of symbols.

Last, but not least, this growing is a social (and sociogenetic) process. Actually, it should not be necessary to remember that the clarification of the relationship between model, symbol and concept is essential for any progress towards a machine-based epistemology.

This whole landscape between the body, its modeling, the symbols and the concepts we now can incorporate into the rather different perspective of generalized conditions. This arrangement describes the ultimate border for any onto-epistemic action, which we will investigate here.

  • [1] Charles S. Peirce
  • [2] Ernst Cassirer, Philosophie der symbolischen Formen. Band II, Darmstadt 1977
  • [3] Ernst Cassirer, Versuch über den Menschen. Meiner Verlag, Hamburg 2007, S. 110.
  • [4] Ernst Cassirer, Philosophie der symbolischen Formen. Band III, Darmstadt 1982
  • [5] John B. Carroll (ed.), Language, Thought, and Reality: Selected Writings of Benjamin Lee Whorf, MIT Press. Boston 1964.
  • [6] Jerry Fodor, The Modularity of Mind, 1985.
  • [7] Steven Pinker, The Language Instinct. 1995.
  • [8] Bas van Fraassen, „Putnam’s Paradox: Metaphysical Realism Revamped and Evaded.“ Philosophical Perspectives, vol.11. Blackwell, Boston 1997. S. 17-42.
  • [9] Aristotle
  • [10] Kant
  • [11] Ludwig Wittgenstein, Tractatus Logico-Philosophicus. [1918]
  • [12] Robert Cowles, Semiotic Symbols and the Missing Theory of Thinking, in: Tony Belpaeme, Stephen J. Cowley, Karl F. Macdorman (eds.), Symbol Grounding, Benjamins Publ., Amsterdam 2009. pp.107-126.
  • [13] Lev S. Vygotsky, Thought and Language. Alex Kozulin (ed.), Revised Edition, MIT Press, Boston 1986.
  • [14] Symbol Grounding, Wikipedia, available online
  • [15] Stevan Harnad (1990) The Symbol Grounding Problem. Physica D 42: 335-346. available online

۞

Associativity

December 19, 2011 § Leave a comment

Initially, the meaning of ‘associativity’ seems to be pretty clear.

According to common sense, it denotes the capacity or the power to associate entities, to establish a relation or a link between them. Yet, there is a different meaning from mathematics that almost appears as kind of a mocking of the common sense. Due to these very divergent meanings we first have to clarify our usage before discussing the concept.

A Strange Case…

In mathematics, associativity is defined as a neutrality of the results of a compound operation with respect to the “bundling,” or association, of the individual parts of the operation. The formal statement is:

A binary operation ∘ (relating two arguments) on a set S is called associative if it satisfies the associative law:

x∘(y∘z) = (x∘y)∘z for all x, y, z S

This, however, is just the opposite of “associative,” as it demands the independence from any particular association. If there would be any capacity to establish an association between any two elements of S, then there should not be any difference.

Maybe, some mathematician in the 19th century hated the associative power of so many natural structures. Subsequently, modernism contributed its own part to establish the corruption of the obvious etymological roots.

In mathematics the notion of associativity—let us call it I-associativity in order to indicate the inverted meaning—is an important part of fundamental structures like “classic” (Abelian) groups or categories.

Groups are important since they describe the basic symmetries within the “group” of operations that together form an algebra. Groups cover anything what could be done with sets. Note that the central property of sets is their enumerability. (Hence, a notion of “infinite” sets is nonsense; it simply contradicts itself.) Yet, there are examples of quite successful, say: abundantly used, structures that are not based on I-associativity, the most famous of them being the Lie-group. Lie-groups allow to conceive of continuous symmetry, hence it is much more general than the Abelian group that essentially emerged from the generalization of geometry. Even in the case of Lie-groups or other “non-associative” structures, however, the term refers to the meaning such as to inverting it.

With respect to categories we can say that so far, and quite unfortunately, there is not yet something like a category theory that would not rely on I-associativity, a fact that is quite telling in itself. Of course, category theory is also quite successful, yet…

Well, anyway, we would like to indicate that we are not dealing with I-associativity here in this chapter. In contrast, we are interested in the phenomenon of associativity as it is indicated by the etymological roots of the word: The power to establish relations.

A Blind Spot…

In some way the particular horror creationes so abundant in mathematics is comprehensible. If a system would start to establish relations it also would establish novelty by means of that relation (sth. that simply did not exist before). So far, it was not possible for mathematics to deal symbolically with the phenomenon of novelty.

Nevertheless it is astonishing that a Google raid on the term “associativity” reveals only slightly more than 500 links (Dec. 2011), from which the vast majority consists simply from the spoofed entry in Wikipedia that considers the mathematical notion of I-associativity. Some other links are related to computer sciences, which basically refer to the same issue, just sailing under a different flag. Remarkably, only one (1) single link from an open source robotics project [1] mentions associativity as we will do here.

Not very surprising one can find an intense linkage between “associative” and “memory,” though not in the absolute number of sources (also around ~600), but in the number of citations. According to Google scholar, Kohonen and his Self-Organizing Map [2] is being cited 9000+ times, followed by Anderson’s account on human memory [3], accumulating 2700 citations.

Of course, there are many entries in the web referring to the word “associative,” which, however, is an adjective. Our impression is that the capability to associate has not made its way into a more formal consideration, or even to regard it as a capability that deserves a dedicated investigation. This deficit may well be considered as a continuation of a much older story of a closely related neglect, namely that of the relation, as Mertz pointed out [4, ch.6], since associativity is just the dynamic counterpart of the relation.

Formal and (Quasi-)Material Aspects

In a first attempt, we could conceive of associativity as the capability to impose new relations between some entities. For Hume (in his “Treatise”, see Deleuze’s book about him), association was close to what Kant later dubbed “faculty”: The power to do sth, and in this case to relate ideas. However, such wording is inappropriate as we have seen (or: will see) in the chapters about modeling and categories and models. Speaking about relations and entities implies set theory, yet, models and modeling can’t be covered by set theory, or only very exceptionally so. Since category theory seems to match the requirements and the structure of models much better, we also adapt its structure and its wording.

Associativity then may be taken as the capability to impose arrows between objects A, B, C such that at least A ⊆ B ⊆ C, but usually A ⋐ B ⋐ C, and furthermore A ≃ C, where “≃” means “taken to be identical despite non-identity”. In set theoretic terms we would have used the notion of the equivalence class. Such arrows may be identified with the generalized model, as we are arguing in the chapter about the category of models. The symbolized notion of the generalized abstract model looks like this (for details jump over to the page about modeling):

eq.1

where U=usage; O=potential observations; F=featuring assignates on O; M=similarity mapping; Q=quasi-logic; P=procedural aspects of implementation.

Those arrows representing the (instances of a generalized) model are functors that are mediating between categories. We also may say that the model imposes potentially a manifold of partially ordered sets (posets) onto the initial collection of objects.

Now we can start to address our target, the structural aspects of associativity, more directly. We are interested in the necessary and sufficient conditions for establishing an instance of an object that is able (or develops the capability) to associate objects in the aforementioned sense. In other words, we need an abstract model for it. Yet, here we are not interested in the basic, that is transcendental conditions for the capability to build up associative power.

Let us start more practically, but still immaterial. The best candidates we can think of are Self-Organizing Maps (SOM) and particularly parameterized Reaction-Diffusion Systems (RDS); both of them can be subsumed into the class of associative probabilistic networks, which we describe in another chapter in more technical detail. Of course, not all networks exhibit the emergent property of associativity. We may roughly distinguish between associative networks and logistic networks [5]. Both, SOM as well as RDS, are also able to create manifolds of partial orderings. Another example from this family is the Boltzmann engine, which, however, has some important theoretical and practical drawbacks, even in its generalized form.

Next, we depict the elementary processes of SOM and RDS, respectively. SOM and RDS can be seen as instances located at the distant endpoints of a particular scale, which expresses the topology of the network. The topology expresses the arrangement of quasi-material entities that serve as persistent structure, i.e. as a kind of memory. In the SOM, these entities are called nodes and they are positioned in a more or less fixed grid (albeit there is a variant of the SOM, the SOM gas, where the grid is more fluid). The nodes do not range around. In contrast to the SOM, the entities of an RDS are freely floating around. Yet, RDS are simulated much like the SOM, assuming cells in a grid and stuffing them with a certain memory.

Inspecting those elementary processes, we of course again find transformations. More important, however, is another structural property to both of them. Both networks are characterized by a dynamically changing field of (attractive) forces. Just the locality of those forces is different between SOM and RDS, leading to a greater degree of parallelity in RDS and to multiple areas of the same quality. In SOMs, each node is unique.

The forces in both types of networks are, however, exhibiting the property of locality, i.e. there is one or more center, where the force is strong, and a neighborhood that is established through a stochastic decay of the strength of this force. Usually, in SOM as well as in RDS, the decay is assumed to be radially symmetric, but this is not a necessary condition.

After all, are we now allowed to ask ‘Where does this associativity come from?’ The answer is clearly ‘no.’ Associativity is a holistic property of the arrangement as a total. It is the result of the copresence of some properties like

  • – stochastic neighborhoods that are hosting an anisotropic and monotone field of forces;
  • – a certain, small memory capacity of the nodes; note that the nodes are not “points”: in order to have a memory they need some corporeality. In turn this opens the way to think of a separation of of the function of that memory and a variable host that provides a container for that memory.
  • – strong flows, i.e. a large number of elementary operations acting on that memory, producing excitatory waves (long-range correlations) of finite velocity;

The result of the interaction of those properties can not be described on the level of the elements of the network itself, or any of its parts. What we will observe is a complex dynamics of patterns due to the superposition of antagonist forces, that are modeled either explicitly in the case of RDS, or more implicitly in the case of SOM. Thus both networks are also presenting the property of self-organization, though this aspect is much more dominantly expressed in RDS as compared to the SOM. The important issue is that the whole network, and even more important, the network and its local persistence (“memory”) “causes” the higher-level phenomenon.

We also could say that it is the quasi-material body that is responsible for the associativity of the arrangement.

The Power of a Capability

So, what is this associativity thing about? As we have said above, associativity imposes a potential manifold of partial orderings upon an arbitrary open set.

Take a mixed herd of Gnus and Zebras as the open set without any particular ordering, put some predators like hyenas or lions into this herd, and you will get multiple partially ordered sub-populations. In this case, the associativity emerges through particular rules of defense, attack and differential movement. The result of the process is a particular probabilistic order, clearly an immaterial aspect of the herd, despite the fact that we are dealing with fleshy animals.

The interesting thing in both the SOM and the RDS is that a quasi-body provides a capability that transforms an immaterial arrangement. The resulting immaterial arrangement is nothing else than information. In other words, something specific, namely a persistent contrast, has been established from some larger unspecific, i.e. noise. Taking the perspective of the results,  i.e. with respect to the resulting information, we always can see that the association creates new information. The body, i.e. the materially encoded filters and rules, has a greater weight in RDS, while in case of the SOM the stabilization aspect is more dominant. In any case, the associative quasi-body introduces breaks of symmetry, establishes them and stabilizes them. If this symmetry breaking is aligned to some influences, feedback or reinforcement acting from the surrounds onto the quasi-body, we may well call the whole process (a simple form of) “learning.”

Yet, this change in the informational setup of the whole “system” is mirrored by a material change in the underlying quasi-body. Associative quasi-bodies are therefore representatives for the transition from the material to the immaterial, or in more popular terms, for the body-mind-dualism. As we have seen, there is no conflict between those categories, as the quasi-body showing associativity provides a double-articulating substrate for differences. Else, we can see that these differences are transformed from a horizontal difference (such as 7-5=2) into vertical, categorical differences (such like the differential). If we would like to compare those vertical differences we need … category theory! …or a philosophy of the differential!

Applications

Early in the 20th century, the concept of association has been adopted by behaviorism. Simply recall the dog of Pavlov and the experiments of Skinner and Watson. The key term in behaviorism as a belated echo of 17th century hyper-mechanistics (support of a strictly mechanic world view) is conditioning, which appears in various forms. Yet, conditioning always remains a 2-valued relation, practically achieved as an imprinting, a collision between two inanimate entities, despite the wording of behaviorists who equate their conditioning with “learning by association.” What should learning be otherwise? Nevertheless, behaviorist theory commits the mistake to think that this “learning” should be a passive act. As you can see here, psychologists still strongly believe in this weird concept. They write: “Note that it does not depend on us doing anything.” Utter nonsense, nothing else.

In contrast to imprinting, imposing a functor onto an open set of indeterminate objects is not only an exhausting activity, it is also a multi-valued “relation,” or simply, a category. If we would analyze the process of imprinting, we would find that even “imprinting” can’t be covered by a 2-valued relation.

Nevertheless, other people took the media as the message. For instance, Steven Pinker criticized the view that association is sufficient to explain the capability of language. Doing so, he commits the same mistake as the behaviorists, just from the opposite direction. How else should we acquire language, if not by some kind of learning, even if it is a particular type of learning? The blind spot of Pinker seems to be randomization, i.e. he is not able leave the actual representation of a “signal” behind.

Another field of application for the concept of associativity is urban planning or urbanism, albeit associativity is rarely recognized as a conceptual or even as a design tool. [cf. 6]  It is obvious that urban environments can be conceived as a multitude of high-dimensional probabilistic networks [7].

Machines, Machines, Machines, ….Machines?

Associativity is a property of a persistent (quasi-material) arrangement to act onto a volatile stream (e.g. information, entropy) in such a way as to establish a particular immaterial arrangement (the pattern, or association), which in turn is reflected by material properties of the persistent layer. Equivalently we may say that the process leading to an association is encoded into the material arrangement itself. The establishment of the first pattern is the work of the (quasi-)body. Only for this reason it is possible to build associative formal structures like the SOM or the RDS.

Yet, the notion of “machine” would be misplaced. We observe strict determinism only on the level of the elementary micro-processes. Any of the vast number of individual micro-events are indeed uniquely parameterized, sharing only the same principle or structure. In such cases we can not speak of a single machine any more, since a mechanic machine has a singular and identifiable state at any point in time. The concept of “state” does neither hold for RDS nor for SOM. What we see here is much more like a vast population of similar machines, where any of those is not even stable across time. Instead, we need to adopt the concept of mechanism, as it is in use in chemistry, physiology, or biology at large. Since both principles, SOM and RDS, show the phenomenon of self-organization, we even can not say that they represent a probabilistic machine. The notion of the “machine” can’t be applied to SOM or RDS, despite the fact that we can write down the principles for the micro-level in simple and analytic formulas. Yet, we can’t assume any kind of a mechanics for the interaction of those micro-machines.

It is now exciting to see that a probabilistic, self-organizing process used to create a model by means of associating principles looses the property of being a machine, even as it is running on a completely deterministic machine, the simulation of a Universal Turing Machine.

Associativity is a principle that transcends the machine, and even the machinic (Guattari). Assortative arrangements establish persistent differences, hence we can say that they create proto-symbols. Without associativity there is no information. Of course, the inverse is also true: Wherever we find information or an assortment, we also must expect associativity.

۞

  • [1]  iCub
  • [2] Kohonen, Teuvo, Self-Organization and Associative Memory. Springer Series in Information Sciences, vol.8, Springer, New York 1988.
  • [3] Anderson J.R., Bower G.H., Human Associative Memory. Erlbaum, Hillsdale (NJ) 1980.
  • [4] Mertz, D. W., Moderate Realism and its Logic, New Haven: Yale 1996.
  • [5] Wassermann, K. (2010), Associativity and Other Wurban Things – The Web and the Urban as merging Cultural Qualities. 1st international workshop on the urban internet of things, in conjunction with: internet of things conference 2010 in Tokyo, Japan, Nov 29 – Dec 1, 2010. (pdf)
  • [6] Dean, P., Rethinking representation. the Berlage Institute report No.11, episode Publ. 2007.
  • [7] Wassermann, K. (2010). SOMcity: Networks, Probability, the City, and its Context. eCAADe 2010, Zürich. September 15-18, 2010. (pdf)

The Category of Models

December 18, 2011 § Leave a comment

It is models that link us to the world.

We can not do (anything) without them. The model is one of the four transcendental (as well as “basic”) conditions of anything, as we are arguing in another chapter. We propose the concept of model as a proper operationalization of interpretation. At the same time, modeling comprises a practical dimension that exceeds that of the other conditions. Models, and modeling may be conceived as an eminently generative double-articulation, much in the sense as Deleuze and Guattari introduced the term in their book Mille Plateaux [1]. So, there is a good reason to say that models articulate us to the world.

Putting models into such a prominent place may raise certain objections. One could say, o.k., models are important, but to lift them out in this way results in the claim that epistemology is the main philosophical discipline. I think, such a characterization does not reflect the status of the generalized model in an appropriate manner. We will discuss this in much more detail in the chapter about the choreostemic constitution. For now, we just want to propose not to neglect that the operation of comparison may be considered a very fundamental operation. Actually, there are good reasons to think that it is the operation of comparison, including the implied concept of similarity, that saves us from being propositional logic, hence deterministic machines. Regardless what is happening in our brains and minds, comparison and hence modeling is taking place everywhere and all the time, even if it feels as if we would apply propositional logic.

Generally spoken, concrete models may be conceived as a particular class of transformations. A generalization of the notion of model could probably be taken also as a generalization of the notion of transformation. This should be checked, of course.

Taking an abstract perspective, we could ask about the (formal) properties of a particular class of transformations or its generalization and the difference to other classes. The purpose of such an endeavor of formalization serves, as it is the case of the formalization of model itself, just a single one: to get clear (1) about what could be said in principle about the object of the formalization, in this case: models, and what not, and (2) to describe the appropriate way of how to speak about models, that is, which structures could not be applied without falling back into inconsistencies. Insofar there are no singularized items also for the exemplars of the class we call “models,” we also may ask about the possibility to make any proposals about the relationships between models, that is, about the possibilities to combine them.

Looking for a formal theory where the notion of transformation is central in some way or another, we find topology and category theory. Since the latter is more general than the first, we may suggest to take a brief view into the mathematical category theory. Category theory is also said to be the abstract theory of functions, that is, mappings between sets. Models, on the other hand, also perform a mapping between two sets.

Category Theory

Before stepping deeper into the subject I have to admit that I am not a mathe-matician. Unfortunately, so, regarding the subject of this chapter. Thus, all we can do here is to justify some suggestions, mainly about the level where we can start to talk correctly about models. Of course, we feel that regarding the relationship between models and category theory there is much more about it than we can accomplish here.

Equally obvious, we can’t provide any thing that could be rated as an introduction, even the entries on Wikipedia are more complete. Our attempts here are based on the book by Steve Awodey [2].

Category theory has been proposed first in the beginning of the 1940ies. Today, it is an eminently important tool in mathematics, since it provides means to formalize (viz., to speak mathematically about ) the relation between mathematical objects.

A mathematical category is defined by the following two elements [2, p.4]:

  • Objects: A, B, C … , i.e. at least three basic objects;
  • Arrows: f, g, … relating the objects A, B, C in a directed manner;

The following conditions are needed to establish a category given A, B, C and f, and g .

  • – for each arrow f, there are implied objects that are called “domain” and “codomain” of f, dom(f) and cod(f) , such that we can write f : A → B in order to indicate that A = dom(f) and B = cod(f); the domain of an arrow is the object that serves as an argument to f (“input”), while the result of the transformation of f is called its codomain.
  • – given the arrows f : A → B and g : B → C, that is cod(f) = dom(g), there is an arrow g∘f : A → C, called the composite of f and g ;
  • – for each object A, there is given an arrow  IA : A → A , called the identity of A;

From these axiomatic conditions the following laws derive:

  • associativity
    h∘(g∘f) = (h∘g)∘f
    for all f : A → B , g : B → C ,  h : C → D
  • unit
    f∘IA = f = IB∘g
    for all  f : A → B

Anything satisfying these definitions is considered a category. The objects need not be sets and the arrows need not be functions. The arrows are also called “morphism” while the composition operator ∘ is a primitive.

“Anything” here means for example mathematical groups, graphs, number spaces, differentiable manifolds and, last but not least, categories itself. In a somewhat limited sense, one could say that a category is a generalized (mathematical) group. Arrows (morphisms) are not only mappings, but also comprise things like, as in the case of the category Rel, the set of all relations between any two objects. Actually, category theory builds a framework for propositions about morphisms, i.e. transformations between structures; it is a “theory about arrows,” as Awodey phrased it.

The relation between A, B and C as established by f, g, establishes a category. Remarkably, a category is not just a relation between two objects, which is important for any possibility to build “daisy chains” of overlapping categories; a basic category can be visualized in a diagram like the following:

Note that the category theoretic notion of “diagram” is a very special one, to which we return later.
If we take A, B and C as a classical set, then the arrows simply turn into (ordinary) functions.

There are also deep relations to (quasi-) logics. Category theory hence may be considered as a theory of abstract mappings, and, as far as this gets symbolized itself, an algebra of abstract mappings, where abstract mappings include any kind of transformation.

Different categories can be related to each other. If such a relation exists as a structure preserving mapping, that relation is called a functor. In [2, p.155] we find: “For fixed categories 𝒞 and 𝒟 (not just “simple” objects as above), we can regard the functors 𝒞𝒟 as the objects of a new category, and the arrows between these objects (actually: functors) are what we are going to call natural transformation. They are to be thought of as different ways of “relating” functors to each other, …”.  Likewise, natural equivalency expresses the isomorphy of the category of functors.

In contrast to the theory of (analytic) functions, category theory allows to express morphisms (“transitions”, “transformations”) between structures of different “kinds.” Interestingly, such cross-structural morphisms are well defined in category theory; if now the describe the relation between, say from a group to a set of sets, where this set is no equivalent to all sets, we have a “forgetful” functor.

The concept of functor developed into one of the most important concepts in (structural) mathematics. In [3] we find: “The Yoneda lemma is an abstract result on functors of the type morphisms into a fixed object. […] It allows the embedding of any category into a category of functors […] defined on that category. It also clarifies how the embedded category, of representable functors and their natural transformations, relates to the other objects in the larger functor category. It is an important tool that underlies several modern developments in algebraic geometry and representation theory. It is named after Nobuo Yoneda.

Such, the Yoneda lemma allows for a reduction of complexity: instead of studying the particular category C, one could instead study the respective functors of sets (as a category). In the opposite direction, it allows to transfer (generalize) concepts from the category of sets to arbitrary categories, even if they can not be conceived as “sets”, i.e. enumerable items.

This is very close to a self-referential situation, which is called a 2-category in category theory. A 2-category is a category with “morphisms between morphisms,” that is, in layman terms, a transformation between transformations. Even higher abstractions are possible: Higher category theory is the part of category theory at a higher order, which—quite remarkably—means that some equalities are replaced by explicit arrows in order to be able to explicitly study the structure behind those equalities [4].

It is indeed a remarkable step, since this ultimately allows to start with an “empty structure,” which in some way is equivalent to the transcendental difference as Deleuze has conceived it in “Difference and Repetition.” It also allows to get rid of the apriori of logics, astonishingly by means of mathematics itself.

Models and Categories

Models are generative transformations. In some way, they start either in the undefined, or in another model, i.e. in the same category. Such, they resemble to the (compressed) form of the Yoneda-lemma: X ↦ Hom(—,X). Models thus must not be conceptualized as “initial” or “primitive” objects. As such they would be items of sets, which are reduced, “primitive” categories. Yet, this would be rather inappropriate, since it would require an axiom stating/claiming its existence; in other words, formal concepts and ontological claims are (inappropriately) equated, leading to considerable mess. Additionally, as we have already seen in the chapter about the generalized model, models can’t be generalized into (mathematical) groups.

Now note that the inverse element is not part of a category, in contrast to the definition of a group, and also remember that the inverse element was precisely the element that could not be satisfied by our notion of model!

There are indeed good arguments that models could be formalized as (category theoretic) functors, or better as a particular category of functors. Replacing set theory (including groups) by category theory and its notions allows to study the structure of equivalence, instead of introducing it axiomatically. One stops to talk about equivalence, instead one is interested in isomorphism. Indeed, any equivalence should always be understood as , i.e. equivalent (just) by definition. There is no equivalence in the empirical world, or in any aspect of our relation to the world. Ultimately, we may set things not as equivalent if they match in “all” their criteria, but only if we can’t measure them, if they, in other words, are indiscernible because they are outside of any available scale for comparison.

Isomorphism, in contrast, requires some criteria that have first to be set or selected. This process may be repeated (recursively), as the rules for modeling may be considered as results of models. Again, we have an equivalent structure in our theory, which we called “ortho-regulation.” In both cases we are not threatened by the infinite regress, since in both case the progressive abstraction comes to an end quite naturally.

Yet, in this way, category theory clearly allows for a constructive attitude of equivalence. This situation is very similar to our notion of assignates in modeling. Funny enough, 2-category gives rise to statements like “The objects of the 2-category are called theories, the 1-morphisms  f : A → B are called models of the A in B, and the 2-morphisms are called morphisms between models.” [5] We will return to this in our discussion about the theory of theory. From the perspective of category theory, theories are closely related to “diagrams”:  they are the categorical analogue of an indexed family in set theory.  An indexed family of sets is a collection of sets, indexed by a fixed set; a diagram is a collection of objects and morphisms, indexed by a fixed category, or, equivalently, a functor from a fixed index category to some category.

The possibility of self-referential definitions also opens a fresh view to the concept of data. Data are no longer “givens,” as the etymological roots suggest. Instead, in category theoretic terms, data are the domain and codomain of particular functors, which we usually call “models.” Since data are compounds, it is natural to conceive of them as categories, too.

We already said that models “are” functors, i.e. models may be conceived most appropriately as functors; more precisely, they are functors between two categories C, D, where those categories are related bodies of data. These data C and D, respectively, have a different factual structure, but their abstract structural value is the same. Only for this reason we can “transform” them. If any two categories are “related” through functors we say that those categories are adjuncted.

𝒞 𝒟 U (m1) F (m2)

In our case U on model m1 and F on model m2, as well as 𝒞 and 𝒟, respectively, even belong to the same family of structures.   

Now, Awodey describes [2, p.253] that

… every adjunction describes, in a “syntax invariant” way, a notion of an”algebra” for an abstract “equational theory.”

[quotation marks by Awodey]

Awodey then emphasizes [p.255] that concepts that are defined by adjoints can be defined itself without referring to more “complicated,” say derived or semantic, concepts like limits, quantifiers, homomorphism-sets, infinite conditions, etc.

As it can be seen from the formal definition, the category is an arrangement that fulfills the law of associativity for the arrows:  h∘(g∘f) = (h∘g)∘f.

The important question for us here is, whether this relation also holds for models, which we have set to be equivalent to arrows. Why is this important? First,we think that it is important to be able to conceptualize the relationships between models in a formal manner. Second, we saw that group theory does not provide the means to talk generally about the relationships between models. Third, we see that category theory provides concepts that seem to characterize the structure of models quite well. So, the intention is to keep models aligned to category theory. Put it into different words we also could ask, in which way we should talk about models and their relationships and what should we avoid in order to keep this alignment alive?

In our previous investigation about the relationships between models and groups we already mentioned that a combination of models will fulfill the law of associativity only if the models are completely disjoint. Here, “disjoint” means that the respective arrows f, g, and h result in differently structured solution spaces. Usually, one can regard solution spaces as secondary data spaces, which are disjoint in the case if a transition between them would require a folding of one or of both of them. We can’t proof it here, but we guess that neither a difference in methods nor a difference in the used variables (assignates) represent a “necessary” condition for disjointness.

If we accept that it is meaningful at all to take the perspective of category theory in dealing with the investigation of generalized models, then the previous statement has an important consequence regarding the practice of modeling. In order to be able to conceive of models as a category we have to preserve associativity. In common parlance, however, models are perceived as different if (1) the use different constitutive variables, or if (2) they have been built using different methods. Yet, according to our results, in many cases we simply have different parts of the same, still incomplete model. Perfect disjoint models need to be able to indicate to which data they are applicable and they need to indicate different data as applicable.

The important consequence then is that a sound modeling process should be built as an iterative process, which establishes both a sorting, rating and selection of the input data, and which builds disjoint models on disjoint (primary) input data. Otherwise we would not achieve an instance of the category of models, we just would build a single mapping, a 3-valued relation, a single arrow. Yet and again, we would be trapped in set theory of which we know that it is inappropriate. To put it short we may say that

Category theory directly implies comparative theory.

There is still another correlate to this result. Modeling, correctly understood (where “correctly” means: in a way that avoids formal inconsistencies), has to include a self-focusing mechanism. From other contexts we know this property as “idealistic” style of thinking. For us it implies that we have to implement such an “idealizing” mechanism, which works for any “subject” being “contained” in a SOM.

Categories, Conditions and Machines

These result now seem to be quite important for our perspective onto modeling as part of of machine-based epistemology.

First, we may take it as a further confirmation of the generality of the concept of model, at least if taken in the general form as we’ve proposed it here. To achieve at a model we need not to start with a formal theory, contrary to what is believed by logical constructivism. Of course, we still need theories for modeling, but for quite different reasons. There is no generalization necessary and possible, which would exceed the general notion of a model.

Second, the formal properties of models revealed by category theory emphasize their character of being a double-articulating entity between matter and symbols. Since the formal properties of categories comprise a “cartesic closedness” we are formally allowed to conceive modeling as a purely mechanic activity—within the choreostemic conditions. Hence, we also conclude (1) that modeling can be automated, (2) that modeling could even take place in “purely” material arrangements, and (3) that any model comprises an associative part (here, “associative” does not refer to the “law of association” as in group theory).

Last, but not least, we will have to investigate two further issues: the generalization of the notion of the condition, and the role of the entity we usually call “theory.”

This article was first published 18/12/2011, last revision is from 25/12/2011

۞

Conditions

December 2, 2011 § Leave a comment

If any thing implies at least one condition

that are outside of itself, how to speak then about conditions?

Conditions are a strange something. Everybody is using them as sort of a scaffold,
for instance in the form of logical premises. In science, however, everyone tries to kick them out of the game. Degraded into “side-conditions” they are first vigorously controlled, then declared as irrelevant. In physical contexts as well as in the context of logic, taking conditions seriously leads to self-referentiality, which in nature provides the road into chaos and complexity. For logics, however, self-referentiality is often considered as its devilish counterpart and hence treated as a taboo. Conditions can’t be investigated by means of logics.

Conditions, and the conditional alike, can be written down in many pretty ways. Yet, it is quite difficult to explain them, to explain how we do arrive at them. They are certainly kind of a nightmare for philosophers, at least to those who, in one way or another, believe into existence as a primary foundation. Not believing in existence as a primary foundation does not mean at all not to believe in existence, or to deny it. Yet it is a dramatic difference if existence is set as a primary foundation or as an implied quality. We discuss this in another chapter.

This question about the foundations is so important because rationality is tightly bound to it, and with it the possibility to know. Since the beginnings of philosophy, and probably even long before, it has been considered as the ultimate justification for human societies to say “I did this because…”. To give a reason for ones own acts establishes the act as act, it thus also establishes the conditions for the possibilities of freedom. At least, such it was believed. If we could not justify our acts, well, then everything about human society would be unsecured. At least, such it was believed.

People quickly discovered that there are justifications that are more tenable than others. Then, if there is such an experiencable  difference, is it possible to think that there is a superior condition, one that rules all others? Something like a last condition? Something, that causes any other thing or change? Or, complementary to that, is there the possibility to gain apriori truth? Generally, it was always considered to be vital not to use unnecessary reasons in arguments. Aristoteles always tried to build his systems on a single principle. About 1500 years after the invention of the sparsity argument it was rediscovered by William of Occam (Ockham), a great medieval philosopher. Nowadays it is popular under the brand of “Occam’s razor”.

Consistently, Aristoteles also formulated the idea of the “Unmoved Mover“: an independent divine eternal unchanging immaterial substance. Yet, his main concern in this construct concerned material things. If the “unmoved mover” would be transferred to social or mental aspects of human life, a plethora of problems would arise, primarily about freedom, determinism and ethics. It does not help much to introduce the idea of God into the discourse about conditions. God would be the condition, more direct than not, also for the evil. Don’t think about war, think just of the victims of inquisition. You got the point. Post-medieval philosophy recognized the problems of the idea of God that would be related to any kind of condition, when Leibniz tried to justify that idea in his Theodizee.

Leibniz himself, but also Descartes among others, took the God out of the world of facts and matters and put it into the construction of the world. Consequently, everything was considered to be perfectly construed, everything has been—had to be—considered a machine, a quite complicated one, but still. The discussion about determinism persists up to date. All of our writings here about the possibility of machine-based epistemology are related to this issue, too. Of course, we reject determinism etc., but anyone thinking about the relation of matter and mind inevitably relates herself to the issue of determinism.

The idea of a single, monopolistic foundation for any thinkable justification introduces some serious problems. These problems are first structural in nature, then ethical.

A foundation could do its work only if there is additionally a necessity at work. In other words, the language game of “foundation” remains intact only if there is no choice, even no possibility for a choice for an instantiation. The foundation should not be abstract and it has to be formulated in a positive way. A good example is provided by the Ten Commandments. The very idea of a foundation presupposes the exclusion of choice and freedom for itself.

We may summarize these structural problems:

  • (1) The principle has to be formulated as a principle of construction; formally we call that “positive definite”.
  • (2) Everything becomes dependent on it.
  • (3) The need for further justification has to be impossible; otherwise it would not be a foundation.

It is difficult to underestimate the severity of the first part of the problem. Even Aristotle’s “unmoved mover” has to act, there is a positive determination of some aspect of that entity: it has the faculty to establish movement and change. This, in turn implies conditions that only could be cast aside by an (arbitrary) definition.

The third part of the problem can be dissolved by claiming some kind of absolute externalization, e.g. a divine origin, or the hyper-platonic sphere of “existent” ideas. Yet, since a “God” always remains somehow abstract (even by definition) one additionally needs rules for instantiate the divinity in real life. The rules of course also have to be of divine origin. Again, by definition, these rules can’t be discussed. Quite obviously, such divine rules are completely arbitrary, despite the fact that they produce societal structures in a way that the same arbitrary rules will provide predictive power. You may understand that this “solution” is not an acceptable one for us.

In mathematics, first principles are called axioms. Usually, and traditionally, they are founded outside of mathematics, referring to something that could not be denied. Such undeniable “subjects” are usually taken from everyday life, i.e. the sphere of experiences accessible for a human body, which is considered not to be part of mathematics. It is interesting to recall what happened to mathematics since the late 19th century. The French revolution and the settlement of industrialization brought a broad liberation and a long time crammed with successful science. Mathematics played an important role in those development. Based on the achievements in the unification of mathematics, namely the research about and the creation of new Algebras and their abstract embedding, the group by people like Clifford, Lie, Peano, Abel or Klein, the question about the foundation of mathematics arose. Could there be a single axiom that could be used to justify the correctness of mathematics? Hilbert then proposed his famous program, the Hilbert program, a set of tasks to be solved to guarantee the correctness of mathematics.

We have to put at least 2300 years of philosophical argumentation onto the table in order to grasp the shock that has been released by the results of Goedel’s early work, the incompleteness theorem. Its claim is simple, despite the fact that at that time the proof has been very difficult. Goedel proofed, that the Hilbert program poses impossible goals. It can’t be proofed that a formal system is complete and without contradictions. If you assume consistency, you loose completeness, if you prefer completeness, you lose safety, i.e. you likely will be caught by arbitrariness. This result is devastating for any theory of explicable universalia, let it be a religious theory like the idea of a God, or let it be mathematics. Whenever you strive for a constructive argument that should influence everything you will meet the same problem.

The problem can not be abolished by retreating to a structural and abstract level as first principles. As long as these rules are positive definite they imply arbitrariness. In other words, even the universality of ethics and the clarity of logics are threatened by it. How then rely on them? How to justify a political act, or any act, since any act erodes the freedom of another person? How to conduct an argument, if logics is unsafe and the idea of necessity vanishes? Elsewhere we will delve a bit deeper into the problematics of necessity or logics, which in turn is related to the issue of causality and information. Here, and for now, we have to postpone these big issues for a while.

Instead, we will return to our core subject, the possibility of machine-based epistemology. The question is what are the conditions for that possibility. As we have seen in the chapters about Non-Turing Computation and Growth, these conditions are not trivial. Particularly, we can’t program it. Thus, in the second, more strategic line we have to ask about the way we can speak about those conditions.

The consequence from the above is quite clear. Whatever we will take as these conditions, or speak about them, they can not consist from positive rules. The whole system needs to be a system of constraints, not a system of positive settings. Yet, these constraints have to have a particular quality as they themselves can’t be formulated as an identifiable (countable) limit, or a physical quality.

We already mentioned that positive axioms have to be avoided since they introduce arbitrariness. Nevertheless, we have to choose some starting point. We need an instance of a Wittgenstein ladder. Yet, it should not be necessary to throw it away after we achieved some insights, it simple should vanish, disappear by it own working.

Closely related to that we also can derive that there can’t be an outside position from where we could impose rules as such conditions or about how to talk about them. There is only an inside. Any justification of the system refers only and necessarily to itself. The structure we are going to build is fully self-referential, and that it has to remain consistent under this condition of self-referentiality without sending us into the abyss of the infinite regress. Usually, philosophy excludes any kind of self-referentiality because it is believed that it is equal to or equally evil as infinite regresses.  Arguments have to come to an end. Interestingly enough, here philosophers always followed the logical structure of finite algorithms. Yet, not coming to a definite end does not imply that their is meaningful progress.

There is yet a further structural condition we have to fulfill and which we can know before actually starting to define it. First, it is clear, that there will be one or some aspects of our theory of the condition. These aspects, but even a single aspect would do the same, establish a kind of an abstract space. Yet, this space can not be a presentative space, i.e. a space where we could put things into. We immediately would contradict our first principle of the pure negativity of the principles. This space can’t have a structure that would allow for coordinates. Instead, it needs to be impossible to determine a coordinate. As exotic as this requirement may seem, it can be instantiated; at least we can derive that the structure of the space is made from a differential. To give a metaphorical example, the space we are going to construct does not look like a map about a landscape. Such a map would contain locations, coordinates. Instead, the map should contain only kind of the possible acceleration. Yet, it is NOT a map about the values of accelerations at a particular location. The concept of a location does not make any sense in our space. Another way to put it in an almost metaphorical way would be to conceive of this space as a space of Laplace Operators.  “Points” in this space are not geometrical points (enumerable idealized particles under an order relation). Yet the space is dense. Each point is acting as an operator, or more graspable, acting like a tunnel, or wormhole.

Let us summarize these conditions crisp and clear:

  • (1) We can use only negatively formulated principles, they must not be normative in any respect.
  • (2) Even in their negativity these principles must not be “crisp” in order to avoid structural positivity.
  • (3) There is no outside position, thus the resulting arrangement needs to be stabilized by self-referentiality
  • (4) The abstract space is defined as a space of operators.

Further side conditions are imposed onto our project from the intention to remain consistent with anything we have been saying throughout this collection. Before we start we’d like to say that the way we introduce the structure here is only one out of many different. Yet we are convinced that this structure is stable and useful, after several years of using this structure as a kind of orientation.

Some basic Observations

Let us now start first by referring briefly to some simple observations. Yet, in the end we will put them all together.

The first one has been mentioned by Augustinus when he tried to reflect upon time []. He said that—in our words—time appears to him as a perfectly clear concept as long as he uses it. Yet, as soon as he starts to investigate it closer, this clarity vanishes almost completely. The interesting thing now is that this happens with any immaterial concept we can address with out languages. If we do not have a material reference for a concept, say “stone,” we get quickly caught by serious difficulties if we start to ask about its meaning or its essence, its dynamics or its heredity. More often than not, the difficulties arise even for material things. It is almost impossible to give a positive definition of a chair. There is always some exception. We could conclude that concepts can not be defined in a positive manner.

The second observation concerns what we call a model. People say that a model is some representation of another thing, often more simple, or more abstract than what is being represented. Such descriptions are (simply) wrong, as we have seen in the chapter about modeling. Things are experiencable only through models. There is no such thing as the thing-as-such, and there is no process we could call representation besides modeling. The issue now that is relevant in our context here in this chapter regards the issues of generation and usage of models.

Let us imagine we have a model about a building. It is an architectural model, which can be used only in a particular way and for particular purposes. It is a model for a wider context. The model itself can’t define the rules for its own application. That is, the architectural model is different from what we call architecture. Of course, we could build a model about architecture, but that model isn’t a model about a building anymore.

Quite similar, any particular model can’t define itself how to create it. If we generalize the argument to any kind of formal models (which is really a large set), we can see that the model can’t define the symbols and the way we use symbols to create a model. Thus we can conclude that any model has two open ports, where it is not self-containing.

Both sides, or both ports, the required symbols as well as the modes of usage refer to a community. From the perspective of the community, and also from that of its members, the model is a kind of a particular device that mediates the communicative relationships between the members of the community. Models establish a a distinct type of mediality. Yet, models are, of course, not the only media. Our example just demonstrates that the mediality of things can not be reduced to the things themselves, even if those things are concepts or models.

Those three aspects of concept, model and mediality are not reducible to each other; they all refer mutually to the other two as their condition. Together they imply a further categorical aspect: future. Actions are always directed to the future, but never to the past, the same is true for any epistemic act, for epistemology itself, and also for knowledge. Using something implies future. Yet, what is interesting about the future is not the temporal aspect. Instead, for our discussion of the conditionability its correlate, virtuality, is much more relevant.

Virtuality could be conceived as the pure potentiality, irrespective the “concrete” facts and processes. From Bergson and Deleuze we can learn that we have to distinguish the possible from the potential, or in other terms, and related to that, also the real from the actual. The possible is always real, even if it is not factual. We clearly can imagine it. The potential however we can not depict. Yet it does not denote a completely unknown domain either. Aristoteles and later Bergson and Deleuze clearly recognized that any discourse about change requires a notion of virtuality. Virtuality is the non-physical consequence of time and change. Taken from the other side we can see that we can’t talk about change without the concept of virtuality. In still other terms, and again implying the Deleuzean position1, we could say that “the” virtual refers to the energy created by the tension between immanence and transcendence.

Starting to ask about the condition we found four aspects that seem to be involved in the conditionability: model, concept, mediality and virtuality. Obviously, conditionability refers to the issue of sufficient reason and, of course, the possibility to know. For now, we have to stop here, postponing the further development of these issues. We first have to discuss some other topics. The thread will be taken up again in the chapter “A Deleuzean Move“.

Notes

1. Deleuze, Guattari, Was ist Philosophie? p.63 . 1991

۞

Where Am I?

You are currently viewing the archives for December, 2011 at The "Putnam Program".