Urban Reason II: Scopes & Scapes

September 19, 2012 § Leave a comment

Architecture is strongly based on models.

Everybody knows that architecture could not be practiced without models. This particularly strong relation between model and practice led to the use of the concept of “architecture” in areas quite different from building houses, for instance regarding software systems, the design of processes or organizational design. Since the advent of urbanism in the mid of the 20th century, this relation between architecture and the model became more and more problematic, the main reason being that the categories of the “observer” and the “observed” lost their mere possibility. In case of urban culture they can’t be separated without implying considerable costs.

This opened the question how to position urbanism, and there is still no (acceptable) answer to this question so far. Positioning urbanism includes any possibility to relate ourselves to what we call city, or urban arrangement, our expectations, hopes and fears about it, personally or politically, from a design perspective or the inhabitance perspective (again as far as those could be separated). For sure, scientism doesn’t provide the full answer, if any at all. The further question is why science must fail here despite it is an important ingredient to deal with the city. Else, the use of models when dealing with the city is inevitable, just as it is for any other relation to the world. Yet, which kind of models are appropriate, and even more important, how do we structure and organize our talking about it? Which kind of theoretical stance would be appropriate?

Among others, Koolhaas and his OMA/AMO setup has been working for a long time now to find new approaches. The other question is whether any answer to the former issue of positioning urbanism can be found within architecture or urbanism itself. Koolhaas’ guess is not quite positive, as he displayed it in his Junkspace. As an an-architect, Koolhaas has other means at his disposal than architecture itself, such like writing or movie making, to investigate the problematic field of the urban as a quality.

The general idea I am going to propose here is fundamentally different to common approaches in urbanism. Roughly spoken, it follows the grand cultural perspective, considering the Form of Life (as conceived by Wittgenstein) as an ineluctable “fact”. From this perspective, we radicalize Koolhaas’ rhetoric question “What ever happened to Urbanism?” (in S,M,X,XL), proposing to deny the reasonability of an “-ism” regarding the city and the Urban, simply because “The City” does not exist anymore.

The “architecture” of the argument uses philosophical techniques to organize conceptual elements which in turn refer to the contributions from the sciences. The outcome should allow to keep everything about the city in a single perspective, without totalizing or dominating any particular stance or attitude. In other words, we will not provide a recipe for achieving a solution in any particular case. In contrast, in the end we will provide a conceptual basis for deriving such solutions, a conceptual tool box, a techné. In still other words, it is, as always, I suppose, a matter to organize the use of language.

This Essay

This essay will collect some arguments in favor of the reasonability of the program that we call “Urban Reason”. We begin with a (very brief) discussion of the status of the model and of theory in architecture and urbanism. We conclude the first part by guessing that there is no theory about the Urban. The second part “Departure…” explores the site of departure towards an Urban Reason. This site is being illuminated by the observation of the inseparability of language and the form of life. Both affect the way of thinking and even what we can think at all. Now, if the form of life is Urban, what and how could we think? Finally, the third part “Approaching…” introduces the notion of the critique. Only the critique of the concept of “reason” allows to take an appropriate stance to it. The final section provides a glimpsy outlook to the effects of the turn towards the Urban Reason,

One of the consequences of that perspectival turn towards Urban Reason is a detachment of the Urban (see this footnote) as a quality from certain kinds of built environment (that we call city). In other words, our approach is heading towards a non-representational conceptualization of the city and the Urban. I am deeply convinced—as Deleuze also always was, we will return to this issue—that this dismissal of the representational attitude is mandatory for any attempt what is going on in our urban culture. Koolhaas demonstrated it some years ago in his trial called “The Generic City”. Generally spoken, I don’t see any other possibility for going non-representational with regard to the Urban than by means of the proposed turn. Without it, any approach to the city will got stuck in naivity, always constrained by the illusion of the particularity of the phenomenon, even if the pretending urbanist would start to engage in empirical counting activities. On the other hand, addressing the quality of the Urban just by philosophical means establishes what we will call the “binding problem”: The Urban requires a particular construction to enable philosophy to get a grip on it.

The Scope of Current Approaches to Theory

Actually, the problematic field as established by the model as a practice and as a concept has been part of architecture since Vitruv, as Werner Oechslin demonstrates [1]. Thus, in architectural writing we can find traces of a discussion that spans, with some gaps, more than 2000 years. Some sciences did not even detect that field up today. We may even say that architecture becomes architecture only through this problematic field. For only the model opens the process of building into the divergence of the question of form on the one hand and the status of architecture as a theoretical concern on the other. Hence, in the same move as the model, regardless its actualization, brings us to the form it also enforces us to think about theory. How do we come to build that model and this form? As we have argued in an earlier essay, Oechslin as well emphasizes that theory is not antipodic to practice. Instead, now in my words, theory is linked to the irreversibility of the act through the model. In turn, any practice implies a theory, and of course, also models. Oechslin writes:.

The model is definitely located in such an intermediate area made from abstract conceptions and contingent realities. ([1], p.131).1

This lets us guess that, regarding architecture, there is definitely something more about the model than just the physical model, the act of representation designed to convince the sponsor of the project. As the master of the history of architecture Oechslin refers to Vitruv directly and as well to authors from the Renaissance in his “ldea materialis” [1], where he writes as a closing remark:

In the Vitruvian precincts and in the succession to Alberti the model has been discussed particularly regarding the (anticipating) sensory perception, therefore often called also visation. […] the model, which often seems to be reduced to an image of itself, .lost its power that it contributed to the ‘process of becoming’. ([1], p.155).2

Werner Oechslin, an amicable person stuffed with incredible energy, runs a likewise incredible library and foundation about the history and theory of architecture. Hundreds of books from all times can be found there. It is indeed a sacred place, somehow, as far as we may consider culture and its book-like stuff as one of the most important parts of the conditio humana.

So, how is Oechslin conceiving “Architectural Theory”? On the website of the library foundation the following can be found [2]:

This project systematically collects and evaluates the literature of architectural theory, pursuing comprehensive coverage of the discipline and a catalogue (census) of all printed sources. The project is the basis for specific individual investigations regarding particular aspects and questions of the formation of architectural theory (such as drawings, models, relations between image and text, the genesis of concepts, strategies of design, etc.). The census is based on research done since 1989 at the Institute for the History and Theory of Architecture at the ETH Zurich. [my emphasis]

In earlier essays we argued that probably the only reasonable way to conceive of theories is as orthoregulation of modeling. According to this perspective theories are not related to empiric issues, but just to the practice of modeling. Theories do not contain hypotheses at all, since hypotheses are always about something experienceable. Oechslin’s almost perfectly represents that. We have to be perfectly clear about this status of theory! Many proclaimed theories are in fact just models, e.g. Newton’s “theory” of gravitation. In fact, up today we do not have such a theory of gravitation at our disposal. What is missing in Oechslin’s explication is the embedding in language as a life form. The issue is only implicitly invoked.

In a more elaborated notion about theory in architecture that serves as the introduction to the Vitruv Colloquium, Oechslin still does not bring in language. Yet, he cites Aristotle’s formula “Habitus faciendi cum ratione“. (Nicomachean Ethics). Oechslin leaves this untranslated, and wisely so, since facere could mean {produce, erect, build, exert, act, make, do} and ratio {cause, modality, calculation, reason, clarification, explanation, invoice, principle ,theory, proportion}. Note that the Latin ratio is already a translation from the Greek logos, or logike, which adds further dimensions. Anyway, the implication is clear.

An appropriate concept about theory denies the separation of theory and practice. We may regard theory as almost the same as practice. What could not be subsumed to theory is performance, which is an answer to the “resistance of the existential”. The existential, however, could neither be a part of any theory nor of any kind of model. We can’t even speak about it, nor could we point to it or demonstrate it. Realism, deconstructivism and phenomenology—which are closely related to each other—all fail in their attempt to take an appropriate stance towards the existential. To be clear, this is not a matter of attitude, it is a matter of methodology.

Above we already introduced the question “How do we come to build that model and this form?” as the hallmark of a theory of architecture. This question about the “How do we come to …” asks about the conditions of doing so. An eminently important part of these conditions is language and languagability. How do we speak about this how to? About this practice? How if not by philosophical means should we address that question? Architectural theory is not possible without references to philosophy. This, of course, holds for biology or physics in the same manner.

For 2000 years architectural theory has been a theoretical engagement targeting architectural questions, that is questions about the form of an individual building and its close surround. This tradition led to Junkspace. The medium that created Junkspace was swarm architecture. Quite obviously, we have to adapt the scope of our theoretical concerns. The scope of architectural theory—which dedicatedly includes a corresponding and inseparable practice, as we have seen above—can’t be any longer that of individual buildings. And this scope is the city and the quality of the Urban..

A theory about the city, and even more about the Urban.3, poses a serious challenge, though. For large parts of culture relates to it, or is even already a major constituent of it. A theory about culture, however, would have to be a self-referential theory. In our piece called “A Deleuzean Move” we tried to develop such a structure, which is not related specifically to any kind of theory about the urban any more.

David Shane, in his “Recombinant Urbanism”[3] devises considerable efforts to clarify his concept of theory. It is not the only feature that makes his book is so outstanding. Despite he does not completely arrive at a general or generally applicable concept about “theory”, his efforts come close to what we described earlier (“Theory (of Theory)”, and in the further course of his synthetic investigations he tightly follows his theoretical outline. Yet, he calls his theory a “theory about the city”, not a theory about the Urban. According to which we said in the preceding paragraph, he is totally correct about that. Throughout his book he demonstrates how to build models about the city. Probably Shane’s contribution may be conceived even as the only theory of the city we currently have available.

Yet, here we are not interested in a theory of the city, that is a theory about modeling and investigating urban arrangement, thereby doubling the great work of Shane. Our goal is a quite different one. In a preliminary fashion we could say that we are interested in the foundations of urbanism. A “City Theory” like that of Shane is certainly an important part of urbanism. Yet, it can’t be considered as the only part. First, urbanism is not only about the almost “physical” mechanisms of urban agglomerations. A collection of buildings is as less a city as a collection of trees is already a forest. .The important things about a particular city and as well about the Urban are far beyond traffic control or the legislative regulations about erecting buildings4, albeit such rules and controls­­­—though again not as particulars—are necessary ingredients to allow for the emergence of the Urban. Of course, the same holds for the practice of erecting buildings itself, stripped from relational concerns. This was clearly recognized by Fumihiko Maki as early as in 1964 [6]:

There is nothing less urbane, nothing less productive of cosmopolitan mixture than raw renewal, which displaces, destroys, and replaces, in that mechanistic order;’

Secondly, for addressing the Urban it is not sufficient to think about the way of how to speak about the models about the city. Such would represent the more scientific and reductionist attitude that takes the city and the urban processes as an observable. Yet, such a separation is not a sound alternative, because the scientific description is—by its own definition—only about the sayable. One could easily misunderstand this as a rejection of science as a whole. Of course, I don’t opt for that. Science may well practice analyticity and reductionism within a defined framework and an established community that adhere to scientific methodology. But science should not attempt to export its standards as the structure of choice for any other area. Outside science, science is just an element (in the sense we discussed it here). Nevertheless, science excludes any aspect of performance and the demonstrable apriori. Reducing cities to the scientifically observable aspects could be regarded even as a methodological fault if it comes (i) to the qualitative aspects of urban life and, more important, (ii) to the conditions of the Urban and the way of speaking that we could employ regarding any putative theory of the Urban.

The foundations of urbanism comprise the topic of the conditions for the possibility of creating models about the change of urban environments, and here we deliberately include the social, political and cultural aspects at large. Hence, without those foundations we can’t hope to get any reasonable grip to what is going on in the cities, putting emphasis here is on reasonable. The difference is the same as we have discussed previously (“Koolhaas the story-teller”) with regard to story-telling. It makes a huge difference to be part of a story or likewise to provide an arbitrary something that is then assimilated by the story, or to deal consciously with the Urban.

In the remainder of this writing I will present a brief outline about a potential argumentation that would support our conviction that the concept of Urban Reason is a reasonable program.

Departure to Urban Reason

As one of the more salient starting points for such arguments, though there are certainly others, one could take the inseparability of language and the life form in the Wittgensteinian sense. Since the times of ancient Rome humans have experienced the particular conditions of urban life. These conditions regard anything, from the supply of food, water and energy, up to the social aspects of life and questions of organization and power. It is certainly not an exaggeration to say that everything that could be conceived as human culture today is specifically related to the form of the city. Today, and certainly for a long time, the Urban stains the rural, the country-side of the Urban is everything that is not the city, let it even be the Sahara or the Amazonas jungle. The rural is the surround, a dislocated source for a diversity of fundamental streams: Water, energy (be it electricity, be it food), for some parts also space or a particular quality of time, for which there is no replenishment achievable within most urban agglomerations.

The. city and its surround represent entangled forms of life, yet, the cultural dynamics, particularly as a semiosis (generation of signs5) or as mediagenesis (the generation of media and mediality6) is clearly dominated by the Urban. Think of books, theater, the arts, the press, the construct of the news, etc. All of that and—most significant regarding our interests here—all the related thinking and living belongs to the quality of the Urban, it contributes to it and it derives from it. Note that it would be missing the point to say that these qualities could be “found only in the city”, since the book and its companions are just constitutive of the urban itself. Separating the whole of the Urban from its drivers results in a tautology. The locational, or better: the territorial speaking is modernist, analytic, not having left behind the 19th century, at most.

We may express it in a condensed manner: In the city we experience thinking, it is within the practice of the abstract Urban, where thinking happens, and where densified thinking takes place, there we may experience or attribute the Urban. Some of the conditioning requirements for those bursts upon densification are the abstract associativity, the networks, the streams, the concepts that are kept flying around, the vortices and clinamen appearing on those streams, etc.

Such determines and deeply affects thinking, language and the life form and hence also the kind of rationality and reason that could arise and emerge from it. The relationship between thinking and life form is not limited to urban life, of course, it is a quite general principle. The novelty here is that it happens as a particular urban issue on a global scale, instead of its previously regional instantiation within a particular rural.

So, if we for now accept the idea that there is a specific instance of thinking in the cultural environment of the city, constituting an Urban Reason, and including the way to deal with the “resistance of the existential”, then we can start to ask particular questions that are not possible without that move. This move towards the Urban Reason would allow to develop urbanity along a completely different storyline. We may even say that it constitutes the possibility for such a storyline at all. Koolhaas notion of “The Generic City”, provided as an imaginary script for a movie, now appears as a very early pre-cursor of that.

A quite interesting topic is presented by the concept and the practice of trust. Trust builds a bridge between the animal-corporeal and the medial-cultural. Along with the development of the city since the 12th century, trust became more and more probabilized. We may even turn the perspective that allows to conceive of the city as an organizational form to probabilize trust. In some agglomerations this endeavor fails, and it is difficult, if not impossible to regard such agglomerations as urban or as city at all. All shades and grades between the two poles can be observed, of course. The successful probabilization of trust may be the most important difference between the urban and the non-urban.

The changed concept of trust also changes the concept of politics, or governmentality, as Foucault has been identifying it. The late Foucault has been increasingly interested in governmentality and its relation to the exertion of power. A long time before once he was starting his journey towards the bio-power with investigations about thing, order and violence, continuing after a more broad assimilation of Wittgensteinian philosophy with his particular concept of historicity. Bio-power refers to a certain attitude and assignment of importance to the concept of the body, namely the biological aspect of the body. His fears and projections did not fully develop (so far), yet, the importance of the question about the body and its status remained intact. We just have to ask about the body, and of course the model of the body (e.g. [7])

So far, there is no discussion at all in urbanism about the relation between the form and government, the exertion of power and the organization of probabilized trust. Neither monarchies nor elite-constrained oligarchies as their modernized form—think of the E.U.—, in short no kind of strongly centralized government could be considered as an adequate form for Urban societies. Just think of the difference between Tokyo (in fact 24 autonomous cities operating under the same label) and Moscow, or, vice versa, the resemblance between Tokyo and the political organization of Switzerland and its 25 cantons (despite all differences…).

Approaching the Critique of Urban Reason

Given the concept’s reasonability we may ask, how then could we go about for Urban Reason?

Of course, Immanuel Kant’s investigation of reason and rationality immediately pops in with his distinction into pure reason, practical reason, ethics and aesthetics, if it is allowed to talk in such a coarse manner about his work. Yet, I don’t think that the Kantian way is not suitable any longer, for at least three reasons.

First, Kant has been strongly influenced by physics and kind of a first-level scientism, seriously affected (and limited) by thinking in cause-and-effect schemata. Kant did not have at his disposal the concept of probabilization as we can use it today. Neither was the population established as a life form—it just had been invented as the French Revolution when Kant was writing the concluding parts—, nor could he have been consequently aware about the realm of information. Physics served Kant as an ideal, yet, physics is still not able to say anything about complexity and emergence. Today we even could reason, as we did above, that science itself doesn’t represent a generalizable image of thought at all. At best, it provides an acceptable contribution.

Secondly, the Kantian distinction is vulnerable against idealism and all its detrimental consequences. For starting with the “pureness” always relies on the identity as the ruling transcendental principle. Identity thinking is methodologically faulty and politically disastrous. We had to wait until Deleuze who successfully demonstrated how philosophy, thinking and acting could be re-oriented towards the principle of transcendental difference [8]. Accordingly, Kant did not recognize the working of abstraction through the differential. Thus, Kant always had serious difficulties to link the idea, the abstract, the concept to the dimension of practice and performance.

Thirdly, and this is related to the second point, Kant was quite too early to be able to recognize the role of language. Without incorporating the linguistic turn (in its non-analytical form, of course) it may prove to be quite difficult (if not hopeless) to find a suitable link between mental life (whether internal or external), practice and performance (down to logistics, politics and the so-called “public space”) and the philosophical “habit”. The combination of these three missing issues in Kantian philosophy—probabilization, transcendental difference, linguistic turn—causes a fourth one, which is the blindness against mediality.

Saying this I feel obliged to emphasize the great achievement of the Kantian philosophy. Firstly, there is the concept of transcendence, or more precisely, the working of transcendence and its continuous presence in any thought. Secondly, and that’s a methodological trick, Kant didn’t engage in explaining or describing reason, instead he introduced philosophy as a technique, as a critique. After specifying it, we should check it’s conditions and consequences, we should “criticize” it.

The concept of Urban Reason thus is probably less a concept as a particular image of culture. Deleuze once proposed a new image of thought that he based on the notion of the transcendental difference. This image he directed against the “dogmatic image of thought” and the closely related syndrome of representationalist thinking. Yet, even if we refer to the image of thought as a “framework” or a habit, or even as a philosophical stance (whatever this could mean), we could compare it to other such arrangements. We already proposed a proto-philosophical structure that guarantees a conceptual consistency for all its derivates and applications. We developed it in a Deleuzean perspective and called it the “choreostemic space”. We argued that this space allows to map and to compare not only any style of thinking, but rather any stance towards the world, without falling prey to a methodological petitio principii. Such, we will also have to investigate the attractors of the Urban Reason as a framework as well as the particular instance of Urban Reason as it arises in a particular (class of) urban arrangements. I would expect even before the started the development of Urban Reason (as a framework) that such an abstract cartography will yield important insights into the long-term dynamics of cities.

Even as we dismiss the Kantian distinction, we nevertheless may distinguish different stages in the instantiation of Urban Reason until we arrive at a practical or political guideline, or even as a utilization in an empiric research program. A general and exemplary outline of those steps will be given in the next essay.

Conclusion and Outlook

For now we have to ask about the questions that could be uniquely addressed on the basis of Urban Reason. Of course, we can just provide some examples as the full list is possibly quite large, or even practically infinite.

First of all, and not to the least importance, the perspective of Urban Reason allows to address the relation between abstract categories about the Urban (“Urban Theory”) and the practical concerns that appear in a city for any kind of stake holder. Today, the lack of such a suitable bridge between category and operation may constitute one of the major problems of urbanism. The missing of an appropriate binding between those also contributes to the tendency of urbanism to take a largely reductionist attitude.

Such, the practical affairs in Urban Reason in terms of ”actions taken” are largely influenced by a varying mixture of four attitudes, which supposedly are: (i) make-up of values mostly due to historical constraints, as in its most extreme form in the case of Singapore, (ii) just as a unreflected alignment to arbitrary contingencies, determined by the structure of local political processes (e.g. Munich, Berlin, Tokyo or also Zurich), or finally (iv) due to ideological considerations (most salient examples: Paris, Los Angeles, Chicago, Hanoi, Shanghai, Stone Town Zanzibar).

Any of these four motivational centers do not address the city as a life form in its own right. No wonder can we observe any degree and any kind of violence in the urban processes on any of the time scales, illegitimate as well as legitimate ones, indeed so much that nowadays violence and the Urban often appear as close relatives. It may well be expected that the “binding problem” of urbanism provides an improved capability to navigate through the evo-devo of the city.

Solving the binding problem of urbanism also means that urbanism could integrate concepts from other disciplines more readily. Here I not only refer to concepts from the hard sciences, but rather to holistic conceptualizations or areas like literature science or even philosophy (taken here as a technique for asking about the conditionability of issues). A relatively significant topic is that of differentiation. Currently, urbanism does not have means even to talk appropriately about it, mainly due to the fact that physics prevails as the ideal (still). Yet, physical differentiation refers just to the level of the existential, to be or not be. Physics is a deeply non-relational science and thus totally unsuitable to guide any research program in urbanism. Differentiation includes growth (of different kinds), partial deletion, transformation, but also the issues of individuation, associativity, emergence or fluidity, among others. .While there are already practical adoptions of the topic of differentiation, mainly triggered by the state of market affairs in architecture7, an appropriate theory is not available. On the other hand, differentiation could not be conceived as a purely political topic either, for this would neglect the autonomy, meta-stability and persistence of the city as a complex system. Once, in his short piece “What ever happened to Urbanism?” (part of S,M,X,XL) Koolhaas pointed in a somewhat desperate manner to this fact:

Together, all attempts to make a new beginning have only discredited the idea of a new beginning. A collective shame in the wake of this fiasco has left a massive crater in our understanding of modernity and modernization.

What makes this experience disconcerting and (for architects) humiliating is the city’s defiant persistence and apparent vigor, in spite of the collective failure of all agencies that act on it or try to influence it-creatively, logistically, politically.

The professionals of the city are like chess players who lose to computers. A perverse automatic pilot constantly outwits all attempts at capturing the city, exhausts all ambitions of its definition, ridicules the most passionate assertions of its present failure and future impossibility, steers it implacably further on its flight forward. Each disaster foretold is somehow absorbed under the infinite blanketing of the urban.

At this point I again would like to emphasize that Urban Reason and its critique is not an analytical endeavor. It should not be misunderstood as a kind of logic, or a set of fixed rules, nor as a kind of rationality at all. Story-telling in ancient Bagdad at night is a kind of reason as contemporary mathematics is. Thus, instead of drawing on logic, it may be much more appropriate to conceive of “Urban Reason” in terms of Foucault’s concept of the field of proposals and propositions, where arrangements of proposals, in short: stories, are made from proper elements. This will allow us to find a proper organization for the layout of the genealogy of our critique… which we will start with in one of the next pieces, at least as soon as possible.

..

Notes

1.. German orig.: “Das Modell liegt durchaus in einem solchen Zwischenbereich von abstrakten Vorstellungen und kontingenten Wirklichkeiten.”

2.. German orig.: “Das Modell wurde im vitruvianischen Umfeld und in der Nachfolge Albertis insbesondere im Hinblick auf die Hilfestellung für die (antizipierende) Sinneswahrnehmung diskutiert und deshalb auch häufig Visierung genannt. […] dem Modell, das oftmals auch nur auf ein Bild seiner selbst reduziert erscheint, ist die spekulative Potenz im ‚Prozess des Werdens‘ abhanden gekommen.” ([1] p.155)

3.. We use the capital “U” if we refer to the urban as a particular quality and as a concept, in order to distinguish it from the ordinary adjective.

4.. for a collection of such rules cf. Axel Lehnerer [4].

5.. Here we refer, as always, to the conception of the sign as it has been developed by Charles S. Peirce. The differences to de Saussures concept of the signs are tremendous. The Peircean sign is open, dynamic, volatile and refers only to other signs, never directly to an object, as the phenomenological structure of de Saussures sign does. Such, the Peircean sign is largely synonymic with the interpretation situation and the respective processes itself.

6.. Vera Bühlmann argued for an intimate relationship between mediality as a transcendental and practical entity and architecture, coining the label of “inhabiting media”. [5]

7.. There is a growing awareness in architectural research and education, particularly in Europe, that architecture might be more and more engaged in transformation processes upon existing buildings or arrangements of building instead of building anew. Cf. the master courses titled “Planning and Building within Assets” at the University of Siegen (Germany) (orig. “Planen und Bauen im Bestand”).

References

  • [1] Werner Oechslin, Architekturmodell »ldea materialis«, in: Wolfgang Sonne (ed.), Die Medien und die Architektur. Deutscher Kunstverlag, Berlin 2011, S. 131-155.
  • [2] Website of the Werner Oechslin Library Foundation. last accessed 29th Sep, 2012.
  • [3] David Shane, Recombinant Urbanism. 2005.
  • [4] Axel Lehnerer 2010. Thesis, ETH Zürich.
  • [5] Vera Bühlmann, inhabiting media. Thesis, University of Basel (CH) 2009.
  • [6] Fumihiko Maki, Investigations in Collective Form. 1964. cited after: Rem Koolhaas, Singapore Songlines [9].
  • [7] Klaus Wassermann. The Body as Form – or: Desiring the Modeling Body. in: Vera Bühlmann, Martin Wiedmer (eds.), pre-specifics: Some comparatistic investigations on research in design and art. JRP Ringier, Zürich 2008. pp.351-360. available online.
  • [8] Gilles Deleuze, Difference and Repetition.
  • [9] Rem Koolhaas, Singapore Songlines. in Rem Koolhaas, Bruce Mau, S,M,X,XL. 1995. p.1009-1089.

۞

Advertisements

Modernism, revisited (and chunked)

July 19, 2012 § Leave a comment

There can be no doubt that nowadays “modernism”,

due to a series of intensive waves of adoption and criticism, returning as echoes from unexpected grounds, is used as a label, as a symbol. It allows to induce, to claim or to disapprove conformity in previously unprecedented ways, it helps to create subjects, targets and borders. Nevertheless, it is still an unusual symbol, as it points to a complex history, in other words to a putative “bag” of culture(s). As a symbol, or label, “modernity” does not point to a distinct object, process or action. It invokes a concept that emerged through history and is still doing so. Even as a concept, it is a chimaera. Still unfolding from practice, it did not yet move completely into the realm of the transcendental, to join other concepts in the fields most distant from any objecthood.

This Essay

Here, we continue the investigation of the issues raised by Koolhaas’ “Junkspace”. Our suggestion upon the first encounter has been that Koolhaas struggles himself with his attitude to modernism, despite he openly blames it for creating Junkspace. (Software as it is currently practiced is definitely part of it.) His writing bearing the same title thus gives just a proper list of effects and historical coincidences—nothing less, but also nothing more. Particularly, he provides no suggestions about how to find or construct a different entry point into the problematic field of “building urban environments”.

In this essay we will try to outline how a possible—and constructive—archaeology of modernism could look like, with a particular application to urbanism and/or architecture. The decisions about where to dig and what to build have been, of course, subjective. Of course, our equipment is, as almost always in archaeology, rather small, suitable for details, not for surface mining or the like. That is, our attempts are not directed towards any kind of completeness.

We will start by applying a structural perspective, which will yield the basic set of presuppositions that characterizes modernism. This will be followed by a discussion of four significant aspects, for which we will hopefully be able to demonstrate the way of modernist thinking. These four areas concern patterns and coherence, meaning, empiricism and machines. The third major section will deal with some aspects of contemporary “urbanism” and how Koolhaas relates to that, particularly with respect to his “Junkspace”. Note, however, that we will not perform a literary study of Koolhaas’ piece, as most of his subjects there can be easily deciphered on the basis of the arguments as we will show them in the first two sections.

The final section then comprises a (very) brief note about a possible future of urbanism, which actually, perhaps, already has been lifting off. We will provide just some very brief suggestions in order to not appear as (too) presumptuous.

Table of Content (active links)

1. A structural Perspective

According to its heterogeneity, the usage of that symbol “modernity” is fuzzy as well. While the journal Modernism/modernity, published by John Hopkins University Press, concentrates „on the period extending roughly from 1860 to the mid-twentieth century,“ while galleries for “Modern Art” around the world consider the historical period since post-Renaissance (conceived as the period between 1400 to roughly 1900) up today, usually not distinguishing modernism from post-modernism.

In order to understand modernism we have to take the risk of proposing a structure behind the mere symbolical. Additionally, and accordingly, we should resist the abundant attempt to define a particular origin of it. Foucault called those historians who were addicted to the calendar and the idea of the origin, the originator, or more abstract the “cause”, “historians in short trousers” (meaning a particular intellectual infantilism, probably a certain disability to think abstractly enough) [1]. History does not realize a final goal either, and similarly it is bare nonsense to claim that history came to an end. As in any other evolutionary process historical novelty builds on the leftover of preceding times.

After all, the usage of symbols and labels is a language game. It is precisely a modernist misunderstanding to dissect history into phases. Historical phases are not out there, or haven’t been  there. It is by far more appropriate to conceive it as waves, yet not of objects or ideas, but of probabilities. So, the question is what happened in the 19th century that it became possible to objectify a particular wave? Is it possible to give any reasonable answer here?

Following Foucault, we may try to reconstruct the sediments that fell out from these waves like the cripples of sand in the shallow water on the beach. Foucault’s main invention put forward then in his “Archaeology” [1] is the concept of the “field of proposals”. This field is not 2-dimensional, it is high-dimensional, yet not of a stable dimensionality. In many respects, we could conceive it as a historian’s extension of the Form of Life as Wittgenstein used to call it. Later, Foucault would include the structure of power, its exertion and objectifications, the governmentality into this concept.

Starting with the question of power, we can see an assemblage that is typical for the 19th century and the latest phase of the 18th. The invention of popular rights, even the invention of the population as a conscious and a practiced idea, itself an outcome of the French revolution, is certainly key for any development since then. We may even say that its shockwaves and the only little less shocking echoes of these waves haunted us till the end of the 20th century. Underneath the French Revolution we find the claim of independence that traces back to the Renaissance, formed into philosophical arguments by Leibniz and Descartes. First, however, it brought the Bourgeois, a strange configuration of tradition and the claim of independence, bringing forth the idea of societal control as a transfer from the then emerging intensification of the idea of the machine. Still exhibiting class-consciousness, it was at the roots of the modernists rejection of tradition. Yet, even the Bourgeois builds on the French Revolution (of course) and the assignment of a strictly positive value to the concept of densification.

Without the political idea of the population, the positive value of densification, the counter-intuitive and prevailing co-existence of the ideas of independence and control neither the direction nor the success of the sciences and their utilization in the field of engineering could have been emerging as it actually did. Consequently, right to the end of the hot phase of French Revolution, it was argued by Foucroy in 1794 that it would be necessary to found a „Ecole Polytechnique“1. Densification, liberalism and engineering brought another novelty of this amazing century: the first spread of mass media, newspapers in that case, which have been theorized only approx. 100 years later.

The rejection of tradition as part of the answer to the question “What’s next?” is perhaps one of the strongest feelings for the modernist in the 19th century. It even led to considerable divergence of attitudes across domains within modernism. For instance, while the arts rejected realism as a style building on “true representation,” technoscience embraced it. Yet, despite the rejection of immediate visual representations in the arts, the strong emphasis on objecthood and apriori objectivity remained fully in charge. Think of Kandinsky’s “Punkt und Linie zu Fläche“ (1926), or the strong emphasis of pure color (Malevich), even of the idea of purity itself, then somewhat paradoxically called abstractness, or the ideas of the Bauhaus movement about the possibility and necessity to objectify rules of design based on dot, line, area, form, color, contrast etc.. The proponents of Bauhaus, even their contemporary successors in Weimar (and elsewhere) never understood that the claim for objectivity particularly in design is impossible to be satisfied, it is a categorical fault. Just to avoid a misunderstanding that itself would be a fault of the same category: I personally find Kandinsky’s work mostly quite appealing, as well as some of the work by the Bauhaus guys, yet for completely different reasons that he (they) might have been dreaming of.

Large parts of the arts rejected linearity, while technoscience took it as their core. Yet, such divergences are clearly the minority. In all domains, the rejection of tradition was based on an esteem of the idea of independence and resulted predominantly in the emphasis of finding new technical methods to produce unseen results. While the emphasis of the method definitely enhances the practice of engineering, it is not innocent either. Deleuze sharply rejects the saliency of methods [10]:

Method is the means of that knowledge which regulates the collaboration of all the faculties. It is therefore the manifestation of a common sense or the realisation of a Cogitatio natura, […] (p.165)

Here, Deleuze does not condemn methods as such. Undeniably, it is helpful to explicate them, to erect a methodology, to symbolize them. Yet, culture should not be subordinated to methods, not even sub-cultures.

The leading technoscience of these days had been physics, closely followed by chemistry, if it is at all reasonable to separate the two. It brought the combustion engine (from Carnot to Daimler), electricity (from Faraday to Edison, Westinghouse and Tesla), the control of temperature (Kelvin, Boltzmann), the elevator, and consequently the first high-rise buildings along with a food industry. In the second half of 19th century it was fashionable for newspapers to maintain a section showing up the greatest advances and success of technoscience of the last week.

In my opinion it is eminently important to understand the linkage between the abstract ideas, growing from a social practice as their soil-like precursory condition, and the success of a particular kind of science. Independence, control, population on the one side, the molecule and its systematics, the steam and the combustion engine, electricity and the fridge on the other side. It was not energy (in the form of wood and coals) that could be distributed, electricity meant an open potential for an any  of potential [2]. Together they established a new Form of Life which nowadays could be called “modern,” despite the fact that its borders blur, if we could assume their existence at all. Together, combined into a cultural “brown bag,” these ingredients led to an acceleration, not to the least also due to the mere physical densification, an increase of the mere size of the population, produced (literally so) by advances in the physical and biomedical sciences.

At this point we should remind ourselves that factual success does neither legitimize to expect sustainable success nor to reason about any kind of universal legitimacy of the whole setup. The first figure would represent simply naivety, the second the natural fallacy, which seduces us to conclude from the actual (“what is”) to the deontical and the normative (“what should be”).

As a practice, the modern condition is itself dependent on a set of beliefs. These can neither be questioned nor discussed at all from within the “modern attitude,” of course. Precisely this circumstance makes it so difficult to talk with modernists about their beliefs. They are not only structurally invisible, something like a belief is almost categorically excluded qua their set of conditioning beliefs. Once accepted, these conditions can’t be accessed anymore, they are transcendental to any further argument put forward within the area claimed by these conditions. For philosophers, this figure of thought, the transcendental condition, takes the role of a basic technique. Other people like urbanists and architects might well be much less familiar with it, which could explain their struggling with theory.

What are these beliefs to which a proper modernist adheres to? My list would look like as that given below. The list itself is, of course, neither a valuation nor an evaluation.

  • – independence, ultimately taken as a metaphysical principle;
  • – belief in the primacy of identity against the difference, leading to the primacy of objects against the relation;
  • – linearity, additivity and reduction as the method of choice;
  • – analyticity and “lawfulness” for descriptions of the external world;
  • – belief in positively definable universals, hence, the rejection of belief as a sustaining mental figure;
  • – the belief in the possibility of a finally undeniable justification;
  • – belief that the structure of the world follows a bi-valent logic2, represented by the principle of objective causality, hence also a “logification” and “physicalization” of the concept of information as well as meaning; consequently, meaning is conceived as being attached to objects;
  • – the claim of a primacy of ontology and existential claims—as highlighted by the question “What is …?”—over instances of pragmatics that respect Forms of Life—characterized by the question “How to use …?”;
  • – logical “flatness” and the denial of creativity of material arrangements; representation
  • – belief in the universal arbitrariness of evolution;
  • – belief in the divine creator or some replacement, like the independent existence of ideas (here the circle closes).

It now becomes even more clear that is not quite reasonable to assign a birth date to modernism. Some of those ideas and beliefs haven been around for centuries before their assembly into the 19th century habit. Such, modernism is nothing more, yet also nothing less than a name for the evolutionary history of a particular arrangement of attitudes, believes and arguments.

From this perspective it also becomes clear why it is somewhat difficult to separate so-called post-modernism from modernism. Post-modernism takes a yet undecided position to the issue of abstract metaphysical independence. Independence and the awareness for the relations did not amalgamate yet, both are still, well, independent in post-modernism. It makes a huge, if not to say cosmogonic difference to set the relation as the primary metaphysical element. Of course, Foucault was completely right in rejecting the label of being a post-modernist. Foucault dropped the central element of modernism—independence—completely, and very early in his career as author, thinking about the human world as horizontal (actual) and vertical (differential) embeddings. The same is obviously true for Deleuze, or Serres. Less for Lyotard and Latour, and definitely not for Derrida, who practices a schizo-modernism, undulating between independence and relation. Deleuze and Foucault never have been modern, in order to paraphrase Latour, and it would be a serious misunderstanding to attach the label of post-modernism to their oeuvre.

As a historical fact we may summarize modernism by two main achievements: first, the professionalization of engineering and its rhizomatically pervasive implementation, and second the mediatization of society, first through the utilization of mass media, then by means of the world wide web. Another issue is that many people confess to follow it as if they would follow a program, turning it into a movement. And it is here where difficulties start.

2. Problems with Modernism

We are now going to deal with some of the problems that are necessarily associated to the belief set that is so typical for modernism. In some way or another, any basic belief is burdened by its own specific difficulties. There is no universal or absolute way out of that. Yet, modernism is not just an attitude, up to now it also has turned into a large-scale societal experiment. Hence, there are not only some empirical facts, we also meet impacts onto the life of human beings (before any considerations of moral aspects). Actually, Koolhaas provided precisely a description of them in his “Junkspace” [3]. Perhaps, modernism is also more prone to the strong polarity of positive and negative outcomes, as its underlying set of believes is also particularly strong. But this is, of course, only a quite weak suggestion.

In this section we will investigate four significant aspects. Together they hopefully provide kind of a fingerprint of “typical” modernist thinking—and its failure. These four areas concern patterns and coherence, empiricism, meaning and machines.

Before we start with that I would like to visit briefly the issue raised by the role of objects in modernism. The metaphysics of objects in modernism is closely related to the metaphysical belief of independence as a general principle. If you start to think “independence” you necessarily end up with separated objects. “Things” as negotiated entities do barely exist in modernism, and if so, then only as kind of a error-prone social and preliminary approximation to the physical setup. It is else not possible, to balance objects and relations as concepts. One of them must take the primary role.

Setting objects as primary against the relation has a range of problematic consequences. In my opinion, these consequences are inevitable. It is important that neither the underlying beliefs nor their consequences can’t be separated from each other. For a modernist, it is impossible, to drop one of these and to keep the other ones without stepping into the tomb of internal inconsistency!

The idea of independence, whether in its implicit or its explicit version, can be traced back at least to scholastics, probably even to the classic where it appeared as Platonic idealism (albeit this would be an oversimplification). To its full extent it unfolded through the first golden age of the dogma of the machine in the early 17th century, e.g. in the work of Harvey or the philosophy of Descartes. Leibniz recognized its difficulties. For him perception is an activity. If objects would be conceived as purely passive, they would not be able to perceive and not to build any relation at all. Thus, the world can’t be made of objects, since there is a world external to the human mind. He remained, however, being caught by theism, which brought him to the concept of monads as well as to the concept of the infinitesimal numbers. The concept of the monads should not be underestimated, though. Ultimately, they serve the purpose of immaterial elements that bear the ability to perceive and to transfer them to actual bodies, whether stuffed with a mind or not.

The following centuries brought just a tremendous technical refinement of Cartesian philosophy, despite there have been phases where people resisted its ideas, as for instance many people in the Baroque.

Setting objects as primary against the relation is at the core of phenomenology as well, and also, though in a more abstract version, of idealism. Husserl came up with the idea of the “phenomenon”, that impresses us, notably directly, or intuitively, without any interpretation. Similarly, the Kantian “Erhabenheit”, then tapered by Romanticism, is out there as an independent instance, before any reason or perception may start to work.

So, what is the significance of setting objects as primary constituents of the world? Where do we have to expect which effects?

2.1. Dust, Coherence, Patterns

When interpreted as a natural principle, or as a principle of nature, the idea of independence provokes and supports physical sciences. Independence matches perfectly with physics, yet it is also an almost perfect mismatch for biological sciences as far as they are not reducible to physics. The same is true for social sciences. Far from being able to recognize their own conditionability, most sociologist just practice methods taken more or less directly from physics. Just recall their strange addiction to statistics, which is nothing else than methodology of independence. Instead of asking for the abstract and factual genealogy of the difference between independence and coherence, between the molecule and harmony, they dropped any primacy of the relation, even its mere possibility.

The effects in architecture are well-known. On the one hand, modernism led to an industrialization, which is reaching its final heights in the parametrism of Schumacher and Hadid, among others. Yet, by no means there is any necessity that industrialization leads to parametrism! On the other hand, if in the realm of concepts there is no such thing as a primacy of relation, only dust, then there is also no form, only function, or at least a maximized reduction of any form, as it has been presented first by Mies von der Rohe. The modularity in this ideology of the absence of form is not that of living organisms, it is that of crystals. Not only the Seagram building is looking exactly like the structural model of sodium chloride. Of course, it represents a certain radicality. Note that it doesn’t matter whether the elementary cells of the crystal follows straight lines, or whether there is some curvature in their arrangements. Strange enough, for a modernist there is never a particular intention in producing such stuff. Intentions are not needed at all, if the objects bear the meaning. The modernists expectation is that everything the human mind can accomplish under such conditions is just uncovering the truth. Crystals just happen to be there, whether in modernist architecture or in the physico-chemistry of minerals.

Strictly spoken, it is deeply non-modern, perhaps ex-modern, to investigate the question why even modernists feel something like the following structures or processes mysteriously (not: mystical!) beautiful, or at least interesting. Well, I do not know, of course, whether they indeed felt like that, or whether they just pretended to do so. At least they said so… Here are the artefacts3:

Figure 1: a (left): Michael Hansmeyer column [4] ,b (right): Turing-McCabe-pattern (for details see this);

.

These structures are neither natural nor geometrical. Their common structural trait is the local instantiation of a mechanism, that is, a strong dependence on the temporal and spatial local context: Subdivision in case (a), and a probabilistically instantiated set of “chemical” reactions in the case of (b). For the modernist mindset they are simply annoying. They are there, but there is no analytical tool available to describe them as “object” or to describe their genesis. Yet, both examples do not show “objects” with perceivable properties that would be well-defined for the whole entity. Rather, they represent a particular temporal cut in the history of a process. Without considering their history—which includes the contingent unfolding of their deep structure—they remain completely incomprehensible, despite the fact that on the microscopical level they are well-defined, even deterministic.

From the perspective of primary objects they are separated from comprehensibility by the chasm of idealism, or should we say hyper-idealistic conditioning? Yet, for both there exists a set of precise mathematical rules. The difference to machines is just that these rules describe mechanisms, but not anything like the shape or on the level of the entirety. The effect of these mechanism on the level of the collective, however, can’t be described by those rules for the mechanism. They can’t be described at all by any kind of analytical approach, as it possible for instance in many areas in physics and, consequently in engineering, which so far is by definition always engaged in building and maintaining fully determinate machines. This notion of the mechanism, including the fact that only the concept of mechanism allows for a thinking that is capable to comprehend emergence and complexity—and philosophically potential—, is maybe one of the strongest differences between modernist thinking and “organicist” thinking (which has absolutely nothing to do with bubble architecture), as we may call it in a preliminarily.

Here it is probably appropriate to cite the largely undervalued work of Charles Jencks, who proposed as one of the first in the domain of architecture/urbanism the turn to complexity. Yet, since he had not a well-explicated formulation (based on an appropriate elementarization) at his disposal, we had neither been able to bring his theory “down to earth” nor to connect it to more abstract concepts. People like Jencks, Venturi, “parts of” Koolhaas (and me:)—or Deleuze or Foucault in philosophy—never have been modernist. Except the historical fact that they live(d) in a period that followed the blossoming of modernism, there is not any other justification to call them or their thinking “post-modern”. It is not the use of clear arguments that those reject, it is the underlying set of beliefs.

In modernism, that is, in the practice of the belief set as shown above, collective effects are excluded apriori, metaphysically as well as methodologically, as we will see. Statistics is by definition not able to detect “patterns”. It is an analytic technique, of which people believe that its application excludes any construction. This is of course a misbelief, the constructive steps are just shifted into the side-conditions of the formulas, resulting in a deep methodological subjectivity concerning the choice of a particular technique, or formula respectively.

This affects the perspective onto society as well as individual perception and thought. Slightly metaphorically spoken, everything is believed to be (conceptual) dust, and to remain dust. The belief in independence, fired perhaps by a latent skepticism since Descartes, has invaded the methods and the practices. At most, such the belief, one could find different kinds of dust, or different sizes of the hives of dust, governed by a time-inert, universal law. In turn, wherever laws are imposed to “nature”, the subject matter turns into conceptual dust.

Something like a Language Game, let it even be in combination with transcendental conditionability, must almost be incomprehensible for a modernist. I think they even do not see there possibility. While analytic philosophy is largely the philosophy that developed within modernism (one might say that it is thus not philosophy at all), the philosophical stances of Wittgenstein, Heidegger or Deleuze are outside of it. The instances of misunderstanding Wittgenstein as a positivist are countless! Closely related to the neglect of collective effects is the dismissal of the inherent value of the comparative approach. Again, that’s not an accusation. Its just the description of an effect that emerges as soon as the above belief set turns into a practice.

The problem with modernism is indeed tricky. On the one hand it blossomed engineering. Engineering, as it has been conceived since then, is a strictly modernist endeavor. With regard to the physical aspects of the world it works quite well, of course. In any other area, it is doomed to fail, for the very same reasons, unfortunately. Engineering of informational aspects is thus impossible as it is the engineering of architecture or the engineering of machine-based episteme, not to mention the attempt to enable machines to deal with language. Or to deal with the challenges emerging in the urban culture. Just to avoid misunderstandings: Engineering is helpful to find technical realizations for putative solutions, but it never can deliver any kind of solution itself, except the effect that people assimilate and re-shape the produces of urban engineering through their usage, turning them into something different than intended.

2.2. Meaning

The most problematic effects of the idea  of “primary objects” are probably the following:

  • – the rejection of creational power of unconscious or even purely material entities;
  • – the idea that meaning can be attached to objects;
  • – the idea that objects can be represented and must be represented by ideas.

These strong consequences do not concern just epistemological issues. In modernism, “objectivity” has nothing to do with the realm of the social. It can be justified universally and on purely formal grounds. We already mentioned that this may work in large parts of physics—it is challenged in quantum physics—but certainly not in most biological or social domains.

In his investigation of thought, Deleuze identifies representationalism ([9], p.167) as one of the eight major presuppositions of large parts of philosophy, especially idealism in the line from Platon, Hegel, and Frege up to Carnap.

(1) the postulate of the principle, or the Cogitatio natura universalis (good will of the thinker and good nature of thought); (2) the postulate of the ideal, or common sense (common sense as the concordia facultatum and good sense as the distribution which guarantees this concord); (3) the postulate of the model, or of recognition (recognition inviting all the faculties to exercise themselves upon an object supposedly the same, and the consequent possibility of error in the distribution when one faculty confuses one of its objects with a different object of another faculty); (4) the postulate of the element, or of representation (when difference is subordinated to the complementary dimensions of the Same and the Similar, the Analogous and the Opposed); (5) the postulate of the negative, or of error (in which error expresses everything which can go wrong in thought, but only as the product of external mechanisms); (6) the postulate of logical function, or the proposition (designation is taken to be the locus of truth, sense being no more than the neutralised double or the infinite doubling of the proposition); (7) the postulate of modality, or solutions (problems being materially traced from propositions or, indeed, formally defined by the possibility of their being solved); (8) the postulate of the end, or result, the postulate of knowledge (the subordination of learning to knowledge, and of culture to method). Together they form the dogmatic image of thought.

Deleuze by no means attacks the utility of these elements in principle. His point is just that these elements work together and should not be taken as primary principles. The effect of these presuppositions are disastrous.

They crush thought under an image which is that of the Same and the Similar in representation, but profoundly betrays what it means to think and alienates the two powers of difference and repetition, of philosophical commencement and recommence­ment. The thought which is born in thought, the act of thinking which is neither given by innateness nor presupposed by reminiscence but engendered in its genitality, is a thought without image.

As engineer, you may probably have been noticing issue (5). Elsewhere in our essay we already dealt with the fundamental misconception to start from an expected norm, instead from an open scale without imposed values. Only the latter attitude will allow for inherent adaptivity. Adaptive systems never will fail, because failure is conceptually impossible. Instead, they will cease to exist.

The rejection of the negative, which includes the rejection of the opposite as well as dialectics, the norm, or the exception, is particularly important if we think about foundations of whatsoever (think about Hegel, Marx, attac, etc.) or about political implications. We already discussed the case of Agamben.

Deleuze finally will arrive at this “new imageless image of thought” by understanding difference as a transcendental category. The great advantage of this move is that it does not imply a necessity of symbols and operators as primary, as it is the case if we would take identity as primary. The primary identical is either empty (a=a), that is, without any significance for the relation between entities, or it needs symbolification and at least one operator. In practice, however, a whole battery of models, classifications and the assumptions underlying them is required to support the claim of identity. As these assumptions are not justifiable within the claim of identity itself, they must be set, which results in the attempt to define the world. Obviously, attempting so would be quite problematic. It is even self-contradicting if contrasted with the modernists claim of objectivity. Setting the difference as primary, Deleuze not only avoids the trap of identity and pre-established harmony in the hive of objects, but also subordinates the object to the relation. Here he meets with Wittgenstein and Heidegger.

Together, the presupposition of identity and objecthood is necessarily and in a bidirectional manner accompanied with another quite abundant misunderstanding, according to which logic should be directly applicable to the world. World here is of course “everything” except logic, that is (claimed) objects, their relations, measurement, ideas, concepts and so on. Analytic philosophy, positivism, external realism and the larger movement of modernism all apply the concept of bi-valent logic to empirical entities. It is not really a surprise that this leads to serious problems and paradoxa, which however are pseudo-paradoxa. For instance, universal justification requires knowledge. Without logical truity in knowledge universal justification can’t be achieved. The attempt to define knowledge as consisting of positive content failed, though. Next, the formula of “knowledge as justified belief” was proposed. In order not to fall prey to the Gettier-problem, belief itself would have to be objectified. Precisely this happened in analytic philosophy, when Alchourron et al. (1985) published their dramatically (and overly) reduced operationalization of “belief”. Logic is a condition, it is transcendental to its usage. Hence, it is inevitable to instantiate it. By means of instantiation, however, semantics invades equally inevitable.

Ultimately due to the presupposed primacy of identity, modernists are faced with a particular difficulty in dealing with relations. Objects and their role should not be dependent on their interpretation. As a necessary consequence, meaning—and information—must be attached to objects as quasi-physical properties. There is but one single consequence: tyranny. Again, it is not surprising that at the heights of modernism the bureaucratic tyranny was established several times.

Some modernists would probably allow for interpretation. Yet, only as a means, not as a condition, not as a primacy. Concerning their implications, the difference between the stances is a huge one. If you take it simply as a means, keeping the belief into the primacy of objects, you still would adhere to the idea of “absolute truth” within the physical world. Ultimately, interpretation would be degraded into an error-prone “method”, which ideally should have no influence onto the recognition of truth, of course. The world, at least the world that goes beyond the mere physical aspects, appears as a completely different one if relations, and thus interpretation is set as primary. Obviously, this implies also a categorical difference regarding the way one approaches that world, e.g. in science, or the way one conceives of the possible role of design. Is a nothing else than myth that a designer, architect, or urbanist designs objects. The practitioners in these professions design potentials, namely that for the construction of meaning by the future users and inhabitants (cf. [5]). There is nothing a designer can do to prevent a particular interpretation or usage. Koolhaas concludes that regarding Junkspace this may lead to a trap, or kind of a betrayal [3]:

Narrative reflexes that have enabled us from the beginning of time to connect dots, fill in blanks, are now turned against us: we cannot stop noticing—no sequence is too absurd, trivial, meaningless, insulting… Through our ancient evolutionary equipment, our irrepressible attention span, we helplessly register, provide insight, squeeze meaning, read intention; we cannot stop making sense out of the utterly senseless… (p.188)

I think that on the one hand Koolhaas here accepts the role of interpretation, yet, and somewhat contradictory, he is not able to recognize that it is precisely the primacy of interpretation that enables for an transformation through assimilation, hence the way out of Junkspace. Here he remains modernist to the full extent.

The deep reason being that for the object-based attitude there is no possibility at all to recognize non-representational coherence. (Thus, a certain type of illiteracy regarding complex texts is prevailing among “true” modernists…)

2.3. Shades of Empiricism

Science, as we understand it today—yet at least partially also as we practice it—is based on the so-called hypothetico-deductive approach of empiricism (cf. [6]). Science is still taken as a synonym for physics by many, even in philosophy of science, with only very few exceptions. There, the practice and the theory of Life sciences are not only severely underrepresented, quite frequently biology is still reduced to physics. Physicists, and their philosophical co-workers, often claim that the whole world can be reduced to a description in terms of quantum mechanics (among many others cf. [7]). A closely related reduction, only slightly less problematic, is given by the materialist’s claim that mental phenomena should be explained completely in biological terms, that is, using only biological concepts.

The belief in empiricism is implemented into the methodological framework that is called “statistics”. The vast majority of the statistical tests rest on the assumption that observations and variables are independent from each other. Some tests are devised to test for independence, or dependence, but this alone does not help much. Usually, if dependency is detected, then the subsequent tests are rearranged as to fit again the independence assumption. In other words, any possibly actual coherence is first assumed to be nonexistent. By means of the method itself, the coherence is indeed destroyed. Yet, once it is destroyed, you never will get it back. It is quite simple: The criteria for any such construction are just missing.

From this perspective, statistics is not scientific according to science’s own measures; due to its declared non-critical and  non-experimental stance it actually looks more like ideology. For a scientific method would perform an experiment for testing whether something could be assumed or not. As Nobel laureate Konrad Lorenz said: I never needed statistics to do my work. What would be needed instead is indeed a method that is structurally independent of any independence assumption regarding the observed data. Such a method would propose patterns if there are sufficiently dense hints, and not , otherwise. Without proposing one or the other apriori. From that perspective, it is more the representationalism in modernism that brings the problem.

This framework of statistics is far from being homogeneous, though. Several “interpretations” are fiercely discussed: frequentism, bayesianism, uncertainty, or propensity. Yet, any of them faces serious internal inconsistencies, as Alan Hajek convincingly demonstrated [8]. To make a long story short (the long version you can find over here), it is not possible to build a model without symbols, without concepts that require interpretation and further models, and outside a social practice, or without an embedding into such. Modernists usually reject such basics and eagerly claim even universal objectivity for their data (hives of dust). More than 50 years ago, Quine proofed that believing otherwise should be taken just as nothing else than a dogma [9]. This dogma can be conceived as a consequence of the belief that objects that are the primary constituents of the world.

Of course, the social embedding is especially important in the case of social affairs such like urbanism. The claim that any measurement of data then treated by statistical modeling (they call it wrongly “analysis”) could convey any insight per se is nothing but pretentious.

Dealing with data always results in some kind of construction, base don some methods. Methods, however, respond differentially to data, they filter. In other words, even applying “analytical” methods involves interpretation, often even a strong one. Unfortunately for the modernist, he excluded the possibility of the primacy of interpretation at all, because there are only objects out there. This hurdle is quickly solved, of course, by the belief that the meaning is outside of interpretation. As result, they believe, that there is a necessary progress towards the truth. For modernists: Here you may jump back to subsection 3.2. …

2.4. Machines

For le Corbusier a house is much like a “machine for living in”. According to him, a building has clear functions, that could be ascribed apriori, governed by universal relations, or even laws. Recently, people engaged in the building economy recognized that it may turn problematic to assign a function apriori, as it simply limits the sales arguments. As a result, any function from the building as well as from the architecture itself tends to be stripped away. The “solution” is a more general one. Yet, in contrast to an algebraic equation that will be instantiated before used, the building actually exists after building it. It is there. And up today, not in a reconfigurable form.

Actually, the problem is created not by the tendency for more general, or even pre-specific solutions. It turns critical if it generality amalgamates with the modernist attitude. The category of machines, which is synonymic to ascribing or assigning a function (understood as usage) apriori, doesn’t accept any reference to luxury. A machine that would contain properties or elements that don’t bear any function, at least temporarily, other than pleasure (which does not exist in a world that consists only of objects) would be badly built. Minimalism is not just a duty, it even belongs to the grammar of modernism. Minimalism is the actualization and representation of mathematical rigidity, which is also a necessity as it is the only way to use signs without interpretation. At least, that is the belief of modernists.

The problem with minimalism is that it effectively excludes evolution. Either the produce fits perfectly or not at all. Perfectness of the match can be expected only, if the user behaves exactly as expected, which represents nothing else than dogmatism, if not worse. Minimalism in form excludes alternative interpretations and usages, deliberately so, it even has  to exclude the possibility for the alternative. How else to get rid of alternatives? Koolhaas rightly got it: by nothingness (minimalism), or by chaos.

3. Urbanism, and Koolhaas.

First, we have of course to make clear that we will be able to provide only a glimpse to the field invoked by this header. Else, our attempts here should not be understood as a proposal to separate architecture from urbanism. Both, regarding theory and implementation they more and more overlap. When Koolhaas explains the special situation of the Casa do Musica in Porto, he refers to processes like continuation of certain properties and impressions from the surround to be continued inside of the building. Inversely, any building, even any persistent object in a city shifts the qualities of its urban surround.

Rem Koolhaas, once journalist, then architect, now for more than a decade additionally someone doing comparative studies on cities has performatively demonstrated—by means of his writings such as “S,M,L,XL”, “Generic City” or “Junkspace”—that a serious engagement about the city can’t be practiced as a disciplinary endeavor. Human culture moved irrevocably into a phase where culture largely means urban culture. Urbanists may be seen as a vanishing species that became impossible due to the generality of the field. “Culturalist” is neither a proper domain nor a suitable label. Or perhaps they moult into organizers of research in urban contexts, similarly as architects are largely organizers for creating buildings. Yet, there is an important difference: Architects may still believe that they externalize something. Such a belief is impossible for urbanists, because they are part of the culture. It is thus questionable, if a project like the “Future Cities Laboratory” should indeed be called such. It is perhaps only possible to do so in Singapore, but that’s the subject of one of the next essays.

Rem Koolhaas wrote “Delirious New York” before turning to architecture and urbanism as a practitioner. There, he praised its diversity and manifoldness that, in or by means of his dreams, added up to the deliriousness of Manhattan, and probably also of his own.

Without any doubt, the particular quality of Manhattan is its empowering density, which is not actualizing as the identical, but rather as heterotopia, as divergence. In some way, Manhattan may be conceived as the urban precursor of the internet [11], built first in steel, glass and concrete. Vera Bühlmann writes:

Manhattan space is, if not yet everywhere, so at least in the internet potentially everywhere, and additionally not limited to three, probably even spatial dimensions.4

Urbanism is in urgent demand of an advanced theory that refers to the power of networks. It was perhaps this “network process” that brought Koolhaas to explore the anti-thesis of the wall and the plane, the absolute horizontal and vertical separation. I say anti-thesis, because Delirious New York itself behaves quite ambiguously, half-way between the Hegelian, (post-)structuralist dialectics and utopia on the one side and an affirmation of heterotopias on the other hand as a more advanced level of conceptualization alienating processes, which always are also processes of selection and individuation into both directions, the medium and the “individual”. Earlier scholars like Aldo Rossi have been too early to go into that direction as networks weren’t recognizable as part of the Form of Life. Even Shane is only implicitly referring to their associative power (he does not refer to complexity as well). And Koolhaas was not either, and probably is still not aware of this problematics.

Recently, I have been proposing one of the possible approaches to build such a theory, the according concepts, terms and practices (for more details see [12]). It is rather important, to distinguish two very basic forms of networks, logistic and associative networks. Logistic networks are used everywhere in modernist reasoning about cities and culture. Yet, they exclusively refer to the network as a machine, suitable to optimize the transport of anything. Associative networks are completely different. They do not transfer anything, they swallow, assimilate, rearrange, associate and, above all, they learn. Any associative network can learn anything. The challenge is, particularly for modernist attitudes, that it can’t be controlled what exactly an associative network is going to learn. The interesting thing about it is that the concept of associative networks provides a bridge to the area of advanced “machine”-learning and to the Actor-Network-Theory (ANTH) of Bruno Latour. The main contribution of ANTH is its emphasis of agency, even of those mostly mineral material arrangements that are usually believed to have no mental capacity.

It is clear, that an associative network may not be perceived at all under the strictly practiced presupposition of independence, as it is typical for modernism. Upon its implementation, the  belief set of modernism tends to destroy the associativity, hence also the almost inevitable associations between the more or less mentally equipped actors in urban environments.

When applied to cities, it breaks up relations, deliberately. Any interaction of high-rise buildings, so typical for Manhattan, is precluded intentionally. Any transfer is optimized just along one single parameter: time, and secondarily, space as a resource. Note that optimization always requires the apriori definition of a single function. As soon as would allow for multiple goals, you would be faced with the necessity of weighting and assigning subjective expectations, which are subjective precisely due to the necessity of interpretation. In order to exclude even the possibility for it, modernists agree hastily to optimize time (as a resource under the assignment of scarcity and physicality), once being understood as a transcendental condition.

As Aldo Rossi remarked already in the 1960ies [13], the modernist tries to evacuate any presence of time from the city. It is not just that history is cut off and buried, largely under false premises and wrong conclusions, reducing history just to institutional traditions (remember, there is no interpretation for a modernist!). In some way, it would have been even easy to predict Koolhaas’ Junkspace already in the end of the 19th century. Well, the Futurologists did it, semi-paradoxically, though. Quite stringent, Futurism was only a short phase within modernism. This neglect of time in modernism is by no means a “value” or an intention. It is a direct logical consequence of the presupposed belief set, particularly independence, logification and the implied neglect of context.

Dis-assembling the associative networks of a city results inevitably in the modernist urban conceptual dust, ruled by the paradigm of scarce time and the blindness against interpretation, patterns and non-representational coherence. This is in a nutshell, what I would like to propose as the deep grammar of the Junkspace, as it has been described by Koolhaas. Modernism did nothing else than to build and to actualize it conceptual dust. We may call it tertiary chaos, which has been—in its primary form—equal to the initial state of indiscernability concerning the cosmos as a whole. Yet, this time it has been dictated by modernists. Tertiary chaos thus can be set equal to the attempt to make any condition for the possibility of discernability vanishing.

Modernists may not be aware that there is not only already a theory of discernability, which equals to the Peircean theory of the sign, there is also an adaptation and application to urbanism and architecture. Urbanists probably may know about the name “Venturi”, but I seriously doubt that semiotics is on their radar. If modernists talk about semiotics at all, they usually refer to the structuralist caricature of it, as it has been put forward by de Saussure, establishing a closed version of the sign as a “triangle”. Peircean signs—and these have been used by Venturi—establish as an interpretive situation. They do not refer to objects, but just to other signs. Their reference to the world is provided through instances of abstract models and a process of symbolification, which includes learning as an ability that precedes knowledge. (more detail here in this earlier essay) Unfortunately, Venturi’s concept have scarcely been updated, except perhaps in the context of media facades [14]. Yet, media facades are mostly and often vastly misunderstood as the possibility to display adverts. There are good arguments supporting the view that there is more about them [15].

Modernists, including Koolhaas employ a strange image of evolution. For him (them), evolution is pure arbitrariness, both regarding the observable entities and processes as well as regarding the future development. He supposes to detect “zero loyalty-and zero tolerance-toward configuration“ ([3] p.182). In the same passage he simultaneously and contradictory misses the „”original” condition“ and blames history for its corruptive influence: „History corrupts, absolute history corrupts absolutely.“ All of that is put into the context of a supposedly “”permanent evolution.”“ (his quot. marks). Most remarkably, even biologists as S.J. Gould, pretending to be evolutionary biologist, claims that evolution is absolutely arbitrary. Well, the only way out of the contrasting fact that there is life in the form we know about it is to assume some active divine involvement. Precisely this was the stance of Gould. People like Gould(and perhaps Koolhaas) commit the representationalist fault, which excludes them from recognizing (i) the structural tendency of any evolution towards more general solutions, and (ii) the there is an evolution of evolutionarity. The modernist attitude towards evolution can again be traced back to the belief into metaphysical independence of objects, but our interest here is different.

Understanding evolution as a concept has only little to do with biology and the biological model that is called “natural evolution”. Natural evolution is just an instance of evolution into physico-chemical and then biological matter. Bergson has been the first who addressed evolution as a concept [16], notably in the context of abstract memory. In a previous essay we formalized that approach and related it to biology and machine-learning. At its basics, it requires a strict non-representational approach. Species and organisms are expressed in terms of probability. Our conclusion was that in a physical world evolution inevitably takes place if there at least two different kinds or scales of memory. Only on that abstract level we can adopt the concept of evolution into urbanism, that is, into any cultural context.

Memory can’t be equated to tradition, institutions or even the concrete left-overs of history, of course. They are just instances of memory. It is of utmost importance here, not to contaminate the concept of memory again with representationalism. This memory is constructive. Memory that is not constructive, is not memory, but a stock, a warehouse (although these are also kinds of storage and contribute as such to memory). Memory is inherently active and associative. Such memory is the basic, non-representative element of a generally applicable evolutionary theory.

Memory can not be “deposited” into almost geological layers of sediments, quite in contrast to the suggestions of Eisenman, whom Rajchman follows closely in his “Constructions”.

The claim of “storable memory” is even more disastrous than the the claim that information could be stored. These are not objects and items that are independent of an interpretation, they are the processes of constructive of guided interpretation. Both “storages” would only become equal to the respective immaterial processes under the condition of a strictly deterministic set of commands. Even the concept of the “rule” is already too open to serve the modernist claim of storable memory.

It is immediately clear that the dynamic concept of memory is highly relevant for any theory about urban conditions. It provides a general language to derive particular models and instances of association, stocks and flows, that are not reducible to storage or transfers. We may even expect that whenever we meet kind of material storage in an urban context, we also should expect association. The only condition for that just being that there are no modernists around… Yet, storage without memory, that is, without activity remains dead, much like but even less than a crystal. Cripples in the sand. The real relevance of stocks and flows is visible only in the realm of the non-representational, the non-material, if we conceive it as waves in abstract density, that is as media, conveying the potential for activity as a differential. Physicalists and modernists like Christianse or Hillier will never understand that. Just think of the naïve empirics, calling it cartography, they are performing around the world.

This includes deconstructivism as well. Derrida’s deconstructivism can be read as a defense war against the symbolification of the new, the emerging, the complex, the paradox of sense. His main weapon is the “trail”, of which he explicitly states that it could not be interpreted at all. Such, Derrida as master of logical flatness and modernist dust is the real enemy of progress. Peter Sloterdijk, the prominent contemporary German “philosopher”5, once called Derrida the “Old Egyptian”. Nothing would fit better to Derrida, who lives in the realm of shadows and for whom life is just a short transitory phase, hopefully “survived” without too much injuries. The only metaphor being possible on that basis is titanic geology. Think of some of Eisenman’s or Libeskind’s works.

Figure 2: Geologic-titanic shifts induced by the logical flatness of deconstructivism

a: Peter Eisenman, Aronoff Center for Design and Art in Cincinnati (Ohio) (taken from [11]); the parts of building are treated blocks, whose dislocation reminds to that of geological sediments (or the work of titans).

b: Daniel Libeskind, Victoria and Albert Museum Boilerhouse Extension. Secondary chaos, inducing Junkspace through its isolationist “originality”, conveying “defunct myths” (Koolhaas in [3], p.189).

Here we finish our exploration of generic aspects of the structure of modernist thinking. Hopefully, the sections so far are sufficiently suited to provide some insights about modernism in general, and the struggles Koolhaas is fighting with in “Junkspace”.

4. Redesigning Urbanism

Redesigning urbanism, that is to unlock it from modernist phantasms is probably much more simple than it may look at first sight. Well, not exactly simple, at least for modernists. Everything is about the presuppositions. Dropping the metaphysical believe of independence without getting trapped by esotericism or mysticism might well be the cure.Of course, metaphysical independence need to be removed from any level and any aspect in urbanism, starting from the necessary empirical work, which of course is already an important part of the construction work. We already mentioned that the notion of “empirical analysis” pretends neutrality, objectivity (as independence from the author) and validity. Yet, this is pure illusion. Independence should be abandoned also in its form of searching for originality or uniqueness, trying to set an unconditional mark in the cityscape. By that we don’t refer to morphing software, of course.

The antidote against isolationism, analyticity and logic is already well-known. To provide coherence you have to defy splintering and abjure the believe in (conceptual) dust. The candidate tool for it is story-telling, albeit in a non-representational manner, respecting the difference and heterotopias from the beginning. In turn this also means to abandon utopias and a-topias, but to embrace complexity and a deep concept of prevailing differentiation (in a subsequent essay we will deal with that). As citizens, we are not interested in non-places and deserts of spasmodic uniqueness (anymore) or the mere “solution of problems” either (see Deleuze about the dogmatic image of thought as cited above). Changing the perspective from the primacy of analysis to the primacy story-telling immediately reveals the full complexity of the respective Form of Life, to which we refer here as a respectful philosophical concept.

It is probably pretentious to speak such about urbanism as a totality. There are of course, and always have been, people who engaged in the urban condition based on a completely different set of believes, righteous non-modern. Those people start with the pattern and never tear them apart. Those people are able to distinguish structure, genesis and appearance. In biology, this distinction has been instantiated into the perspectives of the genotype, the phenotype, and, in bio-slang, evo-devo, the compound made from development, growth and evolution. These are tied together (necessarily) by complexity. In philosophy, the respective concepts are immanence, the differential, and the virtual.

For urbanism, take for instance the work of David Shane (“Recombinant Urbanism“). Shane’s work, which draws much on Kelly’s, is a (very) good starting point not only for any further theoretical work, but also for practical work.

As a practitioner, one has to defy the seduction for the totality of a master plan, as the renowned parametricists actualize in Istanbul, Christianse and his office did recently in Zürich at the main station. Both are producing pure awfulness, castles of functional uniformity, because they express the totality of the approach even visually. Even in Singapore’s URA (Urban Development Authority), the master plan has been relativised in favor of a (slightly) more open conceptualization. Designer’s have to learn that not less is more, but rather that partial nothingness is more. Deliberately non-planning, as Koolhaas has repeatedly emphasized. This should not be taken representationally, of course. It does not make any sense to grow “raw nature”, jungles within the city, neither for the city, nor for the “jungle”. Before a crystal can provide soil for real life, it must decay, precisely because it is a closed system (see next figure 3). Adaptive systems replace parts, melt holes to build structures, without decaying at all. We will return to this aspect of differentiation in a later article.

Figure 3: Pruitt-Igoe (St.Louis), getting blasted in 1972. Charles Jencks called this event “one of the deaths of modernism”. This had not been the only tear-down there. Laclede, a neighborhood nearby Pruitt-Igoe, made from small, single-flat houses failed as well, the main reasons being an unfortunate structure of the financial model and political issues, namely separation of “classes” and apartheid. (see this article).

The main question for finding a practicable process therefore is: How to ask, which questions should we address in order to build an analytics under the umbrella of story-telling, that avoids the shortfalls of modernism?

We might again take a look to biology (as a science). As urbanism, biology is also confronted with a totality. We call it life. How to address reasonable, that is fruitful questions to that totality? Biology already found a set of answer, which nevertheless are not respected by the modernist version of this science, mainly expressed as genetics. The first insight was, that “nothing in biology makes sense except in the light of evolution.”[17] Which would be the respective question for urbanism? I can’t give an answer here, but it is certainly not independence. This we can know through the lesson told by “Junkspace”. Another, almost ridiculous anti-candidate is sustainability, as far as it is conceived in terms of scarcity of mainly physical resources instead of social complexity. Perhaps we should remember the history of the city beyond its “functionality”. Yet, that would mean to first develop an understanding of (abstract) evolution, to instantiate that, and then to derive a practicable model for urban societies. What does it mean to be social, what does it mean to think, both taken as practice in a context of freedom? Biology then developed a small set of basic contexts along to which any research should be aligned to, without loosing the awareness (hopefully) that there are indeed four of such contexts. These have been clearly stated by Nobel laureate Tinbergen [18]. According to him research in biology is suitably structured by four major per­spectives: phylogeny, ontogeny, physiology and behavior. Are there similarly salient dimensions for structuring thought in urbanism, particularly in a putative non-modernist (neither modernist, not post-modernist) version? Particularly interesting are, imho, especially the intersections of such sub-domains.

Perhaps differentiation (as a concept) is indeed a (the) proper candidate for the grand perspective. We will discuss some aspects of this in the next essay: it includes growth and its modes, removal, replacement, deterioration, the problem of the generic, the difference between development and evolution, and a usable concept of complexity. to name but a few. In the philosophy of Gilles Deleuze, particularly the Thousand Plateaus, Difference and Repetition and the Fold, we already can find a good deal of theoretical work about he conceptual issues around differentiation. Differentiation includes learning, individually and collectively (I do NOT refer to swarm ideology here, nor to collectivist mysticism either!!!), which in turn would bring in the (abstract) mental into any consideration of urbanism. Yet, wasn’t mankind differentiating and learning all the time? The challenge will be to find a non-materialist interpretation of those in these materialist times.

Notes

1. Cited after [11]

2. Its core principles are the principle of excluded middle (PEM) and the  principle of non-contradictivity (PNC). Both principles are equivalent to the concept of macroscopic objects, albeit only in a realist perspective, i.e. under the presupposition that objects are primary against relations. This is, of course, quite problematic, as it excludes an appropriate conceptualisation of information.

Both, the PEM and PNC allow for the construction of paradoxes like the Taylor Paradox. Such paradoxes may be conceived as “Language Game Colliders”, that is as conceptual devices which commit a mistake concerning the application of the grammar of language games. Usually, the bring countability and the sign for non-countability into conflict. First, it is a fault to compare a claim with a sign, second, it is stupid to claim contradicting proposals. Note, that here we are allowed to speak of “contradiction”, because we are following the PNC as it is suggested by the PNC claim. The Taylor-Paradox is of course, like any other paradox, a pseudo-problem. It appears only due to an inappropriate choice or handling of the conceptual embedding, or due to the dismissal of the concept of the “Language Game”, which mostly results in the implicit claim of the existence of a “Private Language”.

3. Vera Bühlmann, “Articulating quantities, if things depend on whatever can be the case“, lecture held at The Art of Concept, 3rd Conference: CONJUNCTURE — A Series of Symposia on 21st Century Philosophy, Politics, and Aesthetics, organized by Nathan Brown and Petar Milat, Multimedia Institute MAMA in Zagreb, Kroatia, June 15-17 2012.

4. German orig.: “Manhattan Space ist, wenn schon nicht überall, so doch im Internet potentiell überall, und zudem nicht mehr auf drei vielleicht gar noch räumliche Dimensionen beschränkt.”

5. Peter Sloterdjik does not like to be called a philosopher

References

  • [1] Michel Foucault, Archaeology of Knowledge. Routledge 2002 [1969].
  • [2] Vera Bühlmann, Printed Physics, de Gruyter, forthcoming.
  • [3] Rem Koolhaas (2002). Junkspace. October, Vol. 100, “Obsolescence”, pp. 175-190. MIT Press
  • [4] Michael Hansmeyer, his website about these columns.
  • [5] “Pseudopodia. Prolegomena to a Discourse of Design”. In: Vera Bühlmann and Martin Wiedmer . pre-specifics. Some Comparatistic Investigations on Research in Art and Design. JRP| Ringier Press, Zurich 2008. p. 21-80 (English edition). available online;
  • [6] Wesley C. Salmon, Causality and Explanation. Oxford University Press, Oxford 1998.
  • [7] Michael Epperson (2009). Quantum Mechanics and Relational Realism: Logical Causality and Wave Function Collapse. Process Studies, 38(2): 339-366.
  • [8] Alan Hájek (2007). The Reference Class Problem is Your Problem Too. Synthese 156 (3):563-585.
  • [9] W.v.O. Quine (1951), Two Dogmas of Empiricism. The Philosophical Review 60: 20-43.
  • [10] Gilles Deleuze, Difference and Repetition. Columbia University Press, New York 1994 [1968].
  • [11] Vera Bühlmann, inhabiting media. Thesis, University of Basel (CH), 2009.
  • [12] Klaus Wassermann (2010). SOMcity: Networks, Probability, the City, and its Context. eCAADe 2010, Zürich. September 15-18, 2010. (pdf)
  • [13] Aldo Rossi, The Architecture of the City. MIT Press, Cambridge (Mass.) 1982 [1966].
  • [14] Christoph Kronhagel (ed.), Mediatecture, Springer, Wien 2010. pp.334-345.
  • [15] Klaus Wassermann, Vera Bühlmann, Streaming Spaces – A short expedition into the space of media-active façades. in: Christoph Kronhagel (ed.), Mediatecture, Springer, Wien 2010. pp.334-345. available here. available here
  • [16] Henri Bergson, Matter and Memory. (Matière et Mémoire 1896) transl. N.M. Paul & W.S. Palmer. Zone Books 1990.
  • [17] Theodore Dobzhansky, Genetics and the Origin of Species, Columbia University Press, New York 1951 (3rd ed.) [1937].
  • [18] Niko Tinbergen (1963). On Aims and Methods in Ethology, Z. Tierpsych., (20): 410–433.

۞

Transformation

May 17, 2012 § Leave a comment

In the late 1980ies there was a funny, or strange, if you like,

discussion in the German public about a particular influence of the English language onto the German language. That discussion got not only teachers engaged in higher education going, even „Der Spiegel“, Germany’s (still) leading weekly news magazine damned the respective „anglicism“. What I am talking about here considers the attitude to „sense“. At those times well 20 years ago, it was meant to be impossible to say „dies macht Sinn“, engl. „this makes sense“. Speakers of German at that time understood the “make” as “to produce”. Instead, one was told, the correct phrase had to be „dies ergibt Sinn“, in a literal, but impossible translation something like „this yields sense“, or even „dies hat Sinn“, in a literal, but again wrong and impossible translation, „this has sense“. These former ways of building a reference to the notion of „sense“ feels even awkward for many (most?) speakers of German language today. Nowadays, the English version of the meaning of the phrase replaced the old German one, and one even can find in the “Spiegel“ now the analogue to “making” sense.

Well, the issue here is not just one historical linguistics or one of style. The differences that we can observe here are deeply buried into the structure of the respective languages. It is hard to say whether such idioms in German language are due to the history of German Idealism, or whether this particular philosophical stance developed on the basis of the structures in the language. Perhaps a bit of both, one could say from a Wittgensteinian point of view. Anyway, we may and can be relate such differences in “contemporary” language to philosophical positions.

It is certainly by no means an exaggeration to conclude that the cultures differ significantly in what their languages allow to be expressible. Such a thing as an “exact” translation is not possible beyond trivial texts or a use of language that is very close to physical action. Philosophically, we may assign a scale, or a measure, to describe the differences mentioned above in probabilistic means, and this measure spans between pragmatism and idealism. This contrast also deeply influences philosophy itself. Any kind of philosophy comes in those two shades (at least), often expressed or denoted by the attributes „continental“ and „anglo-american“. I think these labels just hide the relevant properties. This contrast of course applies to the reading of idealistic or pragmatic philosophers itself. It really makes a difference (1980ies German . . . „it is a difference“) whether a native English speaking philosopher reads Hegel, or a German native, whether a German native is reading Peirce or an American guy, whether Quine conducts research in logic or Carnap. The story quickly complicates if we take into consideration French philosophy and its relation to Heidegger, or the reading of modern French philosophers in contemporary German speaking philosophy (which is almost completely absent).1

And it becomes even more complicated, if not complex and chaotic, if we consider the various scientific sub-cultures as particular forms of life, formed by and forming their own languages. In this way it may well seem to be rather impossible—at least, one feels tempted to think so—to understand Descartes, Leibniz, Aristotle, or even the pre-Socratics, not to speak about the Cro-Magnon culture2, albeit it is probably more appropriate to reframe the concept of understanding. After all, it may itself be infected by idealism.

In the chapters to come you may expect the following sections. As we did before we’ll try to go beyond the mere technical description, providing the historical trace and the wider conceptual frame:

A Shift of Perspective

Here, I need this reference to the relativity as it is introduced in—or by­ —language for highlighting a particular issue. The issue concerns a shift in preference, from the atom, the point, from matter, substance, essence and metaphysical independence towards the relation and its dynamic form, the transformation. This shift concerns some basic relationships of the weave that we call “Lebensform” (form of life), including the attitude towards those empiric issues that we will deal with in a technical manner later in this essay, namely the transformation of “data”. There are, of course, almost countless aspects of the topos of transformation, such like evolutionary theory, the issue of development, or, in the more abstract domains, mathematical category theory. In some way or another we already dealt with these earlier (for category theory, for evolutionary theory). These aspects of the concept of transformation will not play a role here.

In philosophical terms the described difference between German and English language, and the change of the respective German idiom  marks the transition from idealism to pragmatism. This corresponds to the transition from a philosophy of primal identity to one where difference is transcendental. In the same vein, we could also set up the contrast between logical atomism and the event as philosophical topoi, or between favoring existential approaches and ontology against epistemology. Even more remarkably, we also find an opposing orientation regarding time. While idealism, materialism, positivism or existentialism (and all similar attitudes) are heading backwards in time, and only backwards, pragmatism and, more generally, a philosophy of events and transformation is heading forward, and only forward. It marks the difference between settlement (in Heideggerian „Fest-Stellen“, English something like „fixing at a location“, putting something into the „Gestell“3) and anticipation. Settlements are reflected by laws of nature in which time does not—and shall not—play a significant role. All physical laws, and almost all theories in contemporary physics are symmetric with respect to time. The “law perspective” blinds against the concept of context, quite obviously so. Yet, being blinded against context also disables to refer to information in an adequate manner.

In contrast, within a framework that is truly based on the primacy of interpretation and thus following the anticipatory paradigm, it does not make sense to talk about “laws”. Notably, issues like the “problem” of induction exist only in the framework of the static perspective of idealism and positivism.

It is important to understand that these attitudes are far from being just “academic” distinctions. There are profound effects to be found on the level of empiric activity, how data are handled using which kind of methods. Further more, they can’t be “mixed”, once one of them have been chosen. Despite we may switch between them in a sequential manner, across time or across domains, we can’t practice them synchronously as the whole setup of the life form is influenced. Of course, we do not want to rate one of them as the “best”, we just want to ensure that it is clear that there are particular consequences of that basic choice.

Towards the Relational Perspective

As late as 1991, Robert Rosen’s work about „Relational Biology“ has been anything but nearby [1]. As a mathematician, Rosen was interested in the problematics of finding a proper way to represent living systems by formal means. As a result of this research, he strongly proposed the “relational” perspective. He identifies Nicolas Rashevsky as the originator of it, who mentioned about it around 1935 for the first time. It really sounds strange that relational biology had to be (re-)invented. What else than relations could be important in biology? Yet, still today the atomistic thinking is quite abundant, think alone about the reductionist approaches in genetics (which fortunately got seriously attacked meanwhile4). Or think about the still prevailing helplessness in various domains to conceive appropriately about complexity (see our discussion of this here). Being aware of relations means that the world is not conceived as made from items that are described by inputs and outputs with some analytics, or say deterministics, in between. Only such items could be said that they “function”. The relational perspective abolishes the possibility of the reduction of real “systems” to “functions”.

As it is already indicated by the appearance of Rashevsky, there is, of course, a historical trace for this shift, kind of soil emerging from intellectual sediments.5 While the 19th century could be considered as being characterized by the topos of population (of atoms)—cf. the line from Laplace and Carnot to Darwin and Boltzmann—we can observe a spawning awareness for the relation in the 20th century. Wittgenstein’s Tractatus started to oppose Frege and has been always in stark contrast to logical positivism, then accompanied by Zermelo (“axiom” of choice6), Rashevsky (relational biology), Turing (morphogenesis in complex systems), McLuhan (media theory), String Theory in physics, Foucault (field of propositions), and Deleuze (transcendental difference). Comparing Habermas and Luhmann on the one side—we may label their position as idealistic functionalism—with Sellars and Brandom on the other—who have been digging into the pragmatics of the relation as it is present in humans and their culture—we find the same kind of difference. We also could include Gestalt psychology as kind of a pre-cursor to the party of “relationalists,” mathematical category theory (as opposed to set theory) and some strains from the behavioral sciences. Researchers like Ekman & Scherer (FACS), Kummer (sociality expresses as dynamics in relative positions), or Colmenares (play) focused the relation itself, going far beyond the implicit reference to the relation as a secondary quality. We may add David Shane7 for architecture and Clarke or Latour8 for sociology. Of course, there are many, many other proponents who helped to grow the topos of the relation, yet, even without a detailed study we may guess that compared to the main streams they still remain comparatively few.

These difference could not be underestimated in the field of information sciences, computer sciences, data analysis, or machine-based learning and episteme. It makes a great difference whether one would base the design of an architecture or the design of use on the concept of interfaces, most often defined as a location of full control, notably in both directions, or on the concept of behavioral surfaces.9. In the field of empiric activities, that is modeling in its wide sense, it yields very different setups or consequences whether we start with the assumption of independence between our observables or between our observations or whether we start with no assumptions about the dependency between observables, or observations, respectively. The latter is clearly the preferable choice in terms of intellectual soundness. Even if we stick to the first of both alternatives, we should NOT use methods that work only if that assumption is satisfied. (It is some kind of a mystery that people believe that doing so could be called science.) The reason is pretty simple. We do not know anything about the dependency structures in the data before we have finished modeling. It would inevitably result in a petitio principii if we’d put “independence” into the analysis, wrapped into the properties of methods. We would just find. . . guess what. After destroying facts—in the Wittgensteinian sense understood as relationalities—into empiristic dust we will not be able to find any meaningful relation at all.

Positioning Transformation (again)

Similarly, if we treat data as a “true” mapping of an outside “reality”, as “givens” that eventually are distorted a bit by more or less noise, we will never find multiplicity in the representations that we could derive from modeling, simply because it would contradict the prejudice. We also would not recognize all the possible roles of transformation in modeling. Measurement devices act as a filter10, and as such it does not differ from any analytic transformation of the data. From the perspective of the associative part of modeling, where the data are mapped to desired outcomes or decisions, “raw” data are simply not distinguishable from “transformed” data, unless the treatment itself would not be encoded as data as well. Correspondingly, we may consider any data transformation by algorithmic means as additional measurement devices, which are responding to particular qualities in the observations on their own. It is this equivalence that allows for the change from the linear to a circular and even a self-referential arrangement of empiric activities. Long-term adaptation, I would say even any adaptation at all is based on such a circular arrangement. The only thing we’d to change to earn the new possibilities was to drop the “passivist” representationalist realism11.

Usually, the transformation of data is considered as an issue that is a function of discernibility as an abstract property of data (Yet, people don’t talk like that, it’s our way of speaking here). Today, the respective aphorism as coined by Bateson already became proverbial, despite its simplistic shape: Information is the difference that makes the difference. According to the context in which data are handled, this potential discernibility is addressed in different ways. Let us distinguish three such contexts: (i) Data warehousing, (ii) statistics, and (iii) learning as an epistemic activity.

In Data Warehousing one is usually faced with a large range of different data sources and data sinks, or consumers, where the difference of these sources and sinks simply relates to the different technologies and formats of data bases. The warehousing tool should “transform” the data such that they can be used in the intended manner on the side of the sinks. The storage of the raw data as measured from the business processes and the efforts to provide any view onto these data has to satisfy two conditions (in the current paradigm). It has to be neutral—data should not be altered beyond the correction of obvious errors—and its performance, simply in terms of speed, has to be scalable, if not even independent from the data load. The activities in Data Warehousing are often circumscribed as “Extract, Transform, Load”, abbreviated ETL. There are many and large software solutions for this task, commercial ones and open source (e.g. Talend). The effect of DWH is to disclose the potential for an arbitrary and quickly served perspective onto the data, where “perspective” means just re-arranged columns and records from the database. Except cleaning and simple arithmetic operations, the individual bits of data itself remain largely unchanged.

In statistics, transformations are applied in order to satisfy the conditions for particular methods. In other words, the data are changed in order to enhance discernibility. Most popular is the log-transformation that shifts the mode of a distribution to the larger values. Two different small values that consequently are located nearby are separated better after a log-transformation, hence it is feasible to apply log-transformation to data that form a left-skewed distribution. Other transformations are aiming at a particular distribution, such as the z-score, or Fisher’s z-transformation. Interestingly, there is a further class of powerful transformations that is not conceived as such. Residuals are defined as deviation of the data from a particular model. In linear regression it is the square of the distance to the regression line.

The concept, however, can be extended to those data which do not “follow” the investigated model. The analysis of residual has two aspects, a formal one and an informal one. Formally, it is used as a complex test whether the investigated model does fit or whether it does not. The residual should not show any evident “structure”. That’s it. There is no institutional way back to the level of the investigated model, there are no rules about that, which could be negotiated in a yet to establish community. The statistical framework is a linear one, which could be seen as a heritage from positivism. It is explicitly forbidden to “optimize” a correlation by multiple actualization. Yet, informally the residuals may give hints on how to change the basic idea as represented by the model. Here we find a circular setup, where the strategy is to remove any rule-based regularity, i.e. discernibility form the data.

The effect of this circular arrangement takes completely place in the practicing human as kind of a refinement. It can’t be found anywhere in the methodological procedure itself in a rule-based form. This brings us to the third area, epistemic learning.

In epistemic learning, any of the potentially significant signals should be rendered in such a way as to allow for an optimized mapping towards a registered outcome. Such outcomes often come as dual values, or as a small group of ordinal values in the case of multi-constraint, multi-target optimization. In epistemic learning we thus find the separation of transformation and association in its most prominent form, despite the fact that data warehousing and statistics as well also are intended to be used for enhancing decisions. Yet, their linearity simply does not allow for any kind of institutionalized learning.

This arbitrary restriction to the linear methodological approach in formal epistemic activities results in two related quite unfavorable effects: First, the shamanism of “data exploration”, and second, the infamous hell of methods. One can indeed find thousands, if not 10s of thousands of research or engineering articles trying to justify a particular new method as the most appropriate one for a particular purpose. These methods themselves however are never identified as a „transformation“. Authors are all struggling for the “best” method, the whole community being neglecting the possibility—and the potential—of combining different methods after shaping them as transformations.

The laborious and never-ending training necessary to choose from the huge amount of possible methods then is called methodology… The situation is almost paradox. First, the methods are claimed to tell something about the world, despite this is not possible at all, not just because those methods are analytic.  It is an idealistic hope, which has been abolished already by Hume. Above all, only analytic methods are considered to be scientific. Then, through the large population of methods the choice for a particular one becomes aleatory, which renders the whole activity into a deeply non-scientific one. Additionally, it is governed by the features of some software, or the skills of the user of such software, not by a conceptual stance.

Now remember that any method is also a specific filter. Obviously, nothing could be known about the beneficiality of a particular method before the prediction that is based on the respective model had been validated. This simple insight renders “data exploration” into meaninglessness. It can only play its role within linear empirical frameworks, which are inappropriate any way. Data exploration is suggested to be done “intuitively”, often using methods of visualization. Yet, those methods are severely restricted with regard to the graspable dimensionality. More than 6 to 8 dimensions can’t be “visualized” at once. Compare this to the 2n (n: number of variables) possible models and you immediately see the problem. Else, the only effect of visualization is just a primitive form of clustering. Additionally, visual inputs are images, above all, and as images they can’t play a well-defined epistemological role.12

Complementary to the non-concept of “exploring” data13, and equally misconceived, is the notion of “preparing” data. At least, it must be rated as misconceived as far as it comprises transformations beyond error correction and arranging data into tables. The reason is the same: We can’t know whether a particular “cleansing” will enhance the predictive power of the model, in other words, whether it comprises potential information that supports the intended discernibility, before the model has been built. There is no possibility to decide which variables to include before having finished the modeling. In some contexts the information accessible through a particular variable could be relevant or even important. Yet, if we conceive transformations as preliminary hypothesis we can’t call them “preparation” any more. “Preparation” for what? For proofing the petitio principii? Certainly the peak of all preparatory nonsense is the “imputation” of missing values.

Dorian Pyle [11] calls such introduced variables “pseudo variables”, others call them “latent” or even “hidden variables”.14 Any of these labels is inappropriate, since the transformation is nothing else than a measurement device. Introduced variables are just variables, nothing else.

Indeed, these labels are reliable markers: whenever you meet a book or article dealing with data exploration, data preparation, the “problem” of selecting a method, or likewise, selecting an architecture within a meta-method like the Artificial Neural Networks, you can know for sure that the author is not really interested in learning and reliable predictions. (Or, that he or she is not able to distinguish analysis from construction.)

In epistemic learning the handling of residuals is somewhat inverse to their treatment in statistics, again as a result of the conceptual difference between the linear and the circular approach. In statistics one tries to prove that the model, say: transformation, removes all the structure from the data such that the remaining variation is pure white noise. Unfortunately, there are two drawbacks with this. First, one has to define the model before removing the noise and before checking the predictive power. Secondly, the test for any possibly remaining structure again takes place within the atomistic framework.

In learning we are interested in the opposite. We are looking for such transformations which remove the noise in a multi-variate manner such that the signal-noise ratio is strongly enhanced, perhaps even to the proto-symbolic level. Only after the de-noising due to the learning process, that is after a successful validation of the predictive model, the structure is then described for the (almost) noise-free data segment15 as an expression that is complementary to the predictive model.

In our opinion an appropriate approach would actualize as an instance of epistemic learning that is characterized by

  • – conceiving any method as transformation;
  • – conceiving measurement as an instance of transformation;
  • – conceiving any kind of transformation as a hypothesis about the “space of expressibility” (see next section), or, similarly, the finally selected model;
  • – the separation of transformation and association;
  • – the circular arrangement of transformation and association.

The Abstract Perspective

We now have to take a brief look onto the mechanics of transformations in the domain of epistemic activities.16 For doing this, we need a proper perspective. As such we choose the notion of space. Yet, we would like to emphasize that this space is not necessarily Euclidean, i.e. flat, or open, like the Cartesian space, i.e. if quantities running to infinite. Else, dimensions need not be thought of as being “independent”, i.e. orthogonal on each other. Distance measures need to be defined only locally, yet, without implying ideal continuity. There might be a certain kind of “graininess” defined by a distance D, below which the space is not defined. The space may even contain “bubbles” of lower dimensionality. So, it is indeed a very general notion of “space”.

Observations shall be represented as “points” in this space. Since these “points” are not independent from the efforts of the observer, these points are not dimensionless. To put it more precisely, they are like small “clouds”, that are best described as probability densities for “finding” a particular observation. Of course, this “finding” is kind of an inextricable mixture of “finding” and “constructing”. It does not make much sense to distinguish both on the level of such cloudy points. Note, that the cloudiness is not a problem of accuracy in measurement! A posteriori, that is, subsequent to introducing an irreversible move17, such a cloud could also be interpreted as an open set of the provoked observation and virtual observations. It should be clear by now that such a concept of space is very different from the Euclidean space that nowadays serves as a base concept for any statistics or data mining. If you think that conceiving such a space is not necessary or even nonsense, then think about quantum physics. In quantum physics we also are faced with the break-down of observer and observable, and they ended up quite precisely in spaces as we described it above. These spaces then are handled by various means of renormalization methods.18 In contrast to the abstract yet still physical space of quantum theory, our space need not even contain an “origin”. Elsewhere we called such a space aspectional space.

Now let us take the important step in becoming interested in only a subset of these observations. Assume we not only want to select a very particular set of observations—they are still clouds of probabilities, made from virtual observations—by means of prediction. This selection now can be conceived in two different ways. The first way is the one that is commonly applied and consists of the reconstruction of a “path”. Since in the contemporary epistemic life form of “data analysts” Cartesian spaces are used almost exclusively, all these selection paths start from the origin of the coordinate system. The endpoint of the path is the point of interest, the “outcome” that should be predicted. As a result, one first gets a mapping function from predictor variables to the outcome variable. All possible mappings form the space of mappings, which is a category in the mathematical sense.

The alternative view does not construct such a path within a fixed coordinate system, i.e. with a space with fixed properties. Quite to the contrast, the space itself gets warped and transformed until very simple figures appear, which represent the various subsets of observations according to the focused quality.

Imagine an ordinary, small, blown-up balloon. Next, imagine a grid in the space enclosed by the balloon’s hull, made by very thin threads. These threads shall represent the space itself. Of course, in our example the space is 3d, but it is not limited to this case. Now think of two kinds of small pearls attached to the threads all over the grid inside the balloon, blue ones and red ones. It shall be the red ones in which we are interested. The question now is what can we do to separate the blue ones from the red ones?

The way to proceed is pretty obvious, though the solution itself may be difficult to achieve. What we can try is to warp and to twist, to stretch, to wring and to fold the balloon in such a way that the blue pearls and the red pearls separate as nicely as possible. In order to purify the groups we may even consider to compress some regions of the space inside the balloon such that they are turn into singularities. After all this work—and beware it is hard work!—we introduce a new grid of threads into the distorted space and dissolve the old ones. All pearls automatically attach to the threads closest nearby, stabilizing the new space. Again, conceiving of such a space may seem weird, but again we can find a close relative in physics, the Einsteinian space of space-time. Gravitation effectively is warping that space, though in a continuous manner. There are famous empirical proofs of that warping of physical space-time.19

Analytically, these two perspectives, the path reconstruction on the hand and the space warping on the other, are (almost) equivalent. The perspective of space warping, however, offers a benefit that is not to be underestimated. We arrive at a new space for which we can define its own properties and in which we again can define measures that are different from those possible in the original space. The path reconstruction does not offer such a “a derived space”. Hence, once the path is reconstructed, the story stops. It is a linear story. Our proposal thus is to change perspective.

Warping the space of measurability and expressibility is an operation that inverts the generation of cusp catastrophes.20 (see Figure 1 below). Thus it transcends the cusp catastrophes. In the perspective of path reconstruction one has to avoid the phenomenon of hysteresis and cusps altogether, hence loosing a lot of information about the observed source of data.

In the Cartesian space and the path reconstruction methodology related to it, all operations are analytic, that is organized as symbolic rewriting. The reason for this is the necessity for the paths remaining continuous and closed. In contrast, space warping can be applied locally. Warping spaces in dealing with data is not an exotic or rare activity at all. It happens all the time. We know it even from (simple) mathematics, when we define different functions, including the empty function, for different sets of input parameter values.

The main consequence of changing the perspective from path reconstruction to space warping is an enlargement of the set of possible expressions. We can do more without the need to call it “heuristics”. Our guess is that any serious theory of data and measurement must follow the opened route of space warping, if this theory of data tries to avoid positivistic reductionism. Most likely, such a theory will be kind of a renormalization theory in a connected, relativistic data space.

Revitalizing Punch Cards and Stacks

In this section we will introduce the outline of a tool that allows to follow the circular approach in epistemic activities. Basically, this tool is about organizing arbitrary transformations. While for analytic (mathematical) expressions there are expression interpreters it is also clear that analytic expressions form only a subset of the set of all possible transformations, even if we consider the fact that many expression interpreters have been growing to some kind of programming languages, or script language. Indeed, Java contains an interpreting engine for JavaScript by default, and there are several quite popular ones for mathematical purposes. One could also conceive mathematical packages like Octave (open source), MatLab or Mathematica (both commercial) as such expression interpreters, even as their most recent versions can do much, much more. Yet, using MatLab & Co. are not quite suitable as a platform for general purpose data transformation.

The structural metaphor that proofed to be as powerful as it was sustainable for more than 10 years now is the combination of the workbench with the punch card stack.

Image 1: A Punched Card for feeding data into a computer

Any particular method, mathematical expression or arbitrary computational procedure resulting in a transformation of the original data is conceived as a “punch card”. This provides a proper modularization, and hence standardization. Actually, the role of these “functional compartments” is extremely standardized, at least enough to define an interface for plugins. Like the ancient punch cards made from paper, each card represents a more or less fixed functionality. Of course, these functionality may be defined by a plugin that itself connects to Matlab…

Else, again like the ancient punch cards, the virtualized versions can be stacked. For instance, we first put the treatment for missing values onto the stack, simply to ensure that all NULLS are written as -1. The next card then determines minimum and maximum in order to provide the data for linear normalization, i.e. the mapping of all values into the interval [0..1]. Then we add a card for compressing the “fat tail” of the distribution of values in a particular variable. Alternatively we may use a card to split the “fat tail” off into a new variable! Finally we apply the card=plugin for normalizing the data to the original and the new data column.

I think you got the idea. Such a stack is not only maintained for any of the variables, it is created on the fly according to the needs as these got detected by simple rules. You may think of the cards also as the set of rules that describe the capabilities of agents, which constantly check the data whether they could apply their rules. You also may think of these stacks as a device that works like a tailored distillation column , as it is used for fractional distillation in petro-chemistry.

Image 2: Some industrial fractional distillation columns for processing mineral oil. Dependent on the number of distillation steps different products result.

These stacks of parameterized procedures and expressions represent a generally programmable computer, or more precisely, operating system, quite similar to a spreadsheet, albeit the purpose of the latter, and hence the functionality, actualizes in a different form. The whole thing may even be realized as a language! In this case, one would not need a graphical user-interface anymore.

The effect of organizing the transformation of data in this way, by means of plugins that follow the metaphor of the “punch card stack”, is dramatic. Introducing transformations and testing them can be automated. At this point we should mention about the natural ally of the transformation workbench, the maximum likelihood estimation of the most promising transformations that combine just two or three variables into a new one. All three parts, the transformation stack engine, the dependency explorer, and the evolutionary optimized associative engine (which is able to create a preference weighting for the variables) can be put together in such a way that finding the “optimal” model can be run in a fully automated manner. (Meanwhile the SomFluid package has grown into a stage where it can accomplish this. . . download it here, but you need still some technical expertise to make it running)

The approach of the “transformation stack engine” is not just applicable to tabular data, of course. Given a set of proper plugins, it can be used as a digester for large sets of images or time series as well (see below).

Transforming Data

In this section we now will take a more practical and pragmatic perspective. Actually, we will describe some of the most useful transformations, including their parameters. We do so, because even prominent books about “data mining” have been handling the issue of transforming data in a mistaken or at least seriously misleading manner.21,22

If we consider the goal of the transformation of numerical data, increasing the discernibility of assignated observations , we will recognize that we may identify a rather limited number of types of such transformations, even if we consider the space of possible analytic functions, which combine two (or three) variables.

We will organize the discussion of the transformations into three sub-sections, whose subjects are of increasing complexity. Hence, we will start with the (ordinary) table of data.

Tabular Data

Tables may comprise numerical data or strings of characters. In its general form it may even contain whole texts, a complete book in any of the cells of a column (but see the section about unstructured data below!). If we want to access the information carried by the string data, we more sooner than later have to translate them into numbers. Unlike numbers, string data, and the relations between data points made from string data, must be interpreted. As a consequence, there are always several, if not many different possibilities of that representation. Besides referring to the actual semantics of the strings that could be expressed by means of the indices of some preference orders, there are also two important techniques of automatic scaling available, which we will describe below.

Besides string data, dates are further multi-dimensional category of data. A date encodes not only a serial number relative to some (almost) arbitrarily chosen base date, which we can use to express the age of the item represented by the observation. We have, of course, day of week, day of month, number of week, number of month, and not to forget about season as an approximate class. It depends a bit on the domain whether these aspects play any role at all. Yet, think about the rhythms in the city or on the stock markets across the week, or the “black Monday/ Tuesday/Friday effect” in production plants or hospitals then it is clear that we usually have to represent the single date value by several “informational extracts”.

A last class of data types that we have to distinguish are time values. We already mentioned the periodicity in other aspects of the calendar. In which pair of time values we find a closer similarity, T1( 23:41pm, 0:05pm), or T2(8:58am;3:17pm)? In case of any kind of distance measure the values of T2 are evaluated as much more similar than those in T1. What we have to do is to set a flag for “circularity” in order to calculate the time distances correctly.

Numerical Data: Numbers, just Numbers?

Numerical data are data for which in principle any value from within a particular interval could be observed. If such data are symmetrically normal distributed then we have little reasons to guess that there is something interesting within these sample of values. As soon as the distribution becomes asymmetrical, it starts to become interesting. We may observe “fat tails” (large values are “over-represented), or multi-modal distributions. In both cases we could suspect that there are at least two different processes, one dominating the other differentially across peaks. So we should split the variable into two (called “deciling”) and ceteris paribus check out the effect on the predictive power of the model. Typically one splits the values at the minimum between the peaks, but it is also possible to implement an overlap, where some records are present in both of the new variables.

Long tails indicate some aberrant behavior of the items represented by the respective records, or, like in medicine even pathological contexts. Strongly left-skewed distribution often indicate organizational or institutional influences. Here we could compress the long tail, log-shift, and then split the variable, that is decile it into two. 21

In some domains, like the finances, we find special values at which symmetry breaks. For ordinary money values the 0 is such a value. We know in advance that we have to split the variable into two, because the semantic and the structural difference between +50$ and -75$ is much bigger than between 150$ and 2500$… probably. As always, we transform it such that we create additional variables as kind of a hypotheses, for which we have to evaluate their (positive) contribution to the predictive power of the model.

In finances, but also in medicine, and more general in any system that is able to develop meta-stable regions, we have to expect such points (or regions) with increased probability of breaking symmetry and hence strong semantic or structural difference. René Thom first described similar phenomena by his theory that he labeled “catastrophe theory”. In 3d you can easily think about cusp catastrophes as a hysteresis in x-z direction that is however gradually smoothed out in y-direction.

Figure 1: Visualization of folds in parameters space, leading to catastrophes and hystereses.

In finances we are faced with a whole culture of rule following. The majority of market analysts use the same tools, for instance “stochasticity,” or a particularly parameterized MACD for deriving “signals”, that is, indicators for points of actions. The financial industries have been hiring a lot of physicists, and this population sticks to greatly the same mathematics, such as GARCH, combined with Monte-Carlo-Simulations. Approaches like fractal geometry are still regarded as exotic.23

Or think about option prices, where we find several symmetry breaks by means of contract. These points have to be represented adequately in dedicated, means derived variables. Again, we can’t emphasize it enough, we HAVE to do so as a kind of performing hypothesizing. The transformation of data by creating new variables is, so to speak, the low-level operationalization of what later may grow into a scientific hypothesis. Creating new variables poses serious problems for most methods, which may count as a reason why many people don’t follow this approach. Yet, for our approach it is not a problem, definitely not.

In medicine we often find “norm values”. Potassium in blood serum may take any value within a particular range without reflecting any physiologic problem. . . if the person is healthy. If there are other risk factors the story may be a different one. The ratio of potassium and glucose in serum provides us an example for a significant marker. . . if the person has already heart problems. By means of such risk markers we can introduce domain-specific knowledge. And that’s actually a good message, since we can identify our own “markers” and represent it as a transformation. The consequence is pretty clear: a system that is supposed to “learn” needs a suitable repository for storing and handling such markers, represented as a relational system (graph).

Let us return to the norm ranges briefly again. A small difference outside the norm range could be rated much more strongly than within the norm range. This may lead to the weight functions shown in the next figure, or more or less similar ones. For a certain range of input values, the norm range, we leave the values unchanged. The output weight equals 1. Outside of this range we transform them in a way that emphasizes the difference to the respective boundary value of the norm range. This could be done in different ways.

Figure 2: Examples for output weight configurations in norm-range transformation

Actually, this rationale of the norm range can be applied to any numerical data. As an estimate of the norm range one could use the 80% quantile, centered around the median and realized as +/-40% quantiles. On the level of model selection, this will result in a particular sensitivity for multi-dimensional outliers, notably before defining any criterion apriori of what an outlier should be.

From Strings to Orders to Numbers

Many data come as some kind of description or label. Such data are described as nominal data. Think for instance about prescribed drugs in a group of patients included into an investigation of risk factors for a disease, or think about the name or the type of restaurants in a urbanological/urbanistic investigation. Nominal data are quite frequent in behavioral, organizational or social data, that is, in contexts that are established mainly on a symbolic level.

It should be avoided to perform measurements only on the nominal scale, yet, sometimes it is not possible to circumvent it. It could be avoided at least partially by including further properties that can be represented by numerical values. For instance, instead using only the names cities in a data set, one can use the geographical location, number of inhabitants, or when referring to places within a city one can use descriptors that cover some properties of the respective area, such items as density of traffic, distance to similar locations, price level of consumer goods, economical structure etc. If a direct measurement is not possible, estimates can do the job as well, if the certainty of the estimate is expressed. The certainty then can be used to generate surrogate data. If the fine grained measurement creates further nominal variables, they could be combined for form a scale. Such enrichment is almost always possible, irrespective the domain. One should keep in mind, however, that any such enrichment is nothing else than a hypothesis.

Sometimes, data on the nominal level, technically a string of alphanumerical characters, already contains valuable information. For instance, the contain numerical values, as in the name of cars. If we would deal with things like names of molecules, where these names often come as compounds, reflecting the fact that molecules themselves are compounds, we can calculate the distance of each name to a virtual “average name” by applying a technique called “random graph”. Of course, in case of molecules we would have a lot of properties available that can be expressed as numerical values.

Ordinal data are closely related to nominal data. Essentially, there are two flavors of them. In case of the least valuable of them the numbers to not express a numerical value, the cipher is just used as kind of a letter, indicating that there is a set of sortable items. Sometimes, values of an ordinal scale represent some kind of similarity. Despite this variant is more valuable it still can be misleading, because the similarity may not scale isodistantly with the numerical values of the ciphers. Undeniably, there is still a rest of a “name” in it.

We are now going to describe some transformations to deal with data from low-level scales.

The least action we have to apply to nominal data is a basic form of encoding. We use integer values instead of the names. The next, though only slightly better level would be to reflect the frequency of the encoded item in the ordinal value. One would, for instance not encode the name into an arbitrary integer value, but into the log of the frequency. A much better alternative, however, is provided by the descendants of the correspondence analysis. These are called Optimal Scaling and the Relative Risk Weight. The drawback for these method is that some information about the predicted variable is necessary. In the context of modeling, by which we always understand target-oriented modeling—as opposed to associative storage24—we usually find such information, so the drawback is not too severe.

First to optimal scaling (OSC). Imagine a variable, or “assignate” as we prefer to call it25, which is scaled on the nominal or the low ordinal scale. Let us assume that there are just three different names or values. As already mentioned, we assume that a purpose has been selected and hence a target variable as its operationalization is available. Then we could set up the following table (the figures are denoting frequencies).

Table 1: Summary table derived from a hypothetical example data set. av(i) denote three nominally scaled assignates.

outcometv

av1

av2

av3

marginal sum

ta

140

120

160

420

tf (focused)

30

10

40

80

marginal sum

170

130

200

500

From these figures we can calculate the new scale values by the formula

For the assignate av1 this yields

Table 2: Here, various encodings are contrasted.

assignate

literal encoding

frequency

normalized log(freq)

optimal scaling

normalized OSC

av1

1

170

0.62

0.176

0.809

av2

2

130

0.0

0.077

0.0

av3

3

200

1.0

0.200

1.0

Using these values we could replace any occurrence of the original nominal (ordinal) values by the scaled values. Alternatively—or better additionally—, we could sum up all values for each observation (record), thereby collapsing the nominally scaled assignates into a single numerically scaled one.

Now we will describe the RRW. Imagine a set of observations {o(i)} where each observation is described by a set of assignates a(i). Also let us assume that some of these assignates are on the binary level, that is, the presence of this quality in the observation is encoded by “1”, its missing by “0”. This usually results in sparsely filled (regions of ) the data table. Depending on the size of the “alphabet”, even more than 99.9% of all values could simply be equal to 0. Such data can not be grouped in a reasonable manner. Additionally, if there are further assignates in the table that are not binary encoded, the information in the binary variables would be neglected almost completely without applying a rescaling like the RRW.

For the assignate av1 this yields

As you can see, the RRW uses the marginal from the rows, while the optimal scaling uses the marginal from the columns. Thus, the RRW uses slightly more information. Assuming a table made from binary assignates av(i), which could be summarized into table 1 above, the formula yields the following RRW factors for the three binary scaled assignates:

Table 3: Relative Risk Weights (RRW) for the frequency data shown in table 1.

Assignate

raw RRWi

RRWi

normalized RRW

av1

1.13

0.33

0.82

av2

0.44

0.16

0.00

av3

1.31

0.36

1.00

The ranking of av(i) based RRW is equal to that returned by OSC, even the normalized score values are quite similar. Yet, while in the case of nominal variables assignates are usually not collapsed, this will be done always in case of binary variables.

So, let us summarize these simple methods in the following table.

Table 4: Overview about some of the most important transformations for tabular data.

Transformation

Mechanism

Effect, New Value

Properties, Conditions

log-transform

analytic function

analytic combination

explicit analytic function (a,b)→f(a,b)

enhancing signal-to-noise ratio for the relationship between predictors and predicted, 1 new variable

targeted modeling

empiric combinational recoding

using simple clustering methods like KNN or K-means for a small number of assignates

distance from cluster centers and, or cluster center as new variables

targeted modeling

Deciling

upon evaluation of properties of the distribution

2 new variables

Collapsing

based on extreme-value quantiles

1 new variable, better distinction for data in frequent bins

optimal scaling

numerical encoding and/or rescaling using marginal sums

enhancing the scaling of the assignate from nominal to numerical

targeted modeling

relative risk weight

dto.

collapsing sets of sparsely filled variables

targeted modeling

Obviously, the transformation of data is not an analytical act, on both sides. Left-hand it refers to structural and hence semantic assumptions, while right hand it introduces hypotheses about those assumptions. Numbers are never ever just values, much like sentences and words do not consists just from letters. After all, the difference between both is probably less than one could initially presume. Later we will address this aspect from the opposite direction, when it comes to the translation of textual entities into numbers.

Time Series and Contexts

Time series data are the most valuable data. They allow the reconstruction of the flow of information in the observed system, either between variables intrinsic to the measurement setup (reflecting the “system”) or between treatment and effects. In the recent years, so-called “causal FFT” gained some popularity.

Yet, modeling time series data poses the same problematics as tabular data. We do not know apriori which variables to include, or how to transform variables in order to reflect particular parts of the information in the most suitable way. Simply pressing a FFT onto the data is nothing but naive. FFT assumes a harmonic oscillation, or a combination thereof, which certainly is not appropriate. Even if we interpret a long series of FFT terms as an approximation to an unknown function, it is by no means clear whether the then assumed stationarity26 is indeed present in the data.

Instead, it is more appropriate to represent the aspects of a time series in multiple ways. Often, there are many time series available, one for each assignate. This brings the additional problem of careful evaluation of cross-correlations and auto-correlations, and all of this under the condition that it is not known apriori whether the evolution of the system is stationary.

Fortunately, the analysis of multiple time series, even from non-stationary processes, is quite simple, if we follow the approach as outlined so far. Let us assume a set of assignates {a(i)} for which we have their time series measurement available, which are given by equidistant measurement points. A transformation then is constructed by a method m that is applied to a moving window of size md(k). All moving windows of any size are adjusted such that their endpoints meet at the measurement point at time t(m(k)). Let us call this point the prediction base point, T(p). The transformed values consist either from the residuals resulting from this methods values and the measurement data, or the parameters of the method fitted to the moving window. A example for the latter case are for instance given by the wavelet coefficients, which provide a quite suitable, multi-frequency perspective onto the development up to T(p). Of course, the time series data of different assignates could be related to each other by any arbitrary functional mapping.

The target value for the model could be any set of future points relative to t(m(k)). The model may predict a singular point, averages some time in the future, the volatility of the future development of the time series, or even the parameters of a particular mapping function relating several assignates. In the latter case the model would predict several criteria at once.

Such transformations yield a table that contain a lot more variables than originally available. The ratio may grow up to 1:100 in complex cases like the global financial markets. Just to be clear: If you measure, say the index values of 5 stock markets, some commodities like gold, copper, precious metals and “electronics metals”, the money market, bonds and some fundamentals alike, that is approx. 30 basic input variables, even a superficial analysis would have to inspect 3000 variables… Yes, learning and gaining experience can take quite a bit! Learning and experience do not become cheaper only for that we use machines to achieve it. Just exploring is more easy nowadays, not requiring life times any more. The reward consists from stable models about complex issues.

Each point in time is reflected by the original observational values and a lot of variables that express the most recent history relative to the point in time represented by the respective record. Any of the synthetic records thus may be interpreted as a set of hypothesis about the future development, where this hypothesis comes as a multidimensional description of the context up to T(p). It is then the task of the evolutionarily optimized variable selection based on the SOM to select the most appropriate hypothesis. Any subgroup contained in the SOM then represents comparable sets of relations between the past relative to T(p) and the respective future as it is operationalized into the target variable.

Typical transformations in such associative time series modeling are

  • – moving average and exponentially decaying moving average for de-seasoning or de-trending;
  • – various correlational methods: cross- and auto-correlation, including the result parameters of the Bartlett test;
  • – Wavelet-, FFT-, or Walsh- transforms of different order, residuals to the denoised reconstruction;
  • – fractal coefficients like Lyapunov coefficient or Hausdorff dimension
  • – ratios of simple regressions calculated over moving windows of different size;
  • – domain specific markers (think of technical stock market analysis, or ECG.

Once we have expressed a collection of time series as series of contexts preceding the prediction point T(p), the further modeling procedure does not differ from the modeling of ordinary tabular data, where the observations are independent from each other. From the perspective of our transformation tool, these time series transformation are nothing else than “methods”, they do not differ from other plugin methods with respect to the procedure calls in their programing interface.

„Unstructurable“ „Data“: Images and Texts

The last type of data for which we briefly would like to discuss the issue of transformation is “unstructurable” data. Images and texts are the main representatives for this class of entities. Why are these data “unstructurable”?

Let us answer this question from the perspective of textual analysis. Here, the reason is obvious, actually, there are several obvious reasons. Patrizia Violi [17] for instance emphasizes that words are creating their own context, upon which they are then going to be interpreted. Douglas Hofstadter extended the problematics to thinking at large, arguing that for any instance of analogical thinking—and any thinking he claimed as being analogical—it is impossible to define criteria that would allow to set up a table. Here on this site we argued repeatedly that it is not possible to define any criteria apriori that would capture the “meaning” of a text.

Else, understanding language, as well as understanding texts can’t be mapped to the problematics of predicting a time series. In language, there is no such thin as a prediction point T(p), and there is no positively definable “target” which could be predicted. The main reason for this is the special dynamics between context (background) and proposition (figure). It is a multi-level, multi-scale thing. It is ridiculous to apply n-grams to text, then hoping to catch anything “meaningful”. The same is true for any statistical measure.

Nevertheless, using language, that is, producing and understanding is based on processes that select and compose. In some way there must be some kind of modeling. We already proposed a structure, or more, an architecture, for this in a previous essay.

The basic trick consists of two moves: Firstly, texts are represented probabilistically as random contexts in an associative storage like the SOM. No variable selection takes place here, no modeling and no operationalization of a purpose is present. Secondly, this representation then is used as a basis for targeted modeling. Yet, the “content” of this representation does not consist from “language” data anymore. Strikingly different, it contains data about the relative location of language concepts and their sequence as they occur as random contexts in a text.

The basic task in understanding language is to accomplish the progress from a probabilistic representation to a symbolic tabular representation. Note that any tabular representation of an observation is already on the symbolic level. In the case of language understanding precisely this is not possible: We can’t define meaning, and above all, not apriori. Meaning appears as a consequence of performance and execution of certain rules to a certain degree. Hence we can’t provide the symbols apriori that would be necessary to set up a table for modeling, assessing “similarity” etc.

Now, instead of probabilistic non-structured representation we also could say arbitrary unstable structure. From this we should derive a structured, (proto-)symbolic and hence tabular and almost stable structure. The trick to accomplish this consists of using the modeling system itself as measurement device and thus also as a “root” for further reference in the then possible models. Kohonen and colleagues demonstrated this crucial step in their WebSom project. Unfortunately (for them), they then actualized several misunderstandings regarding modeling. For instance, they misinterpreted associative storage as a kind of model.

The nice thing with this architecture is that once the symbolic level has been achieved, any of the steps of our modeling approach can be applied without any change, including the automated transformation of “data” as described above.

Understanding the meaning of images follows the same scheme. The fact that there are no words renders the task more complicated and more simple at the same time. Note that so far there is no system that would have learned to “see”, to recognize and to understand images, despite many titles claim that the proposed “system” can do so. All computer vision approaches are analytic by nature, hence they are all deeply inadequate. The community is running straight into the method hell as the statisticians and the data miners did before, mistaking transformations as methods, conflating transformation and modeling, etc.. We discussed this issues at length above. Any of the approaches might be intelligently designed, but all are victimized by the representationalist fallacy, and probably even by naive realism. Due to the fact that the analytic approach is first, second and third mainstream, the probabilistic and contextual bottom-up approach is missing so far. In the same way as a word is not equal to the grapheme, a line is not defined on the symbolic level in the brain. We else and again meet the problem of analogical thinking even on the most primitive graphical level. When is a line still a line, when is a triangle still a triangle?

In order to start in the right way we first have to represent the physical properties of the image along different dimensions, such as textures, edges, or salient points, and all of those across different scales. Probably one can even detect salient objects by some analytic procedure. From any of the derived representations the random contexts are derived and arranged as vectors. A single image is represented as a table that contains random contexts derived from the image as a physical entity. From here on, the further processing scheme is the same as for texts. Note, that there is no such property as “line” in this basic mapping.

In case of texts and images the basic transformation steps thus consist in creating the representation as random contexts. Fortunately, this is “only” a question of the suitable plugins for our transformation tool. In both cases, for texts as well as images, the resulting vectors could grow considerably. Several thousands of implied variables must be expected. Again, there is already a solution, known as random projection, which allows to compress even very large vectors (say 20’000+) into one of say maximal 150 variables, without loosing much of the information that is needed to retain the distinctive potential. Random projection works by multiplying a vector of size N with a matrix of uniformly distributed random values of size NxM, which results in a vector of size M. Of course, M is chosen suitably (100+). The reason why this works is that with that many dimension all vectors are approximately orthogonal to each other! Of course, the resulting fields in such a vector do not “represent” anything that could be conceived as a reference to an “object”. Internally, however, that is from the perspective of a (population of) SOMs, it may well be used as a (almost) fixed “attribute”. Yet, neither the missing direct reference not the subjectivity poses a problem, as the meaning is not a mental entity anyway. Q.E.D.

Conclusion

Here in this essay we discussed several aspects related to the transformation of data as an epistemic activity. We emphasized that an appropriate attitude towards the transformation of data requires a shift in perspective and the focus of another vantage point. One of the more significant changes in attitude consider, perhaps, the drop of any positivist approach as one of the main pillars of traditional modeling. Remember that statistics is such a positivist approach. In our perspective, statistical methods are just transformations, nothing less, but above all also nothing more, characterized by a specific set of rather strong assumptions and conditions for their applicability.

We also provided some important practical examples for the transformation of data, whether tabular data derived from independent observations, time series data or “unstructurable” “data” like texts and images. According to the proposed approach we else described a prototypical architecture for a transformation tool, that could be used universally. In particular, it allows a complete automation of the modeling task, as it could be used for instance in the field of so-called data mining. The possibility for automated modeling is, of course, a fundamental requirement for any machine-based episteme.

Notes

1. The only reason why we do not refer to cultures and philosophies outside Europe is that we do not know sufficient details about them. Yet, I am pretty sure that taking into account Chinese or Indian philosophy would severe the situation.

2. It was Friedrich Schleiermacher who first observed that even the text becomes alien and at least partially autonomous to its author due to the necessity and inevitability of interpretation. Thereby he founded hermeneutics.

3. In German language these words all exhibit a multiple meaning.

4. In the last 10 years (roughly) it became clear that the gene-centered paradigms are not only not sufficient [2], they are even seriously defect. Evely Fox-Keller draws a detailed trace of this weird paradigm [3].

5. Michel Foucault [4]

6. The „axiom of choice“ is one of the founding axioms in mathematics. Its importance can’t be underestimated. Basically, it assumes that “something is choosable”. The notion of “something choosable” then is used to construct countability as a derived domain. This implies three consequences. First, this avoids to assume countability, that is, the effect of a preceding symbolification, as a basis for set theory. Secondly, it puts performance at the first place. These two implications render the “Axiom of Choice” into a doubly-articulated rule, offering two docking sites, one for mathematics, and one for philosophy. In some way, it thus can not count as an  “axiom”. Those implications are, for instance, fully compatible with Wittgenstein’s philosophy. For these reasons, Zermelo’s “axiom” may even serve as a shared point (of departure) for a theory of machine-based episteme. Finally, the third implication is that through the performance of the selection the relation, notably a somewhat empty relation is conceived as a predecessor of countability and the symbolic level. Interestingly, this also relates to Quantum Darwinism and String Theory.

7. David Grahame Shane’s theory on cities and urban entities [5] is probably the only theory in urbanism that is truly a relational theory.  Additionally, his work is full of relational techniques and concepts, such as the “heterotopy” (a term coined by Foucault).

8. Bruno Latour developed the Actor-Network-Theory [6,7], while Clarke evolved “Grounded Theory” into the concept of “Situational Analysis” [8]. Latour, as well as Clarke, emphasize and focus the relation as a significant entity.

9. behavioral coating, and behavioral surfaces ;

10. See Information & Causality about the relation between measurement, information and causality.

11. „Passivist“ refers to the inadequate form of realism according to which things exist as-such independently from interpretation. Of course, interpretation does affect the material dimension of a thing. Yet, it changes its relations insofar the relations of a thing, the Wittgensteinian “facts”, are visible and effective only if we assign actively significance to them. The “passivist” stance conceives itself as a re-construction instead of a construction (cf. Searle [9])

12. In [10] we developed an image theory in the context of the discussion about the mediality of facades of buildings.

13. nonsense of „non-supervised clustering“

14. In his otherwise quite readable book [11], though it may serve only as an introduction.

15. This can be accomplished by using a data segment for which the implied risk equals 0 (positive predictive value = 1). We described this issue in the preceding chapter.

16. hint to particle physics…

17. See our previous essay about the complementarity of the concepts of causality and information.

18. For an introduction of renormalization (in physics) see [12], and a bit more technical [13]

19. see the Wiki entry about so-called gravitational lenses.

20. Catastrophe theory is a concept invented and developed by French mathematician Rene Thom as a field of Differential Topology. cf. [14]

21.  In their book, Witten & Eibe [15] recognized the importance of transformation and included a dedicated chapter about it. They also explicitly mention the creation of synthetic variables. Yet, they do also explicitly retreat from it as a practical means for the reason of computational complexity (=here, the time needed to perform a calculation in relation to the amount of data). After all, their attitude towards transformation is somehow that towards an unavoidable evil. They do not recognize its full potential. After all, as a cure for the selection problem, they propose SVM and their hyperplanes, which is definitely a poor recommendation.

22. Dorian Pyle [11]

23. see Benoit Mandelbrot [16].

24. By using almost meaningless labels target-oriented modeling is often called supervised modeling as opposed to “non-supervised modeling”, where no target variable is being used. Yet, such a modeling is not a model, since the pragmatics of the concept of “model” invariably requires a purpose.

25. About assignates: often called property, or feature… see about modeling

26. Stationarity is a concept in empirical system analysis or description, which denotes the expectation that the internal setup of the observed process will not change across time within the observed period. If a process is rated as “stationary” upon a dedicated test, one could select a particular, and only one particular method or model to reflect the data. Of course, we again meet the chicken-egg problem. We can decide about stationarity only by means of a completed model, that is after the analysis. As a consequence, we should not use linear methods, or methods that depend on independence, for checking the stationarity before applying the “actual” method. Such a procedure can not count as a methodology at all. The modeling approach should be stable against non-stationarity. Yet, the problem of the reliability of the available data sample remains, of course. As a means to “robustify” the resulting model against the unknown future one can apply surrogating. Ultimately, however, the only cure is a circular, or recurrent methodology that incorporates learning and adaptation as a structure, not as a result.

References
  • [1] Robert Rosen, Life Itself: A Comprehensive Inquiry into the Nature, Origin, and Fabrication of Life. Columbia University Press, New York 1991.
  • [2] Nature Insight: Epigenetics, Supplement Vol. 447 (2007), No. 7143 pp 396-440.
  • [3] Evelyn Fox Keller, The Century of the Gene. Harvard University Press, Boston 2002. see also: E. Fox Keller, “Is There an Organism in This Text?”, in P. R. Sloam (ed.), Controlling Our Destinies. Historical, Philosophical, Ethical, and Theological Perspectives on the Human Genome Project, Notre Dame (Indiana), University of Notre Dame Press, 2000, pp. 288-289
  • [4] Michel Foucault, Archeology of Knowledge. 1969.
  • [5] David Grahame Shane. Recombinant Urbanism: Conceptual Modeling in Architecture, Urban Design and City Theory
  • [6] Bruno Latour. Reassembling The Social. Oxford University Press, Oxford 2005.
  • [7] Bruno Latour (1996). On Actor-network Theory. A few Clarifications. in: Soziale Welt 47, Heft 4, p.369-382.
  • [8] Adele E. Clarke, Situational Analysis: Grounded Theory after the Postmodern Turn. Sage, Thousand Oaks, CA 2005).
  • [9] John R. Searle, The Construction of Social Reality. Free Press, New York 1995.
  • [10] Klaus Wassermann & Vera Bühlmann, Streaming Spaces – A short expedition into the space of media-active façades. in: Christoph Kronhagel (ed.), Mediatecture, Springer, Wien 2010. pp.334-345. available here
  • [11] Dorian Pyle, Data Preparation for Data Mining. Morgan Kaufmann, San Francisco 1999.
  • [12] John Baez (2009). Renormalization Made Easy. Webpage
  • [13] Bertrand Delamotte (2004). A hint of renormalization. Am.J.Phys. 72: 170-184. available online.
  • [14] Tim Poston & Ian Stewart, Catastrophe Theory and Its Applications. Dover Publ. 1997.
  • [15] Ian H. Witten & Frank Eibe, Data Mining. Practical Machine Learning Tools and Techniques (2nd ed.). Elsevier, Oxford 2005.
  • [16] Benoit Mandelbrot & Richard L. Hudson, The (Mis)behavior of Markets. Basic Books, New York 2004.
  • [17] Patrizia Violi (2000). Prototypicality, typicality, and context. in: Liliana Albertazzi (ed.), Meaning and Cognition – A multidisciplinary approach. Benjamins Publ., Amsterdam 2000. p.103-122.

۞

Prolegomena to a Morphology of Experience

May 2, 2012 § Leave a comment

Experience is a fundamental experience.

The very fact of this sentence demonstrates that experience differs from perception, much like phenomena are different from objects. It also demonstrates that there can’t be an analytic treatment or even solution of the question of experience. Experience is not only related to sensual impressions, but also to affects, activity, attention1 and associations. Above all, experience is deeply linked to the impossibility to know anything for sure or, likewise, apriori. This insight is etymologically woven into the word itself: in Greek, “peria” means “trial, attempt, experience”, influencing also the roots of “experiment” or “peril”.

In this essay we will focus on some technical aspects that are underlying the capability to experience. Before we go in medias res, I have to make clear the rationale for doing so, since, quite obviously so, experience could not be reduced to those said technical aspects, to which for instance modeling belongs. Experience is more than the techné of sorting things out [1] and even more than the techné of the genesis of discernability, but at the same time it plays a particular, if not foundational role in and for the epistemic process, its choreostemic embedding and their social practices.

Epistemic Modeling

As usual, we take the primacy of interpretation as one of transcendental conditions, that is, it is a condition we can‘t go beyond, even on the „purely“ material level. As a suitable operationalization of this principle, still a quite abstract one and hence calling for situative instantiation, we chose the abstract model. In the epistemic practice, the modeling does not, indeed, even never could refer to data that is supposed to „reflect“ an external reality. If we perform modeling as a pure technique, we are just modeling, but creating a model for whatsoever purpose, so to speak „modeling as such“, or purposed modeling, is not sufficient to establish an epistemic act, which would include the choice of the purpose and the choice of the risk attitude. Such a reduction is typical for functionalism, or positions that claim a principle computability of epistemic autonomy, as for instance the computational theory of mind does.

Quite in contrast, purposed modeling in epistemic individuals already presupposes the transition from probabilistic impressions to propositional, or say, at least symbolic representation. Without performing this transition from potential signals, that is mediated „raw“ physical fluctuations in the density of probabilities, to the symbolic it is impossible to create a structure, let it be for instance a feature vector as a set of variably assigned properties, „assignates“, as we called it previously. Such a minimal structure, however, is mandatory for purposed modeling. Any (re)presentation of observations to a modeling methods thus is already subsequent to prior interpretational steps.

Our abstract model that serves as an operationalization of the transcendental principle of the primacy of interpretation thus must also provide, or comprise, the transition from differences into proto-symbols. Proto-symbols are not just intensions or classes, they are so to speak non-empiric classes that have been derived from empiric ones by means of idealization. Proto-symbols are developed into symbols by means of the combination of naming and an associated practice, i.e a repeating or reproducible performance, or still in other words, by rule-following. Only on the level of symbols we then may establish a logic, or claiming absolute identity. Here we also meet the reason for the fact that in any real-world context a “pure” logic is not possible, as there are always semantic parts serving as a foundation of its application. Speaking about “truth-values” or “truth-functions” is meaningless, at least. Clearly, identity as a logical form is a secondary quality and thus quite irrelevant for the booting of the capability of experience. Such extended modeling is, of course, not just a single instance, it is itself a multi-leveled thing. It even starts with the those properties of the material arrangement known as body that allow also an informational perspective. The most prominent candidate principle of such a structure is the probabilistic, associative network.

Epistemic modeling thus consists of at least two abstract layers: First, the associative storage of random contexts (see also the chapter “Context” for their generalization), where no purpose is implied onto the materially pre-processed signals, and second, the purposed modeling. I am deeply convinced that such a structure is only way to evade the fallacy of representationalism2. A working actualization of this abstract bi-layer structure may comprise many layers and modules.

Yet, once one accepts the primacy of interpretation, and there is little to say against it, if anything at all, then we are lead directly to epistemic modeling as a mandatory constituent of any interpretive relationship to the world, for primitive operations as well as for the rather complex mental life we experience as humans, with regard to our relationships to the environment as well as with regard to our inner reality. Wittgenstein emphasized in his critical solipsism that the conception of reality as inner reality is the only reasonable one [3]. Epistemic modeling is the only way to keep meaningful contact with the external surrounds.

The Bridge

In its technical parts experience is based on an actualization of epistemic modeling. Later we will investigate the role and the usage of these technical parts in detail. Yet, the gap between modeling, even if conceived as an abstract, epistemic modeling, and experience is so large that we first have to shed some light on the bridge between these concepts. There are some other issues with experience than just the mere technical issues of modeling that are not less relevant for the technical issues, too.

Experience comprises both more active and more passive aspects, both with regard to performance and to structure. Both dichotomies must not be taken as ideally separated categories, of course. Else, the basic distinction into active and passive parts is not a new one either. Kant distinguished receptivity and spontaneity as two complementary faculties that combine in order to bring about what we call cognition. Yet, Leibniz, in contrast, emphasized the necessity of activity even in basic perception; nowadays, his view has been greatly confirmed by the research on sensing in organic (animals) as well as in in-organic systems (robots). Obviously, the relation between activity and passivity is not a simple one, as soon as we are going to leave the bright spheres of language.3

In the structural perspective, experience unfolds in a given space that we could call the space of experiencibility4. That space is spanned, shaped and structured by open and dynamic collections of any kind of theory, model, concept or symbol as well as by the mediality that is “embedding” those. Yet, experience also shapes this space itself. The situation reminds a bit to the relativistic space in physics, or the social space in humans, where the embedding of one space into another one will affect both participants, the embedded as well as the embedding space. These aspects we should keep in mind for our investigation of questions about the mechanisms that contribute to experience and the experience of experience. As you can see, we again refute any kind of ontological stances even to their smallest degrees.5

Now when going to ask about experience and its genesis, there are two characteristics of experience that enforce us to avoid the direct path. First, there is the deep linkage of experience to language. We must get rid of language for our investigation in order to avoid the experience of finding just language behind the language or what we call upfront “experience”; yet, we also should not forget about language either. Second, there is the self-referentiality of the concept of experience, which actually renders it into a strongly singular term. Once there are even only tiny traces of the capability for experience, the whole game changes, burying the initial roots and mechanisms that are necessary for the booting of the capability.

Thus, our first move consists in a reduction and linearization, which we have to catch up with later again, of course. We will achieve that by setting everything into motion, so-to-speak. The linearized question thus is heading towards the underlying mechanisms6:

How do we come to believe that there are facts in the world? 7

What are—now viewed from the outside of language8—the abstract conditions and the practiced moves necessary and sufficient for the actualization­­ of such statements?

Usually, the answer will refer to some kind of modeling. Modeling provides the possibility for the transition from the extensional epistemic level of particulars to the intensional epistemic level of classes, functions or categories. Yet, modeling does not provide sufficient reason for experience. Sure, modeling is necessary for it, but it is more closely related to perception, though also not being equivalent to it. Experience as a kind of cognition thus can’t be conceived as kind of a “high-level perception”, quite contrary to the suggestion of Douglas Hofstadter [4]. Instead, we may conceive experience, in a first step, as the result and the activity around the handling of the conditions of modeling.

Even in his earliest writings, Wittgenstein prominently emphasized that it is meaningless to conceive of the world as consisting from “objects”. The Tractatus starts with the proposition:

The world is everything that is the case.

Cases, in the Tractatus, are states of affairs that could be made explicit into a particular (logical) form by means of language. From this perspective one could derive the radical conclusion that without language there is no experience at all. Despite we won’t agree to such a thesis, language is a major factor contributing to some often unrecognized puzzles regarding experience. Let us very briefly return to the issue of language.

Language establishes its own space of experiencibility, basically through its unlimited expressibility that induces hermeneutic relationships. Probably mainly to this particular experiential sphere language is blurring or even blocking clear sight to the basic aspects of experience. Language can make us believe that there are phenomena as some kind of original stuff, existing “independently” out there, that is, outside the human cognition.9 Yet, there is no such thing like a phenomenon or even an object that would “be” before experience, and for us humans even not before or outside of language. It is even not reasonable to speak about phenomena or objects as if they would exist before experience. De facto, it is almost non-sensical to do so.

Both, objects as specified entities and phenomena at large are consequences of interpretation, in turn deeply shaped by cultural imprinting, and thus heavily depending on language. Refuting that consequence would mean to refute the primacy of interpretation, which would fall into one of the categories of either naive realism or mysticism. Phenomenology as an ontological philosophical discipline is nothing but a mis-understanding (as ontology is henceforth); since phenomenology without ontological parts must turn into some kind of Wittgensteinian philosophy of language, it simply vanishes. Indeed, when already being teaching in Cambridge, Wittgenstein once told a friend to report his position to the visiting Schlick, whom he refused to meet on this occasion, as “You could say of my work that it is phenomenology.” [5] Yet, what Wittgenstein called “phenomenology” is completely situated inside language and its practicing, and despite there might be a weak Kantian echo in his work, he never supported Husserl’s position of synthetic universals apriori. There is even some likelihood that Wittgenstein, strongly feeling to be constantly misunderstood by the members of the Vienna Circle, put this forward in order to annoy Schlick (a bit), at least to pay him back in kind.

Quite in contrast, in a Wittgensteinian perspective facts are sort of collectively compressed beliefs about relations. If everybody believes to a certain model of whatever reference and of almost arbitrary expectability, then there is a fact. This does not mean, however, that we get drowned by relativism. There are still the constraints implied by the (unmeasured and unmeasurable) utility of anticipation, both in its individual and its collective flavor. On the other hand, yes, this indeed means that the (social) future is not determined.

More accurately, there is at least one fact, since the primacy of interpretation generates at least the collectivity as a further fact. Since facts are taking place in language, they do not just “consist” of content (please excuse such awful wording), there is also a pragmatics, and hence there are also at least two different grammars, etc.etc.

How do we, then, individually construct concepts that we share as facts? Even if we would need the mediation by a collective, a large deal of the associative work takes place in our minds. Facts are identifiable, thus distinguishable and enumerable. Facts are almost digitized entities, they are constructed from percepts through a process of intensionalization or even idealization and they sit on the verge of the realm of symbols.

Facts are facts because they are considered as being valid, let it be among a collective of people, across some period of time, or a range of material conditions. This way they turn into kind of an apriori from the perspective of the individual, and there is only that perspective. Here we find the locus situs of several related misunderstandings, such as direct realism, Husserlean phenomenology, positivism, the thing as such, and so on. The fact is even synthetic, either by means of “individual”10 mental processes or by the working of a “collective reasoning”. But, of course, it is by no means universal, as Kant concluded on the basis of Newtonian science, or even as Schlick did in 1930 [6]. There is neither a universal real fact, nor a particular one. It does not make sense to conceive the world as existing from independent objects.

As a consequence, when speaking about facts we usually studiously avoid the fact of risk. Participants in the “fact game” implicitly agree on the abandonment of negotiating affairs of risk. Despite the fact that empiric knowledge never can be considered as being “safe” or “secured”, during the fact game we always behave as if. Doing so is the more or less hidden work of language, which removes the risk (associated with predictive modeling) and replaces it by metaphorical expressibility. Interestingly, here we also meet the source field of logic. It is obvious (see Waves & Words) that language is neither an extension of logics, nor is it reasonable to consider it as a vehicle for logic, i.e. for predicates. Quite to the contrast, the underlying hypothesis is that (practicing) language and (weaving) metaphors is the same thing.11 Such a language becomes a living language that (as Gier writes [5])

“[…] grows up as a natural extension of primitive behavior, and we can count on it most of the time, not for the univocal meanings that philosophers demand, but for ordinary certainty and communication.”

One might just modify Gier’s statement a bit by specifying „philosophers“ as idealistic, materialistic or analytic philosophers.

In “On Certainty” (OC, §359), Wittgenstein speaks of language as expressing primitive behavior and contends that ordinary certainty is “something animal”. This now we may take as a bridge that provides the possibility to extend our asking about concepts and facts towards the investigation of the role of models.

Related to this there is a pragmatist aspect that is worthwhile to be mentioned. Experience is a historicizing concept, much like knowledge. Both concepts are meaningful only in hindsight. As soon as we consider their application, we see that both of them refer only to one half of the story that is about the epistemic aspects of „life“. The other half of the epistemic story and directly implied by the inevitable need to anticipate is predictive or, equivalently, diagnostic modeling. Abstract modeling in turn implies theory, interpretation and orthoregulated rule-following.

Epistemology thus should not be limited to „knowledge“, the knowable and its conditions. Epistemology has explicitly to include the investigation of the conditions of what can be anticipated.

In a still different way we thus may repose the question about experience as the transition from epistemic abstract modeling to the conditions of that modeling. This would include the instantiation of practicable models as well as the conditions for that instantiation, and also the conditions of the application of models.In technical terms this transition is represented by a problematic field: The model selection problem, or in more pragmatic terms, the model (selection) risk.

These two issues, the prediction task and the condition of modeling now form the second toehold of our bridge between the general concept of experience and some technical aspects of the use of models. There is another bridge necessary to establish the possibility of experience, and this one connects the concept of experience with languagability.

The following list provides an overview about the following chapters:

These topics are closely related to each other, indeed so closely that other sequences would be justifiable too. Their interdependencies also demand a bit of patience from you, the reader, as the picture will be complete only when we arrive at the results of modeling.

A last remark may be allowed before we start to delve into these topics. It should be clear by now that any kind of phenomenology is deeply incompatible with the view developed here. There are several related stances, e.g. the various shades of ontology, including the objectivist conception of substance. They are all rendered as irrelevant and inappropriate for any theory about episteme, whether in its machine-based form or regarding human culture, whether as practice or as reflecting exercise.

The Modeling Statement

As the very first step we have to clearly state the goal of modeling. From the outside that goal is pretty clear. Given a set of observations and the respective outcomes, or targets, create a mapping function such that the observed data allow for a reconstruction of the outcome in an optimized manner. Finding such a function can be considered as a simple form of learning if the function is „invented“. In most cases it is not learning but just the estimation of pre-defined parameters.12 In a more general manner we also could say that any learning algorithm is a map L from data sets to a ranked list of hypothesis functions. Note that accuracy is only one of the possible aspects of that optimization. Let us call this for convenience the „outer goal“ of modeling. Would such mapping be perfect within reasonable boundaries, we would have found automatically a possible transition from probabilistic presentation to propositional representation. We could consider the induction of a structural description from observations as completed. So far the secret dream of Hans Reichenbach, Carl Schmid-Hempel, Wesley Salmon and many of their colleagues.

The said mapping function will never be perfect. The reasons for this comprise the complexity of the subject, noise in the measured data, unsuitable observables or any combinations of these. This induces a wealth of necessary steps and, of course, a lot of work. In other words, a considerable amount of apriori and heuristic choices have to be taken. Since a reliable, say analytic mapping can’t be found, every single step in the value chain towards the model at once becomes questionable and has to be checked for its suitability and reliability. It is also clear that the model does not comprise just a formula. In real-world situations a differential modeling should be performed, much like in medicine a diagnosis is considered to be complete only if a differential diagnosis is included. This comprises the investigation of the influence of the method’s parameterization onto the results. Let us call the whole bunch of respective goals the „inner goals“ of modeling.

So, being faced with the challenge of such empirical mess, how does the statement about the goals of the „inner modeling“ look like? We could for instance demand to remove the effects of the shortfalls mentioned above, which cause the imperfect mapping: complexity of the subject, noise in the measured data, or unsuitable observables.

To make this more concrete we could say, that the inner goals of modeling consist in a two-fold (and thus synchronous!) segmentation of the data, resulting in the selection of the proper variables and in the selection of the proper records, where this segmentation is performed under conditions of a preceding non-linear transformation of the embedding reference system. Ideally, the model identifies the data for which it is applicable. Only for those data then a classification is provided. It is pretty clear that this statement is an ambitious one. Yet, we regard it as crucial for any attempt to step across our epistemic bridge that brings us from particular data to the quality of experience. This transition includes something that is probably better known by the label „induction“. Thus, we finally arrive at a short statement about the inner goals of modeling:

How to conclude and what to conclude from measured data?

Obviously, if our data are noisy and if our data include irrelevant values any further conclusion will be unreliable. Yet, for any suitable segmentation of the data we need a model first. From this directly follows that a suitable procedure for modeling can’t consist just from a single algorithm, or a „one-shot procedure“. Any instance of single-step approaches are suffering from lots of hidden assumptions that influence the results and its properties in unforeseeable ways. Modeling that could be regarded as more than just an estimation of parameters by running an algorithm is necessarily a circular and—dependent on the amount of variables­—possibly open-ended process.

Predictability and Predictivity

Let us assume a set of observations S obtained from an empirical process P. Then ­­­this process P should be called “predictable” if the results of the mapping function f(m) that serves as an instance of a hypothesis h from the space of hypotheses H coincides with the outcome of the process P in such a way that f(m) forms an expectation with a deviation d<ε for all f(m). In this case we may say that f(m) predicts P. This deviation is also called “empirical risk”, and the purpose of modeling is often regarded as minimizing the empirical risk (ERM).

There are then two important questions. Firstly, can we trust f(m), since f(m) has been built on a limited number of observations? Secondly, how can we make f(m) more trustworthy, given the limitation regarding the data? Usually, these questions are handled under the label of validation. Yet, validation procedures are not the only possible means to get an answer here. It would be a misunderstanding to think that it is the building or construction of a model that is problematic.

The first question can be answered only by considering different models. For obtaining a set of different models we could apply different methods. That would be o.k. if prediction would be our sole interest. Yet, we also strive for detecting structural insights. And from that perspective we should not, of course, use different methods to get different models. The second possibility for addressing the first question is to use different sub-samples, which turns simple validation into a cross-validation. Cross-validation provides an expectation for the error (or the risk). Yet, in order to compare across methods one actually should describe the expected decrease in “predictive power”13 for different sample sizes (independent cross-validation per sample size). The third possibility for answering question (1) is related to the the former and consists by adding noised, surrogated (or simulated) data. This prevents the learning mechanism from responding to empirically consistent, but nevertheless irrelevant noisy fluctuations in the raw data set. The fourth possibility is to look for models of equivalent predictive power, which are, however, based on a different set of predicting variables. This possibility is not accessible for most statistical approaches such like Principal Component Analysis (PCA). Whatever method is used to create different models, models may be combined into a “bag” of models (called “bagging”), or, following an even more radical approach, into an ensemble of small and simple models. This is employed for instance in the so-called Random Forest method.

Commonly, if a model passes cross-validation successfully, it is considered to be able to “generalize”. In contrast to the common practice, Poggio et al. [7] demonstrated that standard cross-validation has to be extended in order to provide a characterization of the capability of a model to generalize. They propose to augment

CV1oo stability with stability of the expected error and stability of the empirical error to define a new notion of stability, CVEEE1oo stability.

This makes clear that Poggio’s et al. approach is addressing the learning machinery, not any longer just the space of hypotheses. Yet, they do not take the free parameters of the method into account. We conclude that their proposed approach still remains an uncritical approach. Thus I would consider such a model as not completely trustworthy. Of course, Poggio et al. are definitely pointing towards the right direction. We recognize a move away from naive realism and positivism, instead towards a critical methodology of the conditional. Maybe, philosophy and natural sciences find common grounds again by riding the information tiger.

Checking the stability of the learning procedure leads to a methodology that we called “data experiments” elsewhere. The data experiments do NOT explore the space of hypotheses, at least not directly. Instead they create a map for all possible models. In other words, instead of just asking about the predictability we now ask about the differential predictivity of in the space of models.

From the perspective of a learning theory Poggio’s move can’t be underestimated. Statistical learning theory (SLT)[8] explicitly assumes that a direct access to the world is possible (via identity function, perfectness of the model). Consequently, SLT focuses (only) on the reduction of the empirical risk. Any learning mechanism following the SLT is hence uncritical about its own limitation. SLT is interested in the predictability of the system-as-such, thereby not rather surprisingly committing the mistake of pre-19th century idealism.

The Independence Assumption

The independence assumption [I.A.], or linearity assumption, acts mainly on three different targets. The first of them is the relationship between observer and observed, while its second target is the relationship between observables. The third target finally regards the relation between individual observations. This last aspect of the I.A. is the least problematic one. We will not discuss this any further.

Yet, the first and the second one are the problematic ones. The I.A. is deeply buried into the framework of statistics and from there it made its way into the field of explorative data analysis. There it can be frequently met for instance in the geometrical operationalization of similarity, the conceptualization of observables as Cartesian dimensions or independent coefficients in systems of linear equations, or as statistical kernels in algorithms like the Support Vector Machine.

Of course, the I.A. is just one possible stance towards the treatment of observables. Yet, taking it as an assumption we will not include any parameter into the model that reflects the dependency between observables. Hence, we will never detect the most suitable hypothesis about the dependency between observables. Instead of assuming the independence of variables throughout an analysis it would be methodological much more sound to address the degree of dependency as a target. Linearity should not be an assumption, it should be a result of an analysis.

The linearity or independence assumption carries another assumption with it under its hood: the assumption of the homogeneity of variables. Variables, or assignates, are conceived as black-boxes, with unknown influence onto the predictive power of the model. Yet, usually they exert very different effects on the predictive power of a model.

Basically, it is very simple. The predictive power of a model depends on the positive predictive value AND the negative predictive value, of course; we may also use closely related terms sensitivity and specificity. Accordingly, some variables contribute more to the positive predictive value, other help to increase the negative predictive value. This easily becomes visible if we perform a detailed type-I/II error analysis. Thus, there is NO way to avoid testing those combinations explicitly, even if we assume the initial independence of variables.

As we already mentioned above, the I.A. is just one possible stance towards the treatment of observables. Yet, its status as a methodological sine qua non that additionally is never reflected upon renders it into a metaphysical assumption. It is in fact an irrational assumption, which induces serious costs in terms of the structural richness of the results. Taken together, the independence assumption represents one of the most harmful habits in data analysis.

The Model Selection Problem

In the section “Predictability and Predictivity” above we already emphasized the importance of the switch from the space of hypotheses to the space of models. The model space unfolds as a condition of the available assignates, the size of the data set and the free parameters of the associative (“modeling”) method. The model space supports a fundamental change of the attitude towards a model. Based on the denial of the apriori assumption of independence of observables we identified the idea of a singular best model as an ill-posed phantasm. We thus move onwards from the concept of a model as a mapping function towards ensembles of structurally heterogeneous models that together as a distinguished population form a habitat, a manifold in the sphere of the model space. With such a structure we neither need to arrive at a single model.

Methods, Models, Variables

The model selection problem addresses two sets of parameters that are actually quite different from each other. Model selection should not be reduced to the treatment of the first set, of course, as it happens at least implicitly for instance in [9]. The first set refers to the variables as known from the data, sometimes also called the „predictors“. The selection of the suitable variables is the first half of the model selection problem. The second set comprises all free parameters of the method. From the methodological point of view, this second set is much more interesting than the first one. The method’s parameters are apriori conditions to the performance of the method, which additionally usually remain invisible in the results, in contrast to the selection of variables.

For associative methods like SOM or other clustering variables the effect of de-/selecting variables can be easily described. Just take all the objects in front of you, for instance on the table, or in your room. Now select an arbitrary purpose and assign this purpose as a degree of support to those objects. For now, we have constructed the target. Now we go “into” the objects, that is, we describe them by a range of attributes that are present in most of the objects. Dependent on the selection of  a subset from these attributes we will arrive at very different groups. The groups now represent the target more or less, that’s the quality of the model. Obviously, this quality differs across the various selections of attributes. It is also clear that it does not help to just use all attributes, because some of the attributes just destroy the intended order, they add noise to the model and decrease its quality.

As George observes [10], since its first formulation in the 1960ies a considerable, if not large number of proposals for dealing with the variable selection problem have been proposed. Although George himself seem to distinguish the two sets of parameters, throughout the discussion of the different approaches he always refers just to the first set, the variables as included in the data. This is not a failure of the said author, but a problem of the statistical approach. Usually, the parameters of statistical procedures are not accessible, as any analytic procedure, they work as they work. In contrast to Self-organizing Maps, and even to Artificial Neural Networks (ANN) or Genetic Procedures, analytic procedures can’t be modified in order to achieve a critical usage. In some way, with their mono-bloc design they perfectly fit into representationalist fallacy.

Thus, using statistical (or other analytic) procedures, the model selection problem consists of the variable selection problem and the method selection problem. The consequences are catastrophic: If statistical methods are used in the context of modeling, the whole statistical framework turns into a black-box, because the selection of a particular method can’t be justified in any respect. In contrast to that quite unfavorable situation, methods like the Self-Organizing Map provide access to any of its parameters. Data experiments are only possible with methods like SOM or ANN. Not the SOM or the ANN are „black-boxes“, but the statistical framework must be regarded as such. Precisely this is also the reason for the still ongoing quarrels about the foundations of the statistical framework. There are two parties, the frequentists and the bayesians. Yet, both are struck by the reference class problem [11]. From our perspective, the current dogma of empirical work in science need to be changed.

The conclusion is that statistical methods should not be used at all to describe real-world data, i.e. for the modeling of real-world processes. They are suitable only within a fully controlled setting, that is, within a data experiment. The first step in any kind of empirical analysis thus must consist of a predictive modeling that includes the model selection task.14

The Perils of Universalism

Many people dealing with the model selection task are mislead by a further irrational phantasm, caused by a mixture of idealism and positivism. This is the phantasm of the single best model for a given purpose.

Philosophers of science long ago recognized, starting with Hume and ultimately expressed by Quine, that empirical observations are underdetermined. The actual challenge posed by modeling is given by the fact of empirical underdetermination. Goodman felt obliged to construct a paradox from it. Yet, there is no paradox, there is only the phantasm  of the single best model. This phantasm is a relic from the Newtonian period of science, where everybody thought the world is made by God as a miraculous machine, everything had to be well-defined, and persisting contradictions had to be rated as evil.

Secondarily, this moults into the affair of (semantic) indetermination. Plainly spoken, there are never enough data. Empirical underdetermination results in the actuality of strongly diverging models, which in turn gives rise to conflicting experiences. For a given set of data, in most cases it is possible to build very different models (ceteris paribus, choosing different sets of variables) that yield the same utility, or say predictive power, as far as this predictive power can be determined by the available data sample at all. Such ceteris paribus difference will not only give rise to quite different tracks of unfolding interpretation, it is also certainly in the close vicinity of Derrida’s deconstruction.

Empirical underdetermination thus results in a second-order risk, the model selection risk. Actually, the model selection risk is the only relevant risk. We can’t change the available data, and data are always limited, sometimes just by their puniness, sometimes by the restrictions to deal with them. Risk is not attached to objects or phenomena, because objects “are not there” before interpretation and modeling. Risk is attached only to models. Risk is a particular state of affair, and indeed a rather fundamental one. Once a particular model would tell us that there is an uncertainty regarding the outcome, we could take measures to deal with that uncertainty. For instance, we hedge it, or organize some other kind of insurance for it. But hedging has to rely on the estimation of the uncertainty, which is dependent on the expected predictive power of the model, not just the accuracy of the model given the available data from a limited sample.

Different, but equivalent selections of variables can be used to create a group of models as „experts“ on a given task to decide on. Yet, the selection of such „experts“ is not determinable on the basis of the given data alone. Instead, further knowledge about the relation of the variables to further contexts or targets needs to be consulted.

Universalism is usually unjustifiable, and claiming it instead usually comes at huge costs, caused by undetectable blindnesses once we accept it. In contemporary empiricism, universalism—and the respecting blindness—is abundant also with regard to the role of the variables. What I am talking about here is context, mediality and individuality, which, from a more traditional formal perspective, is often approximated by conditionality. Yet, it more and more becomes clear that the Bayesian mechanisms are not sufficient to get the complexity of the concept of variables covered. Just to mention the current developments in the field of probability theory I would like to refer to Brian  Weatherson, who favors and develops the so-called dynamic Keynesian models of uncertainty. [10] Yet, we regard this only as a transitional theory, despite the fact that it will have a strong impact on the way scientists will handle empiric data.

The mediating individuality of observables (as deliberately chosen assignates, of course) is easy to observe, once we drop the universalism qua independence of variables. Concerning variables, universalism manifests in an indistinguishability of the choices made to establish the assignates with regard to their effect onto the system of preferences. Some criteria C will induce the putative objects as distinguished ones only, if another assignate A has pre-sorted it. Yet, it would be a simplification to consider the situation in the Bayesian way as P(C|A). The problem with it is that we can’t say anything about the condition itself. Yet, we need to “play” (actually not “control”) with the conditionability, the inner structure of these conditions. As it is with the “relation,” which we already generalized into randolations, making it thereby measurable, we also have to go into the condition itself in order to defeat idealism even on the structural level. An appropriate perspective onto variables would hence treat it as a kind of media. This mediality is not externalizable, though, since observables themselves precipitate from the mediality, then as assignates.

What we can experience here is nothing else than the first advents of a real post-modernist world, an era where we emancipate from the compulsive apriori of independence (this does not deny, of course, its important role in the modernist era since Descartes).

Optimization

Optimizing a model means to select a combination of suitably valued parameters such that the preferences of the users in terms of risk and implied costs are served best. The model selection problem is thus the link between optimization problems, learning tasks and predictive modeling. There are indeed countless many procedures for optimization. Yet, the optimization task in the context of model selection is faced with a particular challenge: its mere size. George begins his article in the following way:

A distinguishing feature of variable selection problems is their enormous size. Even with moderate values of p, computing characteristics for all 2p models is prohibitively expensive and some reduction of the model space is needed.

Assume for instance a data set that comprises 50 variables. From that 1.13e15 models are possible, and assume further that we could test 10‘000 models per second, then we still would need more than 35‘000 years to check all models. Usually, however, building a classifier on a real-world problem takes more than 10 seconds, which would result in 3.5e9 years in the case of 50 variables. And there are many instances where one is faced with much more variables, typically 100+, and sometimes going even into the thousands. That’s what George means by „prohibitively“.

There are many proposals to deal with that challenge. All of them fall into three classes: they use either (1) some information theoretic measure (AIC, BIC, CIC etc. [11]), or (2) they use likelihood estimators, i.e. they conceive of parameters themselves as random variables, or (3) they are based of probabilistic measures established upon validation procedures. Particularly the instances from the first two of those classes are hit by the linearity and/or the independence assumption, and also by unjustified universalism. Of course, linearity should not be an assumption, it should be a result, as we argued above. Hence, there is no way to avoid the explicit calculation of models.

Given the vast number of combinations of symbols it appears straightforward to conceive of the model selection problem from an evolutionary perspective. Evolution always creates appropriate and suitable solutions from the available „evolutionary model space“. That space is of size 230‘000 in the case of humans, which is a „much“ larger number than the number of species ever existent on this planet. Not a single viable configuration could have been found by pure chance. Genetics-based alignment and navigation through the model space is much more effective than chance. Hence, the so-called genetic algorithms might appear on the radar as the method of choice .

Genetics, revisited

Unfortunately, for the variable selection problem genetic algorithms15 are not suitable. The main reason for this is still the expensive calculation of single models. In order to set up the genetic procedure, one needs at least 500 instances to form the initial population. Any solution for the variable selection problem should arrive at a useful solution with less than 200 explicitly calculated models. The great advantage of genetic algorithms is their capability to deal with solution spaces that contain local extrema. They can handle even solution spaces that are inhomogeneously rugged, simply for the reason that recombination in the realm of the symbolic does not care about numerical gradients and criteria. Genetic procedures are based on combinations of symbolic encodings. The continuous switch between the symbolic (encoding) and the numerical (effect) are nothing else than the pre-cursors of the separation between genotypes and phenotypes, without which there would not be even simple forms of biological life.

For that reason we developed a specialized instantiation of the evolutionary approach (implemented in SomFluid). Described very briefly we can say that we use evolutionary weights as efficient estimators of the maximum likelihood of parameters. The estimates are derived from explicitly calculated models that vary (mostly, but not necessarily ceteris paribus) with respect to the used variables. As such estimates, they influence the further course of the exploration of the model space in a probabilistic manner. From the perspective of the evolutionary process, these estimates represent the contribution of the respective parameter to the overall fitness of the model. They also form a kind of long-term memory within the process, something like a probabilistic genome. The short-term memory in this evolutionary process is represented by the intensional profiles of the nodes in the SOM.

For the first initializing step, the evolutionary estimates can be estimated themselves by linear procedure like the PCA, or by non-parametric procedures (Kruskal-Wallis, Mann-Whitney, etc.), and are available after only a few explicitly calculated models (model here means „ceteris paribus selection of variables“).

These evolutionary weights reflect the changes of the predictive power of the model when adding or removing variables to the model. If the quality of the model improves, the evolutionary weight increases a bit, and vice versa. In other words, not the apriori parameters of the model are considered, but just the effect of the parameters. The procedure is an approximating repetition: fix the parameters of the model (method specific, sampling, variables), calculate the model, record the change of the predictive power as compared to the previous model.

Upon the probabilistic genome of evolutionary weights there are many different ways one could take to implement the “evo-devo” mechanisms, let it be the issue of how to handle the population (e.g. mixing genomes, aspects of virtual ecology, etc.), or the translational mechanisms, so to speak the “physiologies” that are used to proceed from the genome to an actual phenotype.

Since many different combinations are being calculated, the evolutionary weight represents the expectable contribution of a variable to the predictive power of the model, under whatsoever selection of variables that represents a model. Usually, a variable will not improve the quality of the model irrespective to the context. Yet, if a variable indeed would do so, we not only would say that its evolutionary weight equals 1, we also may conclude that this variable is a so-called confounder. Including a confounder into a model means that we use information about the target, which will not be available when applying the model for classification of new data; hence the model will fail disastrously. Usually, and that’s just a further benefit of dropping the independence-universalism assumption, it is not possible for a procedure to identify confounders by itself. It is also clear that the capability to do so is one of the cornerstones of autonomous learning, which includes the capability to set up the learning task.

Noise, and Noise

Optimization raises its own follow-up problems, of course. The most salient of these is so-called overfitting. This means that the model gets suitably fitted to the available observations by including a large number of parameters and variables, but it will return wrong predictions if it is going to be used on data that are even only slightly different from the observations used for learning and estimating the parameters of the model. The model represents noise, random variations without predictive value.

As we have been describing above, Poggio believes that his criterion of stability overcomes the defects with regard to the model as a generalization from observations. Poggio might be too optimistic, though, since his method still remains to be confined to the available observations.

In this situation, we apply a methodological trick. The trick consists in turning the problem into a target of investigation, which ultimately translates the problem into an appropriate rule. In this sense, we consider noise not as a problem, but as a tool.

Technically, we destroy the relevance of the differences between the observations by adding noise of a particular characteristic. If we add a small amount of normally distributed noise, nothing will probably change, but if we add a lot of noise, perhaps even of secondarily changing distribution, this will result in the mere impossibility to create a stable model at all. The scientific approach is to describe the dependency between those two unknowns, so to say, to set up a differential between noise (model for the unknown) and the model (of the unknown). The rest is straightforward: creating various data sets that have been changed by imposing different amounts of noise of a known structure, and plotting the predictive power against the the amount of noise. This technique can be combined by surrogating the actual observations via a Cholesky decomposition.

From all available models then those are preferred that combine a suitable predictive power with suitable degree of stability against noise.

Résumé

In this section we have dealt with the problematics of selecting a suitable subset from all available observables (neglecting for the time being that model selection involves the method’s parameters, too). Since mostly we have more observables at our disposal than we actually presume to need, the task could be simply described as simplification, aka Occam’s Razor. Yet, it would be terribly naive to first assume linearity and then selecting the “most parsimonious” model. It is even cruel to state [9, p.1]:

It is said that Einstein once said

Make things as simple as possible, but not simpler.

I hope that I succeeded in providing some valuable hints for accomplishing that task, which above all is not a quite simple one. (etc.etc. :)

Describing Classifiers

The gold standard for describing classifiers is believed to be the Receiver-Operator-Characteristic, or short, ROC. Particularly, the area under the curve is compared across models (classifiers). The following Figure 1demonstrates the mechanics of the ROC plot.

Figure 1: Basic characteristics of the ROC curve (reproduced from Wikipedia)

Figure 2. Realistic ROC curves, though these are typical for approaches that are NOT based on sub-group structures or ensembles (for instance ANN or logistic regression). Note that models should not be selected on the basis of the Area-under-Curve. Instead the true positive rate (sensitivity) at a false positive rate FPR=0 should be used for that. As a further criterion that would indicate the stability of of the model one could use the slope of the curve at FPR=0.

Utilization of Information

There is still another harmful aspect of the universalistic stance in data analysis as compared to a pragmatic stance. This aspect considers the „reach“ of the models we are going to build.

Let us assume that we would accept a sensitivity of approx 80%, but we also expect a specificity of >99%. In other words, the cost for false positives (FP) are defined as very high, while the costs for false negatives (FN, not recognized preferred outcomes) are relatively low. The ratio of costs for error, or in short the error cost ratio err(FP)/err(FN) is high.

Table 1a: A Confusion matrix for a quite performant classifier.

Symbols: test=model; TP=true positives; FP=false positives; FN=false negatives; TN=true negatives; ppv=positive predictive value, npv=negative predictive value. FN is also called type-I-error (analogous to “rejecting the null hypothesis when it is true”), while FP is called type-II-error (analogous to “accepting the null hypothesis when it is false”), and FP/(TP+FP) is called type-II-error-rate, sometime labeled as β-error, where (1-β) is the called the “power” of the test or model. (download XLS example)

condition Pos

condition Neg

test Pos

100 (TP)

3 (FP)

0.971

ppv

test Neg

28 (FN)

1120 (TN)

0.976

npv

0.781

0.997

sensitivity

specificity

Let us further assume that there are observations of our preferred outcome that we can‘t distinguish well from other cases of the opposite outcome that we try to avoid. They are too similar, and due to that similarity they form a separate group in our self-organizing map. Let us assume that the specificity of these clusters is at 86% only and the sensitivity is at 94%.

Table 1b: Confusion matrix describing a sub-group formed inside the SOM, for instance as it could be derived from the extension of a “node”.

condition Pos

condition Neg

test Pos

0 (50)

0 (39)

0.0 (0.56)

ppv

test Neg

50 (0)

39 (0)

0.44 (1.0)

npv

0.0 (1.0)

1.0 (0.0)

sensitivity

specificity

Yet, this cluster would not satisfy our risk attitude. If we would use the SOM as a model for classification of new observations, and the new observation would fall into that group (by means of similarity considerations) the implied risk would violate our attitude. Hence, we have to exclude such clusters. In the ROC this cluster represents a value further to the right on the specificity (X-) axis.

Note that in the case of acceptance of the subgroup as a representative for a contributor of a positive prediction, the false negative is always 0 aposteriori, and in case of denial the true positives is always set to 0 (accordingly the figures for the condition negative).

There are now several important points to that, which are related to each other. Actually, we should be interested only in such sub-groups with specificity close to 1, such that our risk attitude is well served. [13] Likewise, we should not try to optimize the quality of the model across the whole range of the ROC, but only for the subgroups with acceptable error cost ratio. In other words, we use the available information in a very specific manner.

As a consequence, we have to set the ECR before calculating the model. Setting the ECR after the selection of a model results in a waste of information, time and money. For this reason it is strongly indicated to use methods that are based on building a representation by sub-groups. This again rules out statistical methods as they always take into account all available data. Zytkow calls such methods empirically empty [14].

The possibility to build models of a high specificity is a huge benefit of sub-group based methods like the SOM.16 To understand this better let us assume we have a SOM-based model with the following overall confusion matrix.

condition Pos

condition Neg

test Pos

78

1

0.9873

ppv

test Neg

145

498

0.7745

npv

0.350

0.998

sensitivity

specificity

That is, the model recognizes around 35% of all preferred outcomes. It does so on the basis of sub-groups that all satisfy the respective ECR criterion. Thus we know that the implied risk of any classification is very low too. In other words, such models recognize whether it is allowed to apply them. If we apply them and get a positive answer, we also know that it is justified to apply them. Once the model identifies a preferred outcome, it does so without risk. This lets us miss opportunities, but we won’t be trapped by false expectations. Such models we could call auto-consistent.

In a practical project that has been aiming at an improvement of the post-surgery risk classification of patients (n>12’000) in a hospital we have been able to demonstrate that the achievable validated rate of implied risk can be as low as <10e-4. [15] Such a low rate is not achievable by statistical methods, simply because there are far too few incidents of wrong classifications. The subjective cut-off points in logistic regression are not quite suitable for such tasks.

At the same time, and that’s probably even more important, we get a suitable segmentation of the observations. All observations that can be identified as positive do not suffer from any risk. Thus, we can investigate the structure of the data for these observations, e.g. as particular relationships between variables, such as correlations etc. But, hey, that job is already done by the selection of the appropriate set of variables! In other words, we not only have a good model, we also have found the best possibility for a multi-variate reduction of noise, with a full consideration of the dependencies between variables. Such models can be conceived as reversed factorial experimental design.

The property of auto-consistency offers a further benefit as it is scalable, that is, “auto-consistent” is not a categorical, or symbolic, assignment. It can be easily measured as sensitivity under the condition of specificity > 1-ε, ε→0. Thus, we may use it as a random measure (it can be described by its density) or as a scale of reference in case of any selection task among sub-populations of models. Additionally, if the exploration of the model space does not succeed in finding a model of a suitable degree of auto-consistency, we may conclude that the quality of the data is not sufficient. Data quality is a function of properly selected variables (predictors) and reproducible measurement. We know of no other approach that would be able to inform about the quality of the data without referring to extensive contextual “knowledge”. Needless to say that such knowledge is never available and encodable.

There are only weak conditions that need to be satisfied. For instance, the same selection of variables need to be used within a single model for all similarity considerations. This rules out all ensemble methods, as far as different selections of variables are used for each item in the ensemble; for instance decision tree methods (a SOM with its sub-groups is already “ensemble-like”, yet, all sub-groups are affected by the same selection of variables). It is further required to use a method that performs the transition from extensions to intensions on a sub-group level,which rules out analytic methods, and even Artificial Neural Networks (ANN). The way to establish auto-consistent models is not possible for ANN. Else, the error-cost ratio must be set before calculating the model, and the models have to be calculated explicitly, which removes linear methods from the list, such as Support Vector Machines with linear kernels (regression, ANN, Bayes). If we want to access the rich harvest of auto-consistent models we have to drop the independence hypothesis and we have to refute any kind of universalism. But these costs are rather low, indeed.

Observations and Probabilities

Here we developed a particular perspective onto the transition from observations to intensional representations. There are of course some interesting relationships of our point of view to the various possibilities of “interpreting” probability (see [16] for a comprehensive list of “interpretations” and interesting references). We also provide a new answer to Hume’s problem of induction.

Hume posed the question, how often should we observe a fact until we could consider it as lawful? This question, called the “problem of induction” points to the wrong direction and will trigger only irrelevant answer. Hume, living still in times of absolute monarchism, in a society deeply structured by religious beliefs, established a short-cut between the frequency of an observation and its propositional representation. The actual question, however, is how to achieve what we call an “observation”.

In very simple, almost artificial cases like the die there is nothing to interpret. The die and its values are already symbols. It is in some way inadequate to conceive of a die or of dicing as an empirical issue. In fact, we know before what could happen. The universe of the die consists of precisely 6 singular points.

Another extreme are so-called single-case observations of structurally rich events, or processes. An event, or a setting should be called structurally rich, if there are (1) many different outcomes, and (2) many possible assignates to describe the event or the process. Such events or processes will not produce any outcome that is could be expected by symbolic or formal considerations. Obviously, it is not possible to assign a relative frequency to a unique, a singular, or a non-repeatable event. Unfortunately, however, as Hájek points out [17], any actual sequence can be conceived of as a singular event.

The important point now is that single-case observations are also not sufficiently describable as an empirical issue. Ascribing propensities to objects-in-the-world demands for a wealth of modeling activities and classifications, which have to be completed apriori to the observation under scrutiny. So-called single-case propensities are not a problem of probabilistic theory, but one of the application of intensional classes and their usage as means for organizing one’s own expectations. As we said earlier, probability as it is used in probability theory is not a concept that could be applied meaningful to observations, where observations are conceived of as primitive “givens”. Probabilities are meaningful only in the closed world of available subjectively held concepts.

We thus have to distinguish between two areas of application for the concept of probability: the observational part, where we build up classes, and the anticipatory part, where we are interested in a match of expectations and actual outcomes. The problem obviously arises by mixing them through the notion of causality.17 Yet, there is absolutely no necessity between the two areas. The concept of risk probably allows for a resolution of the problems, since risk always implies a preceding choice of a cost function, which necessarily is subjective. Yet, the cost function and the risk implied by a classification model is also the angle point for any kind of negotiation, whether this takes place on an material, hence evolutionary scale, or within a societal context.

The interesting, if not salient point is that the subjectively available intensional descriptions and classes are dependent on ones risk attitude. We may observe the same thing only  if we have acquired the same system of related classes and the same habits of using them. Only if we apply extreme risk aversion we will achieve a common understanding about facts (in the Wittgensteinian sense, see above). This then is called science, for instance. Yet, it still remains a misunderstanding to equate this common understanding with objects as objects-out-there.

The problem of induction thus must be considered as a seriously  ill-posed problem. It is a problem only for idealists (who then solve it in a weird way), or realists that are naive against the epistemological conditions of acting in the world. Our proposal for the transition from observations to descriptions is based on probabilism on both sides, yet, on either side there is a distinct flavor of probabilism.

Finally, a methodological remark shall be allowed, closely related to what we already described in the section about “noise” above. The perspective onto “making experience” that we have been proposing here demonstrates a significant twist.

Above we already mentioned Alan Hájek’s diagnosis that the frequentist and the Bayesian interpretation of probabilities suffer from the reference class problem. In this section we extended Hájek’s concerns to the concept of propensity. Yet, if the problem shows a high prevalence we should not conceive it as a hurdle but should try to treat it dynamically as a rule.The reference class is only a problem as long as (1) either the actual class is required as an external constant, or (2) the abstract concept of the class is treated as a fixed point. According to the rule of Lagrange-Deleuze, any constant can be rewritten into a procedure (read: rules) and less problematic constants. Constants, or fixed points on a higher abstract level are less problematic, because the empirically grounded semantics vanishes.

Indeed, the problem of the reference class simply disappears if we put the concept of the class, together with all the related issues of modeling, as the embedding frame, the condition under which any notion of probability only can make sense at all. The classes itself are results of “rule-following”, which  admittedly is blind, but whose parameters are also transparently accessible. In this way, probabilistic interpretation is always performed in a universe, that is closed and in principle fully mapped. We need the probabilistic methods just because that universe is of a huge size. In other words, the space of models is a Laplacean Universe.

Since statistical methods and similar interpretations of probability are analytical techniques, our proposal for a re-positioning of statistics into such a Laplacean Universe is also well aligned with the general habit of Wittgenstein’s philosophy, which puts practiced logic (quasi-logic) second to performance.

The disappearance of the reference class problem should be expected if our relations to the world are always mediated through the activity with abstract, epistemic modeling. The usage of probability theory as a “conceptual game” aiming for sharing diverging attitudes towards risks appears as nothing else than just a particular style of modeling, though admittedly one that offers a reasonable rate of success.

The Result of Modeling

It should be clear by now, that the result of modeling is much more than just a single predictive model. Regardless whether we take the scientific perspective or a philosophical vantage point, we need to include operationalizations of the conditions of the model, that reach beyond the standard empirical risk expressed as “false classification”. Appropriate modeling provides not only a set of models with well-estimated stability and of different structures; a further goal is to establish models that are auto-consistent.

If the modeling employs a method that exposes its parameters, we even can avoid the „method hell“, that is, the results are not only reliable, they are also valid.

It is clear that only auto-consistent models are useful for drawing conclusions and in building up experience. If variables are just weighted without actually being removed, as for instance in approaches like the Support Vector Machines, the resulting methods are not auto-consistent. Hence, there is no way towards a propositional description of the observed process.

Given the population of explicitly tested models it is also possible to describe the differential contribution of any variable to the predictive power of a model. The assumption of neutrality or symmetry of that contribution, as it is for instance applied in statistical learning, is a simplistic perspective onto the variables and the system represented by them.

Conclusion

In this essay we described some technical aspects of the capability to experience. These technical aspects link the possibility for experience to the primacy of interpretation that gets actualized as the techné of anticipatory, i.e. predictive or diagnostic modeling. This techné does not address the creation or derivation of a particular model by means of employing one or several methods. The process of building a model could be fully automated anyway. Quite differently, it focuses the parametrization, validation, evaluation and application of models, particularly with respect to the task of extract a rule from observational data. This extraction of rules must not be conceived as a “drawing of conclusions” guided by logic. It is a constructive activity.

The salient topics in this practice are the selection of models and the description of the classifiers. We emphasized that the goal of modeling should not be conceived as the task of finding a single best model.

Methods like the Self-organizing Map which are based on sub-group segmentation of the data can be used to create auto-consistent models, which represent also an optimally de-noised subset of the measured data. This data sample could be conceived as if it would have been found by a factorial experimental design. Thus, auto-consistent models also provide quite valuable hints for the setup of the Taguchi method of quality assurance, which could be seen as a precipitation of organizational experience.

In the context of exploratory investigation of observational data one first has to determine the suitable observables (variables, predictors) and, by means of the same model(s), the suitable segment of observations before drawing domain-specific conclusions. Such conclusions are often expressed as contrasts in location or variation. In the context of designed experiments as e.g. in pharmaceutical research one first has to check the quality of the data, then to de-noise the data by removing outliers by means of the same data segmentation technique, before again null hypotheses about expected contrasts could be tested.

As such, auto-consistent models provide a perfect basis for learning and for extending the “experience” of an epistemic individual. According to our proposals this experience does not suffer from the various problems of traditional Humean empirism (the induction problem), or contemporary (defective) theories of probabilism (mainly the problem of reference classes). Nevertheless, our approach remains fully empirico-epistemological.

Notes

1. As many other philosophers Lyotard emphasized the indisputability of an attention for the incidential, not as a perception-as, but as an aisthesis, as a forming impression. see: Dieter Mersch, ›Geschieht es?‹ Ereignisdenken bei Derrida und Lyotard. available online, last accessed May 1st, 2012. Another recent source arguing into the same direction is John McDowell’s “Mind and World” (1996).

2. The label “representationalism” has been used by Dreyfus in his critique of symbolic AI, the thesis of the “computational mind” and any similar approach that assumes (1) that the meaning of symbols is given by their reference to objects, and (2) that this meaning is independent of actual thoughts, see also [2].

3. It would be inadequate to represent such a two-fold “almost” dichotomy as a 2-axis coordinate system, even if such a representation would be a metaphorical one only; rather, it should be conceived as a tetraedic space, given by two vectors passing nearby without intersecting each other. Additionally, the structure of that space must not expected to be flat, it looks much more like an inhomogeneous hyperbolic space.

4. “Experiencibility” here not understood as an individual capability to witness or receptivity, but as the abstract possibility to experience.

5. In the same way we reject Husserl’s phenomenology. Phenomena, much like the objects of positivism or the thing-as-such of idealism, are not “out there”, they are result of our experiencibility. Of course, we do not deny that there is a materiality that is independent from our epistemic acts, but that does not explain or describe anything. In other words we propose go subjective (see also [3]).

6. Again, mechanism here should not be misunderstood as a single deterministic process as it could be represented by a (trivial) machine.

7. This question refers to the famous passage in the Tractatus, that “The world is everything that is the case.“ Cases, in the terminology of the Tractatus, are facts as the existence of states of affairs. We may say, there are certain relations. In the Tractatus, Wittgenstein excluded relations that could not be explicated by the use of symbols., expressed by the 7th proposition: „Whereof one cannot speak, thereof one must be silent.“

8. We must step outside of language in order to see the working of language.

9. We just have to repeat it again, since many people develop misunderstandings here. We do not deny the material aspects of the world.

10. “individual” is quite misleading here, since our brain and even our mind is not in-divisable in the atomistic sense.

11. thus, it is also not reasonable to claim the existence of a somehow dualistic language, one part being without ambiguities and vagueness, the other one establishing ambiguity deliberately by means of metaphors. Lakoff & Johnson started from a similar idea, yet they developed it into a direction that is fundamentally incompatible with our views in many ways.

12. Of course, the borders are not well defined here.

13. “predictive power” could be operationalized in quite different ways, of course….

14. Correlational analysis is not a candidate to resolve this problem, since it can’t be used to segment the data or to identify groups in the data. Correlational analysis should be performed only subsequent to a segmentation of the data.

15. The so-called genetic algorithms are not algorithms in the narrow sense, since there is no well-defined stopping rule.

16. It is important to recognize that Artificial Neural Networks are NOT belonging to the family of sub-group based methods.

17. Here another circle closes: the concept of causality can’t be used in a meaningful way without considering its close amalgamation with the concept of information, as we argued here. For this reason, Judea Pearl’s approach towards causality [16] is seriously defective, because he completely neglects the epistemic issue of information.

References
  • [1] Geoffrey C. Bowker, Susan Leigh Star. Sorting Things Out: Classification and Its Consequences. MIT Press, Boston 1999.
  • [2] Willian Croft, Esther J. Wood, Construal operations in linguistics and artificial intelligence. in: Liliana Albertazzi (ed.) , Meaning and Cognition. Benjamins Publ, Amsterdam 2000.
  • [3] Wilhelm Vossenkuhl. Solipsismus und Sprachkritik. Beiträge zu Wittgenstein. Parerga, Berlin 2009.
  • [4] Douglas Hofstadter, Fluid Concepts And Creative Analogies: Computer Models Of The Fundamental Mechanisms Of Thought. Basic Books, New York 1996.
  • [5] Nicholas F. Gier, Wittgenstein and Deconstruction, Review of Contemporary Philosophy 6 (2007); first publ. in Nov 1989. Online available.
  • [6] Henk L. Mulder, B.F.B. van de Velde-Schlick (eds.), Moritz Schlick, Philosophical Papers, Volume II: (1925-1936), Series: Vienna Circle Collection, Vol. 11b, Springer, Berlin New York 1979. with Google Books
  • [7] Tomaso Poggio, Ryan Rifkin, Sayan Mukherjee & Partha Niyogi (2004). General conditions for predictivity in learning theory. Nature 428, 419-422.
  • [8]  Vladimir Vapnik, The Nature of Statistical Learning Theory (Information Science and Statistics). Springer 2000.
  • [9] Herman J. Bierens (2006). Information Criteria and Model Selection. Lecture notes, mimeo, Pennsylvania State University. available online.
  • [10 ]Brian Weatherson (2007). The Bayesian and the Dogmatist. Aristotelian Society Vol.107, Issue 1pt2, 169–185. draft available online
  • [11] Edward I. George (2000). The Variable Selection Problem. J Am Stat Assoc, Vol. 95 (452), pp. 1304-1308. available online, as research paper.
  • [12] Alan Hájek (2007). The Reference Class Problem is Your Problem Too. Synthese 156(3): 563-585. draft available online.
  • [13] Lori E. Dodd, Margaret S. Pepe (2003). Partial AUC Estimation and Regression. Biometrics 59( 3), 614–623.
  • [14] Zytkov J. (1997). Knowledge=concepts: a harmful equation. 3rd Conference on Knowledge Discovery in Databases, Proceedings of KDD-97, p.104-109.AAAI Press.
  • [15] Thomas Kaufmann, Klaus Wassermann, Guido Schüpfer (2007).  Beta error free risk identification based on SPELA, a neuro-evolution method. presented at ESA 2007.
  • [16] Alan Hájek, “Interpretations of Probability”, The Stanford Encyclopedia of Philosophy (Summer 2012 Edition), Edward N. Zalta (ed.), available online, or forthcoming.
  • [17] Judea Pearl, Causality – Models, Reasoning, and Inference. 2nd ed. Cambridge University Press, Cambridge  (Mass.) 2008 [2000].

۞

Data

February 28, 2012 § Leave a comment

There are good reasons to think that data appear

as the result of friendly encounters with the world.

Originally, “data” has been conceived as the “given”, or as things that are given, if we follow the etymological traces. That is not quite surprising since it is closely related to the concept of date as a point in time. And what if not time could be something that is given? The concept of date is, on the the other, related to the computation, at least, if we consider etymology again. Towards the end of the medieval ages, the problems around the calculation of the next Easter date(s) triggered the first institutionalized recordings of rule-based approaches that have been called “computation.” At those times, it already has been a subject for specialists…

Yet, the cloud of issues around data also involves things. But “things” are nothing that are invariably given, so to speak as a part of an independent nature. In Nordic languages there is a highly interesting link to constructivism. Things originally denoted some early kind of parliament. The Icelandic “alþingi”, or transposed “Althingi” is the oldest parliamentary institution in the world still extant, founded in 930. If we take this thread further it is clear that things refer to entities that have been recognized by the community as subject for standardization. That’s the job of parliaments or councils. Said standardization comprises the name, rules for recognizing it, and rules for using or applying it, or simply, how to refer to it, e.g. as part of a semiosic process. That is, some kind of legislation, or norming, if not to say normalization. (That’s not a bad thing in itself, only if a society is too eager in doing so, standardization is a highly relevant condition for developing higher complexity, see here) And, back to the date, we fortunately know also about a quite related usage of the “date” as in “dating” or to make a date, in other words, to fix the (mostly friendly) issues with another person…

The wisdom of language, as Michel Serres once coined it (somewhere in his Hermes series, I suppose) knew everything, it seems. Things are not, because they remain completely beyond even any possibility to perceive them if there is no standard to treat the differential signals it provides. This “treatment” we usually call interpretation.

What we can observe here in the etymological career of “data” is nothing else than a certain relativization, a de-centering of the concept away from the absolute centers of nature, or likewise the divine. We observe nothing else than the evolution of a language game into its reflected use.

This now is just another way to abolish ontology and its existential attitude, at least as far as it claims an “independent” existence. In order to become clear about the concept of data, what we can do about it, or even how to use data, we have to arrive at a proper level of abstraction, that to understand is not a difficult thing in itself.

This, however, also means that “data processing” can’t be conceived in the way as we conceive, for instance, the milling of grain. Data processing should me taken much more as a “data thinging” than as a data milling, or data mining. There is deep relativity in the concept of data, because it is always an interpretation that creates them. It is nonsense to naturalize them in the infamous equation  “information=data+meaning”, we already discussed that in the chapter about information. Yet, this process probably did not reach its full completion, especially not in the discipline of so-called computer “sciences”. Well, every science started as some kind of Hermetism or craftmenship…

Yet, one still might say that at a given point at time we come upon encoded information, we encounter some written, stored, or somehow else materially represented structured differences. Well, ok, that’s  true. However, and that’s a big however: We still can NOT claim that the data is something given.

This raises a question: what are we actually doing when we say that we “process” data? At first sight, and many people think so, that this processing data produces information. But again, it is not a processing in the sense of milling. This information thing is not the result of some kind of milling. It needs constructive activities and calls for affected involvement.

Obviously, the result or the produce of processing data is more data. Data processing is thus a transformation. Probably it is appropriate to say that “data” is the language game for “transforming the possibility for interpretation into its manifold.” Nobody should wonder about the fact that there are more and more “computers” all the time and everywhere. Besides the fact that the “informationalization” of any context allows for a improved generality as well as for improved accuracy (they excluded each other in the mechanical age), the conceptual role of data itself produces an built-in acceleration.

Let us leave the trivial aspects of digital technology behind, that is, everything that concerns mere re-arrangement and recombination without loosing and adding anything. Of course, creating a pivot table may lead to new insights since we suddenly (and simply) can relate things that we couldn’t without pivoting. Nevertheless, it is mere re-arrangement, despite it is helpful, of course. It is clear that pivoting itself does not produce any insight, of course.

Our interest is in machine-based episteme and its possibility. So, the natural question is: How to organize data and its treatment such that machine-based episteme is possible? Obviously this treatment has to be organized and developed in a completely autonomous manner.

Treating Data

In so-called data mining, which only can be considered as a somewhat childish misnomer, people often report that they spend most of the time in preparing data. Up to 80% of the total project time budget is spent for “preparing data”. Nothing else cold render the inappropriate concepts behind data mining more visible than this fact. But one step at a time…

The input data to machine learning are often considered to be extremely diverse. At first place, we have to distinguish between structured and unstructured data, secondly, we unstructured qualities like text or images or the different scales of expression.

Table 1: Data in the Quality Domain

structured data things like tables, or schemes, or data that could be brought into that form in one way or another; often related to physical measurement devices or organizational issues (or habits)
unstructured data  —- entities that can’t be brought into a structured form before processing them in principle. It is impossible to extract the formal “properties” of text before interpreting it; those properties we would have to know before being able to set up any kind of table into which we could store our “measurement”. Hence, unstructured data can’t be “measured”. Everything is created and constructed “on-the-fly”, sailing while building the raft, as Deleuze (Foucault?) put it once. Any input needs to be conceived as and presented to the learning entity in a probabilized form.

Table 1: Data in the Scale Domain

real-valued scale numeric, like 1.232; mathematically: real numbers, (ir)rational numbers, etc. infinitely different values
ordinal scale enumerations, ordering, limited to a rather small set of values, typically n<20, such like 1,2,3,4; mathematically: natural numbers, integers
nominal scale singular textual tokens, such like “a”, “abc”, “word”
binary scale only two values are used for encoding, such as 1,0, or yes,no etc.

Often it is proposed to regard the real-valued scale as the most dense one, hence it is the scale that could be expected to transport the largest amount of information. Despite the fact that this is not always true, it surely allows for a superior way to describe the risk in modeling.

That’s not all of course. Consider for instance domains like the financial industry. Here, all the data are marked by a highly relevant point of anisotropy regarding the scale: the zero. As soon something becomes negative, it belongs to a different category, albeit it could be quite close to another value if we consider just the numeric value. It is such domain specific issues that contribute to the large efforts people spend to the preparation of data. It is clear that any domain is structured by and knows about lot of such “singular” points. People then claim that they have to be a specialist in the respective domain in order to be able to prepare the data.

Yet, that’s definitely not true, as we will see.

In order to understand the important point we have to understand a further feature of data in the context of empirical analysis.Remember, that in empirical analysis we are looking primarily for a mapping function, which transforms values from measurement into values of a prediction or diagnosis, in short, into the values that describe the outcome. In medicine we may measure physiological data in order to achieve a diagnosis, and doing so is almost identical as other people perform measures in an organization.
Measured data can be described by means of a distribution. A distribution simply describes the relative frequency of certain values. Let us resort to the following two. examples. Here you see simply frequency histograms, where each bin reflects the relative frequency of the values falling into the respective bin.

What is immediately striking is that both are far from the analytical distributions like the normal distribution. They are both strongly rugged, far from being smooth. What we can see also: they have more than one peak, even as it is not clear how many peaks there are.

Actually, in data analysis one meets such conditions quite often.

Figure 1a. A frequency distribution showing (at least) two modes.

Figure 1b. A sparsely filled frequency distribution

So, what to do with that?

First, the obvious anisotropy renders any trivial transformation meaningless. Instead, we have to focus precisely those inhomogeneities.  In a process perspective we may reason that the data that have been measured by a single variable actually are from at least two different processes, or that the process is non-stationary and switches between (at least two) different regimes. In either case, we split the variable into two, applying a criterion that is intrinsic to the data. This transformation is called deciling, and it is probably the third-most important transformation that could be applied to data.

Well, let us apply deciling to data shown inFigure 1a.

Figure 2a,b: Distributions after deciling a variable V0 (as of Figure 1a) into V1 and V2. The improved resolution for the left part is not shown.

The result is three variables, and each of them “expresses” some features. Since we can treat them (and the values comprised) independently, we obviously constructed something. Yet, we did not construct a concept, we just introduced additional potential information. At that stage, we do not know whether this deciling will help to build a better model.

Variable V1 (Figure 2a (left part ) ) can be transformed further, by shifting the value to the right through applying a log-transformation. A log-transformation increases the differences between small values and decreases the differences between large values, and it does so in a continuous fashion. As a result, the peak of the distribution will move more to the right (and it will also be less prominent). Imagine a large collection of bank accounts, most of them filled with amounts between 1’000 and 20’000, while some host 10’00’000.  If we map all those values onto the same width, the small amounts can’t be well distinguished any more, and we have to do that mapping, called linear normalization, with all our variables in order to make variances comparable. It is mandatory to transform such left-skewed distributions into a new variable in order to access the potential information represented by it. Yet, as always in data analysis, before we didn’t complete the whole modeling cycle down to validation we can not know whether a particular transformation will have any or even a positive effect for the power of our model.

The log transformation has a further quite neat feature: it is defined only for positive values. Thus, is we apply a transformation that creates negative values for some of the observed values and subsequently apply a log-transform, we create missing values. In other words, we disregard some parts of the information that originally has been available in the data. So, a log-transform can be used to

  • – render items discernible in left-skewed distributions, and to
  • – blend out parts of information dedicatedly by a numeric transformation.

These two possible achievements make the log-transform one of the most frequently applied.

The most important transformation in predictive modeling is the construction of new variables by combining a small number (typically 2) of hitherto available ones, either analytically by some arithmetics, or more generally, any suitable mapping, inclusive the SOM, from n variables to 1 variable. Yet, this will be discussed at a later point (in another chapter, for an overview see here). The trick is to find the most promising of such combinations of variables, because obviously the number of possible combinations is almost infinitely large.

Anyway, the transformed data will be subject to an associative mechanism, such like the SOM. Such mechanism are based on the calculation of similarities and the comparison of similarity values. That is, the associative mechanism does not consider any of the tricky transformations, it just reflects the differences in the profiles (see here for a discussion of that).

Up to this point the conclusion is quite clear. Any kind of data preparation just has to improve the distinguishability of individual bits. Since we anyway do not know anything about the structure of the relationship between measurement, the prediction and the outcome we try to predict, there is nothing else we could do in advance. On the second line this means that there is no need to import any kind of semantics. Now remember that transforming data is an analytic activity, while it is the association of things that is a constructive activity.

There is a funny effect of this principle of discernibility. Imagine an initial model that comprises two variables v-a and v-b, among some others, for which we have found that the combination a*b provides a better model. In other words, the associative mechanism found a better representation for the mapping of the measurement to the outcome variable. Now first remember that all values for any kind of associative mechanism has to be scaled to the interval [0..1]. Multiplying two sets of such values introduces a salient change if both values are small or if both values are large. So far, so good. The funny thing is that the same degree of discernibility can be achieved by the transformative coupling v-a/v-b, by the division. The change is orthogonal to that introduced by the multiplication, but that is not relevant for the comparison of profiles. This simple effect nicely explains a “psychological” phenomenon… actually, it is not psychological but rather an empiric one: One can invert the proposal about a relationship between any two variables without affecting the quality of the prediction. Obviously, it is rather not the transformative function as such that we have to consider as important. Quite likely, it is the form aspect of the data space warping qua transformation that we should focus on.

All of those transformation efforts exhibit two interesting phenomena. First, we apply them all as a hypothesis, which describes the relation between data, the (more or less) analytic transformation, the associative mechanism, and the power of the model. If we can improve the power of the model by selecting just the suitable transformations, we also know which transformations are responsible for that improvement. In other words, we carried out a data experiment, that, and that’s the second point to make here, revealed a structural hypothesis about the system we have measured. Structural hypotheses, however, could qualify as pre-cursors of concepts and ideas. This switching forth and back between the space of hypotheses H and the space of models (or the learning map L, as Poggio et al. [1] call it)

Thus we end up with the insight that any kind of data preparation can be fully automated, which is quite contrary to the mainstream. For the mere possibility of machine-based episteme it is nevertheless mandatory. Fortunately, it is also achievable.

One (or two) last word on transformations. A transformation is nothing else than a method, and importantly, vice versa. This means that any method is just: a potential transformation. Secondly, transformations are by far, and I mean really by far, more important than the choice of the associative method. There is almost no (!) literature about transformations, and almost all publications are about the proclaimed features of a “new” method. Such method hell is dispensable. The chosen method just needs to be sufficiently robust, i.e. it should not—preferably: never—introduce a method-specific bias or, alternatively, it should allow to control as much of its internal parameters as possible. Thus we chose the SOM. It is the most transparent and general method to associate data into groups for establishing the transition from extensions to intensions.

Besides the choice of the final model, the construction of a suitable set of transformation is certainly one of the main jobs in modeling.

Automating the Preparation of Data

How to automate the preparation of data? Fortunately, this question is relatively easy to answer: by machine-learning.

What we need is just a suitable representation of the problematics. In other words, we have to construct some properties that together potentially describe the properties of the data, especially the frequency distribution.

We have made good experiences by applying curve fitting to the distribution in order to create the fingerprint that describe the properties of the values represented by a variable. For instance, a 5-th order polynomial, together with an negative exponential and a harmonic fit (trigonometric functions) are essential for such a fingerprint (don’t forget the first derivatives, and the deviation from the models). Further properties are the count and location of empty bins. The resulting vector typically comprises some 30 variables and thus contains enough information for learning the appropriate transformation.

Conclusion

We have seen that the preparation of data can be automated. Only very few domain-specific rules are necessary to be defined apriori, such as the anisotropy around zero for the financial domain. Yet, the important issue is that they indeed can be defined apriori, outside the modeling process, and fortunately, they are usually quite well-known.

The automation of the preparation of data is not an exotic issue. Our brain does it all the time. There is no necessity for an expert data-mining homunculus. Referring to the global scheme of targeted modeling (in the chapter about technical aspects) we now have completed the technical issues for this part. Since we already handled the part of associative storage, “only” two further issues on our track towards machine-based episteme remain: the issue of the emergence of ideas and concepts, and secondly, the glue between all of this.

From a wider perspective we definitely experienced the relativity of data. It is not appropriate to conceive data as “givens”. Quite in contrast, they should be considered as subject for experimental re-combination, as kind of an invitation to transform them.

Data should not be conceived as a result of experiments or measurements, some kind of immutable entities. Such beliefs are directly related to naive realism, to positivism or the tradition of logical empiricism. In contrast, data are the subject or the substrate of experiments of their own kind.

Once the purpose of modeling is given, the automation of modeling thus is possible.  Yet, this “purpose” can be first quite abstract, and usually it is something that results from social processes. It is a salient and an open issue, not only for machine-based episteme, how to create, select or achieve a “purpose.”

Even as it still remains within the primacy of interpretation, it is not clear so far whether targeted modeling can contribute here. We guess, not so much, at least not for its own. What we obviously need is a concept for “ideas“.

  • [1] Tomaso Poggio, Ryan Rifkin, Sayan Mukherjee1 & Partha Niyogi (2004). General conditions for predictivity in learning theory. Nature 428: 419-422 (25 March 2004).

۞

Theory (of Theory)

February 13, 2012 § Leave a comment

Thought is always abstract thought,

so thought is always opposed to work involving hands. Isn’t it? It is generally agreed that there are things like theory and practice, which are believed to belong to different realms. Well, we think that this perspective is inappropriate and misleading. Deeply linked to this first problem is a second one, the distinction between model and theory. Indeed, there are ongoing discussions in current philosophy of science about those concepts.

Frequently one can meet the claim that theories are about predictions. It is indeed the received view. In this essay we try to reject precisely this received view. As an alternative, we offer a Wittgensteinian perspective on the concept of theory, with some Deleuzean, dedicatedly post-Kantian influences. This perspective we could call a theory about theory. It will turn out that this perspective not only is radically different from the received view, it also provides some important otherwise unachievable benefits, or (in still rather imprecise wording) both concerning “practical” as well as philosophical aspects. But let us start first with some examples.

Even before let me state clearly that there is much more about theory than can be mentioned in a single essay. Actually, this essay is based on a draft for book on the theory of theory that comprises some 500 pages…

The motivation to think about theory derives from several hot spots. Firstly, it is directly and intrinsically implied by the main focus of the first “part” of this blog on the issue of (the possibility for a) machine-based episteme. We as humans only can know because we can willingly take part in a game that could be appropriately described as mutual and conscious theorizing-modeling induction. If machines ever should develop the capability for their own episteme, for their autonomous capability to know, they necessarily have to be able to build theories.

A second strain of motivation comes from the the field of complexity. There are countless publications stating that it is not possible to derive a consistent notion of complexity, ranging from Niklas Luhmann [1986] to Hermann Haken [2012] (see []), leading either to a rejection of the idea that it is a generally applicable concept, or to an empty generalization, or to a reduction. Obviously, people are stating that there is no possibility for a theory about complexity. On the other hand, complexity is more and more accepted as a serious explanatory scheme across disciplines, from material science to biology, sociology and urbanism. Complexity is also increasingly a topic in the field of machine-based episteme, e.g. through the concept of self-organizing maps (SOM). This divergence needs to be clarified, and to be dissolved, of course.

The third thread of motivation is given by another field where theory has  been regarded usually as something exotic: urbanism and architecture. Is talking about architecture, e.g. its history, without actually using this talking in the immediate context of organizing and rising a building already “theory”? Are we allowed to talk in this way at all, thereby splitting talking and doing? Another issue in these fields is the strange subject of planning. Plans are neither models nor theory, nor operation, and planning often fails, not only in architecture, but also in the IT-industry. In order to understand the status of plans, we have first to get clear about the abundant parlance that distinguishes “theory” and “practice”.

Quite obviously, a proper theory of theory in general, that is, not just a theory about a particular theory, is also highly relevant what is known as theory about theory change, or in terms used often in the field of Artificial Intelligence, belief revision. If we do not have a proper theory about theory at our disposal, we also will not talk reasonably about what it could mean to change a belief. Actually, the topic about beliefs is so relevant that we will discuss it in a dedicated essay. For the time being, we just want to point out the relevance of our considerations here. Later, we will include a further short remark about it.

For these reasons it is vital in our opinion (and for us) to understand the concept of theory better than it is possible on the basis of current mainstream thinking on the subject.

Examples

In line with that mainstream attitude it has been said for instance that Einstein’s theory predicted—or: Einstein predicted from his theory—the phenomenon of gravitational lenses for light. In Einstein’s universe, there is no absoluteness regarding the straightness of a line, because space itself has a curvature that is parametrized. Another example is the so-called Standard Model, or Standard Interpretation in particle physics. Physicists claim that this model is a theory and that it is the best available theory in making correct predictions about the behavior of matter. The core of this theory is given by the relation between two elements, the field and its respective mediating particle, a view, which is a descendant of Einstein’s famous equation about energy, mass and the speed of of light. Yet, the field theory leads to the problem of infinite regress, which they hope to solve in the LHC “experiments” currently performed at the CERN in Geneva. The ultimate particle that also should “explain” gravity is called the Higgs-Boson. The general structure of the Standard Model, however, is a limit process: The resting mass of the particles is thought to become larger and larger, such, the Higgs-Boson is the last possible particle, leaving gravitation and the graviton still unexplained. There is also a pretty arrangement of the basic types of elementary particles that is reminding the periodic table in chemistry. Anyway, by means of that Standard Model it is possible to build computers, or at least logical circuits, where a bit is represented by just some 20 electrons. Else, Einstein’s theory has a direct application in the GPS, where a highly accurate common time base shared between the satellites is essential.

Despite these successes there are still large deficits of the theory. Physicists say that they did not detect gravitational waves so far that are said to be predicted by their theory. Well, physics even does not even offer any insight about the genesis of electric charges and magnetism. These are treated as phenomena, leaving a strange gap between the theory and the macroscopic observations (Note that the Standard Model does NOT allow decoherence into a field, but rather only into particles). Else, physicists do not have even the slightest clue about some mysterious entities in the universe that they call “dark matter” and “dark energy”, except that it exerts positive or negative gravitational force. I personally tend to rate this as one of the largest (bad) jokes of science ever: Building and running the LHC (around 12 billion $ so far) on the one hand and at the same time taking the road back into mythic medieval language serious. We meet also and again meet dark ages in physics, not only dark matter and dark energy.

Traveling Dark Matter in a particular context, reflecting and inducing new theories: The case of Malevich and his holy blackness.1

Anyway, that’s not our main topic here. I cited these examples just to highlight the common usage of the concept of theory, according to which a theory is a more or less mindful collection of proposals that can be used to make predictions about worldly facts.

To be Different, or not to be Different…

But what is then the difference between theories and models? The concept of model is itself an astonishing phenomenon. Today, it is almost ubiquitous, We hardly can imagine anymore that only a few decades ago, back in the 19th century, the concept of model was used mainly by architects. Presumably, it was the progress made in physics in the beginning of the 20th century, together with the foundational crisis in mathematics that initiated the career of the concept of model (for an overview in German language see this collection of pages and references).

One of the usages of the concept of model refers to the “direct” derivation of predictions from empirical observations. We can take some observations about process D, e.g. an illness of the human body, where we know the outcome (cured or not) and then we could try to build an “empiric” model that links the observations to the outcome. Observations can include the treatment(s), of course. It is clear that predictions and diagnoses are almost synonyms.

Where is the theory here? Many claim that there is no theory in modeling in general, and particularly that there is no theory possible in the case of medicine and pharmacology. Statistical techniques are usually regarded as some kind of method. For there is no useful generalization is is believed that a “theory” would not be different from stating that the subject is alive. It is claimed that we are always directly faced with the full complexity of living organisms, thus we have to reduce or perspective. But stop, shouldn’t we take the notion of complexity here already as a theory, should we?

For Darwin’s theory of natural selection it is also not easy to draw a separating line between the concept of models and theories. Darwin indeed argued on a quite abstract level, which led to the situation that people think that his theory can not be readily tested. Some people feel thus inclined to refer to the great designer, or to the  Spaghetti monster alike. Others, notably often physicists, chemists or mathematicians, tried to turn Darwin’s theory into a system that actually could be tested. For the time being we leave this as an open issue, but we will return to it later.

Today it is generally acknowledged that measurement always implies a theory. From that we directly can conclude that the same should hold for modeling. Modeling implies a theory, as measurement implies a particular model. In the latter case the model is often actualized by the materiality or the material arrangement of the measurement device. Both, the material aspects together with the immaterial design aspects that mainly concern informational filtering, establish at least implicitly a particular normativity, a set of normative rules that we can call “model.” This aspect of normativity of models (and of theories alike) is quite important, we should keep this in mind.

In the former relation, the implication of theories by modeling, we may expect a similar dependency. Yet, as far as we do not clearly distinguish models and theory, theories would be simply some kind of more general models. If we do not discern them, we would not need both. Actually, precisely this is the current state of affairs, at least in the mainstreams across various disciplines.

Reframing. Into the Practice of Languagability.

It is one of the stances inherited from materialism to pose questions about a particular subject in an existential, or if you like, ontological, manner. Existential questions take the form “What is X?”, where the “is” already claims the possibility of an analytical treatment, implied by the sign for equality. In turn this equality, provoked by the existential parlance, claims that this equation is a lossless representation. We are convinced that this approach destroys any chance for sustainable insights already in the first move. This holds even for the concepts of “model” or “theory” themselves. Nevertheless, the questions “What is a model?” or “What is a theory?” can be frequently met (e.g. [1] p.278)

The deeper reason for the referred difficulties is that it implies the primacy of the identity relation. Yet, the only possible identity relation is a=a, the tautology, which of course is empirically empty. Despite we can write a=b, it is not an identity relation any more. Either it is a claim, or it is based on empiric arguments, that means, it is always a claim. In any case, one have to give further criteria upon which the identity a=b appears as justified. The selection of those criteria is far outside of the relation itself. It invokes the totality of the respective life form. The only conclusion we can draw from this is that the identity relation is transcendent. Despite its necessity it can not be part of the empirical world. All the same is hence true for logic.

Claiming the identity relation for empirical facts, i.e. for any kind of experience and hence also for any thought, is self-contradictive. It implies a normativity that remains deliberately hidden. We all know about the late and always disastrous consequences of materialism on the societal level, irrespective of choosing the marxist or the capitalist flavor.

There are probably only two ways of rejecting materialism and such also for avoiding its implications. Both of them reject the primacy of the identity relation, yet in slightly different ways. The first one is Deleuze’s transcendental difference, which he developed in his philosophy of the differential (e.g. in Difference & Repetition, or his book about the Fold and Leibniz). The second one is Wittgenstein’s proposal to take logic as a consequence of performance, or more precise, as an applicable quasi-logic, and to conceive of logic as a transcendental entity. Both ways are closely related, though developed independently from each other. Of course, there are common traits shared by Deleuze and Wittgenstein such as rejecting what has been known as “academic philosophy” at their time. All the philosophy had been positioned just as “footnotes to Platon”, Kant or Hegel.

In our reframing of the concept of theory we have been inspired by both, Deleuze and Wittgenstein, yet we follow the Wittgensteinian track more explicitly in the following.

Actually, the move is quite simple. We just have to drop the assumption that entities “exist” independently. Even if we erode that idealistic independence only slightly we are ultimately actually enforced to acknowledge that everything we can say, know or do is mediated by language, or more general by the conditions that imply the capability for language, in short by languagability.

In contrast to so-called “natural languages”—which actually is a revealing term— languagability is not a dualistic, bivalent off-or-on concept. It is applicable to any performing entity, including animals and machines. Hence, languagability is not only the core concept for the foundation of the investigation of the possibility of machine-based episteme. It is essential for any theory.

Following this track, we stop asking ontological questions. We even drop ontology as a whole. Questions like “What is a Theory?”, “What is Language?” etc. are almost free of any possible sense. Instead, it appears much more reasonable to accept the primacy of languagability and to ask about the language game in which a particular concept plays a certain role. The question that promises progress therefore is:

What can we say about the concept of theory as a language game?

To our knowledge, the “linguistic turn” has not been performed in philosophy of science so far, let it even be in disciplines like computer science or architecture. The consequence of which is a considerable mess in the respective disciplines.

Theory as a Language Game

One of the first implications of the turn towards the primacy of languagability is the vanishing of the dualism between theory and practice. Any practice requires rules, which in turn can only be referred to in the space of languagability. Of course, there is more than the rule in rule-following. Speech acts have been stratified first by Austin [2] into locutionary, illocutionary and perlocutionary parts. There might be even further ones, implying evolutionary issues or the play as story-telling. (Later we we call these aspects “delocutionary”) On the other hand, it is also true that one can not pretend to follow a rule, as Wittgenstein recognized [3].

It is interesting in this respect that the dualistic, opposing contrast between theory and practice has not been the classical view; not just by chance it appeared as late as in the early 17th century [4]. Originally, theory just meant “to look at, to speculate”, a pairing that is interesting in itself.

Ultimately, rules are embedded in the totality of a life form (“Lebensform” in the Wittgensteinian, non-phenomenological sense), including the complete “system” of norms in charge at a given moment. Yet, most rules are regulated themselves, by more abstract ones, that set the conditions for the less abstract ones. The result is not a perfect hierarchy of course, the collection of rules being active in a Lebensform is not an analytic endeavor. We already mentioned this layered system in another chapter (about “comparing”) and labeled it “orthoregulation” there. Rules are orthoregulated, without orthoregulation rules would not be rules.

This rooting of rules in the Forms of Life (Wittgenstein), the communal aspect (Putnam), the Field of Proposals (“Aussagefeld”, Foucault) or the Plane of Immanence provoked by attempting to think consistently (Deleuze), which are just different labels for closely related aspects, prevents the ultimate justification, the justifiable idea, and the presence of logical truth values or truth functions in actual life.

It is now important to recognize and to keep in mind that rules about rules are not referring to any empiric entity that could be found as material or informational fact! Rules about rules are referring to the regulated rules only. Of course, usually even the meta-rules are embedded into the larger context of valuation, the whole system should work somehow, that is, the whole system should allow to create predictive models. Here we find the link to risk (avoidance) and security.

Taking an empiricist or pragmatic stance also for the “meta”-rules that are part of the orthoregulative layer we could well say that the empiric basis of the ortho-rules are other, less abstract and less general rules.

Now we can apply the principle of orthoregulation to the subject of theory. Several implications are immediately and clearly visible, namely and most important that

  • – theories are not about the prediction of empirical “non-normative” phenomena, the subject of Popper’s falsificationism is the model, nor the theory;
  • – theories can not be formalized, because they are at least partially normative;
  • – facts can’t be “explained” as far as “explanations” are conceived to be non-normative entities;

It is clear that the standard account to the status of scientific theories is not compatible with that (which actually is a compliment). Mathias Frisch [5] briefly discusses some of the issues. Particularly, he dismisses the stance that

“the content of a theory is exhausted by its mathematical formalism and a mapping function defining the class of its models.” (p.7)

This approach is also shared by the influential Bas van Fraassen, especially his 1980 [6]. In contrast to this claim we definitely reject that there is any necessity consistency between models and the theory from which they have been derived, nor among the family of models that could be associated with a theory. Life forms (Lebensformen) can not and should not be  evaluated by means of “consistency”, unless you are a social designer, that for instance  has been inventing a variant of idealism practicing in and on Syracuse… The rejection of a formal relationship between theories and models includes the rejection of the set theoretic perspective onto models. Since theories are normative they can’t be formalizable and it is near to scandal to claim ([6], p.43) that

Any structure which satisfies the axioms of a theory…is called a model of that theory.

The problem here being mainly the claim that theories consist of or contain axioms. Norms never have been and never will be “axiomatic.”

There is a theory about belief revision that has been quite influential for the discipline or field that is called “Artificial Intelligence” (we dismiss this term/name, since it is either empty or misleading). This theory is known under the label AGM theory, where the acronym derives from the initials of the names of three proponents Alchourrón, Gärdenfors, and Makinson [7]. The history of its adoption by computer scientists is a story in itself [8]; what we can take here is that it is believed by the computer scientists that the AGM theory is relevant for the update of so-called knowledge bases.

Despite its popularity, the AGM theory is seriously flawed, as Neil Tennant has been pointing out [9] (we will criticize his results in another essay about beliefs (scheduled)). A nasty discussion mainly characterized by mutual accusations started (see [10] as an example), which is typical for deficient theories.

Within AGM, and similar to Fraassen’s account on the topic, a theory is a equal to a set of beliefs, which in turn is conceived as a logically closed set of sentences. There are several mistakes here. First, they are applying truth-function logic as a foundation. This is not possible, as we have seen elsewhere. Second, a belief is not a belief any more as soon as we conceive it as a preposition, i.e. a statement within logic, i.e. under logical closure. It would be a claim, not a belief. Yet, claims belong to a different kind of game. If one would to express the fact that we can’t know anything precisely, e.g. due to the primacy of interpretation, we simply could take the notion of risk, which is part of a general concept of model. A further defect in AGM theory and any similar approach that is trying to formalize the notion of theory completely is that they conflate propositional content with the form of the proposition. Robert Brandom demonstrates in an extremely thorough way, why this is a mistake, and why we are enforced to the view that propositional content “exists” only as a mutual assignment between entities that talk to each other (chapter 9.3.4 in [11]). The main underlying reason for this is the primacy of interpretation.

In turn we can conclude that the AGM theory as well as any attempt to formalize theory can be conceived as a viable theory only, if the primacy of interpretation is inadequate. Yet, this creates the problem how we are tied to the world. The only alternative would be to claim that this is going on somehow “directly”. Of course, such claims are either 100% nonsense, or 100% dictatorship.

Regarding the application of the faulty AGM theory to computer science we find another problem: Knowledge can’t be saved to a hard disk, as little as it is possible for information. Only a strongly reductionist perspective, which almost is a caricature of what could be called knowledge, allows to take that route.

We already argued elsewhere that a model neither can contain the conditions of its applicability nor of its actual application. The same applies of course to theories. As a direct consequence of that we have to investigate the role of conditions (we do this in another chapter).

Theories are precisely the “instrument” for organizing the conditions for building models. It is the property of being an instrument about conditions that renders them into an entity that is inevitably embedded into community. We could even bring in Heidegger’s concept of the “Gestell” (scaffold) here, which we coined in the context of his reflections about technology.

The subject of theories are models, not the proposals about the empirical world, as far as we exclude models from the empirical world. The subject of Popper’s falsificationism is the realm of models. In the chapter about modeling we determined models as tools for anticipation given the expectation of weak repeatability. These anticipations can fail, hence they can be tested and confirmed. Inversely, we also can say that every theoretical construct that can be tested is an anticipation, i.e. a model. Theoretical constructs that can not be tested are theories. Mathias Frisch ([5], p.42) writes, quote:

I want to suggest that in accepting a theory, our commitment is only that the theory allows us to construct successful models of the phenomena in its domain, where part of what it is for a model to be successful is that it represents the phenomenon at issue to whatever degree of accuracy is appropriate in the case at issue. That is, in accepting a theory we are committed to the claim that the theory is reliable, but we are not committed to its literal truth or even just of its empirical consequences.

We agree with him concerning the dismissal of truth or empiric content regarding the theories. Yet, the term “reliable” could still be misleading. One never would say that a norm is reliable. Norms themselves can’t be called reliable, only its following. You not only just obey to a norm, the norm is also something that has been fixed as the result of social process, as a habit of a social group. On a wider perspective, we probably could assign that property, since we tend to expect that a norm supports us in doing so. If norm would not support us, it would not “work,” and in the long run it will be replaced, often in a catastrophically sweeping event. That “working”of a norm is, however, almost unobservable by the individual, since it belongs to the Lebensform. We also should keep in mind that as far as we would refer to such a reliability, it is not directed towards the prediction, at least not directly, it refers just to the possibility to create predictive models.

From  safe grounds we now can reject all the attempts that try to formalize theories according to the line Carnap-Sneed-Stegmüller-Moulines [12, 13, 14, 15]. The “intended usage” of a theory (Sneed/Stegmüller) can not be formalized, since it is related to the world, not just to an isolated subject. Scientific languages (Carnap’s enterprise) are hence not possible.

Of course, it is possible to create models about the modeling, i.e. taking models as an empiric subject. Yet, such models are still not a theory, even as they look quite abstract. They are simply models,  which imply or require a theory. Here lies the main misunderstanding of the folks cited above.

The turn towards languagability includes the removal of the dualistic contrast between theory and practice. This dualism is replaced by a structural perspective according to which theory and practice are co-extensive. Still, there are activities, that we would not call a practice or an action, so to speak before any rule. Such activities are performances. Not to the least this is also the reason why performance art is… art.

Heinrich Lüber, the Swiss performance artist, standing on-top of a puppet shaped as himself. What is no visible here: He stood there for 8 hours, in the water on shore of the French Atlantic coastline.

Besides performance (art) there are no activities that would be free of rules, or equivalently, free of theory. Particularly modeling is of course a practice, quite in contrast to theory. Another important issue we can derive from our distinction is that any model implies a theory, even if the model just consists of a particular molecule, as it is the case in the perception mechanisms of individual biological cells.

Another question we have sharply to distinguish from that about the reach of theories is whether the models predict well. And of course, just as norms, also theories can be inappropriate.

Theories are simply there. Theories denote what can be said about the influence of the general conditions—as present in the embedding “Lebenswelt”—onto the activity of modeling.

Theories thus can be described by the following three properties:

  • (1) A theory is the (social) practice of determining the conditions for the actualization of virtuals, the result of which are models.
  • (2) A theory acts as a synthesizing milieu, which facilitate the orthoregulated  instantiation of models that are anticipatively related to the real world (where the “real world” satisfies the constraints of Wittgensteinian solipsism).
  • (3) A theory is a language generating language game.

Theories, Models, and in between

Most of the constructs called “theory” are nothing else than a hopeless mixture of models and theories, committing serious naturalistic fallacies in comparing empiric “facts” with normative conditions. We will give just a few examples for this.

It is generally acknowledged that some of Newton’s formulas constitute his theory of gravitation. Yet, it is not a theory, it is a model. It allows for direct and, in the mesocosmic scale, even for almost lawful predictions about falling objects or astronomical satellites. Newton’s theory, however, is given by his belief in a certain theological cosmology. Due to this theory, which entails absoluteness, Newton was unable to detect relativism.

Similarly the case of Kepler. For a long time (more than 20 years) Kepler’s theory entailed the belief in a pre-established cosmic harmony that could be described by Euclidean geometry, which itself was considered as being a direct link to divine regions at that time. The first model that Kepler constructed to fulfill this theory comprised the inscription of platonic solids into the planetary orbits. But those models failed. Based on better observational data he derived different models, yet still within the same theory. Only when we dropped the role of the geometrical approach in his theory he was able to find his laws about the celestial ellipses. In other words, he dropped most of his theological orthoregulations.

Einstein’s work about relativity finally is clearly a model as there is not only one formula. Einstein’s theory is not related to the space-time structure of the macroscopic universe. Instead, the condition for deriving the quantitative / qualitative predictions are related to certain beliefs in non-randomness of the universe. His conflict with quantum theory is well-known: “God does not play dice.

The contemporary Standard Model in particle physics is exactly that: a model. Its not a theory. The theory behind the standard model is logical flatness and materialism. It is a considerable misunderstanding of most physicists to accuse proponents of the String theory not to provide predictions. They can not, because they are thinking about a theory. Yet, string theorists themselves do not properly understand the epistemic role of their theory as well.

A particular case is given by Darwin’s theory. Darwin of course did not distinguish perfectly or explicit between models and theories, it was not possible for him at these days. Yet, throughout his writings and the organization of his work we can detect that he implicitly followed that distinction. From Darwin’s writings we know that he was deeply impressed by the non-random manifoldness in the domain of life. Precisely this represented the core of his theory. His formulation about competition, sexual selection or inheritance are just particular models. In our chapter about the abstract structure of evolution we formulated a model about evolutionary processes in a quite abstract way. Yet, it is still a model, within almost the same theory that Darwin once followed.2

There is a quite popular work about the historical dynamics of theory, Thomas Kuhn’s “The Structure of Scientific Revolutions“, which is not theory, but just a model. For large parts it is not even a model, but just a bad description, which he coined the paradigm of the “paradigm shift”. There is almost no reflection in it. Above all, it is certainly not a theory about theory, nor a theory about the evolution of theories. He had to fail, since he does not distinguish between theories and models to the least extent.

So, leaving these examples, how do relate models and theories practically? Is there a transition between them?

Model of Theory, Theory of Model, and Theory of Theory

I think we can we can derive from these examples a certain relativity regarding the life-cycle of models and theories. Theories can be transformed into models through removal of those parts that refer to the Lebenswelt, while models can be transformed into theories if the orthoregulative part of models gets focused (or extracted from theory-models)

Obviously, what we just did was to describe a mechanism. We proposed a model. In the same way it represents a model to use the concept of the language game for deriving a structure for the concept of theory. Plainly spoken, so far we created a model about theory.

As we have seen, this model also comprises proposals about the transition from model to theory. This transition may take two different routes, according to our model about theory. The first route is taken if a model gets extended by habits and further, mainly socially rooted, orthoregulation, until the original model appears just as a special case. The abstract view might be still only implicit, but it may be derived explicity if the whole family of models is concretely going to be constructed, that are possible within those orthoregulations. The second route draws upon a proceeding abstraction, introducing thereby the necessity of instantiation. It is this necessity that decouples the former model from its capability to predict something.

Both routes, either by adding orthoregulations explicitly or implicitly through abstraction, turn the former model de actio into a milieu-like environment: a theory.

As productive milieus, theories comprise all components that allow the construction and the application of models:

  • – families of models as ensembles of virtualized models;
  • – rules about observation and perception, including the processes of encoding and decoding;
  • – infrastructural elements like alphabets or indices;
  • – axiomatically introduced formalizations;
  • – procedures of negotiation the procedures of standardization and other orthoregulations up to arbitrary order

The model of model, on the other hand, we already provided here, where we described it as a 6-Tupel, representing different, incommensurable domains. No possible way can be thought of from one domain to one of the other. These six domains are, by their label:

  • (1) usage U
  • (2) observations O
  • (3) featuring assignates F on O
  • (4) similarity mapping M
  • (5) quasi-logic Q
  • (6) procedural aspects of the implementation

or, taken together:

This model of model is probably the most abstract and general model that is not yet a theory. It provides all the docking stations that are required to attach the realm of norms. Such, it would be only a small step to turn this model into a theory. That step towards a theory of model would include statements about two further dimensions: (1) the formal status and (2) the epistemic role of models. The first issue is largely covered by identifying them as a category (in the sense of category theory). The second part is related to the primacy of interpretation, that is, to a world view that is structured by (Peircean) sign processes and transcendental differences (in the Deleuzean sense).

The last twist concerns the theory of theory. There are good reasons to assume that for a theory of theory we need to invoke transcendental categories. Particularly, a theory of theory can’t contain any positive definite proposal, since in this case it would automatically turn into a model. A theory of theory can be formulated only as a self-referential, self-generating structure within transcendental conditions, where this structure can act as a borderless container for any theory about any kind of Lebensform. (This is the work of the chapter about the Choreosteme.)

Remarkably, we thus could not formulate that we could apply a theory to itself, as a theory is a positive definite thing, even if it would contain only proposals about conditions (yet, this is not possible either). Of course, this play between (i) ultimately transcendent conditions, (ii) mere performance that is embedded in a life form and finally (iii) the generation of positivity within this field constitutes a quite peculiar “three-body-problem” of mental life and (proto-)philosophy. We will return to that in the chapter about the choreosteme, where we also will discuss the issue of “images of thoughts” (Gilles Deleuze) or, in slightly different terms, the “idioms of thinking” (Bernhard Waldenfels).

Conclusion

Finally, there should be our cetero censeo, some closing remarks about the issue of machine-based episteme, or even machine-based epistemology.  Already in the beginning of this chapter we declared our motivation. But what can we derive and “take home” in terms of constructive principles?

Our general goal is to establish—or to get clear about—some minimal set of necessary conditions that would allow a “machinic substrate” in such a way that we could assign to it the property of “being able to understand” in a fully justified manner.

One of the main results in this respect here was that modeling is nothing that could be thought of as running independently, as algorithm, in such a way that we could regard this modelling as sufficient for ascribing the machine the capability to understand. More precisely, it is not even the machine that is modeling, it is the programmer, or the statistician, the data analyst etc., who switched the machine into the ON-state. For modeling, knowing and theorizing the machine should act autonomously.

On the other hand, performing modeling inevitably implies a theory. We just have to keep this theory somehow “within” the machine, or more precisely, within the sign processes that take place inside the machine. The ability to build theories necessarily implies self-referentiality of the informational processes. Our perspective here is that the macroscopic effects of  self-referentiality, such like the ability for building theories, or consciousness, can not be “programmed”, they have to be a consequence of the im-/material design aspects of the processes that make up this aspects…

Another insight is, also not a heavily surprising one, though, that the ability to build theories refers to social norms. Without social norms there is no theorizing. It is not the mathematics or the science that would be necessary it is just the presence and accessibility of social norms. We could call it briefly education. Here we are aligned to theories (i.e. mostly models) that point to the social origins of higher cognitive functions. It is quite obvious that some kind of language is necessary for that.

The road to machine-based episteme thus does not imply a visit in the realms of robotics. There we will meet only insects and …roboters. The road to episteme leads through languagability, and anything that is implied by that, such as metaphors or analogical thinking. These subjects will be the topic of next chapters. Yet, it also defines the programming project accompanying this blog: implementing the ability to understand textual information.

u .

Notes

1. The image in the middle of this tryptich shows the situation in the first installation on the exhibition in Petrograd in 1915, arranged by Malevich himself. He put the “Black Square” exactly at the same place where traditionally the christian cross was to be found in Russian living rooms at that time: up in the corner under the ceiling. This way, he invoked a whole range of reflections about the dynamics of symbols and habits.

2. Other components of our theory of evolutionary processes entail the principle of complexity, and the primacy of difference and the primacy of interpretation.

This article has been created on Oct 21st, 2011, and has been republished in a considerably revised form on Feb 13th, 2012.

References

  • [1] Stathis Psillos, Martin Curd (eds.) The Routledge Companion to Philosophy of Science.

    Taylor & Francis, London and New York 2008.

  • [2] Austin, Speech Act Theory;
  • [3] Wittgenstein, Philosophical Investigations;
  • [4] etymology of “theory”; “theorein”
  • [5] Mathias Frisch, Inconsistency, Asymmetry, and Non-Locality: A Philosophical Investigation of Classical Electrodynamics. Oxford 2005.
  • [6] Bas van Frassen, The Scientific Image,

    Oxford University Press, Oxford 1980.

  • [7] Alchourron, C., Gärdenfors, P. and Makinson, D. (1985). On the Logic of Theory Change: Partial Meet Contraction and Revision Functions. Journal of Symbolic Logic, 50, 510-30.
  • [8] Raúl Carnota and Ricardo Rodríguez (2011). AGM Theory and Artificial Intelligence.

    in: Belief Revision meets Philosophy of ScienceLogic, Epistemology, and the Unity of Science, 2011, Vol.21, 1-42.

  • [9] Neil Tennant (1997). Changing the Theory of Theory Change: Reply to My Critics.

    Brit. J. Phil. Sci. 48, 569-586.

  • [10] Hansson, S. 0. and Rott, H. [1995]: ‘How Not to Change the Theory of Theory Change: A Reply to Tennant’, British Journal for the Philosophy of Science, 46, pp. 361-80.
  • [11] Robert Brandom, Making it Explicit. 1994.
  • [12] Carnap
  • [13] Sneed
  • [14] Wolfgang Stegmüller
  • [15] Moulines

۞

The Self-Organizing Map – an Intro

October 20, 2011 § Leave a comment

A Map that organizes itself:

Is it the paradise for navigators, or is it the hell?

Well, it depends, I would say. As a control freak, or a warfarer like Shannon in the early 1940ies, you probably vote for the hell. And indeed, there are presumably only very few entities that have been so strongly neglected by information scientists like it was the case for the self-organizing map. Of course, there are some reasons for that. The other type of navigator probably enjoying the SOM is more likely of the type Odysseus, or Baudolino, the hero in Umberto Eco’s novel of the same name.

More seriously, the self-organizing map (SOM) is a powerful and even today (2011) a still underestimated structure, though meanwhile rapidly gaining in popularity. This chapter serves as a basic introduction into the SOM, along with a first discussion of the strength and weaknesses of its original version. Today, there are many versions around, mainly in research; the most important ones I will briefly mention at the end. It should be clear that there are tons of articles around in the web. Yet, most of them focus on the mathematics of the more original versions, but do not describe or discuss the architecture itself, or even provide a suitable high-level interpretation of what is going on in a SOM. So, I will not repeat the mathematics, instead I will try to explain it also for non-engineers without using mathematical formulas. Actually, the mathematics is not the most serious thing in it anyway.

Brief

The SOM is a bundle comprising a mathematical structure and a particularly designed procedure for feeding multi-dimensional (multi-attributes) data into it that are prepared as a table. Numbers of attributes can reach tens of thousands. Its purpose is to infer the best possible sorting of the data in a 2(3) dimensional grid. Yet, both preconditions, dimensionality and data as table, are not absolute and may be overcome by certain extensions to the original version of the SOM. The sorting process groups more similar records closer together. Thus we can say that a body of high-dimensional data (organized as records from a table) are mapped onto 2 dimensions, thereby enforcing a weighting of the properties used to describe (essentially: create) the records.

The SOM can be parametrized such that it is a very robust method for clustering data. The SOM exhibits an interesting duality, as it can be used for basic clustering as well as for target oriented predictive modeling. This duality opens interesting possibilities for realizing a pre-specific associative storage. The SOM is particularly interesting due to its structure and hence due to its extensibility, properties that other most methods do not share with the SOM. Though substantially different to other popular structures like Artificial Neural Networks, the SOM may be included into the family of connectionist models.

History

The development leading finally to the SOM started around 1973 in a computer science lab at the Helsinki university. It was Teuvo Kohonen who got aware to certain memory effects of correlation matrices. Until 1979, when he first published the principle of the Self-Organizing Map, he dedicatedly adopted principles known from the human brain. A few further papers followed and a book about the subject in 1983. Then, the SOM wasn’t readily adapted for at least 15 years. Its direct competitor for acceptance, the Backpropagation Artificial Neural network (B/ANN), was published in 1985, after the neural networks have been rediscovered in physics, following investigations of spin glasses and certain memory effects there. Actually, the interest in simulating neural networks dates back to 1941, when von Neumann, Hebb, McCulloch, and also Pitts, among others, met at a conference on the topic.

For a long time the SOM wasn’t regarded as a “neural network,” and this has been considered being a serious disadvantage. The first part of the diagnosis indeed was true: Kohonen never tried to simulate individual neurons, as it was the goal for all simulations of ANN. The ANN research has been deeply informed by physics, cybernetics and mathematical information theory. Simulating neurons is simply not adequate, it is kind of silly science. Above all, most ANN are just a very particular type of a “network” as there are no connections within a particular layer. In contrast, Kohonen tried to grasp a more abstract level: the population of neurons. In our opinion this choice is much more feasible and much more powerful as well. In particular, SOM can not only represent “neurons,” but any population of entities which can exchange and store information. More about that in a moment.

Nowadays, the methodology of SOM can be rated as well adopted. More than 8’000 research papers have been published so far, with increasing momentum, covering a lot of practical domains and research areas. Many have demonstrated the superiority or greater generality of SOM as compared to other methods.

Mechanism

The mechanism of a basic SOM is quite easy o describe, since there are only a few ingredients.
First, we need data. Imagine a table, where observations are listed in rows, and the column headers describe the variables that have been measured for each observed case. The variables are also called attributes, or features. Note, that in the case of the basic (say, the Kohonen-) SOM the structure given by the attributes is the same for all records. Technically, the data have to be normalized per column such that the minimum value is 0 and the maximum value is 1. Note, that this normalization ensures comparability of different sub-sets of observations. It represents just the most basic transformation of data, while there are many other transformation possible: logarithmic re-scaling of values of a column in order to shift the mode of the empirical distribution, splitting a variable by value, binarization, scaling of parameters that are available only on nominal niveau, or combining two or several columns by a formula are further examples (for details please visit the chapter about modeling). In fact, the transformation of data (I am not talking here about the preparation of data!) is one of the most important ingredients for successful predictive modeling.

Second, we create the SOM. Basically, and its simplest form, the SOM is a grid, where each cell has 4 (squares, rectangles) or 6 edges (hexagonal layout). The grid consists from nodes and edges. Nodes serve as a kind of container, while edges work as a kind of fibers for spreading signals. In some versions of the SOM the nodes can range freely or they can randomly move around a little bit.

An important element of the architecture of a SOM now is that each node gets the same structure assigned as we know from the table. As a consequence, the vectors collected in the nodes can easily be compared by some function (just wait a second for that). In the beginning, each node get randomly initialized. Then the data are fed into the SOM.

This data feeding is organized as follows. A randomly chosen record is taken from the table and then compared to all of the nodes. There is always a best matching node. The record then gets inserted into this node. Upon this insertion, which is kind of hiding, the values in the nodes structure vector are recalculated, e.g. as the (new) mean for all values across all records collected in a node (container). The trick now is not to change just the winning node where the data record has been inserted, but all nodes of the the close surround also, though with a strength that decreases with the distance.

This small activity of searching the best matching node, insertion and information spreading is done for all records, and possibly also repeated. The spreading of information to the neighbor nodes is a crucial element in the SOM mechanism. This spreading is responsible for the self-organization. It also represents a probabilistic coupling in a network. Of course, there are some important variants to that, see below, but basically that’s all. Ok, there is some numerical bookkeeping, optimizations to search the winning nodes etc. but these measures are not essential for the mechanism.

As a result one will find similar records in the same node, or a the direct neighbors. It has been shown that the SOM is topology preserving, that is, the SOM is smooth with regard to the similarity of neighbor nodes. The data records inside the nodes are a list, which is described by the node’s value vector. That value vector could be said to represent a class, or intension, which is defined by its empirical observations, the cases, or extension.

After feeding all data to the SOM the training has been finished. For SOM, it is easy to run in a continuous mode, where the feed of incoming data is not “stopping” at any time. Now the SOM can be used to classify new records. A new record simply needs to be compared to the nodes of the SOM, i.e. to the value vector of the nodes, but NOT to all the cases (SOM is not case-based reasoning, but type-based reasoning!). If the records contained a marker attribute, e.g. indicating the quality of the record, you will also get the expected quality for a new record of unknown quality.

Properties of the SOM

The SOM belongs to the class of clustering algorithms. It is very robust against missing values in the table, and unlike many other methods it does NOT require any settings regarding the clusters, such as size or number. Of course, this is a great advantage and a property of logical consistency. Nodes may remain empty, while the node value vector of the empty node is well-defined. This is a very important feature of the SOM, as this represents the capability to infer potential yet unseen observations. No other method is able to behave like this. Other properties can be invoked by means of possible extensions of the basic mechanism (see below)

As already said, nodes collect similar records of data, where a record represents a single observation. It is important to understand, that a node does not equal to a cluster. In our opinion, it does not make much sense to draw boundaries around one or several nodes and so  proposing a particular “cluster.” This boundary should be set only upon an external purpose. Inversely, without such a purpose, it is sense-free to conceive of a trained SOM as a model. AT best, it would represent a pre-specific model, which however is a great property of the SOM to be able to create such.

The learning is competitive, since different nodes compete for a single record. Yet, it is also cooperative, since the upon an insert operation information is exchanged between neighbor nodes.

The reasoning of the SOM is type-based, which is much more favorable than case-based reasoning. It is also more flexible than ANN, which just provide a classification, but no distinction between extension and intension is provided. SOM, but not ANN, can be used in two very different modes. Either just for clustering or grouping individual observations without any further assumptions, and secondly for targeted modeling, that is for establishing a predictive/ diagnostic link between several (or many) basic input variables and one (or several) target variable(s) that represent the outcome of a process according to experience. Such a double usage of the same structure is barely accessible for any other associative structure.

Another difference is that ANN are much more suitable to approximate single analytic functions, while SOM are suitable for general classification tasks, where the parameter space and/or the value (outcome) space could even be discontinuous or folded.

A large advantage over many other methods is that the similarity function and the cost function is explicitly accessible. For ANN, SVM or statistical learning this is not possible. Similarly, the SOM automatically adapts its structure according to the data, i.e. it is also possible to change the method within the learning process, adaptively and self-organized.

As a result we can conclude that the design of the SOM method is much more transparent than that of any other of the advanced methods.

Competing Comparable Methods

SOM are either more robust, more general or more simple than any other method, while the quality of classification is closely comparable. Among those competing methods are artificial neural networks (ANN), principal component analysis (PCA), multi-dimensional scaling (MDS), or adaptive resonance theory network (ART). Important ideas of ART networks can be merged with the SOM principle, keeping the benefits of both. PCA and MDS are based on statistical correlation analysis (covariance matrices), i.e. they are importing all the assumptions and necessary precondition of statistics, namely the independence of observational variables. Yet, it is the goal to identify such dependencies, thus it is not quite feasible to presuppose that! SOM do not know such limitations from strange assumptions; else, recently it has been proven that SOM are just a generalization of PCA.

Of course, there are many other methods, like Support Vector Machines (SVM) with statistical kernels, or tree forests; yet, these methods are purely statistical in nature, with no structural part in it. Else, they do not provide access to the similarity function as it is possible for the SOM.

A last word about the particular difference between ANN and SOM. SOM are true symmetrical networks, where each unit has its own explicit memory about observations, while the linkage to other units on the same level of integration is probabilistic. That means, that the actual linkage between any two units can be changed dynamically within the learning process. In fact, a SOM is thus not a single network like a fisher net, it is much more appropriate to conceive them as a representation of a manifold of networks.

Contrary to those advanced structural properties, the so-called Artificial Neural Networks are explicit directional networks.Units represent individual neurons and do not have storage capacities. Each unit does not know anything about things like observations.  Conceptually, these units are thus on a much lower level than the units in a SOM. In ANN they can not have “inner” structure. The same is true for for the links between the units. Since they have to be programmed in an explicit manner (which is called “architecture”), the topology of the connections can not be changed during learning at runtime of the program.

In ANN information flows through the units in a directed manner (as in case of natural neurons). It is there almost impossible to create an equally dense connectivity within a single layer of units as in SOM. As a consequence, ANN do not show the capability for self-organization.

Taken as whole, ANN seem to be under the representationalist delusion. In order to achieve the same general effects  and abstract phenomena as the SOM are able to, very large ANN would be necessary. Hence, pure ANN are not really a valid alternative for our endeavor. This does not rule out the possibility to use them as components within a SOM or between SOMs.

Variants and Architectures

Here are some SOM extensions and improvements of the SOM.

Homogenized Extensional Diversity
The original version of the SOM tends to collect “bad” records, those not matching well anywhere else, into a single cluster, even if the records are not similar at all. In this case it is not allowed to compare nodes any more, since the internal variance is not comparable anymore and the mean/variance on the level of the node would not describe the internal variance on the level of the collected records any more. The cure for that misbehavior is rather simple. The cost function controlling the matching of a record to the most similar node needs to contain the variability within the set of records (extension of the type represented by the node) collected by the node. Else, merging and splitting of nodes as described for structural learning helps effectively. In scientific literature, there is yet no reference for this extension of the SOM.

Structural Learning
One of the most basic extensions to the original mechanism is to allow for splitting and merging of nodes according to some internal divergence criterion.  A SOM made from such nodes is able to adopt structurally to the input data, not just statistically. This feature is inspired by the so-called ART-networks [1]. Similarly, merging and splitting of “nodes” of a SOM was proposed by [2], though not in the cue of ART networks.

Nested SOM
Since the SOM represents populations of neurons, it is easy and straightforward to think about nesting of SOM. Each node would contain a smaller SOM. A node may even contain any other parametrized method, such like Artificial Neural Networks. The node value vector then would not exhibit the structure of the table, but instead would display the parameters of the enclosed algorithm. One example for this is the so-called mn-SOM [3].

Growing SOM
Usually, data are not evenly distributed. Hence, some nodes grow much more than others. One way to cope with this situation is to automatically let the SOM grow. many different variants of growth could be thought of and some already has been implemented. Our experiments point into the direction of a “stem cell” analog.

Growing SOMs have first been proposed by [4], while [5] provides some exploratory implementation. Concerning growing SOM, it is very important to understand the concept (or phenomenon) of growth. We will discuss possible growth patterns and the  consequences for possible and new growing SOMs elsewhere. Just for now we can say that any kind of SOM structure can grow and/or differentiate.

SOM gas, mobile nodes
The name already says it: the topology of the grid making up the SOM is not fixed. Nodes even may range around quite freely, as in the case of SOM Gas.

Chemical SOM
Starting from mobile nodes, we can think about a small set of properties of nodes which are not directly given by the data structure. These properties can be interpreted as chemicals creating sth. like a Gray-Scott Reaction-Diffusion-Model, i.e. a self-organizing fluid dynamics. The possible effects are (i) a differential opacity of the SOM for transmitted information, or (ii) the differentiation into fibers and networks, or (iii) the optimization of the topological structure as a standard part of the life cycle of the SOM. The mobility can be controlled internally by means of a “temperature,” or expressed by a metaphor, the fixed SOM would melt partially. This may help to reorganize a SOM. In scientific literature, there is yet no reference for this extension of the SOM.

Evolutionary Embedded SOM with Meta-Modeling
SOM can be embedded into an evolutionary learning about the most appropriate selection of attributes. This can be extended even towards the construction of structural hypothesis about the data. While other methods could be also embedded in a similar manner, the results are drastically different, since most methods do not learn structurally. Coupling evolutionary processes with associative structures was proposed a long time ago by [6], albeit only in the context of optimization of ANN. While this is quite reasonable, we additionally propose to use evolution in a different manner and for different purposes (see the chapter about evolution)

[1] ART networks
[2] merging splitting of nodes
[3] The mn-SOM
[4] Growing SOM a
[5] Growing SOM b
[6] evolutionary optimization of artificial neural networks

۞

Where Am I?

You are currently browsing entries tagged with method at The "Putnam Program".