Sample Classification In The Social Sciences Research Paper. Browse other research paper examples and check the list of research paper topics for more inspiration. iResearchNet offers academic assignment help for students all over the world: writing from scratch, editing, proofreading, problem solving, from essays to dissertations, from humanities to STEM. We offer full confidentiality, safe payment, originality, and money-back guarantee. Secure your academic success with our risk-free services.
Classification is the assignment of objects to classes. For example, an educational researcher might want to establish a taxonomy of teaching styles that covers all possible approaches to teaching. A psychologist studying personality might be interested in whether children can be grouped into categories according to their patterns, or profiles, of personality traits. A sociologist might be interested in whether certain combinations of characteristics of urban areas (average socioeconomic status, crime rate, building types, etc.) occur much more often than other combinations. A biologist might want to study whether animals showing a particular phenotype have specific combinations, or patterns, of genetic codes. In all these cases, objects (teachers, children, urban areas, animals) are classified based on their patterns of some observable characteristics (teaching behaviors, personality traits, city characteristics, genes).
Academic Writing, Editing, Proofreading, And Problem Solving Services
Get 10% OFF with 24START discount code
The task of classifying objects poses several problems. The objects to be classified, the properties based on which they are classified, and the way of assessing similarities among objects have to be specified. The aim is to identify individual classes, to decide how many classes are warranted, and to establish procedures for identifying which class each object should be assigned to. Eventually, the success of the classification system must be evaluated. There are problems associated with each of these steps, and different solutions have been suggested for each problem.
In a sense, this paper recreates the process of establishing and testing a classificatory system. First, we define the topic more precisely, clarify the terminology, and provide some examples. Then, we present an outline of the first steps in a classification process and discuss one of the most debated issues: What is the concept of a class? After this, we deal briefly with procedures for assigning objects to classes and with the—frequently neglected—question of how to evaluate the resulting classification system.
1. Definition Of The Topic
1.1 Terminology
Depending on the research tradition, the objects to be classified into a system are called elements, cases, units, exemplars, specimens or items. They are the sources or ‘carriers’ of properties, characteristics or variables. These properties may be dichotomous or polytomous, qualitative or quantitative. A property can only be useful in a classification, if it varies within the set of objects, that is, if at least two different values (categories, states, labels) on the respective property occur in the sample. When more than one property is used to characterize an object, the object can be described as a vector of values, a profile, a set of symptoms, or a pattern of features.
Sometimes, the data do not consist of objects and their properties, but of measures of relations between objects, such as their similarity, likeness, or belonging together. For example, a researcher might ask participants to rate the similarity between different politicians. The similarity ratings can then be used to classify politicians into groups. Based on this classification, the researcher can then study on which features people’s perceptions of similarities between politicians are based.
The crucial assumption underlying classification is that objects are elements of a class, of a set, of a partition or—in biology—of a taxon. In other terminologies, the terms ‘category’ or ‘cluster’ are also used. Classification is the process of finding classes and of assigning entities to these classes. The end-product of this order-creating process, however, is often also referred to as ‘classification.’ To stress this distinction, the term ‘classification system’ can be used for the end-product, although in clinical psychology and biology the word ‘taxonomy’ is more common. Identification is the assignment of a specific case or object to (usually only) one of the classes.
1.2 Limits Of Discussion
In studying classification in the behavioral and social sciences, we have to distinguish between two questions. (a) What theory and empirical evidence are available about how people classify objects? (b) What theories and methods are used to create classification systems? The first question will not be dealt with here. The second question refers to the formal and empirical procedures used for defining classes and the rules that have evolved for assigning cases to classes. This second question is what concerns us here; it will be discussed from a conceptual rather than a statistical point of view.
One commercially ‘booming’ application of classification that this paper does not refer to is in biometric authentification. The aim of these methods is to use combinations of characteristics of individuals to uniquely identify each individual. Thus, conceptually, the task is to assign one of very many cases into a class, when there might be as many cases as classes. In this process, an observed pattern of features, or even an observed set of such patterns, is matched against a stored list of patterns. For example, DNA is used in this way in forensic criminology, as are records of the past modus operandi of individual criminals in detective work. This is, however a fairly unique example of classification, because the number of classes is intended to be equal to the number of cases, which is rarely the case in the social and behavioral sciences.
1.3 Purposes Of Classification
The fundamental purpose of classification is to find structure. Typically, a large number of objects is reduced to a much smaller set of classes without too much loss of information about the objects. The data thus summarized allow objects to be identified, at least in part, through the class to which they belong. Specifying the boundaries describing a class has several advantages. One is that limits to generalization can be established, and another is that it becomes possible to generate predictions about how different classes are composed and how class membership relates to other variables.
2. Some Examples Of Classifications
The most well-known examples of classifications are from the natural sciences, rather than the social sciences. A well-known, still used, and expanding classification is Mendelejew’s Table of Elements. It can be viewed as a prototype of all taxonomies in that it satisfies the following evaluative criteria: (a) Theoretical foundation: A theory determines the classes and their order. (b) Objectivity: The elements can be observed and classified by anybody familiar with the table of elements. (c) Completeness: All elements find a unique place in the system, and the system implies a list of all possible elements. (d) Simplicity: Only a small amount of information is used to establish the system and identify an object. (e) Predictions: The values of variables not used for classification can be predicted (number of electrons and atomic weight), as well as the existence of relations and of objects hitherto unobserved. Thus, the validity of the classification system itself becomes testable.
Another successful classification system is biological taxonomy. Indeed, most attempts to formalize classification have some intellectual roots in this tradition (Sokal and Sneath 1963). The result of such classification is frequently depicted as a ‘phylogenetic tree,’ today often the result of comparative genomics. In biological taxonomy, however, theory is not so strong as to warrant completeness, as in the Table of Elements (e.g., how should one deal with archaebacteria?). Moreover, the identification of a specimen requires information from morphology and sometimes from behavioral observation. In addition, the system abounds with nested criteria. And, compared with physics, predictions of future developments or of ‘missing links’ in biological taxonomy are vague. However, the classes of the phylogenetic system are still useful because, at the very least, they indicate boundaries to generalization.
In the behavioral and social sciences, hundreds of classifications are published every year. Noteworthy examples are Bloom’s taxonomy of educational objectives (Krathwohl et al. 1964), as well as the DSM (Diagnostic and Statistical Manual of Mental Disorders) and ICD (International Classification of Diseases) classification systems used in psychology and psychiatry. None of these systems have been formally derived, however. Instead, they were generated based on ‘experience.’ The resulting classes are so heterogeneous that they acknowledge many exceptions. Also, a phenomenon called ‘comorbidity’ shows that these classification systems are not optimal yet. It refers to the simultaneous existence of two or more disturbances in the same patient. If comorbidity is the rule rather than the exception, then the classification system loses plausibility and practicability.
3. Preparing The Basis Of A Classification
3.1 Selecting The Cases
In the beginning of the process of developing a classification, two main questions arise. (a) Which elements are to be differentiated in a classification? One searches for a (complete, if possible) list of cases to be classified. This list is called the ‘extension’ of a classification. (b) Which properties characterize the cases? The list of these properties is called the ‘intension’ of the classification. The answers to these questions already determine, in part, the results of the classification. To quote Hartigan (1982, p. 2): ‘Clearly, the selection of variables to measure will determine the final classification. Some informal classification is necessary before data collection: deciding what to call an object, deciding how to classify measurements from different objects as being of the same variable, deciding that a variable on different objects has the same value.’ Sometimes, no well-defined population of objects is available from which to sample, and a preliminary selection has to be made intuitively. In such cases, future applications of the classification system may result in more or different classes from those originally obtained.
3.2 Specifying The Properties
The question of feature selection (selection of the properties on which the classification will be based) arises at two points in the process of classifying. First, as mentioned above, it arises at the very beginning of the process. The second opportunity to select properties comes when an established procedure is tested for identification. This problem is very similar to one in regression analysis: Which variables should be retained because they discriminate best, between the classes (see Pankhurst 1991). Even if computational problems do not play a role, use of too many properties can still be problematic, if measuring these variables is expensive or dangerous. In both instances of selecting properties, reliability is a very important issue. With decreasing reliability of the measurement of the properties, the identification of classes becomes more difficult.
Another important question is whether the values of the properties should be transformed before searching for classes. The results of most classification procedures will be influenced by transformations. If differences in variability between the variables are of substantive importance, no transformations that equate variability across variables should be used. The use of transformations is also called ‘a priori weighting.’ ‘A posteriori weighting’ refers to cases in which different variables are given different emphasis in the identification process.
Especially in routine applications, a good strategy for selecting properties to be retained in the final classification might be to find a minimal set of variables sufficient to discriminate between all the classes. Relative to the set of all variables, the minimal set may not be unique. If this is the case, one will often prefer a set with few practical problems, and replace properties and take other aspects, such as minimization of costs, into account (see Pankhurst 1991 for an algorithm).
3.3 Determining The Similarity
After the objects to be classified and their relevant properties have been selected, the similarity between objects is determined. Similarity is a key concept in classification. As was mentioned earlier, there are two basic ways to obtain similarity measures: The researcher can either collect similarity judgments from participants, or derive similarity measures from the empirical co-occurrence of properties. Methods for obtaining similarity judgments in the context of the first approach—similarity ‘in the eyes of the participants’—are discussed in the context of data theory (Coombs 1964). ‘Proximity’ measures can also be derived from confusion or generalization data, association probabilities, substitutability ratings, sorting procedures, and so on. The second case—‘similarity in the mind of researchers’—amounts to comparing the feature patterns of objects and describing the similarity between objects using similarity coefficients. A large, and still growing, number of these coefficients exist, and monographs on classification devote a lot of space to them. Therefore, the choice of one particular coefficient should be explicitly justified. An analysis of the metric properties of coefficients is given by Gower and Legendre (1986). To choose a coefficient, one may refer to their axiomatic foundation (Baulieau 1999).
Another important distinction in the selection of similarity coefficients refers to ‘negative matching,’ i.e., deciding whether to include observations stating that two objects agree that a property is absent rather than present that is, whether similarity between two objects in the absence, rather than the presence, of a property should be included in the similarity measure. Jaccard’s (1908) coefficient excludes negative matchings.
Although most classification methods make use of similarity information, clustering models exist that do not refer to similarity. Another aspect that might be taken into account is the concept of similarity used: Why only use pairwise co-occurrences, and not higher order contingencies (Daws 1996)?
4. Establishing A Classification
After a measure of similarity has been selected, the next step is the actual classification of the objects based on the similarities between them. Formally (e.g., Biggs 1999), classes can be thought equivalent to (a) partitioning a set into subsets, (b) classifying a set of objects, and (c) distributing a set of objects into a set of ‘boxes.’ These various perspectives differ markedly in their implications for classification. For example, in most mathematical conceptualizations, an element is classified into exactly one class. Some clustering procedures, however, allow for residual elements, which are not considered cluster-able.
Depending on the approach to classification that a researcher has chosen, certain considerations and precautions are necessary. For example, similarity judgment data may not fulfill some necessary assumptions: Generally, related objects are to be located in the same class. The relationship xRx when ‘x is related to x’ has the properties of reflexivity: xRx, symmetry: xRy → yRx, and transitivity: xRy and yRz → xRz. These are the properties of an equivalence relation. If the empirical data are similarity judgments, they do not necessarily fulfill this relation. Some properties of this relation can be tested statistically. Another important consideration applies if the objects are classified by using rules referring to features. In such cases, these rules need to be free of contradictions (for a test, see Feger 1994).
Only a few substantive theories in the behavioral and social sciences allow one to deduce the number and kind of classes needed to describe a given range of phenomena. Therefore in many cases inductive procedures have to be used to generate classes. For this, one needs a concept of what constitutes a class. Many researchers apply inductive classification methods without ever considering explicitly the class concept that their method implies. The following part of the paper gives a brief discussion of the class concepts implied in frequently used methods for finding classes. The list is not complete, and the ‘cluster analysis proper’ dominates all other approaches, because of the frequency of its use.
Before discussing class concepts in detail, one more general distinction needs to be made. If classes are defined by properties of objects, two levels of definition can be distinguished. A general definition specifies the relationship between the properties and the classes. Specific definitions provide detailed translations of the general definition into formal operations for assigning the objects to the classes. Obviously there can be many different specific definitions. General definitions can be ordered by the kind and amount of variability they allow among objects within the class. There are two general positions with respect to within-class variability. The ‘monothetic’ position (Sutcliffe 1993) assumes that a class is defined by one or a few necessary properties. The ‘polythetic’ counter-position (Gyllenberg and Koski 1996) assumes that some properties of a specified total set, not necessarily the same for every object, are sufficient. According to this position, a property is shared by most, but not necessarily all objects of a given class. Proponents of the monothetic camp tend to stress that some properties are more important than others, and that these properties should be used to establish the classification. The opposite position assumes equal importance of all properties. As a third type of general definition, one may add definitions referring to a ‘prototype,’ that is, the most typical example of a class or a hypothetical mean object. In this last case, ‘closeness’ or similarity decides about class membership, and the prototype may be defined with or without allowing for variation in its properties.
Given properties as the base for a classification, the actual observations often are represented as a data matrix containing, for example, the objects as the columns and their properties as the rows. The cells of the matrix contain either the values ‘0’ or ‘1’ to indicate the absence or presence of properties, or they contain frequencies, durations, intensities, or symbols (in the case of qualitative polytomous items) that indicate the type or degree of the respective property in the respective object. As this enumeration shows, the procedure can accommodate data of all scale types. The goal now is to find a ‘feature by classes’ matrix, called—corresponding to its purpose—the reference or identification matrix, or simply ‘a classification.’
4.1 Concepts Of Classes
Cluster analysis proper. When authors (e.g., Everitt 1993) illustrate the concept of a cluster, they often use two-dimensional graphs to show the clusters as clouds of points (representing the objects). The clouds can have various forms; generally there are ‘gaps’ between the clusters that contain no data points, so that the clusters are isolated from one another. While such explanations of the cluster concept seem intriguing as they invoke classical ‘gestalt’ concepts, it is important to remember that the properties of (good) figures are defined by several ‘laws,’ not just one or two axioms or rules, as in cluster concepts.
Helpful as visualizations are, the more general definition of a cluster does not refer to any particular conception of space, be it dimensional, metric or Euclidean. Set theory defines a cluster as the maximal subset of elements for which proximities within this subset are larger than between any elements of the subset and elements not contained within it. As was discussed above, proximities are information about the extent to which objects ‘belong together,’ and could be expressed in many different ways, for example, as similarities, distances, ranks, or binary information about set membership. More than one subset may exist; subsets may be disjointed or overlapping; and they may or may not be hierarchically ordered. Given this very broad conceptualization, social scientists have access to a large number of clustering procedures. The large number of options reveals that no ‘one and only’ definition of a cluster can be found. Presumably, the availability of so many approaches is one reason for the paucity of comparative studies on methods of clustering.
Clustering procedures can be classified as ‘leading to a structure that is either hierarchical or non-hierarchical.’ The most frequently applied classification procedures are hierarchical, disjointed, and provide exactly one class for each object. The best known hierarchical procedures are agglomerative, that is, in a series of partitions they successively and with increasing dissimilarity, fuse the objects into classes. Each step provides a set of classes, from which the researcher has to make his choice. Once a fusion is made, it is irrevocable, so the early fusions should be very reliable. Additi e clustering (Shepard and Arabie 1979) is a hierarchical partitioning allowing membership of objects in any number of classes. Here, the classes might be interpreted as properties (Lee 1999).
The cluster concept treated thus far is based on similarity as formally represented either in a space or by set theory. A close relative is prototype theory, popular in cognitive research. A prototype can be defined as a vector of values of selected properties; usually a list of cases as exemplars of this prototype is also available. One fundamental assumption of the prototype-oriented approach can be formulated as follows: If there is high similarity among a set of patterns, these patterns are also similar to an— observed or inferred—prototypical pattern. An inferred pattern could, for example, be the vector of mean values. This pattern has high or maximal similarity to every other pattern. The idea of inferring the prototypical pattern from the data forms a bridge to the similarity-based conception. But the researcher has to be more active in abstracting and defining a specific instance as the prototype.
Contingencies of higher order than similarities between the properties are exploited in some other generalizations of the concept of similarity-based clustering, such as Configural Frequency Analysis (Krauth and Lienert 1973) and Pattern-Analytic Clustering (McQuitty 1987). For example, Configural Frequency Analysis identifies combinations of properties that occur more often than expected from some specified base model.
A recent trend, increasing in strength, is to use mixture models for clustering. The original purpose of these methods was to base classification on a model that allows for inference-statistical treatment. But they have since found wider purposes. The basic idea of mixture models can be illustrated using the following example: Assume that a sample of measurements of body height is drawn from a human population. While it is known that all the cases are male or female, gender is not recorded for individual respondents. It is, however, possible, based on the distribution of heights in the total sample, to estimate the coefficients of the separate height distributions for men and women. This is done by interpreting each measurement as a sum of weighted height measurements for women and for men. These weights are the probabilities for each measurement to be from a man and from a woman. ‘Thus the density function of height has been expressed as a superposition of two conditional density functions; it is known as a finite mixture density.’ (Everitt 1993, p. 110).
Mixture models are based on a ‘space’ concept rather than a ‘similarity’ concept; clusters are regions of relative point densities in this space. The assumptions for mixture models are comparable with those of the general linear model: cardinal scale level and multivariate normal (or similar) distributions of the data. A comparatively common mixture model for categorical data is latent class analysis (De Soete 1993).
To conclude this classification of class concepts, one further concept needs to be mentioned. This conception, models for block structure, is close to the raw data matrix and the Aristotelian tradition. A block is a maximal rectangular submatrix combining some objects and some properties with the same (or similar) values in the cells of the data matrix. The scale level of the values is not fixed; and the similarity concept is not invoked in the analytical procedure. In a block, the set of partially similar objects corresponds to the ex- tension of a concept or class. The set of partially similar properties corresponds to the intension. The symmetry in the definitions of intension and extension is fully exploited and preserved (see Feger and De Boeck 1993).
4.2 Evaluation Of A Clustering Result
Although model evaluation is only a part of the overall evaluation of a classification (see Sect. 5), it is an important one. As Dunn and Everitt (1982, p. 94) state: ‘Since clustering techniques will generate a set of clusters even when applied to random, unclustered data, the question of validating and evaluating becomes of great importance.’ Jain and Dubes (1988) classify the criteria of validation as follows:
—–External criteria measure performance by matching a clustering structure to a priori information… . Internal criteria assess the fit between the structure and the data, using only the data themselves… . Relative criteria decide which of two structures is better in some sense, such as being more stable or appropriate for the data.
Considerable progress has been made in internal statistical cluster evaluation. Statistical procedures exist for testing the existence of ‘natural’ clusters, for testing the adequacy of computed classifications, and for the determination of a suitable number of clusters (see, e.g., Bock 1996). A very plausible way to evaluate any solution, independent of the clustering approach used, is to reproduce, or ‘derive,’ from the solution all information that the solution gives about raw data that would fit with the solution, and then to compare this information with the actual raw data.
4.3 Procedures To Assign Cases To Classes
Procedures to assign single cases to classes are needed for two purposes. One purpose is to assign newly observed cases to the classes of an already existing classification. The other purpose is to evaluate a classification by taking ‘old’ cases from the original sample on which the classification was based, and checking which class they would be assigned to. In both cases, the question is: Into which class should the case be placed? In practice, experts (e.g., physicians) are often consulted for the answer this question. In other cases, numerical procedures (‘automatic classification’) are used. Here, the properties may be used, either sequentially, as in a diagnostic key, or simultaneously by some type of matching between the case and the existing classes (see Dunn and Everitt 1982, especially on diagnostic keys). Quite often, as identification with certainty is impossible ‘either because too many characters are variable within taxa or because all assessments of character states are subject to error, probabilistic identification methods are often used’ (Dunn and Everitt 1982, p. 112). Of the probabilistic procedures, the Bayes approach and discriminatory analysis are especially well known.
Other placement rules can be used if they are transparent, unambiguous, and do not lead to contradictions. For example, the principle of ‘nearest neighbor’ computes the distances of a new pattern to all existing classes. It assigns the new case to the class to which the distance is shortest (for details and other rules, see Looney 1997). Rules may also include options such as rejecting a case as ‘not classifiable’ or postponing a decision until more information is available. Most rules currently applied are compensatory, but rules could also be disjunctive or conjunctive, requiring at least one value to reach a high amount, or all values to surpass a given minimum (Coombs 1964).
Different rules lead to different results, especially if the classes vary in their a priori probability, if the distributions and covariances of the variables are very different, and if the number of observations is small. The single most important criterion for evaluating an assignment procedure is the number of correct classifications. But this ‘apparent error rate’ is optimistically biased, because it does not take into account the probability of correct assignments by chance. If base rates of class membership are known, the predictions have to perform better than the base rate (Pires and Branco 1997).
5. Evaluating a Classification
While every step of the classification process can be evaluated (Milligan 1996), two stages have received special attention: the evaluation of class definitions and of the identification procedure. Both have been mentioned previously. There also exist procedures to evaluate the overall performance of a classification system. The main method is ‘cross validation,’ using a new sample of data comparable to the old one, or splitting the original sample randomly into two halves and using one half to evaluate the classification obtained in the other half.
Another way of evaluating the results of a classification process is by comparing the results of using different classification procedures. Usually the researcher has several choices in the classification process, and is not forced by theory to select just one option. Examples include multiple options about the selection of cases and variables, of similarity coefficients, of clustering models and of identification rules. With computers, it is easy to try several combinations of such choices. Confidence that the classification captures substantial information in the data grows with the amount of agreement in the results from different combinations of choice options (for the evaluation and comparison of solutions, see Everitt 1993). To aid the interpretation of the resulting class structure, Milligan (1996, p. 346) suggests deliberately adding ‘ideal types,’ that is, characteristic patterns constructed by the researcher, to the data, and to assess what clusters these patterns are assigned to.
Can a classification be wrong? In most cases, a classification is just a systematic description, and as such, may, or may not, be useful. New observations may require changes in the classification. But if there exists a theory about the definitions of the classes, and the theory is strong enough to allow for specific predictions, then these predictions can be falsified and or lead to revisions of the classification system.
6. Conclusions
More in the past than in the present, opinions fundamentally critical of the possibility of classification in the social sciences have been expressed. For example, Galt and Smith (1976, p. 58) stated: ‘Because they usually lack measurable dimensions, social entities are difficult to classify, and any given system of classification will inevitably be arbitrary.’ Indeed, numerical classification definitely requires measurement, or more generally, the interpretation of observations as variables. Indeed, every variable is itself a classification, defined as a set of disjointed, exclusive and together sufficient classes, and the ‘categories’ of variables are referred to as their values. Without variables in this formal sense, one might consider a heuristic equivalent in abstraction from, and ordering of, observations: the ‘ideal type,’ as introduced by Max Weber.
Variables used to establish a classification may be discrete or continuous. For a classification to be justified, the frequency or density distributions of the properties should not be equal distributions, but show one or several peaks. Then, when the joint distributions of more than one variable are considered, some combinations of values may be more frequent than other combinations, perhaps even more frequent than would be expected based on the marginal distributions. This is one of the fundamental phenomena enabling the formal definition of classes.
While classification is a ‘process,’ the temporary result is a ‘structure.’ Dynamic aspects, such as the development of a property and other trends and changes, might be included in the definitions of variables. This does not, however, make a formal classification a process model. In this sense, classification is static. It temporarily fixes the—sometimes turbulent—streams of information. Changing a classification means the (re-)interpretation of some substantive area.
Bibliography:
- Arabie P, Hubert L-J, De Soete G (eds.) 1996 Clustering and Classification. World Scientific, Singapore
- Baulieau F B 1999 Two variant axiom systems for presence absence based dissimilarity coeffi Journal of Classification 14: 159–70
- Biggs N L 1999 Discrete Mathematics. rev. edn. Clarendon Press, Oxford, UK
- Bock H-H 1996 Probability models and hypothesis testing in partitioning cluster analysis. In: Arabie P, Hubert L J, De Soete G (eds.) Clustering and Classification. World Scientific Singapore , pp. 377–453
- Coombs C H 1964 A Theory of Data. Wiley, New York
- Daws J T 1996 The analysis of free-sorting data: Beyond pairwise cooccurrences. Journal of Classification 13: 57–80
- De Soete G 1993 Using latent class analysis in categorization research. In: van Mechelen I, Hampton J, Michalski R S, Theuns P (eds.) Categories and Concepts. Academic Press, London, pp. 309–30
- Dunn G, Everitt B S 1982 An Introduction to Mathematical Taxonomy. Cambridge University Press, Cambridge, UK
- Everitt B S 1993 Cluster Analysis. 3rd edn. Edward Arnold, London
- Feger H 1994 Structure Analysis of Co-occurrence Data. Shaker, Aachen, Germany
- Feger H, De Boeck P 1993 Categories and concepts: Introduction to data analysis. In: van Mechelen I, Hampton J, Michalski R S, Theuns P (eds.) Categories and Concepts. Academic Press, London, pp. 203–23
- Galt A H, Smith L J 1976 Models and the Study of Social Change. Wiley, New York
- Gower J C, Legendre P 1986 Metric and Euclidean properties of dissimilarity coeffi Journal of Classification 3: 5–48
- Gyllenberg M, Koski T 1996 Numerical taxonomy and the principle of maximum entropy. Journal of Classification 13: 213–29
- Hartigan J A 1982 Classification. In: Kotz S, Johnson N L, Read C B (eds.) Encyclopedia of Statistical Sciences. Wiley, New York, Vol. 2, pp. 1–10
- Jaccard P 1908 Nouvelles recherches sur la distribution florale. Bulletin de la Societe Vaudoise de Science Naturelle 44: 223–70
- Jain A K, Dubes R C 1988 Algorithms for Clustering Data. Prentice-Hall, Englewood Cliffs, NJ
- Krathwohl D R, Bloom B S, Masia B B 1964 Taxonomy of Educational Objectives. Longman, London
- Krauth J, Lienert G A 1973 KFA—Die Konfigurationsfrequenzanalyse. Alber, Freiburg, Germany
- Lee M D 1999 An extraction and regularization approach to additive clustering. Journal of Classification 16: 255–81
- Looney C G 1997 Pattern Recognition Using Neural Networks. Oxford University Press, New York
- McQuitty L L 1987 Pattern-Analytic Clustering: Theory, Method, Research and Configural Findings. University Press of America, New York
- Milligan G W 1996 Clustering validation: Results and implication for applied analyses. In: Arabie P, Hubert L J, De Soete G (eds.) Clustering and Classification. World Scientific, Singapore, pp. 341–75
- Pankhurst R J 1991 Practical Taxonomic Computing. Cambridge University Press, Cambridge, UK
- Pires A M, Branco J A 1997 Comparison of multinomial classification rules. Journal of Classification 14: 137–45
- Shepard R N, Arabie P 1979 Additive clustering representations of similarities as combinations of discrete overlapping properties. Psychological Review 86: 87–123
- Sokal R R, Sneath P H A 1963 Principles of Numerical Taxonomy. Freeman, San Francisco
- Sutcliffe J P 1993 Concept, class, and category in the tradition of Aristotle. In: van Mechelen I, Hampton J, Michalski R S, Theuns P (eds.) Categories and Concepts. Academic Press, London, pp. 35–65