Statistical Sufficiency Research Paper

Academic Writing Service

View sample Statistical Sufficiency Research Paper. Browse other statistics research paper examples and check the list of research paper topics for more inspiration. If you need a religion research paper written according to all the academic standards, you can always turn to our experienced writers for help. This is how your paper can get an A! Feel free to contact our research paper writing service for professional assistance. We offer high-quality assignments for reasonable rates.

In the language of statistical theory, a statistic is a function of a set of data, and a sufficient statistic contains as much information about the statistical model as the original set of data. Statistical sufficiency has served as a powerful concept in the theory of inference in helping to clarify the role of models and data, and in providing a framework for data reduction.

Academic Writing, Editing, Proofreading, And Problem Solving Services

Get 10% OFF with 24START discount code


1. Introduction

Statistical sufficiency is a concept in the theory of statistical inference that is meant to capture an intuitive notion of summarizing a large and possibly complex set of data by relatively few summary numbers that carry the relevant information in the larger data set. For example, a public opinion poll might sample one or two thousand people, but will typically report simply the number of respondents and the percentage of respondents in each of a small number of categories. A study to investigate the effectiveness of a new treatment for weight loss may well report simply the average weight loss of a group of patients receiving the new treatment. This study would usually report as well some measure of the range of weight loss observed; variability in this setting would be a crucial piece of information for evaluating the effectiveness of the new treatment.

In order to make precise the vague notion of an informative data summary, it is necessary to have a means of defining what we mean by relevant information. This in turn is highly dependent on the use we wish to make of the data. The formal mechanism for doing this is to use a statistical model to represent a (usually idealized) version of the problem we are studying. Statistical sufficiency and information are defined relative to this model. A statistical model is a family of probability distributions, the central problem of statistical inference being to identify which member of the family generated the data currently of interest. The basic concepts of statistical models and sufficiency were set out in Fisher (1922, 1925) and continue to play a major role in the theory of inference.




For example, we might in an idealized version of the weight-loss problem assume that the potential weight loss for any particular patient can be described by a model in the family of normal distributions. This specifies a mathematical formula for the probability that the weight loss of an individual patient falls in any possible interval, and the mathematical formula depends on two unknown parameters, the mean and standard deviation. If we are willing to further assume that the same statistical model applies to all the potential recipients of the weight-loss treatment, then the data collected in a careful study will enable us to make an inference on the two parameters that depends only on two data summaries: the average weight loss in the data and the standard deviation of the weight loss in the data.

The normal distribution is also referred to as the ‘bell curve,’ but there is in fact a family of bell curves, each differing from the others by a simple change of mean, which fixes the location of the bell curve, and standard deviation, which fixes the scale of the bell curve. Standardized tests for measuring IQ are de- signed so that the scores follow a bell-curve model with parameter values for the mean and standard deviation of 100 and 15, respectively. The parameters for the potential weight loss under our hypothetical new treatment are unknown, and the role of the data collected on the treatment is to shed some light on their likely values. Should the weight loss bell curve turn out to be centered at or near zero, one would be wise not to invest too heavily in the new company’s stock just yet. As long as we are willing to accept the bell-curve model for the weight loss data, all the information about that model is contained in two numbers: the average and the standard deviation of the weight loss recorded in our data set.

An idealized model that lies behind most media reporting of polls is a Bernoulli or binomial model, in which we assume that potential respondents will choose one of two categories with a fixed but unknown probability, that respondents choose categories independently of each other, and with the same, constant probability. In such an idealized version of the problem, we have complete information about this unknown probability from the knowledge of the number of responses and the proportion of responses in each of the two categories.

These two examples are by construction highly idealized, and it is easy to think quickly of a number of complexities that have been overlooked. An educated reader of report on the weight-loss study would want to know how the patients were selected to receive the new treatment, whether the patients were in any way representative of a population of interest to that reader, whether any comparison has been made with alternative, possibly conventional, treatments, the length of time over which the study was conducted and so on. Public opinion polls typically offer more than two categories of response, even to a ‘yes/no’ question, including something equivalent to ‘don’t know,’ or ‘won’t say,’ and also often break down the results by gender, geographic area, or some other classification. Randomized selection of respondents by computerized telephone polls is very different to a ‘reader response’ survey often conducted by magazines and media outlets. These and a wide variety of other issues are studied in the fields of experimental design and sample survey design, by statisticians and researchers in a variety of subject matter areas, from economics to geophysics. See also Observational Studies: Overview and Experimental Design: Overview.

Statistical sufficiency is concerned with a much narrower issue, that of formalizing the notion that for a given family of models, a complex set of data can be summarized by a smaller collection of numbers, and lead to the same statistical inference. The following sections describe statistical sufficiency with reference to families of models and related concepts in the theory of statistics. Section 2 provides the main definitions and examples, Sect. 3 some related concepts, and Sect. 4 some more advanced topics.

2. Statistical Models And Sufficiency

2.1 Basic Notation

We describe a statistical model by a random variable Y and a family of probability distributions for Y, F. A random variable is a variable that takes values on a sample space, and a probability distribution describes the probability of observing a particular value or range of values in the sample space. The sample space is in a great many applications either discrete, taking values on a countable set such as (some subset of ) the nonnegative integers, or continuous taking values on (some subset of ) the real line R or p-dimensional Euclidean space Rp. A typical data set will be a number of realizations of Y, and the goal is to use these observations to deduce which member of F generated these realizations. The members of F are often conveniently indexed by the probability mass function or probability density function f ( y), where

Statistical Sufficiency Research Paper Formula 1

if the sample space is discrete, and

Statistical Sufficiency Research Paper Formula 2

if the sample space is continuous. If Y is a vector then Prob (Y ≤ y) is defined component-wise, i.e. Prob (Y1 ≤ y1,…, Yp ≤ yp).

Very often the mathematical form of the probability distributions in F is determined up to a finite number of unknown constants called parameters, so we write Fθ = {f ( ;θ)}, or more precisely Fθ = {f ( ; θ ); θ ϵ Θ}, where Θ is a set of possible values for θ, such as the real numbers R or real-valued vectors of fixed length p, p, or [0, 1], and so on. The problem of using data to make an inference about the probability distribution is now the problem of making inference about the parameter θ, and is referred to as parametric inference. Sufficiency also plays a role in nonparametric inference as well, and we illustrate this at the end of this section.

We assume in what follows that the random variable Y = (Y1,…, Yn) is a vector of length n with observed valued y = ( y1,…, yn). A statistic is a function T = t(Y) whose distribution can be computed from the distribution of Y. Examples of statistics include the mean, T1 = n−1ΣYi, the variance T2 = (n – 1)−1Σ (Yi – T1)2, the standard deviation T3 = T21/2, the range T4 = max Yi–min Yi, and so on. The probability distribution of any statistic is obtained from the probability distribution for Y by a mathematical calculation.

2.2 Sufficiency And Minimal Sufficiency

A statistic T = t(Y ) is sufficient for θ in the family of models Fθ = {f(y; θ); θ ϵ Θ} if and only if its conditional distribution

Statistical Sufficiency Research Paper Formula 3

does not depend on θ.

Example 1. Suppose Y1,…, Yn are independent and identically distributed with probability mass function

Statistical Sufficiency Research Paper Formula 4

which can be used to model the number of successes in n independent Bernoulli trials, when the probability of success if θ. The joint density for Y = (Y1,…, Yn) is

Statistical Sufficiency Research Paper Formula 5

and the marginal density for T = ΣYi is

Statistical Sufficiency Research Paper Formula 6

where c(t.n) = n!/t!(n-t)! is the number of vectors y of length n with t ones and n t zeroes. The conditional density of Y, given T, is the ratio of (5) and (6) and is thus free of θ.

The intuitive notion that the definition is meant to encapsulate is that once the sufficient statistic T is observed, i.e. once we condition on T, no further information about the θ is available from the original data Y.

The definition of sufficiency in terms of the conditional distribution is not very convenient to work with, especially in the case of continuous sample spaces where some care is needed in the definition of a conditional distribution. The following result is more helpful.

Factorization theorem. A statistic T = t(Y ) is sufficient for θ in the family of models Fθ = {f ( y; θ); θ ϵ Θ} if and only if there exist functions g(t; θ ) and h( y) such that for all θ ϵ Θ

Statistical Sufficiency Research Paper Formula 7

Example 2. Let Y1,…, Yn be independent, identically distributed from the normal distribution with mean µ and standard deviation σ:

Statistical Sufficiency Research Paper Formula 8

The joint density of Y = (Y1,…, Yn) is

Statistical Sufficiency Research Paper Formula 9

where y = n-1Σyi is the saple mean, θ = ( µ, σ), an d Θ = R × R*. From (9) we see that T = {Y, Σ(Yi – Y)2} is sufficient for θ, with g(t; θ) identified with all but the final factor in (9).

Strictly speaking, there are a number of other sufficient statistics in both the examples above. In Example 2, (Σ41Yi, Σn5Yi, Σ71(Yi – Y )2, Σn8(Yi – Y)) is also sufficient, as is, trivially, the original vector Y itself. Clearly (Y, Σn1(Yi – Y)2) is in some sense ‘smaller’ than these other sufficient statistics, and to be preferred for that reason. A minimal sufficient statistic is defined to be a function of every other sufficient statistic. One-to-one functions of the minimal sufficient statistic are also minimal sufficient an d are not distinguished, so in Example 2 (Σn1Yi, Σn1Y2i) is also minimal sufficient. Any statistic defines a partition of the sample space in which all sample points leading to the same value of the statistic fall in the same partition, and the minimal sufficient statistic defines the coarsest possible partition.

Since the central problem of parametric statistical inference is to reason from the data back to the parameter θ, extensive use is made of the likelihood function, which is (proportional to) the joint density of the data, regarded as a function of θ:

Statistical Sufficiency Research Paper Formula 10

The factorization theorem suggests, and it can be proved, that the minimal sufficient statistic in the family of models Fθ is the likelihood statistic L( ; Y ). This result is easily derived from the factorization theorem if Θ is a finite set, but extending it to cases of more practical interest involves some mathematical intricacies (Barndorff-Nielsen, 1978, Chap. 4).

Example 3. To illustrate the concept of sufficiency in a nonparametric setting, suppose that Y1,…, Yn are independent, identically distributed on R with probability density function f; i.e. F consists of all probability distributions which have densities with respect to Lebesgue measure. Denote by Y( ) the vector of ordered values Y( ) = (Y(1) ≤ Y(2) ≤ … ≤ Y(n)). Then Y( ) is sufficient for F, as

Statistical Sufficiency Research Paper Formula 11

is the same for all members of F

Example 4. Let Y1,…, Yn be independent, identically distributed according to the uniform distribution on (θ – 1, θ + 1), where θ ϵ R. The order statistic Y( ) is sufficient, as this is a special case of Example 3 above, but is not minimal sufficient. The minimal sufficient statistic is determined from the likelihood function:

Statistical Sufficiency Research Paper Formula 12

from which we see that the minimal sufficient statistic is T = (Y( ), Y-), as this is a one-to-one function of the likelihood statistic.

2.3 Exponential Families

A random vector Y ϵ RP is said to be distributed according to an exponential family of dimension k if the probability mass or density function can be expressed in the form

Statistical Sufficiency Research Paper Formula 13

for some functions φ(θ) = {φ1(θ),…, φk(θ)}, t( y) = {t1( y),…,tk( y)}, c(θ) and d( y).

Many common distributions can be expressed in exponential family form, including Example 1 with φ(θ) = log{θ/(1 – θ)}, t( y) = Σyi and Example 2 with

Statistical Sufficiency Research Paper Formula 13.1

Other examples of exponential families include the Poisson, geometric, negative binomial, multinomial, exponential, gamma, and inverse Gaussian distributions. Generalized linear models are regression models built on exponential families that have found wide practical application. See Dobson (1990) or McCullagh and Nelder (1989) and Analysis of Variance and Generalized Linear Models.

From (13) we can see that exponential families are closed under independent, indentically distributed sampling, and that the minimal sufficient statistic for θ is the k-dimensional vector T = Σt(Yi). Exponential families are said to permit a sufficiency reduction under sampling; we can replace the original n observations by a vector of length k. Possibly for this reason exponential families are used widely as models in applied work.

A partial converse of this is also true, that if a finite dimensional minimal sufficient statistic exists in independent sampling from the same density, and the range of values over which the density is positive does not depend on θ, then that density must be of exponential family form.

Versions of the material in this appear in most mathematical statistics textbooks. Suggested references, roughly in order of increasing difficulty, are Azzalini (1996), Casella and Berger (1990), Lehmann and Casella (1998), Cox and Hinkley (1974), Pace and Salvan (1997), and Barndorff-Nielsen and Cox (1994).

3. Some Related Concepts

3.1 Ancillary Statistics

A companion concept for the reduction of data to a number of smaller quantities is that of ancillarity. A statistic A = a(Y) is said to be ancillary for θ in the family of models Fθ if the marginal distribution of A,

Statistical Sufficiency Research Paper Formula 14

is free of θ. If every other ancillary statistic is a function of a given ancillary statistic, the given statistic is a maximal ancillary statistic.

Example 4 (continued). In independent sampling from the U(θ, θ + 1) distribution, T = (Y(1), Y(n)) is a minimal sufficient statistic, and A = Y(n) – Y(1) is a maximal ancillary statistic.

If an ancillary statistic exists, all the information about θ is in the conditional distribution of Y, given the ancillary statistic. If a minimal sufficient statistic exists, all the information about θ is in the marginal distribution of T. These two data reductions, to A or to T, are complementary, and can sometimes be combined.

Example 5. Let Y1,…, Yn be independent, identically distributed from the one parameter location family f ( yi; θ) = f0( yi – θ), θ ϵ R, where f0( ) is known. Then the vector of residuals A = (Y1 – Y,…, Yn – Y ) is ancillary. An extension of this to the location scale family is possible: let f ( yi; θ) = σ−1 f0 −1 ( yi – µ)}, then

Statistical Sufficiency Research Paper Formula 15

where S2 = Σ(Yi – Y)2, is a maximal ancillary. The normal distribution of Example 4 is a location-scale model, with minimal sufficient statistic T = (Y, S2), which in this case is independent of A.

The importance for the theory of statistics in concepts of dimension reduction such as ancillarity and sufficiency is that inference for a k-dimensional parameter is encapsulated by a probability distribution on a k-dimensional space. This immediately provides tests of significance, confidence intervals, and point estimates for the unknown parameter. From this point of view it now appears that ancillarity is possibly a more central notion to the theory of statistics, although sufficiency is regarded widely as more natural and intuitive.

It should be noted that inference from a Bayesian point of view automatically provides a k-dimensional distribution for inference for a k-dimensional parameter, which is the posterior distribution given the observed data y. This is achieved at the expense of constructing a probability model for the unknown parameter θ. See Bayesian Statistics.

3.2 Asymptotic Sufficiency

The normal distribution is used widely for modelling data from a variety of sources, but is also very important in the theory of inference in providing approximate inferences, when exact inferences may be unavailable. While the likelihood function is equivalent to the minimal sufficient statistic, a special role in approximate inference is played by some functions computed from the likelihood function.

The maximum likelihood estimator of θ in the model f ( y; θ) is defined by

Formula 16

and the observed information function for θ is

Formula 17

Under regularity conditions on the family of models for Y = (Y1,…, Yn) we have the following asymptotic result for a scalar parameter θ as n → ∞:

Formula 18

which means that for sufficiently large n the distribution of the maximum likelihood estimator is arbitrarily close to a normal distribution with mean zero and variance j−10), under sampling from f ( y; θ0). A similar result is available for vector θ.

The maximum likelihood estimator has by construction the same dimension as the unknown parameter θ, so (18) provides a means of constructing an inference for θ using the normal approximation. This suggests that a notion of asymptotic sufficiency could be formalized, the maximum likelihood estimator having this property. This formalization relies on showing that the likelihood function is asymptotically of normal form, with minimal sufficient statistic θ; see Fraser (1976, Chap. 8). The argument is very similar to that showing that the posterior distribution for θ is asymptotically normal. More generally, in models with smooth likelihood functions, a Taylor series expansion of the likelihood function around θ suggests a refinement of asymptotic sufficiency: the set of statistics θ and second and higher order derivatives of the likelihood function evaluated at θ form a set of approximate sufficient statistics in a certain sense (Barndorff-Nielsen and Cox 1994, Chap. 7).

In general models the maximum likelihood estimator is not the minimal sufficient statistic, although it is always a (many-to-one) function of the minimal sufficient statistic. Much of the recent development of the parametric theory of inference has been concerned with combining sufficiency or approximate sufficiency with ancillarity or an appropriate notion of approximate ancillarity, to effect a dimension reduction with minimal loss of information. In general models if the maximum likelihood estimator is used in place of the minimal sufficient statistic, information is lost. The idea of recovering this information, via conditioning on an ancillary statistic, was put forth in Fisher (1934). With some exceptions this idea was not exploited fully in the statistical literature until about 1980, when it was discovered that an asymptotic version of Fisher’s argument holds in very general situations. The subsequent 20 years saw continued development and refinement of Fisher’s original idea. A brief overview is given in Reid (2000), and the books Barndorff-Nielsen and Cox (1994) and Pace and Salvan (1997) give a detailed account.

3.3 Inference In The Presence Of Nuisance Parameters

Many statistical models have vector valued parameters θ, but often some components or scalar valued functions of θ are of particular interest. For example, in evaluating the hypothetical new weight-loss treatment under a normal model, the mean parameter µ is in the first instance of more interest than the standard deviation σ. In studying results from a poll we may wish to introduce parameters describing various aspects of the population, such as gender, age group, income bracket, and so on, but the main purpose of the poll is to construct an inference about the proportion likely to vote for our candidate, for example.

The idealized modelling of these types of situations is to describe θ as consisting of two components θ = (ψ, λ), of parameters of interest, ψ and nuisance parameters, λ. An extension of this would be to define ψ = g(θ) as determined by a set of constraints on θ, but only the component version will be considered here.

There are in the literature several different definitions of sufficiency for a parameter of interest ψ in the presence of a nuisance parameter λ. To illustrate some of the issues we consider again the normal distribution with θ = ( µ, σ), T = (Y, s2). We have

Statistical Sufficiency Research Paper Formula 19

where fY( ) is the density of a normal distribution with mean µ and standard deviation σ/ √n, and fS ( ) is proportional to that of a chi-squared distribution on n – 1 degrees of freedom. As the conditional distribution of S2, given Y is free of µ, we could define Y as sufficient for µ, by analogy to (3). However it is not sufficient for µ in the sense of providing everything we need for inference about µ, since its distribution depends also on σ. In this latter sense S2 is sufficient for σ.

There have been a number of attempts to formalize the notions of partial sufficiency and partial ancillarity, but none has been particularly successful. However, the notion of dimension reduction through exact or approximate sufficiency and ancillarity has played a very important role in the theory of parametric inference.

Sufficiency in the presence of nuisance parameters is considered in Barndorff-Nielsen (1978, Chap. 4), Jorgensen (1997) and Bhapkar (1991). A discussion of partitions like (19) and the role of conditional and marginal inference is given in Reid (1995).

4. Conclusion

A sufficient statistic is defined relative to a statistical model, usually a parametric model, and provides all the information in the data about that model or the parameters of that model. In many cases the sufficient statistic provides a substantial reduction of complexity or dimension from the original data, and as a result the concept of sufficiency makes the task of statistical inference simpler. The related concept of ancillarity has a similar function. Recent developments in likelihood inference have emphasized the constructions of approximately sufficient and approximately ancillary statistics in cases where further dimension reduction is needed after invoking sufficiency and ancillarity. Excellent textbook references for the definitions and examples include Casella and Berger (1990) and Azzalini (1996). More advanced accounts of recent work are given in Barndorff-Nielsen and Cox (1994) and Pace and Salvan (1997).

Bibliography:

  1. Azzalini A 1996 Statistical Inference. Chapman and Hall, London
  2. Barndorff-Nielsen O E 1978 Information and Exponential Families in Statistical Theory. Wiley, New York
  3. Barndorff-Nielsen O E, Cox D R 1994 Inference and Asymptotics. Chapman and Hall, London
  4. Bhapkar V P 1991 Sufficiency, ancillarity and information in estimating functions. In: Godambe V P (ed.) Estimating Functions. Oxford University Press, Oxford, UK
  5. Casella G, Berger R L 1990 Statistical Inference. Brooks-Cole, Pacific Grove, CA
  6. Cox D R, Hinkley D V 1974 Theoretical Statistics. Chapman and Hall, London
  7. Dobson A J 1990 An Introduction to Generalized Linear Models. Chapman and Hall, London
  8. Fisher R A 1922 On the mathematical foundations of theoretical statistics. Philosophical Transactions of The Royal Society London, Series A 222: 309–68. Reprinted. In: Kotz S, Johnson N L 1992 (eds.) Breakthroughs in Statistics. Springer, New York
  9. Fisher R A 1925 Theory of statistical estimation. Proceedings of the Cambridge Philosophical Society 22: 700–25
  10. Fisher R A 1934 Two new properties of mathematical likelihood. Proceedings of the Royal Society Series A 144: 285–307
  11. Fraser D A S 1976 Probability and Statistics: Theory and Applications. Duxbury Press, North Scituate, MA
  12. Jorgensen B 1997 The rules of conditional inference: Is there a universal definition of non-formation? Journal of the Italian Statistical Society 3: 355–84
  13. Lehmann E L, Casella G 1998 Theory of Point Estimation, 2nd edn. Springer, New York
  14. McCullagh P, Nelder J A 1989 Generalized Linear Models, 2nd edn. Chapman and Hall, London
  15. Pace L, Salvan A 1997 Principles of Statistical Inference From a Neo-Fisherian Perspective. World Scientific, Singapore
  16. Reid N 1995 The roles conditioning in inference (with discussion). Statistical Science 10: 138–57
  17. Reid N 2000 Likelihood. Journal of the American Statistical Association 95: 1335–40
Crime Statistics Research Paper
Statistical Pattern Recognition Research Paper

ORDER HIGH QUALITY CUSTOM PAPER


Always on-time

Plagiarism-Free

100% Confidentiality
Special offer! Get 10% off with the 24START discount code!