Event-History Analysis in Discrete Time Research Paper

Academic Writing Service

Sample Event-History Analysis in Discrete Time Research Paper. Browse other  research paper examples and check the list of research paper topics for more inspiration. If you need a research paper written according to all the academic standards, you can always turn to our experienced writers for help. This is how your paper can get an A! Feel free to contact our research paper writing service for professional assistance. We offer high-quality assignments for reasonable rates.

The practical motivation for considering discrete-time models of event histories (Allison 1982, Thompson 1977) comes from the fact that such longitudinal data are often recorded and presented in a discrete-time form. Thus, for example, it is common to collect information on the values of considered individual status variables at regularly spaced time points and to report such measurements directly as the data. A time unit is then fixed, typically to be a day, a week, a month, or a year, thus creating a regular time lattice. A slightly different but related reason why event-history data often appear in a discrete-time form is that the event times themselves are reported in a rounded form. Then the events which happened inside a time interval of unit length are somewhat arbitrarily assigned to either of its end points or, in so-called actuarial methods, to the middle. As a consequence, the reported event times are often tied, with several individuals sharing the same value. As an example, one can consider the changes of a person’s employment status during the work history, typically consisting of spells of employment and unemployment, and finally ending with a permanent transition to outside of the work force. In order to be completely accurate, the reporting might have to done day-by-day. More realistically in practice, it can only be done monthly, and then a number of different alternatives exist.

Academic Writing, Editing, Proofreading, And Problem Solving Services

Get 10% OFF with 24START discount code

For example, one could directly report the employment status on the last day of the month. A drawback of such reporting would be that, if the status can undergo frequent changes, the status at the end of the month might be purely accidental and not represent the entire month in an adequate way. An alternative would be to set a threshold x and ask whether the person has been employed for at least x days during the month. Such a solution may not be completely satisfactory either, as a low value of x does not distinguish between people who were employed only briefly from those who were employed during the entire month, with a similar but opposite problem arising for a high value of x. Such problems are of less concern if changes in the status variable are less frequent (such as marital status), or even irreversible.

1. Discrete-time Hazards

The main advantages from linking a statistical model directly with such data are that model construction becomes relatively straightforward, and that no special methods are required for dealing with the ties. For an illustration, suppose for simplicity that there are only two states for each individual, say ali e and dead, and that the chosen time unit is one month. Then the lifetime T of an individual, in a discretized time scale, can be thought of as being determined by a sequence of Bernoulli trials, resulting in a sequence of ‘zeros’ as long as the individual is alive and ending with a ‘one’ for the month during which the individual dies. The basic element for setting up a corresponding statistical model is the conditional probability of recording ‘one’ in a unit interval given that all preceding intervals resulted in ‘zeros’, that is, the probability of death during the tth month for a person still alive after t 1 months. Because of the discrete-time lattice used, such discretetime hazards can be simply expressed as conditional probabilities:

 h(t) = Pr(T = t|T≥ t) (1)

By applying the chain multiplication rule of conditional probabilities progressively for increasing values of t, one finds readily that the unconditional probability for the survival is given by the product

 Pr(T˃ t) ts = Π ֧ͭ ‗  ̥[ 1 ̶ h(s)] (2)

Note that moreover

 Pr(T = t) = Pr(T˃ t  ̶ 1)h(t) (3)

These are discrete-time analogues of the well known exponential formulas for the survival probability and density, respectively, which are valid in continuous time and express a lifetime distribution in terms of the corresponding hazard rate (see, e.g., Andersen et al. 1993, Kalbfleisch and Prentice 1980). In fact, the exponential formulas can be obtained as limits when the time lattice used in the discretization becomes more and more dense, thereby making the discretetime hazards h(t) numerically small.

2. Regression Models and Likelihood Inference

When considering data consisting of lifetimes of several individuals, the number of such Bernoulli trials equals the total number of person-months at risk, and under natural independence conditions across individuals, a statistical model for the entire data can be formulated in terms of equally many Bernoulli trials, with the corresponding discrete-time hazards depending on age. This gives rise to a simple likelihood expression of a product form, without that tied lifetimes in the data would be a problem. The structure of the likelihood remains essentially unchanged if the discrete-time hazards of the considered individuals depend on covariates, as long as the covariates are fixed in time. One is then dealing with a regression problem, in which the observed development of the response variables (lifetimes) is explained in terms of the corresponding realized covariate values in the data. A convenient framework for considering such regression models, including the possibility of using standard software, is offered by generalized linear models (for a general account and references, see Fahrmeir and Tutz (1994), and for asymptotic properties of the estimators, see, e.g., Arjas and Haara (1987)). In the important special case of a logistic link function, if the considered time unit is short enough to make the discrete-time hazards h(t) small and the intercept of the linear expression is allowed to depend on the interval, the model corresponds closely to the proportional hazards model of Cox (1972). If the covariates can themselves change in time, the same logic and structure of the likelihood can be used as long as the covariates are external (Kalbfleisch and Prentice 1980) in the sense that their development can be thought of as being exogenous from the point of view of the studied event-history data of individuals.

This is because there is then no need for an explicit modeling of the process governing the covariate process, as the resulting contributions to the likelihood expression do not depend on the actual parameters of interest and can therefore be coalesced into a proportionality constant. The same logic applies if the individual event histories are right censored and the censoring mechanism is assumed to be noninformative. If the model also involves time-dependent internal covariates, special care is needed in modeling and inference to make sure that their development in time does not confound the regression problem and the corresponding conclusions, which are often given a causal form. The issue of confounding, in the case of sequential designed experiments, has been considered in a series of papers by Robins (e.g., Robins 1997). Sometimes, in order to give useful causal interpretations, it becomes necessary to build the regression model in two or more levels representing, for example, group level and individual level effects. For a recent contribution in this direction, see Barber et al. (2000). The same general ideas apply also in situations in which the considered status or response variables are no longer just binary. For example, the status of an individual could assume more than two possible values, and the same status could repeat itself many times during the follow-up. For example, as noted above, a person’s employment status could first alternate several times between the states employed and unemployed, until it finally reaches the absorbing state outside of work force because of retirement. What is needed, however, is a convenient form of conditioning, which carries information about an individual’s earlier history and present status and thereby influences the assessment of the discrete-time hazards associated with the future development, always by one time unit ahead. For an interesting case study, applying a variety of inferential techniques, see Fahrmeir and Wagenpfeil (1996).

3. Structure of the Likelihood

Let us now make this idea more explicit. Indexing the individuals in the data by i I and denoting the value of the corresponding status or response variable at time t by Xt(i), we are led to consider conditional probabilities for a transition from a state Xt(i) at time t to a state Xt+1 (i) at a later time t+1. Denoting by Ht(i) the pre-t-history of individual i, which contains all observations made on i up to and including time t, including the past values of Xs(i) (s≤ t) and of possible covariates considered in the analysis, the first task is to specify the conditional probabilities Pr[Xt+1 (i) Ht(i)]. The following natural conditional independence assumptions will then often be valid:

(a) The variables Xt+1 (i), I є I, are conditionally mutually independent given the collective pre-t-past {Ht(i), i є I} in the data.

(b) Knowledge of the individual’s own pre-t-history Ht(i) is sufficient for predicting Xt+1(i) in the sense that

 Pr(Xt+1 (i) Ht(j), j є I ) Pr[Xt+1 (i) Ht(i)] (4)

Under conditions (a) and (b), the likelihood expression arising from discrete-time event history data becomes a product of terms of the form Pr[Xt+ (i) Ht(i)], where the product is taken over all individuals i and all times t. (A tacit assumption here has been that if the covariates registered in Ht(i) are themselves time dependent, they are either exogenous or their evolution is modeled in a way which does not involve the actual parameter of interest.) This makes the statistical inference of the model parameters in principle a straightforward problem, for example, by applying maximum likelihood. If, moreover, the condition

(c) Knowledge of the current state Xt(i) is sufficient in the sense that

 Pr[Xt+1 (i) Ht(i)] Pr[Xt+1 (i) Xt(i)] (5)

holds, the model can be formulated in terms of a discrete-time Marko chain (see, e.g., Cinlar 1975). Information contained in the observed past history Ht(i) can then be stored into a covariate vector of a fixed length, and the likelihood becomes a product of transition probabilities Pr[Xt+1 (i) Xt(i)], again taken over both i and t.

In spite of the seemingly general applicability of such a discrete-time approach to likelihood inference, the following two conventions, both usually involving an approximation, seem necessary in interpreting such models. First, censoring and possible other forms of controlling the number of individuals at risk are always thought to happen at times which are integer multiples of the chosen time unit. Second, if the model involves time-dependent covariates which are used as regressors of the observed status or response variable, their measured values need to be interpreted as ‘prevailing (constant) conditions for exposure’ during each considered time interval of unit length (see Sandefur and Tuma 1987). In particular, if the response variable can feed back and influence the values of the covariates, in order to save a causal interpretation of the regression model, the covariate values must be determined already at the beginning of each time interval, thus avoiding the potential problem of conditioning on variables which are intermediate between a contemplated cause and the effect.

4. Embedding into Continuous-time Models

Another, and perhaps a more serious, drawback from a modeling perspective concerns the interpretation of the discrete-time parameter. In almost any conceivable applied context, and corresponding the common human perception, time is viewed as being continuous. Based on this perception, and even in situations in which the data are in a discrete-time form, a more natural alternative for statistical modeling of event histories starts from the idea that there is a true underlying status process in continuous time, and that the recorded event-history (panel) data consist then of a series of measurements from this underlying process taken at a number of typically prescheduled time points. The term current status data is sometimes used to emphasize this aspect. The development of a person’s true employment status, marital status, support of political parties, or presence or absence of a chronic disease, as functions of time, would be natural examples. Indeed, such individual status histories would be the focus of a statistical analysis if they only could be observed, while discrete-time event history data consist merely of incomplete information, typically drawn from a sequence of cross-sections of such histories. As a consequence of such imprecision, individual event histories in continuous time are generally unidentifiable from discrete-time data. In particular, the exact times of occurrence are then not recorded, and if the status of a person has changed back and forth within a unit interval, this possibility may not even become explicit in the data (see, e.g., Stasny 1986). Nevertheless, it may be possible to embed the data successfully into a larger statistical model in continuous time. The fact that the individual underlying continuous-time developments cannot generally be determined uniquely from the data can often be compensated with some form of data augmentation. In other words, we can then consider an entire collection, or distribution, of sample paths which are consistent with the data. The practical solution may involve some form of simulation (multiple imputations), while a theoretical motivation to such techniques comes from Bayesian statistical inference. Exact numerical computations become quickly infeasible in practice, however. This leads to an application of sampling-based Marko chain Monte Carlo methods, in which the possible realizations of the individual processes are drawn by simulation. Such data augmentation methods have been applied to data describing recurrent bacterial infections by Auranen et al. (2000 ), but the same approach applies more generally to problems in which it is natural to relate discrete-event history data to underlying individual developments in continuous time.

5. Conclusion

Depending on the context, modeling of event-history data may emphasize more the e ent times at which some particular events of interest occurred, or the states in which some studied process happened to be at the times at which it was measured. To stress the former aspect, one often speaks of survival or duration data. The second aspect is similarly stressed by talking about repeated measurements or longitudinal data. When the time variable in the data and in the model is discrete, it may sometimes be difficult to make a clear distinction between these two basic forms. On the other hand, the likelihood expressions will be of the same canonical product form in both cases, so that likelihood inference will remain the same. Recommended reading on these and closely related issues include Allison (1982), Barber et al. (2000), Diggle et al. (1994), Kalbfleisch and Prentice (1980) and Sandefur and Tuma (1987).


  1. Allison P D 1982 Discrete-time methods for the analysis of event histories. In: Leinhardt S (ed.) Sociological Methodology. Jossey-Bass, San Francisco, CA, pp. 61–98
  2. Andersen P K, Borgan O, Gill R D, Keiding N 1993 Statistical Models Based on Counting Processes. Springer, New York
  3. Arjas E, Haara P 1987 A logistic regression model for hazard: Asymptotic results. Scandina ian Journal of Statistics 14: 1–18
  4. Auranen K, Arjas E, Leino T, Takala A K 2000 Transmission of pneumococcal carriage in families: A latent Markov process model for binary longitudinal data. Journal of the American Statistical Association 95: 1044–53
  5. Barber J S, Murphy S, Axinn W G, Maples J 2000 Discrete-time multilevel hazard analysis. Sociological Methodology 30: 201–35
  6. Cinlar E 1975 Introduction to Stochastic Processes. PrenticeHall, Englewood Cliffs, NJ
  7. Cox D R 1972 Regression models and life tables (with discussion). Journal of the Royal Statistical Society, Series B 34: 187–220
  8. Diggle P J, Liang K-L, Zeger S L 1994 Analysis of Longitudinal Data. Oxford University Press, Oxford, UK
  9. Fahrmeir L, Tutz G 1994 Multi ariate Statistical Modeling Based on Generalized Linear Models. Springer, New York
  10. Fahrmeir L, Wagenpfeil S 1996 Smoothing hazards and time-varying effects in discrete duration and competing risk models. Journal of the American Statistical Association 91: 1584–94
  11. Kalbfleisch J D, Prentice R 1980 The Statistical Analysis of Failure Time Data. Wiley, New York
  12. Robins J 1997 Causal inference from complex longitudinal data. In: M Berkane (ed.) Latent Variable. Modeling and Applications to Causality. Lecture Notes in Statistics 120: 69–117. Springer, New York
  13. Sandefur G D, Tuma N B 1987 How data type affects conclusions about individual mobility. Social Science Research 16: 301–28
  14. Stasny E A 1986 Estimating gross flows using panel data with nonresponse: An example from the Canadian labor force survey. Journal of the American Statistical Association 81: 42–7
  15. Thompson W A Jr 1977 Treatment of grouped observations in life studies. Biometrics 33: 463–70
Panel Retention Research Paper
Longitudinal Data Research Paper


Always on-time


100% Confidentiality
Special offer! Get 10% off with the 24START discount code!