Economic Microdatabases Research Paper

Academic Writing Service

Sample Economic Microdatabases Research Paper. Browse other  research paper examples and check the list of research paper topics for more inspiration. If you need a research paper written according to all the academic standards, you can always turn to our experienced writers for help. This is how your paper can get an A! Feel free to contact our research paper writing service for professional assistance. We offer high-quality assignments for reasonable rates.

Additions to scientific knowledge generally require interactions between theory and measurement. For example, the Hubbel Telescope provides astronomers with information about the universe, biologists have the Human Genome Project to provide them with a mapping of the genetic characteristics of the human species, and physicists have large scale particle accelerators to allow experiments designed to identify types of matter that could not have been observed in the past because the instrumentation was not powerful enough. It is a common observation among scientists that the rate of scientific progress is a function of the degree to which new measurement techniques enable us to observe events that could not have been observed in the past.

Academic Writing, Editing, Proofreading, And Problem Solving Services

Get 10% OFF with 24START discount code

What are the social science counterparts of the Hubbel Telescope, the Human Genome Project, and the particle accelerator? It turns out that, in the social sciences generally and in economics in particular, the evolution of the sample survey provides a close counterpart to the natural sciences’ powerful measurement devices.

In this research paper, the evolution of sample survey data in economics, often called microdata, is examined; databases that represent households as well as those that represent business establishments are discussed; the distinction between cross-sectional and longitudinal microdatabases are examined; and the principal features of a selection of the most important databases both in the USA and in other countries are summarized. The paper concludes with a brief assessment of future developments.

1. History of Microeconomic Databases

The earliest use of quantitative empirical information in the development of economics as a science was probably in the living standards surveys conducted in the UK during the nineteenth and early twentieth centuries. These early survey studies were descriptions of income, expenditures, and nutrition in cities or towns, and were designed to help understand poverty. Perhaps the best known studies are those by Charles Booth (1891–7), and by Seebohm Rountree (1902), which examined York. Earlier work, more in the mode of intensive case studies than surveys, was carried out by Frederic LePlay (1855). Another set of very early studies was the establishment of firm surveys focused on measuring production. Systematic work along those lines was done in the USA during the early part of the twentieth century by Simon Kuznets and his colleagues at the National Bureau of Economic Research (Kuznets 1941). The Kuznets studies were basically macro descriptions of change over time in the level of production in the US economy, along with a variety of industry sub-aggregates. The information in these early studies was typically collected from sample surveys (or censuses) of business establishments, conducted by the US Census Bureau.

A distinguishing feature of these early establishment measurements is that the basic microdata were not available to outside analysts. These studies were solely used to produce aggregate estimates for the economic system as a whole, and were not thought of as sources of information about micro-level economic behavior.

Over the twentieth century, there has been an explosion of economic measurements based on sample surveys. In the early part of the century the focus was mainly on business establishments and on the measurement of aggregate economic activity, as previously noted. Later in the twentieth century, starting with the Great Depression of the 1930s, microeconomic databases, particularly those relating to households, began to appear with increasing frequencies. In 1935 and 1936, for example, a massive study of household expenditures was undertaken by the US Bureau of Labor Statistics, in an attempt to get a better understanding of the impact of the depression on living standards. The Current Population Survey (CPS), which provides the basic measure of unemployment in the US, began in the early 1940s. The BLS conducted consumer expenditure surveys designed to produce weights for the Consumer Price Index as early as 1890, in 1935–36 as just noted, and in their current form in 1950, the early 1960s, and the early 1970s.

All of the sets of microdata just noted were designed and collected by governmental units. During the latter part of the twentieth century, these governmental efforts began to be supplemented with microdata collections funded by governments but designed and conducted in the private sector, typically involving academic survey research units, such as the National Opinion Research Center (NORC) at the University of Chicago and the Survey Research Center (SRC) at the University of Michigan. These private sector microdata sets were almost entirely household data, and included the series of studies of consumer net worth titled the Survey of Consumer Finances (begun in the late 1940s), the National Longitudinal Surveys of labor market activity, the Panel Study of Income Dynamics, and the Retirement History Survey (all begun in the late 1960s). During the 1970s, 1980s, and 1990s there was a dramatic increase in both public and private household microdata collections. Finally, the last decades of the twentieth century saw the beginnings of publicly available business establishment microdata in the USA.

2. Structure of Microeconomic Databases

Microeconomic databases belong to one of four possible categories, depending on whether they sample households or firms, and whether they are crosssectional or longitudinal. Whether surveys are crosssectional or longitudinal depends on whether or not the same household or establishment is followed through time, or whether a new sample is drawn for each study so that the same household or establishment is never in consecutive studies except by chance. Studies can also be focused on households or individuals, often called demographic surveys, or they can examine business firms or establishments, oftencalled economic surveys.

The great bulk of the microdatabases available to the scientific and policy communities for analysis consist of household or individual microdatabases, not establishment or firm databases. These household microdatabases tended to be cross-sectional earlier in the twentieth century, and increasingly longitudinal later in the century.

In part, this is due to the fact that the early databases were likely to be focused on a particular policy variable, where measurement of that variable was the principle focus of the study. Thus, for example, the Current Population Survey (CPS) has a very large monthly sample of cases because its principle objective is to estimate the unemployment rate with great precision, not only for the country as a whole, but also for states and other geographic areas. The CPS has limited information with which to explain employment rates, but the very large sample provides a rich description of the level, change, and regional variation in unemployment rates. In contrast, microdata like the National Longitudinal Surveys or the Panel Study of Income Dynamics were designed as longitudinal microdatabases able to track and eventually model the dynamics of change over time.

These tendencies (for household microdata to be used more than firm or establishment data and for analysis to be increasingly longitudinal) are clearly visible in the published literature (Manser 1998, Stafford 1986). Over the period 1984–1993, for example, there were over 550 articles in the major US economic journals dealing with household microdata, compared to a little over 100 using firm or establishment microdata. Moreover, the great bulk of the latter were not based on nationally representative establishment data collected by the Census or by the BLS, but were typically local studies with the data collected by the researcher.

The very limited use of establishment data in the published literature compared to household data is probably due to three factors. First, until very recently it was difficult, if not impossible, for researchers to access establishment microdata in the USA, due to the (accurate) perception that privacy confidentiality considerations were a much more serious problem for establishments than for households; after all, it is impossible to hide GM or AT&T in a microdata set. Second, the theory of household or individual behavior is much better developed than the theory of business behavior, probably due in part to the greater availability of rich household microdata. Third, microdata sets for establishments are likely to be produced in the public sector and to be designed to track change over time in policy relevant aggregate variables. Thus they will be relatively strong on wellmeasured policy variables, and relatively weak on a rich set of explanatory variables—a combination that makes analysis of the dataset less likely to show up in scientific journals.

The tendency for household studies to be increasingly longitudinal is probably due to two factors. First, it took the economics profession some time to discover the advantages of longitudinal over crosssectional analysis, and to develop the appropriate statistical tools. Second, the earlier sets of public microdata were strongest at providing careful measurements of the policy relevant variables that justified the survey, which requires only cross-sectional data; they were typically less strong on including the explanatory variables essential to successful longitudinal modeling.

The principal features of a number of microdatabases are summarized, including databases where the major focus is on measurement of a dependent variable (where the design is apt to be cross-sectional), and others where the focus is on providing a rich explanation of behavior (where the focus is likely to be on careful longitudinal measurement of a very large set of explanatory variables). The series that we cover in this discussion are mainly household data series available in the USA, but some information on microdatabases available from Western Europe, Asia, and the developing world is provided.

The series covered includes the Surveys of Consumer Finances (SCF); the Survey of Consumer Attitudes (SCA); the National Longitudinal Surveys (NLS); the Panel Study of Income Dynamics (PSID); the Survey of Income and Program Participation (SIPP); the Consumer Expenditure Survey (CES); the Time Use survey (TU); the Current Population Survey (CPS); the Health and Retirement Study (HRS); the Luxembourg Income Study (LIS); the Living Standards Measurement Surveys (LSMS); and the British, German and European versions of the Panel Study of Income Dynamics (The British Household Panel Survey, BHPS; the German Socio-Economic Panel, GSOEP; and the European Household Panel Survey, EHPS). A brief discussion of establishment databases is also included.

2.1 Survey of Consumer Finances (SCF)

The Survey of Consumer Finances, sponsored by the Federal Reserve Board, was initiated just after World War II to provide some insight into the likely behavior of consumers who had accumulated large amounts of assets during the war, when many types of consumer goods and services were unavailable and thus a large fraction of income was saved. The early surveys contain detailed descriptions of asset ownership and amounts of assets held across a rich spectrum of financial and tangible assets—financial assets like savings accounts, checking accounts, CDs, stocks and bonds, in later years money market accounts, IRAs and Keoghs, etc., along with tangible assets like investment real estate, businesses and farms, and vehicles.

Growing dissatisfaction with certain features of the SCF data, in particular with the fact that total financial asset holdings as measured by the survey were much lower than total financial asset holdings as measured in the Federal Reserve Board Flow of Funds accounts, substantially reduced Federal Reserve Board support after 1960. The series was continued with less financial detail and broader sponsorship during the 1960s, and basically came to an end in its original form in 1969. The series was revived in 1983 with a sampling feature aimed at remedying the substantial underestimate of asset holdings shown by all of the previous surveys. This design feature was the addition of a high income high wealth sample of households selected from statistical records derived from tax files. SCFs of this sort were conducted in 1983, 1989, 1992, 1995, and 1998 and are scheduled on an every-third-year basis. In addition to the detailed asset data, the SCF contains a comprehensive income section, has occasionally obtained substantial data on the pension holdings of respondents from the companies providing those pension plans, has a relatively standard set of household demographics and typically contains other variables of analytic interest—savings behavior, subjective health status, inheritances received, intended bequests, etc.

2.2 Survey of Consumer Attitudes (SCA)

Surveys of Consumer Attitudes began in the USA in the late 1940s, originally as a ‘soft’ introduction to the questions about detailed asset holdings and income sources obtained in the Surveys of Consumer Finances. These attitude surveys moved to an intermittent quarterly basis in the early 1950s, to a regular quarterly basis in the early 1960s, and to a regular monthly basis in 1978.

The basic content of these surveys includes a set of measures designed to reflect consumer assessments of their current financial situation compared to the past, their expectations about future financial conditions, their expectations about business conditions during the next year and the next five years, and their assessments of whether the present is a good or bad time to buy durable goods. Three of these core consumer attitude questions are part of the US statistical system of leading economic indicators. The SCA contains substantially more data than the indicator series, including information about expected price change, perceptions of the effectiveness of current economic policy, assessments of buying conditions for houses and cars, expectations about changes in unemployment rates, etc. Data on consumer attitudes based on US experience are now routinely collected in a number of other countries, including Austria, Australia, Belgium, Canada, China, Czech Republic, Denmark, Finland, France, Germany, Great Britain, Greece, Hungary, Ireland, Italy, Japan, Luxembourg, Norway, Poland, Russia, Spain, South Africa, Sweden, Switzerland, and Taiwan. (Information about the SCA can be obtained from their website: http: scripts contents.asp.)

3. National Longitudinal Surveys

The Bureau of Labor Statistics at the US Department of Labor sponsors the National Longitudinal Surveys (NLS). The surveys began in 1966 with studies of the age cohorts of men aged 45–59 and of women in the age cohorts of 30–44. Young men and women aged 14–24 were added in the late 1960s, and young men and women in the age cohorts of 14–22 (called NLSY) were added in 1979. The children of the 1979 cohort were added in 1986. Of the six cohorts, data collection on four are continuing, while data collection on the remaining two (the original older male cohort aged 45–59, and the cohort of young men aged 14–24 in 1966) have been terminated.

The NLS surveys combine respondent interviews with a series of separately fielded administrative data collections, especially for the NLSY and the children of the NLSY cohorts. The information collected directly from NLS sample members includes information about work, family, educational experience, home ownership status, pensions, delinquency, and basic demographic information. The administrative data collection includes school characteristics obtained for the young women and young men cohorts and for the NLSY group, as well as for the cohort of children of the NLSY. In addition, school transcripts including coursework and attendance records were collected for the NLSY respondents in the early 1980s, and similar information is being obtained for children of the NLSY. Finally, aptitude and achievement scores from standardized tests were transcribed from school records in the late 1960s for the young men and young women cohorts, in the early 1980s for NLSY respondents, and during 1995 for the children of the NLSY cohort. Summary pension plan descriptions are collected for the mature women respondents and or their spouses who report pension plan coverage, and death certificate information was collected in the early 1990s for most of the deceased members of the older men cohort. (More information about the NLS can be obtained from their website:

3.1 Panel Study of Income Dynamics (PSID)

The PSID, started in 1968, was based on combining a sample from the Survey of Economic Opportunity, originally conducted by the Department of Health and Human Services in 1966 and 1967, with a probability sample of individuals 18 years of age and older. The sample was heavily overweighed from the beginning with relatively poor households, since the SEO sample was designed to study poverty and had a disproportionate number of poor households.

The PSID has unique design features that have contributed to its status as being (probably) the most widely used microdataset in the world. A critical design feature is the way in which the PSID continues to be cross-sectional representative of the full US population while maintaining the longitudinal characteristic that enables analysts to trace the dynamic of change over time. Representativeness for the US population (except for new immigrants) is maintained by following PSID family members who leave their original sample households and begin their own households. This feature ensures a continued representation of newly formed households in the sample, accurately reflecting how the population changes with the birth of newly formed households. Thus PSID members are born, live their lives, and die just as individuals in the population do, and PSID traces the original sample individuals through their entire lifespan accompanied by whichever family they happen to be attached to—the original family in which they were a child, a new family when they formed their own household, etc.

The PSID content is directed mainly at labor force experience and economic status, with a standard set of demographics. The study is especially rich in terms of data on jobs, job transitions, and detailed sources of income from work.

In recent years, the PSID has added significant modules on new topics. Health status, health insurance coverage, and functional health began to be collected in the early 1990s; wealth measures were obtained at five-year intervals starting in 1984; and detailed pension and savings data were first collected in 1989. The wealth measures have been particularly successful. Finally, the PSID has had a major influence on microdata developments in other countries. Counterparts to the PSID can be found in the British Household Panel Survey (BHPS), the German SocioEconomic Panel (GSOEP), and the European Household Panel Survey (EHPS).

3.2 Survey of Income and Program Participation (SIPP)

The SIPP was begun in the early 1980s, with the basic objective of providing much richer income information for households at the lower end of the income distribution who might be eligible for various types of government public assistance programs. Thus the survey contains considerable detail on work experience, participation in a variety of public welfare programs, and on sources of income, along with family composition, a few health measures, and a standard set of demographics.

The SIPP basically has a cross-sectional design, although a number of observations are obtained for each respondent household in order to obtain very detailed information about income flows and work experience. Thus the SIPP is conducted three times a year, with respondents providing information about the most recent four-month period. Respondents stay in the sample for approximately six quarters, although in recent years the duration of stay has been extended so that a more genuinely longitudinal feature is available. In addition to income, work experience, program participation and some health items, the SIPP also has a wealth module that is administered several times to each participating household. Selected outgoing rotation groups from the SIPP are currently being followed longitudinally to observe the impact of the 1990s welfare reform on income, labor force participation, and poverty. (More information about the SIPP can be obtained from their website: http: sipp.)

3.3 Consumer Expenditure Survey (CES)

The Consumer Expenditure Survey, conducted by the Bureau of Labor Statistics, has the principal objective of providing weights for the calculation of the Consumer Price Index. These studies have a long history: the first survey was conducted in 1888–1891, the next in 1901 to get a better measure of price inflation in food, the third in 1917–1919 to provide weights for a cost-of-living index, the next two during the depression of the 1930s to look at poverty issues, and the sixth, seventh and eighth in 1950, the early 1960s, and early 1970s, respectively. The current version began in the early 1980s, when the survey was modified to be a continuous study with new samples appearing approximately every other year.

The survey goes into enormous detail on household expenditures across a long list of major product classifications (e.g., clothing, food, recreation, travel, insurance, cars, household durables, etc.) and provides estimates of consumer expenditures over past time periods for these categories. In addition, respondents are asked to keep a detailed diary, where they report each item purchased over a two-week period. The diary information and the expenditure data for the product classifications (e.g. clothing for the respondent, for the spouse, for any children, etc.) are then integrated to produce an estimate of total consumer spending. In addition to consumption data, the CES includes detailed information about income, and also has some asset data along with the standard demographics. (More information about the CES can be obtained from their website: http: csxhome.htm.)

3.4 The Time Use Survey (TU)

Time Use studies have a relatively lengthy history, and actually go back into the very early part of the twentieth century when studies of local communities in the US, and of cities in the then USSR, were conducted. As a major survey effort designed to provide national data on both market and nonmarket activities, the Time Use studies effectively started in 1965, when a cross-national study of urban time use was organized by a set of researchers under the direction of the Hungarian sociologist Alexander Szalai (1972). The countries included Belgium, Bulgaria, Czechoslovakia, France, Federal Republic of Germany, German Democratic Republic, Hungary, Peru, Poland, USA, USSR, and Yugoslavia. Subsequently, time use studies in the US with roughly comparable methodologies were conducted during 1975–1976, and during 1981–1982. Other time use studies in the USA were conducted in the late 1980s and early 1990s, although some design differences in these later studies make comparability with the earlier 1965, 1975–1976 and 1981–1982 studies difficult. Most European (and some Asian) countries conduct time use studies periodically, typically every 5 or 10 years, with a common survey design.

One feature of the time use studies that differentiates it from most other microdatabase efforts is the difference in design between the US and other countries. All studies that focus explicitly on time use collect data for a (usually retrospective) 24-hour period—a time diary, as contrasted to a series of questions about the frequency of a set of activities. For the most part, these studies have been used as parts of national accounting systems, in which the major focus is on attempting to estimate the volume of non-market work, travel, leisure activities, etc. as a supplement to the National Income and Product Accounts. Thus the major focus, particularly in Europe, is on the macroeconomic uses of time diary data.

In contrast, time diary data in the US have been designed for use in a microeconomic as well as a macroeconomic framework. For that purpose, data for a single day consist mainly of statistical noise, and multiple time diaries for different days of the week and seasons of the year need to be collected for each sample member. Thus the US design specified that data were to be collected for two weekdays, one Saturday and one Sunday, weighted to produce an estimate of time use during a typical week. In contrast, time diary studies in Europe typically (but not always) collect data for a single day for each respondent; in some recent studies multiple time diaries are collected for each respondent, often getting information for both a weekday and a weekend day in the same survey. Data archives on time use studies are maintained at Essex University in the UK.

4. Current Population Survey (CPS)

The Current Population Survey, probably the most widely reported survey in the US if not in the world, is basically designed to produce an estimate of the unemployment rate for the nation as a whole, as well as for geographic subdivisions like states or SMSAs. The CPS, started in the early 1940s, is designed to produce estimates of work status during the prior week, including time spent looking for jobs as well as time spent working, for a very large (about 50,000 cases) monthly sample. The survey is a very simple and very brief instrument that collects virtually nothing else besides the policy relevant variables— unemployment and work status for the adult population in the USA—from which the unemployment rate for the US as a whole, or for particular states or SMSAs, can be calculated. The CPS does have additional survey content, since there are a series of annual supplements that include such variables as a detailed assessment of family income, assessments of health status for family members, etc. But the core CPS is a very brief interview focused almost entirely on employment and job search experience.

4.1 Health and Retirement Study (HRS)

The Health and Retirement Study, sponsored by the National Institute on Aging, was begun in 1992 with the recognition that the US statistical system had virtually no current data focused on the work situation, health status, family responsibilities, and pension status of the very large cohort of Baby Boom individuals (those born between 1946 and 1964) who would be starting to retire in the early twenty-first century. The behavior of this cohort was thought to be critically important for assessing the societal stresses associated with the prospective dramatic change in the proportion of the working population to the older dependent population.

The study, which collects information every other year, was originally focused on the US population between the ages of 51 and 61 (those in the birth cohorts of 1931 to 1941). This design would provide information with which to model the labor force participation decisions of individuals who were (mainly) not yet retired but who would retire shortly. In subsequent years, the survey was modified from one designed to follow an original cohort through its retirement experience, to a continuing study of the US population of age 50 and older. It was designed to follow the retirement experience of successive cohorts of individuals starting with those born between 1931 and 1941 and continuing with those born up through the Baby Boom years. Thus the current study includes the original cohort of those born between 1931 and 1941 (and their spouses if married), and includes other cohorts added in subsequent years (those born before 1923, those born between 1924 and 1930, and most recently those born between 1942 and 1947).

The content areas of the HRS were subject to much more than the usual amount of developmental work. Survey content is decided by a set of researchers organized into working groups and headed by a Steering or Oversight Committee. The criteria that were uniformly adopted were that variables that played well-defined roles in modeling retirement decisions (or other critical decisions, such as saving rates) were eligible for inclusion, but other variables were not. Thus, the working groups pulled together sets of variables measuring job characteristics and work history; family structure and intrafamily transfers; health status and health insurance; economic status, including both income and wealth; several batteries of cognitive tests; a set of probability questions relating to continued work activity; future health status; longevity, bequests, etc.; and detailed information about pensions. The magnitude, duration, and cost of this planning effort is unique and gives every indication of having produced a very high quality dataset; it is certainly possible that the HRS planning model represents the future of microeconomic database design activities.

4.2 Luxembourg Income Study (LIS)

The Luxembourg Income Study represents one of the few attempts to collect comparable data on economic status across a large sample of countries. The basic idea is to collect national data on all income sources from as wide a range of countries as feasible, process the data so as to make it as comparable as possible across countries, and then make these data available to the research and policy communities for analysis of differences in economic status and income distribution. The core of the LIS activities is to process the data collected by national statistical agencies, to have a staff of trained analysts interact with their counterparts in these national agencies, and to develop measurements of income and income components that are as comparable as possible across countries.

The most important goal for LIS is harmonization—reshaping and reclassifying the components of income or definitions of household structure into comparable categories. Such harmonization allows the researcher to address important social issues without having to invest countless hours in getting every variable that will be analyzed into a comparable format.

Since its beginning, the LIS project has grown into a cooperative research project with a membership that includes countries in Europe, North America, the Far East and Australia. The LIS database now contains information for more than 25 countries for one or more years, covering over 90 datasets over the period

1968 to 1997. The countries currently in the LIS database include: Australia, Austria, Belgium, Canada, Czech Republic, Denmark, Finland, France, Germany, Hungary, Ireland, Israel, Italy, Luxembourg, The Netherlands, Norway, Poland, Portugal, ROC Taiwan, Russia, Slovak Republic, Spain, Sweden, Switzerland, UK, and the USA. Negotiations are currently under way to add data from Korea, Japan, New Zealand, and other countries.

4.3 Living Standards Measurement Study (LSMS)

The World Bank established the Living Standards Measurement Study in 1980 to explore ways of improving the type and quality of household data collected by government statistical offices in developing countries. The objectives of the LSMS were to develop new methods for monitoring progress inraising levels of living, to identify the consequences for households of current and proposed government policies, and to improve communications between survey statisticians, analysts, and policy makers. To accomplish these objectives, LSMS activities have encompassed a range of tasks concerned with the design, implementation and analysis of household surveys in developing countries. The main objective of LSMS surveys is to collect household data that can be used to assess household welfare, to understand household behavior, and to evaluate the effects of various policies on the living conditions of the population. Accordingly, LSMS surveys collect data on many dimensions of household well-being, including consumption, income, savings, employment, health, education, fertility, nutrition, housing, and migration. Three different kinds of questionnaires are normally used: the household questionnaire, which collects detailed information about household members; the community questionnaire, in which key community leaders and groups are asked about community infrastructure; and the price questionnaire, in which market vendors are asked about prices.

Because welfare is measured by consumption in most LSMS research on poverty, the measurement of consumption is strongly emphasized in the questionnaires. There are detailed questions on cash expenditures, on the value of food items grown at home or received as gifts, and on the ownership of housing and durable goods. A wide range of income information is also collected, including detailed questions about wages, bonuses, and various types of in-kind compensation.

LSMS-type surveys have been conducted in a long list of countries, including Peru, Ivory Coast, Ghana, Mauritania, Jamaica, Bolivia, Morocco, Pakistan, Venezuela, Russia, Guyana, Nicaragua, South Africa, Vietnam, Tanzania, Kyrgyzstan, Ecuador, Romania, and Bulgaria. For more information on the Living Standards Measurement Surveys, see Grosh and Glewwe 1995.

5. Establishment Microdata

Central Statistical Bureaus produce virtually all nationally representative establishment datasets. In the US, which has a unique decentralized statistical system, establishment databases are typically collected by the Census Bureau and designed (and commissioned) either by Census or by the Bureau of Labor Statistics. As noted in the introductory remarks for this essay, the major use of establishment databases tends to be macroeconomic rather than microeconomic, in that they are used to produce national estimates and industry distributions of production, wage costs, employment, fringe benefits, work hours, etc.

The history of establishment databases sponsored by the BLS goes back to the 1800s, starting with occupational pay surveys. The current employment statistics series (CES or 790) began at BLS in 1915, the unemployment insurance series (ES 202) was started in 1935, the Occupational Employment Statistics (OES) survey in 1971, the Employment Cost Index (ECI) series in 1976, the Employee Benefits Survey (EBS) in 1980, and the hours at work survey in 1981. All of these surveys are designed to produce aggregate statistics from which monthly, quarterly, or annual estimates of change can be calculated and used by policy makers and business forecasters. The microdata underlying these aggregates are typically unavailable to the research community, although there are occasional exceptions under tightly controlled circumstances.

Establishment databases that are the responsibility of the Census Bureau basically include a census of establishments in all industries conducted every five years (available starting in 1963), and an annual survey of manufacturing establishments (available starting in 1972). The census data for all industries are designed to provide, among other statistics, benchmark data for total production, price deflators, and product detail. The annual surveys of manufacturing provide data on shipments, wage costs, capital expenditures, materials consumption, and energy use.

These census data are, as with the BLS data described earlier, designed to produce aggregate industry statistics relevant to policy makers and forecasters. However, in recent years Census has tried to increase the use of the microdata in these surveys by creating data enclaves—Longitudinal Research Data Centers. These enclaves exist not only at the Census Bureau, but have also been installed at a number of other locations (Boston, Pittsburgh, Los Angeles, and Berkeley). The intent is to encourage researchers throughout the country to use the microdata, with the expectation that such research will not only turn up interesting scientific findings but will also suggest improvements in the basic data.

There is somewhat greater use of establishment microdata in Western European countries than in the USA, largely because there appears to be a bit less concern with privacy (confidentiality) considerations in Europe. In some countries, establishment microdata can be combined with individual microdata, permitting, for example, analyses of the way in which worker wage trajectories vary with industry.

6. The Future of Economic Microdata

Developments during the last decades of the twentieth century suggest that the twenty-first century is likely to see a continued expansion in the use of longitudinal household microdata sets, along with a vigorous growth in the use of establishment databases for micro-level analysis. These developments will be the result of three compelling forces: first, the increased ability of economists to understand behavior based on the combination of richer theoretical insights and the greater availability of relevant data; second, the increased involvement of academic economists in the design of microeconomic datasets; and third, the development of more effective strategies for safeguarding privacy and confidentiality while still permitting researchers access to microdata files.


  1. Booth C 1891–1897 Life and Labours of the People in London. Macmillan, London and New York
  2. Davis S J, Haltiwanger J, Schuh S 1996 Job Creation and Destruction. The MIT Press, Cambridge, MA
  3. Duncan G J, Hofferth S, Stafford F P (Forthcoming) Evolution and change in family income, wealth and health: The panel study of income dynamics, 1968–2000 and beyond. In: House J S, Kahn R L, Juster F T, Schuman H, Singer E (eds.) A Telescope on Society: Sur ey Research and Social Science in the 20th and 21st Centuries. University of Michigan Press, Ann Arbor, Chap. 6
  4. Grosh M E, Glewwe P 1995 A Guide to Li ing Standards Measurement Study Sur eys and Their Datasets. World Bank, Washington, DC
  5. Kuznets S S 1941 National Income and its Composition, 1919–1938. National Bureau of Economic Research, New York
  6. LePlay F 1855 Les Ou ners Europeans, 6 Vols
  7. Manser M E 1998 Existing labor market data: Current and potential research uses. In: Haltiwanger J, Manser M E, Topel R (eds.) Labor Statistics Measurement Issues. University of Chicago Press, Chicago
  8. Rountree B, Seebohm 1902 Po erty: A Study of Town Life, 2nd edn. Macmillan, London
  9. Smeeding T M 1999 Problems of international availability of microdata: The experience of the Luxembourg income study (LIS) infrastructure project. In: Chlumsky J B, SchimplNeimanns B, Wagner G G (eds.) Kooperation zwischen Wissenschaft und amtlicher Statistik—Praxis und Perspekti en. Metzler Poeschel Publishers, Wiesbaden, Germany, pp. 186–98
  10. Stafford F P 1986 Forestalling the demise of empirical economics: The role of microdata in labor Economics Research. In: Ashenfelter O C, Layard R (eds.) Handbook of Labor Economics. Elsevier, New York, Vol. 1, pp. 387–423
  11. Szalai A 1972 The Use of Time. Morton, The Hague
  12. Woodbury R (ed.). 2000 The health and retirement study, part I—history and overview. Research Highlights, 7 (May 2000).
Economics of Migration Research Paper
Marxian Economic Thought Research Paper


Always on-time


100% Confidentiality
Special offer! Get 10% off with the 24START discount code!