International Data Archives Research Paper

Academic Writing Service

Sample International Data Archives Research Paper. Browse other research paper examples and check the list of research paper topics for more inspiration. iResearchNet offers academic assignment help for students all over the world: writing from scratch, editing, proofreading, problem solving, from essays to dissertations, from humanities to STEM. We offer full confidentiality, safe payment, originality, and money-back guarantee. Secure your academic success with our risk-free services.

Data archives are professional institutions for the acquisition, preparation, preservation, and dissemination of social and behavioral data. These data commonly are stored as electronic records. Evolving in the 1950s with the growth of an ethic of data sharing among social scientists, they have enabled thousands of researchers to undertake a wide variety of analyses of data for which they were not the primary data collectors. This leverages by many times, the original investments in data collection. Many archives also provide training and data services, including statistical analysis and consultation.

Academic Writing, Editing, Proofreading, And Problem Solving Services

Get 10% OFF with 24START discount code

1. The Contents Of Data Archives

Most data archives offer numerically coded data, but those data are not necessarily quantitative. Numbers may represent either quantitative measurements, such as dollars or years of school completed, or merely serve to label categories, such as sex and occupation. Textual, sound, visual, geographic information systems (GIS) data, and physiological data are also now found in some archives. Two main kinds of data are offered.

Microdata are measurements taken on many distinct units of observation, such as persons, families, companies, governmental units, transactions, events, conflicts, and dyadic and group relationships. Each unit of observation, for example, a particular person, generates one or more ‘records,’ each of which contains measurements on a body of variables pertaining to that person, such as occupation or income. It is common for data records to contain hundreds of variables, some measured directly by observation and some derived from other measurements (such as the prestige score of the respondent’s occupation). Micro-data are based typically upon censuses and sample surveys, although direct observation, administrative records, news articles, regulatory instruments, and experiments also are used and occasionally are combined with survey data. Microdata generally are rendered anonymous by removing both direct identifiers (names and addresses) and indirect identifiers that might permit sound guesses as to the identity of a respondent, whether a person or an organization.

Macrodata are aggregated microdata measurements in the form of matrices, tables, and charts. They are the result of statistical operations, such as a count of the persons of a certain ethnic heritage in a geographical area or the sum of an entire industry’s revenue. This form of data presentation is used widely for the reporting of official data at fine levels of geographic identification, as it permits concealing the identities of individual respondents. However, because it is not possible to disaggregate macrodata and perform a different analysis on the underlying micro-data, macrodata provides considerably less analytical power to the researcher. Macrodata may be merged with microdata to facilitate contextual analyses.

The archives also contain the documentation that is required to use these data accurately. Most important are ‘codebooks,’ documents that give conceptual meaning to numerical codes and describe the process of measurement in detail. Increasingly, electronic codebooks are linkable by software to statistical analysis programs such as SPSS, SAS, and STATA, relieving the user of programming tasks that are intellectually arid but fraught with potential error.

Archives make their greatest investments in the preparation of data and its documentation in order that analyses can be undertaken later, even much later, without personal consultations with the original data collector. This preparation involves identifying and resolving undocumented codes, correcting technical errors in data entry, adding substantive information about measurement processes and response categories, and preparing full sampling descriptions, all in consultation with the original data collector. Major changes in format may be made for the convenience of data users.

As the size of archival collections grew, it became essential to create finding aids of various kinds (e.g., catalogs and indices), now on the World Wide Web for interactive access. In recent years, the catalogs of some of the data archives worldwide have been made interoperable, enabling an interactive search of the holdings of several data archives in one operation. Some archives’ collections also include ancillary materials, such as detailed records of fieldwork, basic frequency distributions, statistical analyses of the data, methodological notes, and publications based on the data.

As data archives developed, a specialized profession of data librarian archivist emerged, blending knowledge and skills from information technology, librarianship, statistics, and the social sciences. These professionals staff most national and many campus data archives. The International Association for Social Science Information Service and Technology (IASSIST) is their central professional organization. Among the organization’s goals are the promotion of archiving and the development of standards for the preservation of data.

2. The Establishment Of Data Archives

The intellectual leaders of what came to be known as the ‘data archives movement’ came from both the academy and the private sector. From the academy came the founder of modern comparative politics, Stein Rokkan (Norway); two of the pioneers of the empirical study of political behavior, Philip E. Converse and Warren E. Miller (United States); and a leader of comparative research on social structure, Erwin K. Scheuch (West Germany). Converse and Miller sought to share with their colleagues the election data they were collecting and also to use the new model of the National Election Studies to foster the collection of data on the re-emerging European democracies. Rokkan and Scheuch sought to describe and understand the movement toward a united Europe. All recognized the great potential of data archives to facilitate comparative and time-series research as well as to aid the social and behavioral sciences to become more transparent and self-correcting.

The private sector saw its polling data as integral to the functioning of modern democracies and held that understanding public opinion was crucial for a free society. One of the founders of modern polling, Elmo B. Roper, organized the first of the social science data archives before the 1950s when he decided to share with social scientists and others the data from his polls. These collections of survey data were under development by 1947 at Williams College (Massachusetts), based upon holdings that Roper had been accumulating for 10 years. Williams College formally established the Roper Center for Public Opinion Research in 1957. Roper was joined in contributions by two other leaders of public opinion polling in the US, George H. Gallup and Archibald Crossley, and by many of the other leading survey firms.

A succession of archives followed. The Central Archive for Empirical Social Research (ZA), based at the University of Cologne in the former West Germany, was next in 1960, followed in 1962 by the Steinmetz Archive in Amsterdam, the Netherlands, and the Inter-university Consortium for Political Research (ICPR), at the University of Michigan, Ann Arbor. By the beginning of the 1970s several more European countries had data archives. By 1964 the International Social Science Council (ISSC) had sponsored a second conference on Social Science Data Archives and had a standing Committee on Social Science Data, both of which stimulated the data archives movement. By the beginning of the twenty-first century, most developed countries and some developing countries had organized formal and well-functioning national data archives. In addition, college and university campuses often have ‘data libraries’ that make data available to their faculty, staff, and students; most of these bear minimal archival responsibility, relying for that function on a national institution.

Data archives usually are administratively housed in research universities or in government science or information agencies. Funding for these operations varies widely in scale and stability. The financially largest of the data archives, the Inter-university Consortium for Political and Social Research (formerly ICPR), had in 2000 an annual budget of almost $7 million that was about equally divided between (a) income from grants and contracts and (b) income from tuition and annual dues paid by more than 350 institutions. Many archives have both long-term governmental support and income from user fees; others subsist on small annual grants or support from individual universities. Since 1977 the archives have been organized loosely under the International Federation of Data Organizations (IFDO), which is an associate member of the ISSC. IFDO provides for cooperation among the archives and has been of material assistance in assisting nascent archives, particularly by providing technical and organizational assistance to new archivists in developing countries. European archives are organized more closely under the Council of European Social Science Data Archives (CESSDA), which has provided a basis for cooperative planning and execution of projects, technical assistance, and training. There is no comparable North American organization, although in the 1960s there was the Council of Social Science Data Archives. The archives are largely national in funding, organization, and substantive scope; there is no single international archive of social science data to which all social and behavioral scientists can turn, nor is there as yet a complete ‘union catalog.’ There was discussion in the late 1950s of organizing central data archives for all of Europe, but the present nationally based system emerged in its stead.

Because archives’ holdings have now accumulated to encompass decades of observation in some societies, it may be anticipated that future historians will be among the heaviest users of data archives. So also will sociologists and political scientists involved in comparative and time-series research. Urban anthropologists and humanists are finding in data archives information about values, knowledge, and mass culture that was previously unavailable to them. There is also a substantial level of usage by journalists, policy analysts, and citizens. Usage can be expected to increase substantially as broader categories of re-searchers realize the value of archived resources and as technological progress eases access to data.

3. The Ethic Of Data Sharing And Its Benefits To Science

Both natural and social scientists have long recognized the value of sharing data with their contemporaries and of preserving data for their successors. Records of astronomical observations have been kept for several thousand years. During the 1957 International Geo-physical Year, the natural sciences established a system of World Data Centers that still exists today. Historical parish, census, and tax records, although not originally designed as scientific data collections, were nevertheless a foundation for the development of actuarial science and demography in the eighteenth century. From the nineteenth century onwards, cultural anthropologists carefully documented and pre-served their observations, some of which were incorporated into the Human Relations Area Files.

Data archives have made much possible: using their resources social scientists ask diverse research questions that go well beyond those posed by the original data collectors, address new questions to old data, apply newly developed statistical techniques to data gathered long before the methods were developed, construct time series and repeated cross-sections, and perform meta-analyses. Investigators have used data archives in replication research to correct errors and omissions in the published literature and even to ferret out fraud. Archived data have been the foundation for thousands of monographs, research articles, theses, dissertations, and reports. The agencies that support research have benefited because the archives have enabled researchers to avoid needless duplication of data collection. As social surveys are relatively ex-pensive by the modest standards of the social sciences, many social scientists have been enabled to do research that would otherwise have been impossible. Without the archives, education at the graduate level would be seriously deficient, for it is rare that a graduate student can afford to undertake primary data collection on a large scale; for at least 40 years, analytical theses and dissertations have been based upon archived data. Undergraduate training in the social sciences has increasingly come to resemble the laboratory-oriented training of the natural sciences, with archived data serving in place of biological specimens and chemicals.

Underlying this system of data archives is the ethic of data sharing: scholars should share with other scholars a crucial part of their intellectual property, the data that they have collected. Increasingly, this ethic is codified in formal rules, such as the requirement at the US National Science Foundation that publicly funded data be made available in a public archive in due time after the end of data collection. Journals in economics, political science, and sociology often require that the data supporting an article be made publicly available. The ISSC has stated ‘Full and open sharing of the full suite of datasets for all social scientists is a fundamental goal.’ The International Council for Science (ICSU) recommends ‘… as a general policy the fundamental principle of full and open exchange of data and information for scientific and educational purposes.’

One of the several reasons for such norms is the high value that science puts upon the ability to perform independent replications of published analyses. In 1901 Sir Francis Galton (quoted in Stigler 2000, p. 657) observed:

I have begun to think that no one ought to publish biometric results, without lodging a well arranged and well bound manuscript copy of all his data, in some place where it should be accessible, under reasonable restrictions, to those who desire to verify his work

Today much of the social and behavioral science community shares Galton’s view. Indeed, when the raw data are not available, there can arise a cloud of suspicion (or worse) about a researcher’s published work. Had Cyril Burt’s raw data from his study of the inheritance of intelligence among separated twins not been burned by his secretary upon his death, Burt’s reputation might have survived, or at least he might have escaped the most serious charge, that he knowingly fabricated his data.

4. Concerns For Privacy And Confidentiality

Data archives have been governed historically by a high professional standard for the protection of the privacy and confidentiality of the individual subjects of research. Today, they are also governed by both laws and restrictions imposed by data collectors. The laws arise from societal concern, over protecting the privacy and confidentiality of individuals and organizations. The restrictions arise from researchers’ legitimate concerns with protecting their intellectual property.

Most data archives require a prior commitment by the user of the data to use the data solely for statistical purposes and never to seek to use the data for purposes of identifying a person or an organization. Such pledges could be electronically signed on the Internet, permitting instant access to the data. However, this easy procedure is not the practice worldwide and is unlikely to be the practice in the near future. There are indications that access to data may become more, not less, difficult, primarily because of privacy concerns that arise with particular force now that technology permits extremely rapid and broad dissemination of data. In addition, there is concern about the growing ability to combine information about individuals from several sources of information. Governments differ greatly in data protection practices and laws. The US system provides open access to microdata that has been properly made anonymous, but this is the exception rather than the rule. Some governments consider demographic, microeconomic, and macro-economic data to be matters of national security or heritage. Public opinion polls and data collected under academic auspices are more often made available for data sharing than are official microdata. Stringent data protections, sometimes involving criminal penalties, exist in many countries. In 2000 there was an ongoing dispute between the European Union and the United States over the protection of personal data collected from European citizens over the Internet.

The most common restriction arising from intellectual property concerns is the provision of a reason-able length of time for the original data collector to have exclusive access for the data, often one year. Other restrictions can effectively block access. Some archives will not accept data deposited with onerous restrictions, while others are forced by law or custom to do so. Several kinds of restrictions may be imposed: (a) the original investigator will permit the data to be used for scientific research and teaching, but not for publication until a ‘special release’ is personally obtained from the original investigator; (b) the depositor will decide whether to grant permission for specific scientific research and teaching uses of the data on the basis of requests; and (c) the depositor may require prior review (and perhaps censorship) of any publications. Such conditions run counter to the underlying ethic favoring the sharing of data and work against the ready conduct of research. Further inter-archival cooperation in addressing such issues is warranted. Moreover, the archives must strive to ensure that privacy concerns are not inappropriately permitted to limit the kinds of questions that re-searchers ask.

5. New Ways Of Protecting Privacy And Confidentiality

Concern over the protection of individual privacy and confidentiality, particularly of official data, has led to the emergence of secure data enclaves. In such enclaves the researcher does not deal directly with the data but instead has highly controlled access to the results of analyses. The raw data never leaves the enclave or passes through the researcher’s hands. One of the oldest of these arrangements is that of the Luxembourg Income Study (LIS), which supports comparative research on the economic status of the populations of more than twenty countries. The stringent data protection laws of the Grand Duchy of Luxembourg presume that all personal data, privately or publicly collected, are sensitive. The law further requires the licensing of computer systems that will be used to process personal data. Under these circumstances many governments have agreed to provide access through LIS to data that they would never archive for public dissemination.

The US National Institute on Aging and National Science Foundation are now providing support to data enclaves in the United States, permitting a level of access to detailed Census and other data that was previously unknown. Controls are physical as well as technical and involve searching the contents of materials entering and leaving highly secure data facilities, screening of computer analyses for intentional or inadvertent disclosure of individual information, and limits on the detail of information that may be extracted from the data files. It is unclear what role data archives will play in constructing and operating secure data enclaves, as their aim has always been to allow researchers considerable flexibility in their data analyses. Nevertheless, there may be economic efficiencies to be obtained from embedding secure data enclaves in data archives.

6. Technological Changes In Data Archives

The information technology infrastructure of data archives has undergone the same remarkable transformation as was experienced recently in the worlds of business and government. When data archives began, the principal data storage medium was the punched card, also known as the Hollerith card from its development by Herman Hollerith for use in processing the 1890 US Census. This technology persisted basically unchanged for 60 years. Magnetic media (such as low-density tapes) were introduced into data processing in 1951, when the US Census Bureau installed ‘the world’s first electronic general purpose data processing computer,’ the UNIVAC I. Tapes were in extensive use by data archives by 1965, and their ever-increasing capacity (due to higher recording density) made them practical for storing and copying large-scale data collections that punched cards could never have accommodated. By the mid-1980s the desktop computer was beginning to have the capability of running complex statistical software, with high-speed central processing units, large random access memory, and hard disks capable of handling such large files as those from the Inter-national Social Survey Program and the General Social Survey. Risking obsolescence, most archives took many years to adjust fully to the new computing environment. However, major archives now provide services directly to desktop computers through client-server computing environments. In doing so, they are often creating service relationships with individual data users that previously existed only with campus-wide data libraries. The growth of cheap computing has encouraged some data archives to provide new exploratory analytical services from their own servers, as well as the ability to extract variables or units of observation. One of the most advanced of the new efforts is the Networked Social Science Tools and Resources (NESSTAR) project, a collaboration of the United Kingdom Data Archive, the Norwegian Social Science Data Service, and the Danish Data Archive. It enables users to tabulate and visualize the data on-line and to obtain the data over the Internet, as well as to access detailed information. Perhaps equally importantly, NESSTAR can locate multiple data sources across archives and national boundaries, creating what could become a virtual global data archive. Related software developments are underway at the Center for Survey Methods, University of California, Berkeley; the Integrated Library and Survey-data Extraction Service (a project of several Dutch, German, French, and Irish institutes and the ZA); and the Virtual Data Center of Harvard University and the Massachusetts Institute of Technology. Archives’ clientele has shifted from being almost exclusively experts in data analysis to including less trained users, and this kind of service is particularly beneficial to these new users.

Archives also had to respond to a second great technological revolution: the telecommunications (see cross-reference) revolution of 1990 onwards. In the era of punched cards, transmittal of data sets to users was exceedingly cumbersome and slow. Magnetic tapes relaxed those problems to a degree, but even as late as 1995, the delivery of one copy of the Public Use Microdata Samples from the US Census involved dozens of high-density tapes. The telecommunications revolution saw the virtually complete elimination of tapes as dissemination media, although they remain valuable as archival media. Internet access, coupled with CD-ROM physical media, took their place. It will soon be possible to assume that high-speed telecommunications are available in much of the world. Fiber optic and wireless technologies have ample capacity to transmit the entire contents of the Library of Congress in only a few seconds; social science data will be trivial passengers. Thus, Converse’s vision of 1964 can be realized:

… the day when an investigator at one of the far-flung California campuses can, in an afternoon’s work at a satellite of his local computer facility, learn just what data exist in the total network bearing on the hypothesis he wishes to check out, order by telecommunication either a statistical analysis of those data from an East Coast repository (or the raw data themselves if he has more extended use in mind), and have his output in time for dinner, all at minimal immediate cost (Converse 1964, p. 280).

This was the idea of the virtual archive: a number of data archives that are ‘coordinated’ so as to offer an investigator the ability ‘to know both that they (data) exist and where they exist.’ Archives still require coordination of their finding aids, which in Helen Crossley’s vision of 1958 (Riley 1958) would have involved a ‘common system of indexing’ so that users could find resources across archives. The archives may be nearing that goal with an international project known as the Data Documentation Initiative, an effort to formulate an interarchival standard for describing data.

7. Conclusion

In 1944, President Franklin D. Roosevelt asked Vannevar Bush a set of questions about science after the conclusion of the war. Among those was ‘With particular reference to the war of science against disease, what can be done now to organize a program for continuing, in the future, the work which has been done in medicine and related sciences?’ In the response, Science—The Endless Frontier, Bush observed ‘The striking advances in medicine during the war have been possible only because we had a large backlog of scientific data accumulated through basic research in many scientific fields in the years before the war.’ Perhaps some future scientist will observe that the striking advances in the social and behavioral sciences of the first decades of the twenty-first century were similarly due to the ‘large backlog of scientific data’ preserved by the data archives. These relatively young sciences have yet to accumulate the centuries of observations of most natural sciences, but the archives are prepared to facilitate that accumulation. Progressively sounder sciences may be built upon this accumulation of observations.


  1. Clubb J M, Austin E W, Geda C L, Traugott M W 1985 Sharing research data in the social sciences. In: Fienberg S E, Martin M E, Straf M L (eds.) Sharing Research Data. National Academy Press, Washington, DC
  2. Converse P E 1964 A network of data archives for the behavioral sciences. Public Opinion Quarterly 28(2): 273–86
  3. Hyman H H 1972 Secondary Analysis of Sample Surveys: Principles, Procedures, and Potentialities. Wiley, New York
  4. International Council for Science 1996 ICSU CODATA Group on Data and Information. General Assembly Resolution
  5. International Social Science Council 1994 General Assembly Social Science Data Management Policy
  6. Riley J W Jr 1958 Proceedings of the thirteenth conference on public opinion research. Public Opinion Quarterly 22(2): 169–216
  7. Rockwell R C 1997 Using electronic social science data in the age of the Internet. In: Dowler L (ed.) Gateways to Knowledge. MIT, Cambridge, MA
  8. Rokkan S 1966 Data Archives for the Social Sciences. Mouton, Paris
  9. Stigler S M 2000 The problematic unity of biometrics. Biometrics 56(3): 653–8
Interviewing in Social Sciences Research Paper
Social Darwinism Research Paper


Always on-time


100% Confidentiality
Special offer! Get 10% off with the 24START discount code!