The availability and diversity of databases for international research in the social sciences increased significantly in the latter half of the twentieth century. Social scientists use international databases constructed from data obtained in many ways, ranging from multinational sample surveys to Earth observation images. They are used to compare the behavior of both individuals and institutions by examining economic, social, spatial, environmental, or political change across many countries. International databases can consist of quantitative or digital observations describing a group of nations, a multinational geographic region, or even the Earth as a whole. Comparability of the data is generally achieved through the repeated use of the same data collection instrument, such as a survey questionnaire, in a number of countries, or by combining comparable national statistics or other records into a single data set.

The creation of global or multinational databases and their use in research are related to developments both within and outside the social sciences over the past 50 years. Among these are changes in the theory and focus of social science research, the emergence of international institutions and research programs capable of assuming responsibility for obtaining and maintaining large data collections, and advances in computers and the technology of data collection, analysis, management and distribution. Today’s diversity in and widespread availability of international research data are a relatively new phenomenon. In the early twentieth century, quantitative social science analysis was frequently focused on the urban neighborhood. Often linking their research to social reform activities, social scientists obtained data by conducting intensive surveys of the housing, working, and family conditions of the poor or working-class residents of specific neighborhoods. Because the physical area of these urban neighborhoods was small, social scientists could map the data manually, thus obtaining a spatial component for their analysis (Kellogg 1914).

By mid-century, quantitative social science had broadened both its theoretical focus and its data collection methods, in part because of improvements in the techniques and the instruments used for data collection. The use of sampling, developed in the 1930s and 1940s, permitted social scientists to construct databases on national populations through the use of questionnaires distributed to a probability sample of the population. Information obtained from the members of the sample was then used to construct inferences about the population from which the sample was taken. Data obtained from probability samples of a population were not geo-referenced, although the sampling framework was often based on geographic criteria. The use of data obtained from samples of a larger population freed social scientists to examine members of a variety of social groups, regardless of their geographic location. The use of the questionnaire as a data collection instrument led to an analytic emphasis on subjective data, such as individual opinions, perceptions, and self-reported behaviors. It also reinforced the growing interest in the analysis of national or macro-social phenomena.

In the immediate postwar period, there was an increased policy and research emphasis on comparing economic development among countries. Prewar research on national income accounting by Simon Kuznets became the foundation upon which the United Nations created a framework for comparable national income accounting in 1952. These accounts, once standardized and available for many countries, constituted a multinational database for economic research as well as for policy. Shortly after this, Deutsch (1960) attempted to identify comparable noneconomic as well as economic data for international research into what he called the capability and stability of nations. Significantly, the basic unit of analysis for this work, as for the national income accounts, was the nation state, and the multinational data required for analysis contained one observation per variable per state.

The new international institutions that were formed in the wake of World War II assumed responsibility for maintaining large databases comprising data on each member nation. Although the League of Nations and the International Labor Organization had obtained national data to create international databases beginning in the 1920s, it was the creation of the United Nations and its associated institutions that provided the greatest stimulus to the development of a wide range of multinational databases. The United Nations itself stimulated member countries to conduct decennial censuses and then maintained international databases on a number of socioeconomic conditions. Specialized United Nations agencies collected data on more specific topics (e.g., the Food and Agriculture Organization, the World Health Organization, UNESCO, and others). In addition, regional economic organizations such as the Organization for Economic Co-operation and Development (OECD) published economic and social data on member states and sponsored international meetings to define data elements that were comparable from country to country. Similarly, the multilateral financial institutions, such as the World Bank and the International Monetary Fund, created and disseminated multinational economic and social databases. Nongovernmental organizations also began to produce international databases, such as the annual list of military equipment produced by the International Institute for Strategic Studies in the United Kingdom. Since that time, these international organizations have continued to update and maintain a wide range of databases for international research and analysis.

A somewhat different approach to the creation of international databases was taken by Almond and Verba (1963) in their study of political culture. Instead of using countries as the basic unit of analysis, they were interested in comparing what was taking place within countries. They conducted sample surveys of the population in five nations (Germany, Italy, Mexico, the United Kingdom, and the United States) to examine political attitudes and democracy. The result was an international database of five noncontiguous countries that provided social scientists with far greater breadth and depth of internal or national data than could be obtained using the types of databases maintained by multilateral organizations with a single datum per variable per nation. Other international databases of this type include the Level of Living or social indicator surveys coordinated by the OECD in the 1960s and 1970s; the World Fertility Survey and the International Social Survey Project (ISSP), which used the same questionnaires in a number of countries to obtain comparable international data; and the Luxembourg Income Study (LIS), which is an international data collection comprising dissimilar national surveys that measure a single phenomenon (income distribution) over time. Like the economic databases which resulted from basic research on national income accounting in the 1930s, these international databases were extensions of data collections that were defined and initially obtained for theoretical or policy purposes. In most cases, the data originated as national databases and were subsequently combined with other data sets to create an international database. Social scientists increasingly turned to comparative or international research—and the use of international databases—to illuminate and extend their understanding of complex processes and socioeconomic relationships within nations.

In the late 1980s, a new type of research—and new types of international databases—began to emerge in some areas of the social sciences. For example, in the physical and ecological sciences, international research programs on global environmental change resulted in a growing appreciation of the environmental significance of socioeconomic and behavioral phenomena. Increasingly, scientists recognized that anthropogenic forces were responsible for a series of interrelated changes in the Earth’s environment and, in turn, that global environmental change could have a wide range of impacts on human populations. The concept of global-scale research issues, together with the international salience of environmental problems and the broad research agenda developed in international scientific programs, led social scientists to consider developing global-scale rather than multinational databases.

At the same time, new technological developments made it easier to create and use global databases. Advances in computers and software made it possible for individual social scientists to obtain and analyze very large databases. The advent of geographic information system (GIS) software provided a means of spatially integrating diverse types of data; the availability of satellite remote sensing on a global scale has provided Earth observation images that are used to create global-scale databases; and advanced scientific computing has provided a means of managing and analyzing the data. Among the earliest global databases were those developed with remote sensing data, such as the so-called ‘City Lights at Night’ data, which are based on images obtained through the Operational Linescan System of the Defense Meteorological Satellite Program (DMSP). Satellite instrumentation was able to observe faint sources of visible near-infrared emissions on the surface of the Earth, specifically, the light from human settlements, fires, and gas flares. The global DMSP image of night-time lights provides a dramatic image of the density and spatial distribution of human settlements on Earth. However, satellite imagery of night-time lights provides only a rough guide to the distribution of human populations and tells little about what takes place within settlements. In 1992, a report to the International Social Science Council’s Standing Committee on the Human Dimensions of Global Environmental Change recommended the creation of a gridded population database for use with remote sensing data (Clarke and Rhind 1992). The first Gridded Population of the World (GPW) database, prepared in response to that report, was released in 1995. Although the data were originally obtained from national censuses, the GPW database displays population as a function of space rather than political units. It can provide either the number of people in 2.5-minute by 2.5-minute latitude longitude cells across the surface of the Earth or the density of settlement (persons per square kilometer), and serves as a prototype for other types of gridded or vectorized socioeconomic databases.

Not only do the new computer capabilities make it possible to obtain the data required by broadening theoretical perspectives in the social sciences, but, at times, the databases themselves stimulate changes in these perspectives. Social scientists now have at their disposal a broader range of comparable international data than in the past and far greater flexibility in analysis, visualization, and manipulation of these data. New developments in data management and analysis, such as data mining, combined with this rich diversity in international databases, constitute a valuable and rapidly expanding component in the research infrastructure of the social sciences.


