The Fractionalization dataset was compiled by Alberto Alesina and associates, and measures the degree of ethnic, linguistic and religious heterogeneity in various countries. The dataset was used in Alesina et al. (2003) to test the effects of fractionalisation on the quality of institutions and economic growth.


Indices based on population data collected from Encyclopaedia Britannica (2001), CIA’s World Factbook (2000), Levinson’s Ethnic Groups Worldwide (1998), and Minority Rights Group International’s World Directory of Minorities (1997); in addition to Mozaffar & Scarrit (1999) for selected African cuntries. In most cases the primary source is national censuses.

The project provides a measurement of ethnic, linguistic, and religious fractionalisation which intends to be more comprehensive than those fractionalization measurements previously used in economics literature, and the new variable-constructs are compared with those previously used. The goal of this new measure of ethnic fragmentation, is a broader classification of groups, taking into account not only language but also racial characteristics (ethnicity) and religion. Based on this they examine the effects of ethnic fragmentation on two general areas: economic growth and the quality of institutions and policy. The indices are computed as one minus the Herfindahl index of group shares. The dataset also contains the underlying data used to construct the indices.

The dataset covers 215 countries and territories.

The dataset contains data for only one year for each country. The language and religion indices are based on data from 2001. Most of the data used to compute the ethnic fractionalisation index are from the 1990s, but for some countries older data are used (as far back as 1979).Another freely available dataset containing data on ethnic, religious and linguistic groups is the Ethnic Composition Data, compiled by Tanja Ellingsen. The dataset, used in Ellingsen (2000), relies on similar sources but covers a longer time period. See sources section for link to website.


Available free of charge.

Defining ethnic, linguistic and religious groups is difficult and is often based on subjective judgement. In many cases it may also be difficult to find reliable data on how many people who belong to the various cultural groups. The underlying data used to construct the fractionalisation indices are therefore likely to be subject to problems of comparability and measurement error. See Alesina et al. (2003), Fearon (2003) and Posner (2004) for discussions of problems associated with various measures of cultural heterogeneity.

