Stanford HGDP SNP Genotyping Data

Stanford Human Genome Center

Department of Genetics

Department of Biological Sciences

Morrison Institute for Population and Resource Studies

Stanford University

Introduction

A group of scientists at Stanford University have collaborated on a large study to understand genetic diversity in human populations. We analyzed genomic DNA from 1,043 individuals from around the world, determining their genotypes at more than 650,000 SNP loci, with the Illumina BeadStation technology. Genomic DNA samples from these fully-consenting individuals were collected by the Human Genome Diversity Project (HGDP), in a collaboration with the Centre Etude Polymorphism Humain (CEPH) in Paris. The collection we tested is referred to as the "HGDP-CEPH Human Genome Diversity Cell Line Panel". They represent 51 different populations from Africa, Europe, the Middle East, South and Central Asia, East Asia, Oceania and the Americas. For details on the individuals in this collection, see H. Cann et al. Science 296: 261-262 (2002) and its Supplemental Data; Rosenberg et al. Science 298: 2381-2385 (2002); and Rosenberg et al. PLoS Genetics 1: 660-671 (2005).

Pre-publication availability of the data

We have submitted this work for publication, and do not know yet when and where our study will appear in press. However, in the spirit of the publicly-funded Human Genome Project, we believe that early release of these data may be useful to other researchers and we hope to encourage additional study. We believe that rapid data release, particularly for studies involving human subjects and valuable samples, better serves the scientific community as well as the participants in the study than does the standard practice of data release after publication. We hope that other researchers will use these data in the same spirit.

We place two stipulations upon the use of these pre-publication data. First, we ask that the HGDP- CEPH rules of not using these samples, and prior to publication, the data derived from them, for profit be observed. Second, we request that our data not be used in other publications until our initial manuscript is published. We will announce on this page the status of publication as soon as we know it.

Our submitted manuscript describes analysis of the patterns of genetic diversity as ascertained by the 650,000 Illumina-assayed SNPs. We assessed shared ancestry and admixture, relationships between haplotype heterozygosity and geography, and population differences in copy number variation throughout the human genome in the 1,043 individuals. We have also collaborated with Dr. Jonathan Pritchard and his group at the University of Chicago to identify and study regions of selection that can be ascertained from these data, and a second manuscript describing this work is in preparation. Jonathan’s group is preparing a web browser to allow users to examine selection signals in the data; this browser will be available upon publication of this second paper.

People and contact information

This work was a collaboration between researchers in several laboratories at Stanford University, and include:

Department of Genetics, Stanford Human Genome Center, Stanford University School of Medicine
Jun Z. Li
Devin M. Absher
Hua Tang
Audrey M. Southwick
Amanda M. Casto
Gregory S. Barsh
Luigi L. Cavalli-Sforza
Richard M. Myers

Department of Biological Sciences, Stanford University
Sohini Ramachandran
Marcus Feldman

Dr. Jun Li and Dr. Devin Absher led the experimental and much of the analytical work in this study, and are the ones most familiar with these datasets.

Please contact Devin at [email protected] or Jun at [email protected] for queries about the data. As another resort, contact Rick Myers at [email protected]. Jun Li has just moved to the University of Michigan, and is a new faculty member in the Department of Human Genetics there.

Additional queries about the study can be addressed to any of the other members of this study, particularly Luca Cavalli-Sforza, Marc Feldman and Rick Myers.

Stanford HGDP SNP Genotyping Data