Integrated Vervet/AGM Research & Resources

vervet pic

Vervet Genome Sequencing Project (NIH)

The Vervet Genome Sequencing Project has been assigned by the NHGRI to the The Genome Institute, Washington University School of Medicine. The project is being led by Dr Wes Warren and Dr George Weinstock.
The VGSP will:
- generate a high quality genome sequence for a reference animal from the VRC
- identify genome wide SNPs for vervet subspecies
- detect genome wide genome rearrangements
- initiate sequence-based transcriptome analysis resources.
The VGSP plan has been developed in coordination with the NCRR integrated vervet genomics application.

As the VGSP progresses, we will provide links to those resources and coordinate with them to provide data access and data visualization.

arrow  Vervet Genome Browser  (Framework assembly)

Update -   Progress on the vervet reference genome assembly

The vervet reference genome assembly is derived from an adult male monkey of the VRC pedigreed colony, now at Wake Forest. Genome sequencing has utilized a variety of technologies (Roche/454, Illumina, conventional Sanger sequencing) and strategies (paired ends of different insert sizes), and sequencing experiments have benefited from improved protocols and extended read lengths as they have been implemented over the course of the project. DNA from the same animal has been cloned as a commercially available BAC library (CH252).

In April 2012 we generated a Newbler assembly based primarily on Roche/454 long read, Roche/454 8kb paired end, and BAC end sequence data sets. As of October 2012 we have also generated an ALLPATHS assembly (5.0) based on Illumina paired end data sets and the BAC end sequences. In each case we went through a series of iterations of BAC end re-alignments to identify additional scaffold joins to build longer range contiguity of our genomic sequence scaffolds.
We are currently displaying Newbler and ALLPATHS 5.0.1 on the genome browser.

Presently we are merging Newbler and ALLPATHS 5.0.2 and applying additional computational gap closing protocols to develop a high quality assembly that will be submitted to NCBI for public access and Ensembl for annotation. Our objective is to have the assembly finalized and submitted before February 28, 2013, with the reference assembly becoming fully accessible by June 30, 2013.

Characteristics of the two assemblies include:

Newbler (April 2012)ALLPATHS 5.0.2 (December 2012)
Assembly length:2.89Gb2.71Gb
>1Mb Scaffolds:380 (2.74Gb, 94.8% of assembly)150 (2.67Gb, 98.5% of assembly)
BAC end concordance:91.8% (of 164,869 clones))98.4% (of 161,660 clones)

Figure 1. Graphical representation of ALLPATHS (red) and Newbler (blue) assemblies. Scaffolds > 1Mb are arranged by length with cumulative length plotted.

MHC regional assembly

Separately from whole genome assemblies, we have sequenced and assembled a BAC path spanning the MHC region.

Physical Map Project (Genome Canada and Genome Quebec)

arrow  Home Page

Genetic Map (NIH)

arrow  Home Page

Microsatellite Genetic Maps and supporting data
arrow  Browse the Maps and the Data

Update -   268K SNP Mapping Set and sequence-based genotype data from the VRC

The Vervet Research Colony (VRC) is a pedigreed colony of vervet monkeys (Chlorocebus aethiops sabaeus) established from a small founder population of Caribbean vervets and provides a model for genetic studies of multiple traits relevant to human health. In order to facilitate genetic mapping in this model we generated whole genome sequencing (WGS) data, conducted variant calling and imputation of missing variants in 723 colony animals.

We generated WGS data from 723 VRC monkeys: 16 high-coverage genomes (~40X average) from the pedigree progenitors, 406 medium-coverage genomes (4-6X) from monkeys in "the middle" of the pedigree, and 301 low-coverage genomes (1-2X) from monkeys at the bottom of the pedigree. From this data set we identified genome-wide sequence variation segregating in the pedigree and determined high quality genotypes for all individuals. In doing so, we employed a combination of single-locus genotype callers (GATK/SAMtools) followed by TrioCaller, which refines genotype data from single-site callers and imputes missing genotype data based on haplotype information and trio constraints.

This link contains the genotype information for a 268K SNP Mapping Set in 705 VRC monkeys whose genomes have so far passed QC steps. These SNPs represent Version 1 of an initial mapping panel for genome-wide investigations and are also suitable for initial investigations of targeted genome regions. We will update this mapping panel once the vervet reference assembly with annotation is released.

Disclaimer: the data is not yet final and may contain Mendelian errors.

Description of the work flow
Readme file
To download the full pedigree (csv file)
To download the genotyping data (build (gzip file)
To download the genotyping data (build 5.0.1) (gzip file)

Update Dec 2014 -   500K and 148K SNP Mapping SNP Sets and sequence-based genotype data from the VRC

We further developed genetic maps and genome-wide genotype data using the WGS dataset from the vervets from the VRC pedigree using the WGS dataset described above. Using the variant discovery and genotyping pipeline shown in the work flow Figure, we created two SNP mapping sets and respective genotype data from 722 VRC vervets that are publically available at EMBL-EBI; these data can be directly queried via the EVA at EBI:

Association Mapping SNP Set consisting of 497,163 SNPs on the 29 vervet autosomes. In this set of ~500K SNPs, there were an average of 198 SNPs per Mb of vervet sequence, and the largest gap size between adjacent SNPs was 5Kb.

Linkage Mapping SNP Set consisting of 147,967 markers. In this set of ~148K SNPs, there were an average of 58.2 SNPs per Mb of vervet sequence, and the average gap size between adjacent SNPs was 17.5Kb.

Transcriptome (NIH)

A variety of resources for analysis of the vervet transcriptome are either already available (and searchable from this site), under development, or in planning.
Currently available microarray resources developed from the Vervet Research Colony are based on widely utilized platforms (Affymetrix and Illumina human arrays) and permit analysis of expression variability across tissues (including several brain regions) and between individuals.

arrow  Affymetrix expression resource

Illumina expression resource

The Illumina gene expression resource using the HumanRef-8 v2 chip, including 22,184 probes representing 18,189 unique human genes (or 20,424 unique transcripts from Reference Sequence (RefSeq) database1, Release 17) consists of two sample sets:

1. Brain-Blood dataset comprising blood and 8 brain regions (cerebellar vermis, pulvinar, head of caudate, hippocampus, frontal pole, dorsolateral prefrontal cortex, orbital frontal cortex, and occipital pole) from 12 males. This dataset has been deposited in GEO (GSE15301) and is available here:
Jasinska AJ, Service S, Choi OW, Deyoung J, Grujic O, Kong SY, Jorgensen MJ, Bailey J, Breidenthal S, Fairbanks LA, Woods RP, Jentsch JD, Freimer NB. Identification of Brain Transcriptional Variation Reproduced in Peripheral Blood: an Approach for Mapping Brain Expression Traits.
Hum Mol Genet. 2009 Aug 19. [Epub ahead of print] PubMed PMID: 19692348

2. Biological Replicate dataset consisting of duplicate samples from 18 individuals in the VRC.
The Illumina gene expression resource currently allows searching, by gene or probe for expression in different tissues in the Brain-Blood dataset and expression reproducibility in blood from the Biological Replicate dataset.

This resource was created to compare brain and peripheral blood expression and to account for environmental effects and random expression signal changes due to technical factors.
Among 22,184 Illumina probes, 6,550 showed a Spearman correlation of > 55% between expression levels in peripheral blood and one or more brain regions. A total of 8,025 probes showed a percent of variability attributable to the inter-individual component (PV) > 55%. Such probes where most of the variability is attributable to the between monkey component show more inter-monkey variability than intra-monkey variability and therefore have less variability between brain and blood tissue than between monkeys.
In interpreting these results, it should be noted that, in the absence of sequence data, some of the apparent vervet-vervet variation in both the periphery and in the brain may be due to sequence-related variations in probe affinity across animals rather than true gene expression differences. The replicate data serve to identify genes with peripheral expression levels that are sensitive to temporal, environmental or stress-related variation. Characterizing within-vervet variations in gene expression is important in generating a more complete framework within which to interpret between-vervet variation.
To the Illumina gene expression resource  arrow
Download Illumina summary statistics (excel spreadsheet)  arrow

Update -   Microarray gene expression probes suitable for studies of vervet transcriptome

To facilitate gene expression studies in the vervet using existing microarray reagents, we used the vervet reference assembly and genetic variation data form the vervet Research Colony (VRC) to characterize probes on the Illumina Human Ref-8 v2 and Affymetrix platforms which were used previously for gene expression studies in the VRC, as well as on the Illumina HumanHTv12, an updated chip which will be used for future vervet expression studies.
To determine probe specificity, complementarity to the vervet transcripts, and sensitivity unbiased by any intrinsic sequence variants, we aligned sequences of probes originally designed to target known human transcripts to the vervet genomic assembly and characterize the probes with respect to number of hits in the vervet genome, presence of insertion-deletions and location of known vervet SNPs.

These links contain information that will enable investigators to select high quality probes for evaluation of gene expression in the vervet.

Readme file
Illumina probe info (build
HumanHT probe info (build
Affy probe info (build
Illumina probe info (build 5.0.1)
HumanHT probe info (build 5.0.1)
Affy probe info (build 5.0.1)

logosFunding agencies

Hosted by the McGill University and Genome Quebec Innovation Centre