Gaston

Gaston is an R package for Genetic data analysis : data manipulation (quality control), computation of GRM, heritability estimates, association testing.

Gaston on CRAN

Gaston on github

Manual

THIS SITE IS UNDER CONSTRUCTION

Reading and writing data

Data manipulation

Mixed models

Association testing

Data sets

AGT: AGT data set

Description

These data have been extracted from the 1000 Genomes data. The data set contains the genotype matrix AGT.gen, the pedigree matrix AGT.fam and a matrix AGT.bim, corresponding to 503 individuals of European populations and 361 SNPs on chromosome 1, on a ~100kb segment containing the Angiotensinogen gene. There is also a factor AGT.pop, which gives the population from which each individual is drawn (CEU = Utah residents of Northern Western European ancestry, FIN = Finnish, GBR = England and Scottland, IBS = Iberian, TSI = Toscani).

Usage

 data(AGT) 

Format

There are three data objects in the dataset:

AGT.gen

Genotype matrix

AGT.fam

Data frame containing all variables corresponding to a .fam file

AGT.bim

Data frame containing all variables corresponding to a .bim file

AGT.pop

Factor giving the population from which each individual is drawn

Source

The data were obtained from the 1000 Genomes project (see https://www.internationalgenome.org/).

References

McVean et al, 2012, An integrated map of genetic variation from 1,092 human genomes, Nature 491, 56-65 doi:10.1038/nature11632

Examples

data(AGT)
x <- as.bed.matrix(AGT.gen, AGT.fam, AGT.bim)
x

LCT: LCT data set

Description

These data have been extracted from the 1000 Genomes data. The data set contains the genotype matrix LCT.gen, the pedigree matrix LCT.fam and a matrix LCT.bim, corresponding to 503 individuals of European populations and 607 SNPs on chromosome 2, on a ~300kb segment containing the Lactase gene. There is also a factor LCT.pop, which gives the population from which each individual is drawn (CEU = Utah residents of Northern Western European ancestry, FIN = Finnish, GBR = England and Scottland, IBS = Iberian, TSI = Toscani).

Note that the SNP rs4988235 is associated with lactase persistence / lactose intolerence.

Usage

 data(LCT) 

Format

There are three data objects in the dataset:

LCT.gen

Genotype matrix

LCT.fam

Data frame containing all variables corresponding to a .fam file

LCT.bim

Data frame containing all variables corresponding to a .bim file

LCT.pop

Factor giving the population from which each individual is drawn

Source

The data were obtained from the 1000 Genomes project (see https://www.internationalgenome.org/).

References

McVean et al, 2012, An integrated map of genetic variation from 1,092 human genomes, Nature 491, 56-65 doi:10.1038/nature11632

Examples

data(LCT)
x <- as.bed.matrix(LCT.gen, LCT.fam, LCT.bim)
x
which(x@snps$id == "rs4988235")

Vignette

THIS SITE IS UNDER CONSTRUCTION