Gaston
Gaston is an R package for Genetic data analysis : data manipulation (quality control), computation of GRM, heritability estimates, association testing.
Manual
THIS SITE IS UNDER CONSTRUCTION
Reading and writing data
Data manipulation
Mixed models
Association testing
Data sets
AGT
: AGT data set
Description
These data have been extracted from the 1000 Genomes data. The data set contains the genotype matrix AGT.gen
, the pedigree matrix AGT.fam
and a matrix AGT.bim
, corresponding to 503 individuals of European populations and 361 SNPs on chromosome 1, on a ~100kb segment containing the Angiotensinogen gene. There is also a factor AGT.pop
, which gives the population from which each individual is drawn (CEU = Utah residents of Northern Western European ancestry, FIN = Finnish, GBR = England and Scottland, IBS = Iberian, TSI = Toscani).
Usage
data(AGT)
Format
There are three data objects in the dataset:
-
AGT.gen
-
Genotype matrix
-
AGT.fam
-
Data frame containing all variables corresponding to a
.fam
file -
AGT.bim
-
Data frame containing all variables corresponding to a
.bim
file -
AGT.pop
-
Factor giving the population from which each individual is drawn
Source
The data were obtained from the 1000 Genomes project (see https://www.internationalgenome.org/).
References
McVean et al, 2012, An integrated map of genetic variation from 1,092 human genomes, Nature 491, 56-65 doi:10.1038/nature11632
Examples
data(AGT) x <- as.bed.matrix(AGT.gen, AGT.fam, AGT.bim) x
LCT
: LCT data set
Description
These data have been extracted from the 1000 Genomes data. The data set contains the genotype matrix LCT.gen
, the pedigree matrix LCT.fam
and a matrix LCT.bim
, corresponding to 503 individuals of European populations and 607 SNPs on chromosome 2, on a ~300kb segment containing the Lactase gene. There is also a factor LCT.pop
, which gives the population from which each individual is drawn (CEU = Utah residents of Northern Western European ancestry, FIN = Finnish, GBR = England and Scottland, IBS = Iberian, TSI = Toscani).
Note that the SNP rs4988235 is associated with lactase persistence / lactose intolerence.
Usage
data(LCT)
Format
There are three data objects in the dataset:
-
LCT.gen
-
Genotype matrix
-
LCT.fam
-
Data frame containing all variables corresponding to a
.fam
file -
LCT.bim
-
Data frame containing all variables corresponding to a
.bim
file -
LCT.pop
-
Factor giving the population from which each individual is drawn
Source
The data were obtained from the 1000 Genomes project (see https://www.internationalgenome.org/).
References
McVean et al, 2012, An integrated map of genetic variation from 1,092 human genomes, Nature 491, 56-65 doi:10.1038/nature11632
Examples
data(LCT) x <- as.bed.matrix(LCT.gen, LCT.fam, LCT.bim) x which(x@snps$id == "rs4988235")
Vignette
THIS SITE IS UNDER CONSTRUCTION