Winbugs code for Hierarchical Bayesian method for multilocus association analysis in general population sample
The code is written by:
Madhuchhanda Bhattacharjee (http://www.rni.helsinki.fi/~mab/)
Reference:
Sillanpää MJ, Bhattacharjee M (2005)
Bayesian association-based fine mapping in small chromosomal segments.
Genetics 169: 427-439.
Data:
Multi-allelic (or bi-allelic) markers, unknown phase, missing data (to some extent),
Continuous or binary phenotype
Small chromosomal segment spanned by a dense set of closely linked markers,
(alternately data from long segment can also be used and accordingly independent prior for marker indicators should be used)
Few model assumptions:
Covariance between consecutive loci accounted using genetic or physical map distances between markers
Dependent variable selection model for markers using joint prior distribution on indicators
Mapping assuming putative genes only at marker points
Winbugs code
model {
# Modelling missing data
for(j in 1 : m) {
for(k in 1:n2[j] ){
p1[ j, k ] <- 1/n2[j] # Uniform allele frequencies assumed for missing data augmentation
}
for(k in n2[j]+1:n2max ){
p1[ j, k ] <- 0 # To complete probability matrix in case number of alleles differ from marker to marker
}
}
for(i in 1 : n1) {
for(j in 1 : m ) {
y[i, 2*j-1 ] ~ dcat( p1[ j, 1:n2[j] ] ) # Model for first allele at locus j
y[i, 2*j ] ~ dcat( p1[ j, 1:n2[j] ] ) # Model for second allele at locus j
# y[i, j ] ~ dcat( p1[ j, 1:n2[j]] ) # Model for single allele data at locus j
}
}
# Model for the indicators
lambda ~ dgamma(1, 0.01) # Smoothing parameter
p[1] <- 1/m # Stringent prior belief on number of markers independently selected in the model
x[1] ~dbern( p[1] ) # Indicator for first marker
for(j in 2 : m ) {
p[ j ] <- 1/m # Stringent prior belief on number of markers independently selected in the model
q[j-1] <- 1/exp( lambda*d[j-1] ) # Smoothing parameter and distances used to generate probability of repeating previous indicator value
w[ j-1 ] ~dbern( q[ j-1 ] ) # LD indicator generated using distance information
x0[ j ] ~dbern( p[ j ] ) # Independent indicator
x[j] <- (1-w[j-1])*x0[j]+w[j-1]*x[j-1] # Mixture of independent indicator and previous indicator, using LD indicator
}
x1 <- sum( x[1:m] ) # Total number of markers selected in the model
for( j in 1 : m+1 ) {
x2[ j ] <- equals( x1, j-1 ) # To estimate distribution of number of markers selected in the model
}
# Modelling phenotype data
for(j in 1 : m ) {
tau[j] ~dgamma( 1, 1) # Precision parameter (i.e. 1/ variance) for random variance model
for( k in 1 : n2[j] ) {
beta[ j, k ] ~ dnorm( 0, tau[j] ) # Allelic coefficients for random variance model
# beta[ j, k ] ~ dnorm( 0, 1) # Allelic coefficients for fixed variance model
}
for( k in n2[j]+1:n2max ) {
beta[ j, k ] <- 0 # To complete coefficient matrix in case number of alleles differ from marker to marker
}
b1[ j ] <- x[ j ]/tau[ j ] # Weighted genetic variance for multiallelic data
# b1[ j ] <- x[j]/(abs(beta[j,1]-beta[j,2])) # Weighted absolute genetic effect for biallelic data
for( k in 1 : n2[j] ) {
b0[ j, k ] <- x[j]*beta[j, k] # Weighted allelic effects
}
}
mu0 ~dnorm( 0, 0.1 ) # Overall mean
# tau0 ~dgamma( 1, 1) # For continuous phenotype: Precision parameter corresponding to residual variance
for(i in 1 : n1) {
for(j in 1 : m ) {
beta1[i,j] <- x[j]*(beta[j,y[i,2*j-1]]+beta[j,y[i,2*j]]) # For two allele data
# beta1[i, j] <- x[j]*beta[ j, y[i, j] ] # For single allele data
}
logit(p2[i]) <- mu0+ sum( beta1[i, 1:m] ) # For binary phenotype, transformation using logit link function
z1[i] ~ dbern( p2[i] ) # Model for binary phenotype data
# p2[i] <- mu0+ sum( beta1[i, 1:m] ) # For continuous phenotype, variable selection model for mean
# z1[i] ~ dnorm( p2[i], tau0 ) # Model for continuous phenotype data
}
}
Input data:
n1: Number of individuals
m: Number of candidate markers
n2: Vector of number of alleles at each marker
n2max: A number larger than the maximum number of alleles at any locus
d: Vector of pair wise distances between loci indexed so that d[1] contains distance between markers 1 and 2.
z1: phenotypic data
y: genotype data