Title: | Finding Associations in Position-Wise Aligned DNA Sequence Dataset |
---|---|
Description: | Can be useful for finding associations among different positions in a position-wise aligned sequence dataset. The approach adopted for finding associations among positions is based on the latent multivariate normal distribution. |
Authors: | Prabina Kumar Meher & A. R. Rao |
Maintainer: | Prabina Kumar Meher <[email protected]> |
License: | GPL (>= 2) |
Version: | 1.0.1 |
Built: | 2025-02-28 03:30:38 UTC |
Source: | https://github.com/cran/corrDNA |
All the six possible association matrices can be merged in to a single matrix to visualize the overall association among positions as well as among the occurences of nucleotides of different positions, in a position-wise aligned sequence dataset.
assoc_comb(x, rZiZj, rZiZjR, rZiZjY, rZiRZjR, rZiRZjY, rZiYZjY)
assoc_comb(x, rZiZj, rZiZjR, rZiZjY, rZiRZjR, rZiRZjY, rZiYZjY)
x |
A dataframe of position wise aligned sequence dataset having A, T, G and C only. |
rZiZj |
An object generated by using the function |
rZiZjR |
An object generated by using the function |
rZiZjY |
An object generated by using the function |
rZiRZjR |
An object generated by using the function |
rZiRZjY |
An object generated by using the function |
rZiYZjY |
An object generated by using the function |
All the six association matrices are required to be generated prior to merging them into a single matrix.
A numeric matrix of order 3L by 3L for the dataset of L nucleotides long sequences.
Prabina Kumar Meher & A. R. Rao
data(don_dat) kk <- don_dat[1:300,] zizj <- assoc_Zi.Zj(x=kk) zizjr <- assoc_Zi.ZjR(x=kk, rZiZj=zizj) zizjy <- assoc_Zi.ZjY(x=kk, rZiZj=zizj) zirzjr <- assoc_ZiR.ZjR(x=kk,rZiZj=zizj,rZiZjR=zizjr) zirzjy <- assoc_ZiR.ZjY(x=kk,rZiZj=zizj,rZiZjR=zizjr,rZiZjY=zizjy) ziyzjy <- assoc_ZiY.ZjY(x=kk,rZiZj=zizj,rZiZjY=zizjy) fin_corr <- assoc_comb(x=kk, rZiZj=zizj,rZiZjR=zizjr,rZiZjY=zizjy, rZiRZjR=zirzjr,rZiRZjY=zirzjy,rZiYZjY=ziyzjy) fin_corr
data(don_dat) kk <- don_dat[1:300,] zizj <- assoc_Zi.Zj(x=kk) zizjr <- assoc_Zi.ZjR(x=kk, rZiZj=zizj) zizjy <- assoc_Zi.ZjY(x=kk, rZiZj=zizj) zirzjr <- assoc_ZiR.ZjR(x=kk,rZiZj=zizj,rZiZjR=zizjr) zirzjy <- assoc_ZiR.ZjY(x=kk,rZiZj=zizj,rZiZjR=zizjr,rZiZjY=zizjy) ziyzjy <- assoc_ZiY.ZjY(x=kk,rZiZj=zizj,rZiZjY=zizjy) fin_corr <- assoc_comb(x=kk, rZiZj=zizj,rZiZjR=zizjr,rZiZjY=zizjy, rZiRZjR=zirzjr,rZiRZjY=zirzjy,rZiYZjY=ziyzjy) fin_corr
and
.
Finding association between variables of position and
position.
In any position wise aligned sequence dataset, occurences of R=(A,G) and Y=(C, T) at each position can be explained by a standard normal variate Z based on certain threshold value.
So, an association between any two position in the datast can be obtained which will be the association beween the two standard normal variate at this two positions.
However, the two nomal variates reprsenting the occurences of R and Y are independent of each other at a given position.
assoc_Zi.Zj(x)
assoc_Zi.Zj(x)
x |
A dataframe of position wise aligned sequence dataset having A, T, G and C only. |
The user has to supply the sequence dataset in tab delimited format and not in FASTA format. Each sequence (row) should contain only standard nucleotides (A, T, G and C). Each sequence should be same length.
A numeric matrix of order L by L for the dataset of L nucleotides long sequences.
Prabina Kumar Meher & A. R. Rao
data(don_dat) kk <- don_dat[1:300,] zizj <- assoc_Zi.Zj(x=kk) zizj
data(don_dat) kk <- don_dat[1:300,] zizj <- assoc_Zi.Zj(x=kk) zizj
and
.
Finding association between variable Z at position and
at
position.
Here, the standard normal variable Z represents the occurence of R=(A,G) and Y=(C, T) at each position in the position wise aligned dataset,
whereas the the standard normal variable
reprsents the occurences of nucleotides A and G at any position based on some threshold value.
assoc_Zi.ZjR(x, rZiZj)
assoc_Zi.ZjR(x, rZiZj)
x |
A dataframe of position wise aligned sequence dataset having A, T, G and C only. |
rZiZj |
An object generated by using the function |
The user has to supply the input dataset as well as the output generated from the function assoc_Zi.Zj
.
A numeric matrix of order L by L for the dataset of L nucleotides long sequences.
It may happen that the convergence will not reach after a ceratin number of iterations and will not produce any output. In such situation, the user is advised to exclude or include some positions , or otherwise include or exclude certain sequences. The user should exploit both options till convergence is reached.
Prabina Kumar Meher & A. R. Rao
data(don_dat) kk <- don_dat[1:300,] zizj <- assoc_Zi.Zj(x=kk) zizjr <- assoc_Zi.ZjR(x=kk, rZiZj=zizj) zizjr
data(don_dat) kk <- don_dat[1:300,] zizj <- assoc_Zi.Zj(x=kk) zizjr <- assoc_Zi.ZjR(x=kk, rZiZj=zizj) zizjr
and
.
Finding association between variable Z at position and
at
position.
Here, the standard normal variable Z represents the occurence of R=(A,G) and Y=(C, T) at each position in the position wise aligned dataset,
whereas the the standard normal variable
reprsents the occurences of nucleotides A and G at any position based on some threshold values.
assoc_Zi.ZjY(x, rZiZj)
assoc_Zi.ZjY(x, rZiZj)
x |
A dataframe of position wise aligned sequence dataset having A, T, G and C only. |
rZiZj |
An object generated by using the function |
The user has to supply the input dataset as well as the output generated from the function assoc_Zi.Zj
.
A numeric matrix of order L by L for the dataset of L nucleotides long sequences.
It may happen that the convergence will not reach after a ceratin number of iterations and will not produce any output. In such situation, the user is advised to exclude or include some positions , or otherwise include or exclude certain sequences. The user should exploit both options till convergence is reached.
Prabina Kumar Meher & A. R. Rao
data(don_dat) kk <- don_dat[1:300,] zizj <- assoc_Zi.Zj(x=kk) zizjy <- assoc_Zi.ZjY(x=kk, rZiZj=zizj) zizjy
data(don_dat) kk <- don_dat[1:300,] zizj <- assoc_Zi.Zj(x=kk) zizjy <- assoc_Zi.ZjY(x=kk, rZiZj=zizj) zizjy
and
.
Finding association between variable at
position and
at
position.
Here, the standard normal variable
reprsents the occurences of nucleotides A and G at any position based on some threshold value.
assoc_ZiR.ZjR(x, rZiZj, rZiZjR)
assoc_ZiR.ZjR(x, rZiZj, rZiZjR)
x |
A dataframe of position wise aligned sequence dataset having A, T, G and C only. |
rZiZj |
An object generated by using the function |
rZiZjR |
An object generated by using the function |
The user has to supply the input dataset as well as the outputs generated from the functions assoc_Zi.Zj
and assoc_Zi.ZjR
.
A numeric matrix of order L by L for the dataset of L nucleotides long sequences.
It may happen that the convergence will not reach after a ceratin number of iterations and will not produce any output. In such situation, the user is advised to exclude or include some positions , or otherwise include or exclude certain sequences. The user should exploit both options till convergence is reached.
Prabina Kumar Meher & A. R. Rao
data(don_dat) kk <- don_dat[1:300,] zizj <- assoc_Zi.Zj(x=kk) zizjr <- assoc_Zi.ZjR(x=kk, rZiZj=zizj) zirzjr <- assoc_ZiR.ZjR(x=kk, rZiZj=zizj, rZiZjR=zizjr) zirzjr
data(don_dat) kk <- don_dat[1:300,] zizj <- assoc_Zi.Zj(x=kk) zizjr <- assoc_Zi.ZjR(x=kk, rZiZj=zizj) zirzjr <- assoc_ZiR.ZjR(x=kk, rZiZj=zizj, rZiZjR=zizjr) zirzjr
and
.
Finding association between variable at
position and
at
position.
Here, the standard normal variable
represents the occurences C and T at each position in the position wise aligned dataset,
and the standard normal variable
reprsents the occurences of nucleotides A and G at any position based on some threshold values.
assoc_ZiR.ZjY(x, rZiZj, rZiZjR, rZiZjY)
assoc_ZiR.ZjY(x, rZiZj, rZiZjR, rZiZjY)
x |
A dataframe of position wise aligned sequence dataset having A, T, G and C only. |
rZiZj |
An object generated by using the function |
rZiZjR |
An object generated by using the function |
rZiZjY |
An object generated by using the function |
The user has to supply the input dataset as well as the outputs generated from the functions assoc_Zi.Zj
, assoc_Zi.ZjR
and assoc_Zi.ZjY
.
A numeric matrix of order L by L for the dataset of L nucleotides long sequences.
It may happen that the convergence will not reach after a ceratin number of iterations and will not produce any output. In such situation, the user is advised to exclude or include some positions , or otherwise include or exclude certain sequences. The user should exploit both options till convergence is reached.
Prabina Kumar Meher & A. R. Rao
data(don_dat) kk <- don_dat[1:300,] zizj <- assoc_Zi.Zj(x=kk) zizjr <- assoc_Zi.ZjR(x=kk, rZiZj=zizj) zizjy <- assoc_Zi.ZjY(x=kk, rZiZj=zizj) zirzjy <- assoc_ZiR.ZjY(x=kk, rZiZj=zizj, rZiZjR=zizjr, rZiZjY=zizjy) zirzjy
data(don_dat) kk <- don_dat[1:300,] zizj <- assoc_Zi.Zj(x=kk) zizjr <- assoc_Zi.ZjR(x=kk, rZiZj=zizj) zizjy <- assoc_Zi.ZjY(x=kk, rZiZj=zizj) zirzjy <- assoc_ZiR.ZjY(x=kk, rZiZj=zizj, rZiZjR=zizjr, rZiZjY=zizjy) zirzjy
and
.
Finding association between variable at
position and
at
position.
Here, the standard normal variable
reprsents the occurences of nucleotides C and T at any position based on some threshold values.
assoc_ZiY.ZjY(x, rZiZj, rZiZjY)
assoc_ZiY.ZjY(x, rZiZj, rZiZjY)
x |
A dataframe of position wise aligned sequence dataset having A, T, G and C only. |
rZiZj |
An object generated by using the function |
rZiZjY |
An object generated by using the function |
The user has to supply the input dataset as well as the outputs generated from the functions assoc_Zi.Zj
and assoc_Zi.ZjY
.
A numeric matrix of order L by L for the dataset of L nucleotides long sequences.
It may happen that the convergence will not reach after a ceratin number of iterations and will not produce any output. In such situation, the user is advised to exclude or include some positions , or otherwise include or exclude certain sequences. The user should exploit both options till convergence is reached.
Prabina Kumar Meher & A. R. Rao
data(don_dat) kk <- don_dat[1:300,] zizj <- assoc_Zi.Zj(x=kk) zizjy <- assoc_Zi.ZjY(x=kk, rZiZj=zizj) ziyzjy <- assoc_ZiY.ZjY(x=kk, rZiZj=zizj, rZiZjY=zizjy) ziyzjy
data(don_dat) kk <- don_dat[1:300,] zizj <- assoc_Zi.Zj(x=kk) zizjy <- assoc_Zi.ZjY(x=kk, rZiZj=zizj) ziyzjy <- assoc_ZiY.ZjY(x=kk, rZiZj=zizj, rZiZjY=zizjy) ziyzjy
This dataset comprises 1000 donor splice site sequences, where each sequence is of length 20 with 10 at the exon end and 10 at the intron start excluding the conserved di-nucleotide GT at the begining of intron. This dataset was randomly taken from true donor splice sites of HS3D dataset.
data(don_dat)
data(don_dat)
Pollastro P, Rampone S: HS3D: Homosapiens Splice Site Data Set. Nucleic Acids Res. 2003, Molecular Biology Database Collection entry number 36; Annual Database Issue.
data(don_dat)
data(don_dat)