rag_tutorial_rna_matrix | Schlick Group at NYU

Program Description

To understand and use this program, there are a few simple concepts, each with a brief explanation, listed below that might be useful.

ct file:

The ct file contains data about the base pairs in a RNA secondary structure. The following is the structure for TRNA12 that was generated by mFold and the ct file that is associated with that structure:

RNA Secondary Structure

ct File

In order for the program to run properly, the ct file that you download from the web (Zuker's mFold) or that you generate must appear exactly as the file above. The first line contains the total number of nucleotides in the structure, the energy associated with the fold, and the name of the file. The significance of the columns is as follows:

Column 1: List of the nucleotides from 1 to N (N = total number of nucleotides).
Column 2: List of the type of nucleotide (A, G, U, or C).
Column 3: List of the nucleotides increasing from zero to N - 1.
Column 4: List of the nucleotides from 2 to N and continuing the column with zeros to fill any empty spaces.
Column 5: List of the nucleotides that are paired to those listed in increasing order. Any zeros in the fifth column indicate that the particular nucleotide is unpaired.
Column 6: A repeat of column 1.

Click on the following ct file if you would like to view the sample file displayed above as it actually appears in file form: TRNA12

Laplacian Matrix:

The Laplacian matrix (L) is a mathematical representation of the connectivity between the vertices in a RNA graph or topology. It's represented by diagonal (D) and adjacency (A) components. The diagonal matrix shows the number of connections each vertex makes with the other vertices along the diagonal of the matrix. The adjacency matrix specifies to which vertices each vertex is connected. In a graphical representation of a RNA structure, any labeling is fine. For example, the tRNA (NDB: TRNA12) structure shown above has the following tree graph structure with the vertices randomly labeled:

The corresponding D and A values are as follows:

D

A

4	0	0	0	0
0	1	0	0	0
0	0	1	0	0
0	0	0	1	0
0	0	0	0	1

0	1	1	1	1
1	0	0	0	0
1	0	0	0	0
1	0	0	0	0
1	0	0	0	0

Each column and row in the above matrices correspond to the graph's vertices. By looking at the diagonal of the diagonal matrix, you can see that vertex 1 is connected to 4 other vertices, vertex 2 is connected to 1 other vertex, and so on. The corresponding adjacency matrix specifies these connections explicitly.
The Laplacian matrix is defined from D and A as follows:

L = D - A

For the example above, we have as follows:

L

4	-1	-1	-1	-1
-1	1	0	0	0
-1	0	1	0	0
-1	0	0	1	0
-1	0	0	0	1

The Laplacian matrix is a square matrix. Each column and row in the above matrix represents the vertices in the tree graph.
A value of -1 in the matrix element i,j indicates that vertices i and j are connected. For example, by looking across at row 1, it is apparent that vertex 1 is connected to vertex 2, 3, 4, and 5. By symmetry, the same information is provided by looking down column 1.
Zeros indicate no connectivity between corresponding vertices. For example, vertex 2 is not connected to vertex 4.
The diagonals of the Laplacian matrix are always positive integers. They represent the number of connections that the particular vertex makes. For example, vertex 1 is connected to 4 other vertices.

Laplacian Eigenvalues:

The Laplacian matrix is used to calculate its corresponding eigenvalues. The total number of vertices in a RNA secondary structure equals the total number of eigenvalues. The eigenvalue that helps to describe the RNA topology is the second eigenvalue. The second eigenvalue describes the compactness of a graph. The range of possible values for the second eigenvalue begin at zero and increase from there. The more compact a graph is, the higher its corresponding second eigenvalue.