Please send suggestions for improving this code and error reports to biomed@psc.edu. Sorry, but we cannot provide consulting support to this code at other sites. AJR will try to provide help on the installation of this code.
(*)PVM can be obtained via anonymous FTP to netlib2.cs.utk.edu. PVM can also be obtained through the World Wide Web (WWW) and electronic mail. WWW information is availiable from http://www.epm.ornl.gov/pvm/. Electronic mail information can be obtained by sending the message "help" to netlib@ornl.govAlthough the Msearch code is designed to be portable, we have only thoroughly tested this release on the Cray-T3d. Moderate testing has been performed on Cray Multiprocessor systems (J90 and C90) and SGI INDY workstations. Future releases of this code will be tested on additional platforms.
References:
References:
References:
References:
See the "FORMAT OF THE DATABASE FILE" section for more information.
(*) Support for the FASTA format is new in this release. We do not normally keep the databases in FASTA format, so this routine has not been thoroughly tested. We would appreciate receiving reports and bug-fixes that you may have with these routines. Please send the reports/fixes to biomed@psc.edu
The "SEARCH" command is relatively well optimized, and usually scales linearly until the program approaches the IO time. Using additional processors will usually speed up this command. On the other hand using an extreemly low cutoff value will increase the time that this command takes to complete. This is because once the search is completed, the listing file is sorted on a single processor. Thus, the lower the cutoff value, the more items there are to sort.
The "Align" command usually takes much longer to complete than the database search. The speed of this command is usually not improved dramatically by using additional processors. Like the search command, selecting a high cutoff value will cause this routine to run faster.
The gap command sets the gap length penality. For example, a gap of length 4 will be charged (4 times this value) + NEWGAP.
The newgap command sets the gap opening penality. For example, a gap of length 4 will be charged (4 times GAP) + this value.
The scoring command sets the matix used to compare the sequences.
Note that no DNA matricies are availiable in this release. DNA matricies will be added in a future version.
This command will let you label your output.
The cutoff command selects the alignment cutoff parameter. Only alignments scoring above this cutoff value will be produced.
The first form of the command, indicates that the program should compute its own cutoff value. The computed value is 10% of the score that would be produced if the query sequence was compared with itself. This is the recomended mode.
The second form of the command allows the user to set the cutoff to be a certain percentage of the score that would be produced if the query sequence was compared with itself. For example, if one was dealing with long, multiple domain proteins, one might want to set this value to 5%. To set this value to 5% enter: "cutoff percent=0.05".
The third form of the command is usually not recomended. It sets the cutoff to a specific value. This powerful option should only be used with discression.
The number parameter selects the number of subalignments to be retreived for EACH PAIR OF SEQUENCES. This parameter is particularly useful if one is using a query sequence that has repeats or multiple domains. This parameter is not meaningful when performing a database search.
Because the Msearch program is capable of translating DNA sequences into protein sequences, it is particularly important the user set these parameters properly.
Note that NUCLEIC-NUCLEIC comparisons are not possible in this version solely because approprite nucleic acid matricies have not been added to the code yet.
This command tells the program which translation frames to use. The default is "000000" for both the library and the query. "000000" means do not translate these sequences.
Of course, if one was comparing a protein to a DNA sequence, translation would be necessarry. The DNA sequence will always be the sequence that needs to be translated. Indicate the translation frames, with a "1". If you are not interested in having a particular frame translated, use a "0":
XXXXXX
||||||__Third reverse complimented frame
|||||
|||||___Second reverse complimented frame
||||
||||____First reverse complimented frame
|||
|||_____Third forward frame
||
||______Second forward frame
|
|_______First forward frame
For example "111000" means translate the forward frames one, two, and
three.Because the Msearch program is capable of translating DNA sequences into protein sequences, it is particularly important the user set these parameters properly.
This command will perform a database search via the method selected with the SEARCHMETHOD command. This command is very time intensive. Please see the performance considerations section of this document on how to improve the speed of this command
This command will align sequences from a database search via the method selected with the ALIGNMETHOD command. You must first SEARCH then ALIGN. Searches can be saved with the "SAVE" command, however if the databases have been changes since the search was done, you will need to repeat the search.
This command is usually never entered by the user. It is used to indicate that no more sequences in a save file are to be aligned.
This command ends the program.
This sets the searching algorithm.
This sets the aligning algorithm.
This command will tell the program which database you want to search Upon starting the program. all availiable databases and names are listed.
This command will have the program write an optional listing file when a database search is performed (via the "SEARCH" command). The listing file contains the sequence identifiers and definitions, sorted from highest to lowest score. Only those pairs that score higher than the cutoff value are reported. Below is a sample listing file:
734.00 PSRSAW x PSRSAW ...phospholipase A2 (EC 3.1.1.4) Western 712.00 PSRSAW x PSRSAE ...phospholipase A2 (EC 3.1.1.4) Eastern 567.00 PSRSAW x PSTV ...phospholipase A2 (EC 3.1.1.4) himehabu 548.00 PSRSAW x PSRSAT ...phospholipase A2 (EC 3.1.1.4) crotoxin
This command will tell the program what file the query sequences or the profile is in. Query sequences can be in the NBRF-PIR, GenBank, EMBL or Swiss-Prot file formats. The sequence profile must be in the GCG Wisconsin package profile file format.
This command will have the program write an alignment file when alignments are requested (via the "ALIGN" command). Below is a sample alignment produced by the program:
Alignment # 1 between PIR1:PSNJ2M and PSRSAW scored: 268.00
The query sequence (PSRSAW-PSRSAW) is 122 residues long.
...Usr$Temp:[Ropelewski]Psrsaw.Pir1;1 => PSRSAW
The library sequence (PSNJ2M-PSNJ2M) is 118 residues long.
...phospholipase A2 (EC 3.1.1.4) II - Mozambique cobra
1 * * * * * 58
PSRSAW => SLVQFETLIM.KIAGRSGLLW.YSAYGCYCGWGGHGLPQDATDRCCFVHDCCYGKA.T.DCN
:| || :| :::|: | :: ||||||:|| | : | |||| ||| ||| | :|
PSNJ2M => NLYQFKNMIHCTVPSRP..WWHFADYGCYCGRGGKGTAVDDLDRCCQVHDNCYGEAEKLGCW
1 * * * * * 60
59 * * * * 116
PSRSAW => PKTVSYTYSEENGEIIC.GGDDPCGTQICECDKAAAICFRDNIPSYDNKYWLFPPKDCR
| | | :| : | ||:: |:: :|:|| || || | :| : |:
PSNJ2M => PYLTLYKYECSQGKLTCSGGNNKCAAAVCNCDLVAANCF.AGARYIDANYNINLKERCQ
61 * * * * 118
The alignment contains:
121 pairs, 46 matches, 67 mismatches and 8 insertions/deletions.
This important command is used to tell the program where to store the intermediate search results. The intermediate results are stored in a binary format. THIS FILE MUST BE SPECIFIED BEFORE THE "SEARCH" COMMAND IS ISSUED!
(*)PVM can be obtained via anonymous FTP to netlib2.cs.utk.edu. PVM can also be obtained through the World Wide Web (WWW) and electronic mail. WWW information is availiable from http://www.epm.ornl.gov/pvm/. Electronic mail information can be obtained by sending the message "help" to netlib@ornl.govOnce PVM is installed on your system, you can then start the PVM daemon. There are a variety of ways that you can start the PVM daemon; Instructions for starting the PVM daemon on a variety of platforms are listed below. If the method listed below does not work on your system, or if you desire a more secure method of starting PVM, please read the "starting the PVMD" section in the PVM manual.
Keep in mind that on some machines, such as the Cray T3d, users do not explicitly start the PVM daemon.
setenv MPP_NPES 16
setenv PVM_ROOT /my/pvm/path/pvm3
to your .cshrc file on every computer that you will be using to run the Msearch program.
setenv NCPUS 4
pvm
pvm> quit
pvm
pvm> halt
setenv PVM_ROOT /my/pvm/path/pvm3
to your .cshrc file on every computer that you will be using to run the Msearch program.
hostname [username]
Enter the IP address of every computer that you will be using (one per line), followed by your login name on that computer. For example, if your login name was joeuser and you wanted to run Msearch on three computers named fred, wilma and barney, your .rhosts file might look like:
fred joeuser
Most computer systems require that permissions on the
.rhosts file be set so that group and others do not have
any permissions for this file. (see man rhosts for more
information on using .rhosts files)
wilma joeuser
barney joeuser
pvm
pvm> add wilma barney
You can check to see if all your hosts have been added by using the
conf command. For example:
pvm> conf
3 hosts, 1 data format
HOST DTID ARCH SPEED
slave 40000 SGI5 1000
ctc20 80000 SGI5 1000
ctc21 c0000 SGI5 1000
pvm> quit
pvm
pvm> halt
A quick way to check .rhost files is to use the remsh command. Below is an example of how to check .rhosts files on computers named fred, wilma and barney:
remsh fred lsSome computers use the rsh command instead of remsh. If the commands listed above do not work try:
remsh wilma ls
remsh barney ls
rsh fred ls
rsh wilma ls
rsh barney ls
First, see if you can shut down the PVM daemons normally:
pvm
pvm> halt
Then, if that dosen't work do the following on each machine that you ran PVM on:
- First, find out your UID by searching the /etc/passwd file for uour login name. The UID found between the second and third colon. For example, joeuser's UID is 2260:
grep -i joeuser /etc/passwd
joeuser:*:2260:708:Joe User:/usr/users/0/joeuser:/bin/csh
- Next, see if you have any pvm daemons on this machine, Find the PID, then kill the process. For example:
ps -elf | grep pvmThe PID in the above example is the fourth entry on the line. (12872). On some computers uou should use ps -aux instead. In the example below, the PID is the second entry on the line:
1 S joeuser 12872 1 26 34 263734 264 6430471 p068 0:00 pvmd3
kill -9 12872
ps -aux | grep pvm
joeuser 3836 0.0 1.9 404 360 p0 S 0:00 pvmd3
kill -9 3836
- Next, remove the old pvm log and daemon files in /tmp. (You will need your UID for this step as these files are always called /tmp/pvmd.UID and /tmp/pvml.UID.) For example to remove joeusers files (joeuser's UID is 2260):
rm /tmp/pvmd.2260
rm /tmp/pvml.2260
TYPE: PIR1 PROTEIN DESCRIPTION: PIR1 Pir1 (Annotated sequences) FILENAME: PIR1 /afs/psc/common/usr/local/biomed/db/nbrf/pir1is a valid sequence file entry in the DATABASES file. To access this file, the user will simply use the logical name "PIR1". Another valid entry would be:
TYPE: PIR2 PROTEIN DESCRIPTION: PIR2 Pir2 (Partially annotated sequences) FILENAME: PIR2 /afs/psc/common/usr/local/biomed/db/nbrf/pir2To access this file, the user will simply use the logical name "PIR2"
TYPE: PIR DATABASE DATABASE: PIR PIR1 PIR2 DESCRIPTION: PIR NBRF-PIR database sections 1 and 2To access this database entry, the user will simply use the logical name "PIR".
TYPE: SP PROTEIN DESCRIPTION: SP Swiss Protein Data Base FILENAME: SP /afs/psc/common/usr/local/biomed/db/swiss/swiss TYPE: PIR1 PROTEIN DESCRIPTION: PIR1 Pir1 (Annotated sequences) FILENAME: PIR1 /afs/psc/common/usr/local/biomed/db/nbrf/pir1 TYPE: PIR2 PROTEIN DESCRIPTION: PIR2 Pir2 (Partially annotated sequences) FILENAME: PIR2 /afs/psc/common/usr/local/biomed/db/nbrf/pir2 TYPE: PIR3 PROTEIN DESCRIPTION: PIR3 Pir3 (Unannotated sequences) FILENAME: PIR3 /afs/psc/common/usr/local/biomed/db/nbrf/pir3 TYPE: GBBCT NUCLEIC DESCRIPTION: GBBCT Genbank Bacterial FILENAME: GBBCT /afs/psc/common/usr/local/biomed/db/genbank/gbbct.seq TYPE: GBEST NUCLEIC DESCRIPTION: GBEST Genbank Expressed taged sequences FILENAME: GBEST /afs/psc/common/usr/local/biomed/db/genbank/gbest.seq TYPE: SWISS DATABASE DATABASE: SWISS SP DESCRIPTION: SWISS Swiss-Protein database TYPE: PIR DATABASE DATABASE: PIR PIR1 PIR2 PIR3 DESCRIPTION: PIR NBRF-PIR database TYPE: GENBANK DATABASE DATABASE: GENBANK GBBCT GBEST DESCRIPTION: GENBANK GenBank database
If there is something in this code that causes you instalation trouble, please let us know. We cannot test this code on every machine capable of running PVM. Please send your report to biomed@psc.edu
database=NRL
cutoff=150.0
number=1
gap=-8.0
newgap=-0.0
scoring=pam250
alphabet library=protein,query=protein
searchmethod=maxsegs
alignmethod=maxsegs
listing=file.list
query=snake.query
alignments=snake_max.alignments
save=file.save
title=comparing sequence vs NRL database.
translate library=000000 query=000000
search
quit
5
database=NRL
cutoff=150.0
number=1
gap=-8.0
newgap=-0.0
scoring=pam250
alphabet library=protein,query=protein
searchmethod=maxsegs
alignmethod=maxsegs
listing=file.list
query=snake.query
alignments=snake_max.alignments
save=file.save
title=comparing sequence vs NRL database.
translate library=000000 query=000000
search
quit
Home Pages