Perl_I.pdf

(29 KB) Pobierz
Objectives
Introduction to Perl & Bioperl (I)
Basic Perl Programming
Xuefeng Zhao
L. H. Baker Center, ISU
• Get an overview of basic Perl
programming
• Write simple perl scripts to manipulate
DNA/RNA sequences
BCB 444/544X
Outline
Why Perl?
Why Perl?
Install Perl/Bioperl on Windows
Run a Hello perl script
Data Types, Variables and Built-in Functions
Control Structures
Basic IO
Subroutines & Functions
More String Manipulation
• Perl: Practical Extraction and Report Language
Easy to learn
Good for string manipulation and File IO
Open source for all OS’s
A lot of Bioinformatics tools available
Install Perl/Bioperl
Ref: Install Note on www.bioperl.org
1. Unix/Linux: Pre-installed
2. Mac: http://www.macperl.org
3. Windows
Download the current ActivePerl from
www.activeperl.com.
The windows
installer package file is ActivePerl-5.8.6.811-MSWin32-x86-122208.msi
– Install ActivePerl using the default values by double-clicking the msi file, using
c:\perl as the Perl home
– *** To install GD.pm.
ppm> install
http://theoryx5.uwinnipeg.ca/ppms/GD.ppd
– To install Bioperl
ppm>rep add Bioperl
http://bioperl.org/DIST
ppm>search bioperl
1. bioperl
2. bioperl
8. bioperl-1.4
….
[1.2.3] bioinformatics tool kits
[1.2.1] non
[1.4] BioPerl 1.4 PPM3 Archive
Run a Hello Perl script
Download all Perl scripts for the class from
www.bioinformatics.iastate.edu/BBSI/
to your USB disk, and run hello.pl
To run:
perl hello.pl
or
To run:
hello.pl
Script:Hello.pl
#!/usr/local/bin/perl
#hello perl script
use strict;
print "Hello, ISU-BCBSI\n";
Basic Syntax:
1. the first #! line indicates the location of perl, used for Unix/Linux OS.
2. Free form, case-sensitive
3. Each statement ends with a semicolon (;)
4. A comment line starts with a pound sign (#), you have to comment line by
line.
ppm>install 8
…….
Successfully installed Bioperl-1.4 version 1.4 in ActivePerl 5.8.6.811
1
Data type, Variables and Built in functions
-
Data Types:
1. Scalar: string and number.
2. Array: list arrays (array) and associative arrays(hash).
Variables:
1. Scalar variables: start with $.
Ex.
$DNAstr, $numNT
2. Array: start with @.
Ex.
@nameAA
3. Hash: start with %.
Ex.
%RestricteEnzym
Some Perl built-in functions:
length
substr index
push
pop
keys
open
close
die
exit
print
Scalar variables: number and string
Number: $numVar=EXPRESSION;
$counter=20;
$Tm=37.5;
String variables: $strVar=EXPRESSION;
#using single quote(‘), no variable expansion takes place,
$str1=‘isu-bcbsi’ ;
$strSupport=‘$$$:NIH-NSF’;
# $strSupport: $$$:NIH-NSF
#using double quote(“), a variable expansion takes place,
#sepcial characters needs to be escape-ed
$str2=“$str1, ames, iowa’ ;
# $str2: “isu-bcbsi, ames, iowa”;
#using backstick(`), the variable is assigned the return values from the command
# line
$strDateNow=`date /t`;
# $strDateNow: “Mon 06/06/2005”
rindex
sort
Operators and Functions for Scalar variables
Arithmetic Operators: =, +, -, *, /, %, **
Array
Array:
Notation: $counter++; $counter= $counter+1; $counter += 1;
$counter--; $counter= $counter-1; $counter -= 1;
Operator and function for strings
Operator/Function
Dot (.)
.=
Example
$str1=“DNA”;
$str2=“RNA”;
$str3=$str1.” and ”.$str2;
$str1 .= $str2;
$len=length(“ACGT”)
$idx=index(“ACGTACGT”, “C”);
$ridx=index(“ACGTACGT”, “C”);
Desc
Concatenation:
$str3: DNA and RNA
Appending:
$str1: DNARNA
$len: 4
$idx: 1
$ridx: 5
$msubstr: GTA
$str1: DN
$str1: DN
@NTlist = (“A”,”T”,”G”,”C”);
# assign 4 elements to the array, included in ();
$nt_2 = $NTlist[1];
# the array index starts 0, the index number is include in [];
$last_idx = $#NTlist;
# last_idx: 3
$num_ele = scalar @
NTlist; # num_element: 4
Operator/Function
push(@arr, element)
pop(@arr)
shift(@arr)
unshift(@arr, element)
delete($arr[$idx])
Example
push (@NTlist, (“A”,”T”)):
$m_nt= pop(@NTlist);
shift((@NTlist)
unshift((@NTlist, “G”);
delete $NTlist[1];
Desc
@NTlist: A, T, G, C, A, T
$m_nt: T;
@NTlist: A, T, G, C, A
@NTlist: T, G, C, A
@NTlist: G, T, G, C, A
@NTlist: G, G, C, A
length(
STRING)
index(
STRING,SEARCH)
rindex(
STRING,SEARCH)
LEN, REPLACEMENT
)
substr(
STRING,OFFSET,
$msubstr=substr(“ACGTACGT”, 2,
3);
chop(
STRING)
chomp(
STRING)
$str1=“DNA”;
chop($str1);
#matching chop, remove “\n”;
chomp($str1);
Hash
Hash
# indexed by strings. Brace {} for key, percent sign % for entire array
# assign 4 elements to the array,
%AAlist=(Ala=>”A”,
Gly=>”G”,His=>”H”, Phe=>”F”);
$aa_gly=$AAlis{Gly};
Operator/function
keys(%ARRAY)
values(%ARRAY)
each(%ARRAY)
delete($ARRAY{KEY})
example
@aa_keys=keys (%AAlist);
@aa_vals=values (%AAlist);
each((%AAlist)
delete($AAlist{Ala});
$AAlis{Val}=“V”;
Desc
@aa_keys: Ala,Gly,His,Phe
Not ordered
@aa_vals: A, G, H, F,
Not ordered
Return pair by pair (key, value)
=>(Ala, A), used for loop
Remove the pair(Ala, A)
Add one element
Control Structures: IF
-
If (condition){
statements;….
}
elsif {
statements;….
}
else {
statements;….
}
E - ESE
LSIF L
If (condition){
statements
;….
}
#use if only
If (condition){
statements;….
}
elsif {
statements;….
}
#use elsif only, no switch in
Perl, use elsif to get around
comparison
Great than
Great than or equal
Equal
Less than
Less or equal
Not equal
Comparison returns -1,0,1
number
>
>=
==
<
<=
!=
<=>
string
gt
ge
eq
lt
le
le
cmp
2
Loop Structures: WHILE,DO
-
Control Structures:IF
-
E - ESE
LSIF L
-- WHILE Loop
while (condition){
statements;….
}
-- DO-WHILE Loop:
do at least once
Do{
statements;….
}while(condition)
-- FOR Loop
For (initial values; test; increment)
{ statement;
}
HILE, FOR
W
If ($m_nt eq “A”) { $num_A++; }
elsif
($m_nt eq “T”) { $num_T++; }
# A counter
# T counter
elsif ($m_nt eq “G”){ $num_G++; } # G counter
elsif ($m_nt eq “C”){ $num_C++; }
else {
$num_error++; }
#C counter
# error
Two control statements for loop:
last
next
Declare that this is the last statement in the
loop
Start a new iteration of the loop
Basic IO
Input
Input
Keyboard (STDIN)
file
$m_input=<STDIN>;
# wait for the input unt the new line character
Open(MYFILEHANDLE, “<mySeqFileName”):
@all_lines=<MYFILEHANDLE>;
Close(MYFILEHANDLE);
print STDOUT “Hello, ISU-BCBSI class!”;
print “Hello, ISU-BCBSI class!”;
Open(MYFILEHANDLE, “>mySeqFileName”);
print MYFILEHANDLE $m_line;
Close(MYFILEHANDLE);
GC_counter.pl
Write a perl script to count GC content in a DNA sequence
Output
Screen
Output
File
File IO
<
>
+>
>>
| CMD
CMD |
Desc
read
write
read and write
append
open a pipe to CMD
open a pipe from CMD
File Test Operator
-r
-x
-e
-d
desc
readable
executable
exists
a directory?
Subroutine & Functions
1. No difference for subroutine and functions in perl. Use subroutine all the time.
2. How to call sub: & is used before the sub name
3. Arguments are passed in the subroutine by a special array @_.
Out2File_FASTA.pl
#demo calls subroutine
$g_msg=“calling from outside”;
&get_msg($g_msg);
print $g_msg, “\n”;
Output the DNA sequence to a file in
FASTA format. The FASTA format description is:
http://ngfnblast.gbf.de/docs/fasta.html
sub get_msg()
{
$m_msg= $_[0];
# Arguments are listed in @_
print $m_msg, “\n”;
$g_msg =“hello from sub”; # the global variable is updated here!!!! Scope issue
}
3
Anti-bugging
#all variables must be declared before being used.
use strict;
# give warning msg
use warnings;
# a little bit more info than use warnings
use diagnostics;
# Limit the scope of variables
my(): lexical scope, visible in the scope of the current block that defines it
only.
local(): does not create a private variable, but let the global variable has
a temp value and be restored to the old values when the variable is
out of scope.
debugging
1. print out or comment out
2 run perl debugger
perl -d your script.pl
db> h
db>n # next step
db> x $your_var # to exam the value
More String Manipulation:split, match & regular expression
Ref Ch7, by Michael Moorhouse and Paul Barry
Split:
$NTStr = “A:T:G:C";
@NTlist= split(/:/, $NTStr );
summary
Matching:
$NTStr=“ATGCAAAAAAA”;
$NTStr =~ /GC/;
Substitute:
$NTStr=“ATGCAAAAAAA”;
$NTStr =~ s/AAA/A/
Translation: character-by-character
translation
$NTStr = “ATGCAAAAAAA”;
$NTStr =~ tr/ATGC/TACG/ ;
Data Types, Variables and Operations
Control Structures
Basic IO
Subroutines & Functions
More String Manipulation
Debug
4
Zgłoś jeśli naruszono regulamin