Perl Training Australia -
Parsing Techniques for Bioinformatics
|
Trainer: |
Damian Conway |
| Length: | 1 day |
| Target Audience: |
Perl programmers in bioinformatics-related fields who are familiar with simple
regular expressions and the use of modules. The techniques presented are not
restricted to the particular applications mentioned, and will be useful to
anyone who needs to process structured bioinformatics data of any kind.
|
Parsing is the process of detecting and verifying the structure of incoming data
and then processing that data so as to make it available to a program in
convenient ways.
This full-day tutorial will introduce beginner and intermediate Perl programmers
to the wide range of parsing mechanisms available in Perl and explain specific
techniques for parsing data in a variety of commonly used formats. Most examples
will be based on typical parsing problems encountered in Bioinformatics.
Topics covered include:
- simple parsing with regexes
- linear parsing with state machines
- piece-wise parsing with extractors
- structured parsing with grammars
- processing comma-separated text
- dealing with XML and other tagged formats
- dealing with BLAST output and other heterogeneous structured
formats
- handling queries in synthetic and natural languages
- extracting data structures from structured data
- processing file inclusions
- coping with incomplete, malformed, and ambiguous data
- selecting and using appropriate parsing tools from the CPAN
- integrating parsing and object oriented programming
- data mining (parsing as a data recognition tool)
- error detection and consistency checking (parsing as a data
validation tool) structured I/O (parsing as a data acquisition
tool)
- recognition and extraction (parsing as a data search tool)
- hierarchical data processing (parsing as a data transformation
tool)
- task specific languages (parsing as a command specification
tool)
Copyright Perl Training Australia. Contact us at contact@perltraining.com.au