
Normal Conference VS Developer Conference. SHDH Illustrated by Derek Yu
Attending gatherings for software developers in silicon valley, their hackathons leave much to be desired at bio events like Sagecon, the least of which being the beer.
Hopefully this will be a fun tool for folks not well acquainted with genomics/programming to sandbox and explore in.
The National Center for Biotechnology Information (NCBI) provides a command line based standalone Basic Local Alignment Search Tool (BLAST) package known as BLAST+ to analyze and play with genomic sequence data. Although, the legacy web based BLAST can perform a range of functions, BLAST+ as a command line tool is much better to understand and analyze large amounts of nucleotide data. It may be best to get an idea of what sort of data we’re dealing with by getting into the government’s database:
mokas$ ftp ftp.ncbi.nlm.nih.gov
Connected to ftp.wip.ncbi.nlm.nih.gov.
220-
Warning Notice!
This is a U.S. Government computer system, which may be accessed and used
only for authorized Government business by authorized personnel.
Unauthorized access or use of this computer system may subject violators to
criminal, civil, and/or administrative action.
All information on this computer system may be intercepted, recorded, read,
copied... There is no right of privacy in this system.
Don’t worry about the scary message, this is all public data… well until the funding stops. Take a look in the blast/db directory for many pre-formatted databases NCBI has provided, i.e. genomic & protein reference sequences, patent nucleotide sequence databases from USPTO & EU/Japan Patent Agencies. Get yourself the latest BLAST+ from blast/executables/LATEST , I used ncbi-blast-2.2.25+-universal-macosx.tar.gz .
Installation:
mokas$ tar zxvpf ncbi-blast-2.2.25+-universal-macosx.tar.gz
mokas$ PATH=/Users/mokas/Desktop/ncbi-blast-2.2.25+/bin
mokas$ export PATH
mokas$ echo $PATH
/Users/mokas/Desktop/ncbi-blast-2.2.25+/bin
mokas$ mkdir ./blast-2.2.25+/db
mokas$ blastn -help
USAGE
blastn [-h] [-help] [-import_search_strategy filename]
...
Databases should be loaded directly into /db directory created above with the mkdir command. The last thing that needs to be done is to make a “.ncbirc” text file in the main directory containing the following:
[BLAST]
BLASTDB=/Users/mokas/Desktop/ncbi-blast-2.2.25+/db
This will guide the program to where data is being kept. At the end of the day we should hope to get something like this:
mokas$ blastn -query Homo_sapiens.NCBI36.apr.rna.fa -db refseq_rna
BLASTN 2.2.25+
...
Query= ENST00000361359 ncrna:Mt_rRNA chromosome:NCBI36:MT:650:1603:1
gene:ENSG00000198714
Length=954
Score E
Sequences producing significant alignments: (Bits) Value
ref|XR_109154.1| PREDICTED: Homo sapiens hypothetical LOC1005054... 464 5e-128
>ref|XR_109154.1| PREDICTED: Homo sapiens hypothetical LOC100505479 (LOC100505479),
partial miscRNA
Length=266
Score = 464 bits (251), Expect = 5e-128
Identities = 255/257 (99%), Gaps = 0/257 (0%)
Strand=Plus/Minus
Query 334 CACCTGAGTTGTAAAAAACTCCAGTTGACACAAAATAGACTACGAAAGTGGCTTTAACAT 393
|||||||||||||||||||||||||||| |||||||| ||||||||||||||||||||||
Sbjct 257 CACCTGAGTTGTAAAAAACTCCAGTTGATACAAAATAAACTACGAAAGTGGCTTTAACAT 198
BLAST+ in action.
Much thanks are in order to Dr. Tao Tao of NCBI
A cross-post by Mo from petridishtalk.com
Citations: Standalone BLAST Setup for Unix – BLAST® Help – NCBI Bookshelf