Trinity Usage on Blacklight-PSC



collage.gif


DISCLAIMER: This is UNOFFICIAL documentation developed by the user community to help with running jobs on Blacklight. It may not be completely accurate or updated.

OFFICIAL DOCUMENTATION FOR RUNNING TRINITY ON BLACKLIGHT CAN BE FOUND HERE:
http://www.psc.edu/index.php/user-resources/software/trinity

Trinity Background


Trinity, developed at the Broad Institute and the Hebrew University of Jerusalem, represents a novel method for the efficient and robust de novo reconstruction of transcriptomes from RNA-Seq data. Trinity combines three independent software modules: Inchworm, Chrysalis, and Butterfly, applied sequentially to process large volumes of RNA-Seq reads. Trinity partitions the sequence data into many individual de Bruijn graphs, each representing the transcriptional complexity at at a given gene or locus, and then processes each graph independently to extract full-length splicing isoforms and to tease apart transcripts derived from paralogous genes. Briefly, the process works like so:
  • Inchworm assembles the RNA-Seq data into the unique sequences of transcripts, often generating full-length transcripts for a dominant isoform, but then reports just the unique portions of alternatively spliced transcripts.
  • Chrysalis clusters the Inchworm contigs into clusters and constructs complete de Bruijn graphs for each cluster. Each cluster represents the full transcriptonal complexity for a given gene (or sets of genes that share sequences in common). Chrysalis then partitions the full read set among these disjoint graphs.
  • Butterfly then processes the individual graphs in parallel, tracing the paths that reads and pairs of reads take within the graph, ultimately reporting full-length transcripts for alternatively spliced isoforms, and teasing apart transcripts that correspond to paralogous genes.

The Trinity software package includes all of these tools and can be downloaded here.
Trinity Website: http://trinityrnaseq.sourceforge.net/
Trinity FAQ: http://trinityrnaseq.sourceforge.net/trinity_faq.html
Trinity Forum: http://sourceforge.net/mailarchive/forum.php?forum_name=trinityrnaseq-users

Blacklight Background


The Blacklight resource is hosted by the Pittsburgh Supercomputing Center (www.psc.edu).

Blacklight is an SGI UV 1000 cc-NUMA shared-memory system comprising 256 blades. Each blade holds 2 Intel Xeon X7560 (Nehalem) eight-core processors, for a total of 4096 cores across the whole machine. The sixteen cores on each blade share 128 Gbytes of local memory. Thus, each core has 8 Gbytes of memory and the total capacity of the machine is 32 Tbytes. This 32 Tbytes is divided into two partitions of 16 Tbyes of hardware-enabled shared coherent memory.

This unique architecture allows computational jobs that require a large amount of memory overheard, such as de novo transcriptome/genomic assemblies to be completed. The very large amount of addressable RAM allows for very high read density assemblies, many of which would be outside the computational scope of many other HPC systems.

A complete description of Blacklight can be found at:
http://www.psc.edu/index.php/resources-for-users/computing-resources/blacklight

Obtaining an account


Blacklight is part of the XSEDE program (https://www.xsede.org/), the successor to the TeraGrid. XSEDE is an National Science Foundation funded collection of HPC resources, services and expertise that allows users to use national HPC infrastructure resources remotely. Instructions for obtaining a user account can be found here: https://www.xsede.org/web/guest/allocations. Requirements are that you or a member of your group are a current researcher in the United States of America or have a research partner who is currently working in the United States.

Logging on to Blacklight

There are three options for logging on to Blacklight once you have established a XSEDE user account.

1. GSI-SSHTerm (All Systems): This allows you to access and use all XSEDE resources as well as transfer files to the desired resource
http://sourceforge.net/projects/gsi-sshterm/files/gsi-sshterm/0.91h/gsi-sshterm-0.91h.tar.gz/download?_test=goal

2. Putty/WinSCP (Windows) SSH (Linux): Allows usage/file transfer remotely through two separate programs
Putty: http://www.chiark.greenend.org.uk/~sgtatham/putty/download.html allows remote log ons
WinSCP: http://winscp.net/eng/download.php allows file transfer

3. XSEDE website (Web Browser): Allows usage/file transfer remotely through a web browser
https://www.xsede.org/user-portal


The host name to use is: blacklight.psc.teragrid.org
Upon log-in you will be prompted to enter a user name and password. Use the XSEDE username and password given to you when you received your XSEDE account.

login.PNG

Running Jobs on Blacklight


A highly detailed explanation of executing computational jobs on Blacklight and advanced usage can be found here: http://www.psc.edu/index.php/computing-resources/blacklight
This section gives a brief overview of the basics for usage of Blacklight and a step by step how-to guide on using Trinity on Blacklight.

Blacklight OS Structure

Blacklight uses a custom Linux based kernel structure for the OS and a PBS-Torque like system for scheduling and managing jobs. Users who have experience with either should be in familiar territory.
Helpful for Linux related questions: http://www.linuxquestions.org/questions/

Blacklight Queue Structure

There are 2 basic queues on Blacklight, the debug queue and the batch queue.

The debug queue has a limit of 30 minutes of wall time and 16 cores maximum, good for ensuring your command line execution arguments are correct. The debug queue is NOT to be used for production runs.

The batch queue is broken into subqueues based on the amount of cores and wall time requested. You submit jobs to the batch queue and they are automatically slotted into the appropriate subqueue based on the resources requested.

  • Jobs that ask for 256 or fewer cores can ask for a maximum wall-time of 96 hours.
  • Jobs that ask for more than 256 cores, to a maximum of 1440 cores, can ask for a maximum wall-time of 48 hours.

Jobs requesting more than 1440 cores are sent to a separate queue where they receive special handling.

What if I need more time?

For assemblies that would take longer then the wall time allowed by the queues, or for any other problems, please contact PSC support at:

remarks.PNG
If you start a job, and then realize that you need more time, you can still send email to remarks@psc.edu and PSC can extend the time of your running job.

Memory Allocation

The amount of memory that is allocated to your job is determined by the number of cores requested. The 16 cores on each blade share 128 Gbytes of RAM. This table shows the amount of RAM you have access to based on the number of cores that you request. Because there are 16 cores on a blade, and blades can not be shared among jobs, you must request cores in multiples of 16.
Cores
Memory (Gbytes)
16
128
64
512
256
2048
512
4096
1024
8192
1424
13952

Charges

On Blacklight, Service Unit charges (SUs) are based on the number of cores a job uses. One core-hour is one SU. Because jobs do not share blades, and there are 16 cores on a blade, a one hour job that uses one blade will be charged 16 SUs.

Job Submission


Jobs are executed on Blacklight using a Portable Batch System(PBS/Torque) system. Users submit jobs to a scheduler which determines when the job is executed based on a number of factors including: the resources required for the job, the number of jobs a user has currently in the queue, the job's specified wall-time, and how many jobs are currently running. For quickest turnaround, jobs should only request the amount of resources needed.

You must create a job script and submit it to run a job. A number of things are required. The following template script is an example of running Trinity. Each #COMMENT line provides an explanation of the next line of the script.


#!/bin/csh
#COMMENT ncpus must be a multiple of 16, the formula for total RAM used by number of cpus is ncpus/16*128 = X GB
#PBS -l ncpus=32
#COMMENT The duration of time requested for the job, in this case 40 hours and 30 minutes
#PBS -l walltime=95:00:00
#COMMENT combines stdout and stderr in one file
#PBS -j oe
#COMMENT specifies the queue. change this to 'debug' to access the debug queue (limit of ncpus=16 and walltime=00:30:00)
#PBS -q batch
#COMMENT Emails you when the job starts, stops, or ends
#PBS -m abe -M youremail@youremail.provider

set echo
#COMMENT Needed to load the module command
source /usr/share/modules/init/csh
#COMMENT set stacksize to unlimited
limit stacksize unlimited
#COMMENT move to my $SCRATCH directory, this directory should be where your read files are located
cd $SCRATCH
#COMMENT Load most recent version of Trinity, currently trinity/r2014-04-13p1
#COMMENT Run 'module avail trinity' on Blacklight command line to find name of latest Trinity module
#COMMENT (unless need to continue a run started with a different version -- don't switch versions in the middle of an assembly!)
module load trinity/2.0.3
#COMMENT Load latest versions of supporting modules required by Trinity
module load bowtie/1.1.1
module load samtools/1.1.0
#COMMENT runs the Trinity command
Trinity --seqType fq --max_memory 100G --left reads.left.fq --right reads.right.fq --SS_lib_type RF --CPU 16 --bflyCPU 16 --bflyGCThreads 16 > trinity_output.log

MAKE SURE TO REDIRECT TRINITY OUTPUT TO A LOG FILE AS SHOWN ABOVE (> trinity_output.log) OR YOUR JOB WILL LIKELY GET KILLED!!!
If the output goes through the batch system it will kill the job if the output exceeds 20 MB (which it usually does with Trinity).

Once you have copied the above template script and made the appropriate changes, you can create a job submission file and submit the job to the queue. Here we use the vi editor to create the job submission script. If you don't know how to use vi, see here:
http://heather.cs.ucdavis.edu/~matloff/UnixAndC/Editors/ViIntro.html

Qsub1.PNG

Open a file on the Blacklight command line by typing: vi MyBlacklightSubmissionScript as shown above.
Now copy and paste the wiki script into the vi file and save it on Blacklight (copy entire script from wiki, go to open vi file, press i for insert, right click, hold shift and press z key twice).

Submission of the job can now be completed by typing qsub MyBlacklightSubmissionScript

To ensure that the job was submitted properly use the command qstat -f <pbsJobnumber>. See below for example output:
Qstat.PNG



Note: Successful submission does not guarantee successful completion. An exit status will be given at the end of the job to designate how the job completed.
A detailed explanation of exit status values can be found: http://www.clusterresources.com/torquedocs21/2.7jobexitstatus.shtml

During job run-time, the qstat -f <pbsjobnumber> command can be used to check on the status of a job, how much RAM is being used and how close the job is to reaching wall-time. If at anytime you would like to cancel a running job use the qdel <pbsjobnumber> command.

Loading Files


All files that are needed for execution should be loaded to your $SCRATCH directory. Upon logging in type in
 cd $SCRATCH
 pwd
The directory given will be the path of your $SCRATCH directory. This directory can store and use large files, unlike your home directory.

SCRATCHfinal.PNG





If using GSI-SSHTerm to transfer files, click the following upon logging in:
Tools > SFTP Session
In the Address box type in the full path to your $SCRATCH directory where you want to store the files. (Your screen will have your username rather than 'mbcougar'.)
FileTransfer.PNG

Using Trinity


The batch script above requests 32 CPU (or cores) with 256 GB of RAM for 95 hours. This should be enough to run most small to medium Trinity jobs. If your job is small, you may consider using 16 CPU (or cores) which allocates 128GB of RAM for your job, but be warned that there are only a limited number of 16 core jobs allowed to run on the system, so turnaround may actually be slower than for 32 core jobs. You can check if your 16 core job is held up by other 16 core jobs by running qstat -s <pbsjobnumber>:

user@tg-login1:~> qstat -s 208539
 
tg-login1.blacklight.psc.teragrid.org:
                                                                    Req'd  Req'd   Elap
Job ID               Username Queue    Jobname    SessID  NDS  TSK  Memory Time  S Time
-------------------- -------- -------- ---------- ------- ---- ---- ------ ----- - -----
208539.tg-login1     user     batch_r  myjob         --   --    16    --  00:10 Q   --
   host bl0.psc.teragrid.org has 7 16 core jobs running...limit is 7
 
Only 16 core jobs are limited in this fashion. Other jobs will run based on available cores and the number of jobs ahead of yours in the queue.

If your job is large, consider altering the parameters as necessary to accommodate the data. If you believe that you need more wall-time remember that Butterfly can be run separately from Inchworm and Chrysalis (recommended for large data-sets on Blacklight).

Using Interactive Access

qsub -I -l ncpus=16 -l walltime=00:30:00 -q debug
Interactive access on Blacklight is possible; however, it should only be used for short debugging jobs. The command above will request an interactive session with 16 cores (allocating 128 GB of RAM) and 30 minutes of wall-time. This job uses the debug queue, which has a limit of 16 cores for 30 minutes. Larger jobs must be run with a batch script as demonstrated above.

If your job is killed

If you encounter the following error (or one with slightly different numerical values) that causes the job to stop, you must request more memory and restart the job, since the amount you initially requested was not sufficient:

PBS: Job killed: cpuset memory_pressure 10562 reached/exceeded limit 1
    (numa memused is 134200964 kb)

If you have a gigantic job that will exceed the standard queue's limits for wall-time or RAM please email remarks@psc.edu to request help.

Module Command


PSC has installed the module software on Blacklight. You can load Trinity and all its dependencies with the module command and execute it anywhere as if it were contained in your path. To see what versions of Trinity are currently installed, type
module available trinity
ModuleAvail.gif

Choose the version you want, then load it using its specific version number.
module load trinity/version-number

Note: When using interactive access you must load these modules after you have started your Interactive PBS access.

For a look at all programs that can be loaded with module type in:
module avail
Module Manual Page:http://www.psc.edu/index.php/module


TrinMainScreen.PNG

Before running Trinity, set stacksize to unlimited

If you are using bash, type:
ulimit -s unlimited
If you are using csh, type:
limit stacksize unlimited


Move to your $SCRATCH space

Your scratch directory is where all assembly files should be uploaded and where all large outputs should be kept on Blacklight (your $HOME space has a 5 GB quota). To move to your scratch space type in
cd $SCRATCH

If you need the location of this directory to transfer files with either WinSCP or GSI-SSHTerm type in pwd. This will bring up the directory which should be:
/brashear/<your Blacklight User Name>
Just remember to backup any data on $SCRATCH either to $HOME (if it is on the order of megabytes) or to the archival system (if it is GBs or larger).

Execute Trinity Specific Commands

The following are examples of Trinity commands that can be used. A full list of options, which is highly recommended to read for correct usage, can be seen on Trinity's main site: http://trinityrnaseq.sourceforge.net/


Note: Where <Variable> are present, you must substitute your specific values. Do not include the '< >' symbols in your command. Be sure you are in your $SCRATCH directory where your input files are located.

Strand Specific Sequencing (Preferred Library Method typical of the dUTP/UDG sequencing method) :
Trinity.pl --seqtype fq --kmer_method meryl --left <YourReads1.fq> --right <YourReads2.fq> --output <DirNameForOutput> --SS_lib_type RF --min_contig_length <contigLengthMinCutoff> --CPU 16 --bflyCPU 16 --bflyGCThreads 16 > trinity_output.log
Note: other methods of Strand Specific library generation may require FR orientation, please review the Trinity website for a full explanation.

Non Strand Specific Library
Trinity.pl --seqtype fq --kmer_method meryl --left <YourReads1.fq> --right <YourReads2.fq> --output <DirNameForOutput> --min_contig_length <contigLengthMinCutoff> --CPU 16 --bflyCPU 16 --bflyGCThreads 16 > trinity_output.log

Other Options for consideration

--paired_fragment_length <int> This is the insert size for paired end reads, default is 300
--jaccard_clip    Requires bowtie module to be loaded, only recommended if you are assembling a transcriptome from a gene dense genome such as a fungal genome.
If you have paired end reads, Trinity uses Bowtie to determine that consistent pairing is used, this is not recommended for large genomes.
Ensure that your read names are properly labeled by ending with  "/1" "/2
--kmer_method   (required) <meryl> <jellyfish> or <inchworm>  These are the different methods that can be used for kmer creation with inchworm.
More documentation can be found on the Trinity website or the meryl website listed above.
For large to very large assemblies these parameters can be adjusted for improved performance at
a trade off for  the amount of RAM used.
--cpu <int>     Number of CPUS, this should be equal to the number of CPUs (cores) requested for the job
--bflyCPU <int> Number of CPUS to use for Butterfly,should be equal to that of the amount of CPUs (cores) requested for the job
--bflyHEapSpaceInit <string> This value is the amount of RAM initially each thread will use in the butterfly job,
the product of this value and the thread count can not exceed the amount of RAM allocated for the job.
An example of a acceptable value is 3G for 3GB of initial java heap space
--bflyHeapSpaceMax <string> This is the amount of heap space butterfly will attempt to use if the initial amount is insufficient,
if a job does not complete and exits with an error
--no_run_chrysalis Only Run Inchworm, can be useful when dealing with very large jobs that require a large amount of wall time
--max_reads_per_graph the maximum amount of reads Chrysalis will anchor for any given graph
--max_reads_per_loop  maximum amount of reads to read into memory at once for Chrysalis
 


Advanced options and guide for Trinity use: http://trinityrnaseq.sourceforge.net/advanced_trinity_guide.html