Lund University

Denna sida på svenska This page in English

Bioinformatics software

This is a list of bioinformatics software available at LUNARC. Please note that this list is not exhaustive. To see if a specific package is available and which versions are installed, you will have to login (how to login) and use 'module spider package-name' e.g. 'module spider BCFtools'.

  • Alfred
    BAM Statistics, Feature Counting and Feature Annotation. Alfred is an efficient and versatile command-line application that computes multi-sample quality control metrics in a read-group aware manner. Alfred supports read counting, feature annotation and haplotype-resolved consensus computation using multiple sequence alignments.Alfred is available as a Bioconda package, you will have to load Anaconda3/2018.12 first before you can use it.
  • Amber
    Amber is a package of programs for molecular dynamics simulations of proteins and nucleic acids.
  • AmberTools
    AmberTools consists of several independently developed packages that work well by themselves, and with Amber. The suite can also be used to carry out complete molecular dynamics simulations, with either explicit water or generalized Born solvent models.
    ANNOVAR is an efficient software tool to utilize update-to-date information to functionally annotate genetic variants detected from diverse genomes (including human genome hg18, hg19, hg38, as well as mouse, worm, fly, yeast and many others).
  • AutoDock_Vina
    AutoDock Vina is an open-source program for doing molecular docking.
  • BBMap
    BBMap short read aligner, and other bioinformatic tools.
  • BCFtools
    BCFtools - Reading/writing BCF2/VCF/gVCF files and calling/filtering/summarising SNP and short indel sequence variants.
  • bcl2fastq
    The Illumina sequencing instruments generate per-cycle base call (BCL) files at the end of the sequencing run. A majority of analysis applications use per-read FASTQ files as input for analysis. You can use the bcl2fastq2 Conversion Software v2.19 to convert base call (BCL) files from a sequencing run into FASTQ files.
  • beagle-lib
    beagle-lib is a high-performance library that can perform the core calculations at the heart of most Bayesian and Maximum Likelihood phylogenetics packages.
    BEAST is a cross-platform program for Bayesian analysis of molecular sequences using MCMC. It is entirely orientated towards rooted, time-measured phylogenies inferred using strict or relaxed molecular clock models. It can be used as a method of reconstructing phylogenies but is also a framework for testing evolutionary hypotheses without conditioning on a single tree topology. BEAST uses MCMC to average over tree space, so that each tree is weighted proportional to its posterior probability.
  • BEDTools
    Bedtools is a fast, flexible toolset for genome arithmetic.
  • Biopython
    Biopython is a set of freely available tools for biological computation written in Python by an  international team of developers. It is a distributed collaborative effort to develop Python libraries and applications which address the needs of current and future work in bioinformatics.
  • BLAT
    BLAT on DNA is designed to quickly find sequences of 95% and greater similarity of length 25 bases or more.
  • Bowtie
    Bowtie is an ultrafast, memory-efficient short read aligner. It aligns short DNA sequences (reads) to the human genome.
  • Bowtie2
    Bowtie2 is an ultrafast and memory-efficient tool for aligning sequencing reads to long reference sequences. It is particularly good at aligning reads of about 50 up to 100s or 1,000s of characters, and particularly good at aligning to relatively long (e.g. mammalian) genomes. Bowtie2 indexes the genome with an FM Index to keep its memory footprint small: for the human genome, its memory footprint is typically around 3.2 GB. Bowtie2 supports gapped, local, and paired-end alignment modes.
  • BWA
    Burrows-Wheeler Aligner (BWA) is an efficient program that aligns relatively short nucleotide sequences against a long reference sequence such as the human genome.
  • bx-python
    The bx-python project is a Python library and associated set of scripts to allow for rapid implementation of genome scale analyses.
  • Cell Ranger
    Cell Ranger is a set of analysis pipelines that process Chromium single-cell RNA-seq output to align reads, generate gene-cell matrices and perform clustering and gene expression analysis.
  • Cell Ranger ATAC
    Cell Ranger ATAC is a set of analysis pipelines that process Chromium Single Cell ATAC data.
  • Chimera
    UCSF Chimera is a highly extensible program for interactive visualization and analysis of molecular structures and related data, including density maps, supramolecular assemblies, sequence alignments, docking results, trajectories, and conformational ensembles.
  • chimerascan
    Chimerascan is a software package that detects gene fusions in paired-end RNA sequencing (RNA-Seq) datasets. Recurrent gene fusions (a.k.a. chimeras) are a prevalent class of mutations that can produce functional transcripts that contribute to cancer progression. Recent advanced in high-throughput sequencing technologies have enabled reliable gene fusion discovery.
  • CNVkit
    CNVkit is a Python library and command-line software toolkit to infer and visualize copy number from high-throughput DNA sequencing data. It is designed for use with hybrid capture, including both whole-exome and custom target panels, and short-read sequencing platforms such as Illumina and Ion Torrent.
  • cnvkit-bundle
    CNVkit is a Python library and command-line software toolkit to infer and visualize copy number from targeted DNA sequencing data. It is designed for use with hybrid capture, including both whole-exome and custom target panels, and short-read sequencing platforms such as Illumina and Ion Torrent. This is a bundle to provide dependencies for cnvkit that aren't available in the standard EasyBuild Python.
  • CNVnator
    CNVnator is a tool for CNV discovery and genotyping from depth-of-coverage by mapped reads.
  • Cufflinks
    Transcript assembly, differential expression, and differential regulation for RNA-Seq.
  • cutadapt
    Cutadapt finds and removes adapter sequences, primers, poly-A tails and other types of unwanted sequence from your high-throughput sequencing reads.
  • deepTools
    deepTools is a suite of Python tools particularly developed for the efficient analysis of high-throughput sequencing data, such as ChIP-seq, RNA-seq or MNase-seq.
    EMBOSS is 'The European Molecular Biology Open Software Suite'. EMBOSS is a free Open Source software analysis package specially developed for the needs of the molecular biology (e.g. EMBnet) user community.
  • EricScript
    EricScript is a computational framework for the discovery of gene fusions in paired end RNA-seq data.
  • FastQC
    A quality control tool for high throughput sequence data.
  • FASTX-Toolkit
    The FASTX-Toolkit is a collection of command line tools for Short-Reads FASTA/FASTQ files reprocessing.
  • FEELnc
    FEELnc (FlExible Extraction of LncRNAs) is an alignment-free program that accurately annotates lncRNAs based on a Random Forest model trained with general features such as multi k-mer frequencies and relaxed open reading frames.
  • fineRADstructure
    Powerful model-based approach to investigating population structure using genetic data. It offers especially high resolution in inference of recent shared ancestry. The high resolution of this method derives from utilizing haplotype linkage information and from focusing on the most recent coalescence (common ancestry) among the sampled individuals to derive a "co-ancestry matrix" - a summary of nearest neighbor haplotype relationships in the dataset. Further advantages when compared with other model-based methods (e.g. STRUCTURE and ADMIXTURE) include the ability to deal with a very large number of populations, explore relationships between them, and to quantify ancestry sources in each population.
    FLASH (Fast Length Adjustment of SHort reads) is a very fast and accurate software tool to merge paired-end reads from next-generation sequencing experiments. FLASH is designed to merge pairs of reads when the original DNA fragments are shorter than twice the length of reads. The resulting longer reads can significantly improve genome assemblies. They can also improve transcriptome assembly when FLASH is used to merge RNA-seq data.
  • FreeBayes
    FreeBayes is a Bayesian genetic variant detector designed to find small polymorphisms, specifically SNPs (single-nucleotide polymorphisms), indels (insertions and deletions), MNPs (multi-nucleotide polymorphisms), and complex events (composite insertion and substitution events) smaller than the length of a short-read sequencing alignment.
  • FusionCatcher
    FusionCatcher searches for novel/known fusion genes, translocations, and chimeras in RNA-seq data (paired-end reads from Illumina NGS platforms like Solexa/HiSeq/NextSeq/MiSeq) from diseased samples.
  • GATK
    The Genome Analysis Toolkit or GATK is a software package developed at the Broad Institute to analyse next-generation resequencing data. The toolkit offers a wide variety of tools, with a primary focus on variant discovery and genotyping as well as strong emphasis on data quality assurance. Its robust  architecture, powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
    GENESIS (short for GEneral NEural SImulation System) is a general purpose simulation platform that was developed to support the simulation of neural systems ranging from subcellular components and biochemical reactions to complex models of single neurons, simulations of large networks, and systems-level models.
    GROMACS is a versatile package to perform molecular dynamics, i.e. simulate the Newtonian equations of motion for systems with hundreds to millions of particles.
    HISAT is a fast and sensitive spliced alignment program for mapping RNA-seq reads. It is recommended that HISAT and TopHat2 users switch to HISAT2.
  • HISAT2
    HISAT2 is a fast and sensitive alignment program for mapping next-generation sequencing reads (both DNA and RNA) against the general human population (as well as against a single reference genome). HISAT2 is a successor to both HISAT and TopHat2.
    HOMER (Hypergeometric Optimization of Motif EnRichment) is a suite of tools for Motif Discovery and next-gen sequencing analysis.  It is a collection of command line programs for Unix-style operating systems written in Perl and C++. HOMER was primarily written as a de novo motif discovery algorithm and is well suited for finding 8-20 bp motifs in large scale genomics data.  HOMER contains many useful tools for analyzing ChIP-Seq, GRO-Seq, RNA-Seq, DNase-Seq, Hi-C and numerous other types of functional genomics sequencing data sets.
  • HTSeq
    Analysing high-throughput sequencing data with Python.
  • HTSlib
    A C library for reading/writing high-throughput sequencing data. This package includes the utilities bgzip and tabix.
  • IGV
    The Integrative Genomics Viewer (IGV) is a high-performance visualization tool for interactive exploration of large, integrated genomic datasets. It supports a wide variety of data types, including array-based and next-generation sequence data, and genomic annotations.
  • IGVTools
    This package contains command line utilities for preprocessing, computing feature count density (coverage), sorting, and indexing data files.
    IMPUTE2 is a computer program for phasing observed genotypes and imputing missing genotypes.
  • Jellyfish
    Jellyfish is a tool for fast, memory-efficient counting of k-mers in DNA.
  • kallisto
    kallisto is a program for quantifying abundances of transcripts from RNA-Seq data, or more generally of target sequences using high-throughput sequencing reads.
  • MACS2
    Model Based Analysis for ChIP-Seq data.
  • MAGeCK
    Model-based Analysis of Genome-wide CRISPR-Cas9 Knockout.
  • manta
    Manta calls structural variants (SVs) and indels from mapped paired-end sequencing reads. It is optimized for analysis of germline variation in small sets of individuals and somatic variation in tumor/normal sample pairs. Manta discovers, assembles and scores large-scale SVs, medium-sized indels and large insertions within a single efficient workflow.
    MAVIS is a Python (requires >=3) command-line tool for the post-processing of structural variant calls. On Aurora you'll need to load GCC and OpenMPI (module load GCC/7.3.0-2.30  OpenMPI/3.1.1) and Python 3.7.0 (module load Python/3.7.0)
  • MEME
    The MEME Suite allows the biologist to discover novel motifs in collections of unaligned nucleotide or protein sequences, and to perform a wide variety of other motif-based analyses.
    The MEME Suite supports motif-based analysis of DNA, RNA and protein sequences. It provides motif discovery algorithms using both probabilistic (MEME) and discrete models (MEME), which have complementary strengths. It also allows discovery of motifs with arbitrary insertions and deletions (GLAM2). In addition to motif discovery, the MEME Suite provides tools for scanning sequences for matches to motifs (FIMO, MAST and GLAM2Scan), scanning for clusters of motifs (MCAST), comparing motifs to known motifs (Tomtom), finding preferred spacings between motifs (SpaMo), predicting the biological roles of motifs (GOMo), measuring the positional enrichment of sequences for known motifs (CentriMo), and analyzing ChIP-seq and other large datasets (MEME-ChIP).
  • Molden
    Molden is a package for displaying Molecular Density from the Ab Initio packages GAMESS-UK, GAMESS-US and GAUSSIAN and the Semi-Empirical packages Mopac/Ampac.
  • MuTect
    MuTect is a method developed at the Broad Institute for the reliable and accurate identification of somatic point mutations in next generation sequencing data of cancer genomes.
  • MultiQC
    Aggregate results from bioinformatics analyses across many samples into a single report. MultiQC searches a given directory for analysis logs and compiles a HTML report. It's a general use tool, perfect for summarising the output from numerous bioinformatics tools.
  • NAMD
    NAMD is a parallel molecular dynamics code designed for high-performance simulation of large biomolecular systems.
  • ncbi-vdb
    The SRA Toolkit and SDK from NCBI is a collection of tools and libraries for using data in the INSDC Sequence Read Archives.
  • NGS
    NGS is a new, domain-specific API for accessing reads, alignments and pileups produced from Next Generation Sequencing.
  • Picard
    Picard is a set of command line tools (in Java) for manipulating high-throughput sequencing (HTS) data and formats such as SAM/BAM/CRAM and VCF.
  • Pindel
    Pindel can detect breakpoints of large deletions, medium sized insertions, inversions, tandem duplications and other structural variants at single-based resolution from next-gen sequence data. It uses a pattern growth approach to identify the breakpoints of these variants from paired-end short reads.
    PLINK is a free, open-source whole genome association analysis toolset, designed to perform a range of basic, large-scale analyses in a computationally efficient manner.
    PLUMED is an open source library for free energy calculations in molecular systems which works together with some of the most popular molecular dynamics engines. Free energy calculations can be performed as a function of many order parameters with a particular focus on biological problems, using state of the art methods such as metadynamics, umbrella sampling and Jarzynski-equation based steered MD. The software, written in C++, can be easily interfaced with both Fortran and C/C++ codes.
  • Protege
    Ontology editor and framework for building intelligent systems.
  • Pysam
    Pysam is a Python module for reading, manipulating and writing genomic data sets.
    QCTOOL is a command-line utility program for basic quality control of gwas datasets and other genome-wide data. It supports the same file formats used by the WTCCC studies, as well as the binary file format described here and the Variant Call Format, and is designed to work seamlessly with SNPTEST and related tools.
  • RasMol
    RasMol is a program for molecular graphics visualisation.
  • ROOT
    ROOT is a modular scientific software toolkit. It provides all the functionalities needed to deal with big data processing, statistical analysis, visualisation and storage.
  • RSEM
    RSEM is a software package for estimating gene and isoform expression levels from RNA-Seq data. The RSEM package provides an user-friendly interface, supports threads for parallel computation of the EM algorithm, single-end and paired-end read data, quality scores, variable-length reads and RSPD estimation. In addition, it provides posterior mean and 95% credibility interval estimates for expression levels.
  • RSeQC
    RSeQC provides a number of useful modules that can comprehensively evaluate high throughput sequence data especially RNA-seq data. Some basic modules quickly inspect sequence quality, nucleotide composition bias, PCR bias and GC bias, while RNA-seq specific modules evaluate sequencing saturation, mapped reads distribution, coverage uniformity, strand specificity, transcript level RNA integrity etc.
  • RevBayes
    RevBayes provides an interactive environment for statistical computation in phylogenetics. It is primarily intended for modeling, simulation, and Bayesian inference in evolutionary biology, particularly phylogenetics.
  • samblaster
    samblaster: a tool to mark duplicates and extract discordant and split reads from SAM files.
  • Salmon
    Salmon is a wicked-fast program to produce a highly-accurate, transcript-level quantification estimates from RNA-seq data.
  • SAMtools
    SAM Tools provide various utilities for manipulating alignments in the SAM/BAM/CRAM format, including sorting, merging, indexing and generating alignments in a per-position format.
  • SeqAn
    SeqAn is an open source C++ library of efficient algorithms and data structures for the analysis of sequences with the focus on biological data.
  • SeqMonk
    A tool to visualise and analyse high throughput mapped sequence data.
  • seqtk
    Seqtk is a fast and lightweight tool for processing sequences in the FASTA or FASTQ format. It seamlessly parses both FASTA and FASTQ files which can also be optionally compressed by gzip.
  • Snakemake
    The Snakemake workflow management system is a tool to create reproducible and scalable data analyses. Workflows are described via a human readable, Python based language. They can be seamlessly scaled to server, cluster, grid and cloud environments, without the need to modify the workflow definition.
  • snpEff
    SnpEff is a variant annotation and effect prediction tool. It annotates and predicts the effects of genetic variants (such as amino acid changes).
    Analysis of single SNP association in genome-wide studies.
  • SplAdder
    SplAdder, short for Splicing Adder, is a toolbox for alternative splicing analysis based on RNA-Seq alignment data.
  • SRA-Toolkit
    The Sequence Read Archive (SRA) Toolkit, and the source-code SRA System Development Kit (SDK), will allow you to programmatically access data housed within SRA and convert it from the SRA format.
  • Stacks
    Stacks is a software pipeline for building loci from short-read sequences, such as those generated on the Illumina platform. Stacks was developed to work with restriction enzyme-based data, such as RAD-seq, for the purpose of building genetic maps and conducting population genomics and phylogeography.
  • STAR
    STAR aligns RNA-seq reads to a reference genome using uncompressed suffix arrays.
  • STAR-Fusion
    STAR-Fusion uses the STAR aligner to identify candidate fusion transcripts supported by Illumina reads. STAR-Fusion further processes the output generated by the STAR aligner to map junction reads and spanning reads to a reference annotation set.
  • Strelka2
    Strelka2 is a fast and accurate small variant caller optimized for analysis of germline variation in small cohorts and somatic variation in tumor/normal sample pairs. The germline caller employs an efficient tiered haplotype model to improve accuracy and provide read-backed phasing, adaptively selecting between assembly and a faster alignment-based haplotyping approach at each variant locus. The germline caller also analyzes input sequencing data using a mixture-model indel error estimation method to improve robustness to indel noise.
  • StringTie
    StringTie is a fast and highly efficient assembler of RNA-Seq alignments into potential transcripts.
  • Subread
    High performance read alignment, quantification and mutation discovery.
    The Subread package comprises a suite of software programs for processing next-gen sequencing read data including:
    • Subread: a general-purpose read aligner which can align both genomic DNA-seq and RNA-seq reads. It can also be used to discover genomic mutations including short indels and structural variants.
    • Subjunc: a read aligner developed for aligning RNA-seq reads and for the detection of exon-exon junctions. Gene fusion events can be detected as well.
    • featureCounts: a software program developed for counting reads to genomic features such as genes, exons, promoters and genomic bins.
    • Sublong: a long-read aligner that is designed based on seed-and-vote.
    • exactSNP: a SNP caller that discovers SNPs by testing signals against local background noises.

      These programs were also implemented in Bioconductor R package Rsubread.