Table of Contents

Prerequisites

ICoVeR is designed and implemented as a regular R package. This guide was tested with R v3.2.0 and RStudio v0.98.1102. Newer versions should work, please report any problems at our github page. Further more, a modern browser is required. We have had most convenient experience with Chrome, but recent versions of Firefox worked as well.

To run our interactive interface, the OpenCPU R-package must be installed. To this end run the following command in R or RStudio.

# Bioconductor is required for the preprocessing of the fasta files
> source("http://bioconductor.org/biocLite.R")
> biocLite("Biostrings")

# OpenCPU is required to run the interactive binning tool
> install.packages("opencpu")

# devtools is required to install the interactive binning tool
> install.packages(devtools)

To get started with the guide you will have to download our prepared package or clone the github repository. In the remainder of this guide we will assume that the package is extracted in your home directory: ~/ICoVeR/.

Quick start

The easiest way to get started with ICoVeR is by installing the package with the pre-loaded CSTR data set. The CSTR data set is generated by us and consists of multiple samples from an anaerobic digester. This data is already in the repository under R.ICoVeR/data. To install ICoVeR open R.ICoVeR/ICoVeR.Rproj in RStudio and use the “build and reload” button in the build tab (top right of the screen). Alternatively. use an R-session to install the ICoVeR package:

# This assumes the workdirectory is ~/ICoVeR (or the actual location on your 
# machine where ICoVeR was checked out)
library(devtools)
install_local(file.path(getwd(), "R.ICoVeR"))

After installation you must start OpenCPU and launch the browser:

# Start opencpu and launch ICoVeR in the browser
library(opencpu)
opencpu$stop() # It starts at a random port, which is annoying.
opencpu$start(8000)
browseURL("http://localhost:8000/ocpu/library/ICoVeR/www/", browser = getOption("browser"), encodeIfNeeded = FALSE)

Prepare your own data

To demonstrate how you can prepare your own data sets for ICoVeR we have added the files, required for ICoVeR, for the data set published by Wrighton et al. To prepare the data files read by ICoVer you have to provide the following files:

  • [REQ] A fasta file with sequences for all the contigs (may be gzip compressed)
  • [REQ] A coverage file with coverage levels of contigs for each of the samples
  • [REQ] An essential single copy gene file with contig - gene pairs
  • [OPT] A clusterings file with binning results from one or more automated binning tools such as metabat. Although optional, it is highly recommended to start with an automated approach to speed up the verification and refinement process.

Preprocessing

In order to load the data into our interactive contig binning system, we need to pre-process the afore mentioned files. This is done by using the scripts provided in the R.preprocessing directory. In this example we will prepare the Wrighton data set for ICoVeR.

The PrepareDataForInteractiveBinning function performs the following steps:

  1. Extract gc_content and contig length for each contig from the fasta file
  2. Extract tetra nucleotide frequencies for each contig from the fasta file
  3. Reads optional binning results from automated methods
  4. Combines extracted information with sample abundance levels into the files which are expected by ICoVeR.

NOTE 1: The ids in the fasta file must match the names in the abundance file. Thus, if a contig in the fasta file starts with a line >contig_123, there must be a line in the abundance file as where the name column has value contig_123. Some error checking is done, but you should not rely on this and make sure that the fasta file and the abundance file are prepared properly.

NOTE 2: Although we got good results with tetra nucleotide frequencies, it is also possible to generate penta nucleotide frequencies. To this end you have to change the PrepareDataForInteractiveBinning script.

Start R studio and type the following commands.

# Set the working directory (change path to your local checkout location)
> setwd("~/ICoVeR")

# Source the required R files for preprocessing.
# Alternatively, have a look at R.preprocessing/preprocessing.R from which
# below commands are taken.
> source("R.preprocessing/FrequenciesSignatures.R")
> source("R.preprocessing/SymmetrizedSignatures.R")
> source("R.preprocessing/ExtractESCG.R")
> source("R.preprocessing/PrepareDataForInteractiveBinning.R")
> PrepareDataForInteractiveBinning(
    dataset.name = "wrighton",
    file.fasta  = "data//wrighton_assembly.fasta.gz",
    file.abundance = "data//wrighton_avg_cov.csv",
    file.escg = "data//wrighton_escg.csv",
    file.clusterings = "data//wrighton_clusterings.csv",
    dir.result = "R.ICoVeR//data"
  )

## Installation

# Check if you find wrighton.rda, wrigthon.schema.rda and wrighton.escg.rda in: R.ICoVeR/data. If so, we
# continue installing the interactive binning application.
> library(devtools)

# NOTE: Before installing you should check the file R.ICoVeR/R/sqlite.R. At the
#       top there is a variable declared named: p.db.dataset. The value for this
#       variable must match the data set you want to analyze (i.e. "wrighton" in
#       this case).
#
#       If you want to bin a different data set, you must change this value
#       **before** installing the R.ICoVeR package.
> install_local(file.path(getwd(), "R.ICoVeR"))

Interactive binning

At this point everything is set up to start the interactive binning process. In order to start the application we need to start OpenCPU:

> library(opencpu)
Initiating OpenCPU server...
Using config: ~/.opencpu.conf
OpenCPU started.
[httpuv] http://localhost:2110/ocpu
OpenCPU single-user server ready.

opencpu$stop() # It starts at a random port, which is annoying.
opencpu$start(8000)

If OpenCPU started without errors, the interactive application can now be accessed Open it in your browser using:

browseURL("http://localhost:8000/ocpu/library/ICoVeR/www/", browser = getOption("browser"), encodeIfNeeded = FALSE)

Or by directly putting the right link into your browser window:

http://localhost:8000/ocpu/library/ICoVeR/www/

NOTE 1: The port number (i.e. 8000) must match with the output of OpenCPU in your RStudio session. This can differ everytime you start the application. To control this, use the commands listed above: first stopping OpenCPU, next restarting it with a fixed port.

NOTE 2: The application stores the most important data resulting from your analysis (i.e. clustering results and tags). An analysis can therefore be splitted into several sessions. The initial preparations do not have to be repeated each time you want to continue working on the data set (assuming binning of the same data set is continued). To continue a previously stopped session, just start R(Studio), load the OpenCPU library as shown above, and point your browser to the correct url.