[Bio in Docker] Symposium 2015
Eventbrite: http://www.eventbrite.co.uk/e/bio-in-docker-symposium-2015-tickets-16296680811 Twitter: @bioindocker15
Docker is now establishing itself as the de facto solution for containerization across a wide range of domains. The advantages are attractive, from reproducible research to simplifying deployment of complex code. This event will bring together some notable cases to discuss how advantage of this new technology can best be achieved within the Bioinformatics space.
Day one & two of the event will feature lightning talks from selected speakers.
Day two will additionally include an afternoon mini-hackday to introduce, demonstrate, and invite participation using Docker on some interesting and well scoped problems (you are encouraged to make suggestions for the hackday topics).
Throughout the course of the event we would like to identify where common goals exist in the bioinformatics arena and explore how efforts in containerized solutions could be aligned by establishing a community of Docker users and resources (with a similar function to that of Bioconductor for R). This could include:
- Communal repositories
- Documentation and tutorials
- Forums
Join us for the first exclusive event that brings together discussions on dockers within Bioinformatics!
Group Webpage:http://goo.gl/LuNWKa
Flyer:https://goo.gl/ntC7aK
Getting There:http://wellcomecollection.org/visit-us/getting-here
Latest Conference Agenda:https://goo.gl/LpH07s
Attendee List
Aims (what we want to get out of the meeting)
There is a BioInDocker15 team which you can request access to https://github.com/orgs/KHP-Informatics/teams/bioindocker15
Please contribute.
- An understanding of current uses of Docker in the Bio/Informatics world
- Index of Docker/VM Bioinformatics projetcs
- F1000 Channel, Editorial and Slide Share (and perhaps a short paper documenting the presented work?)
- Bioinformatics Container Specification/API (e.g. BioBoxes)
- Adapt/Adopt CWL for multi component Docker pipelines
- RFC
-
Bioinformatics for the masses? (@davidweisss)
- ...
F1000 Channel Launch: Container Virtualization in Informatics
Please join us for the launch of the new F1000 Channel:Container Virtualization in Informatics. Official name TBC.
About this Channel
Technologies such as Docker are now establishing themselves as a lightweight solution to packaging applications together with their dependencies, solving a range of problems from reproducible research to simplifying deployment of complex code. This channel highlights literature in F1000Research on uses of containers, published container images, workflows and microservices.
Mini-Hacks, Demos and Tutorials
I have made a start. Please contribute if you'd like to.
These are suggestions we have put together from our poll or attendees and some ideas myself and Steve had. We have requested one or more some of the speakers to run a session, we'll try and get an idea of who is interested in participating in which session before the end of the first day.
Hackday Resources
- [x] AWS EC2 VMs (available on request)
- [x] Intel Edison boards
- [x] Laptops BYOD (but the venue kindly asks that they are PAT-tested)
- [x] This git repository (but created others as needed and we'll link to them here)
TBA: List of potential ideas with links to code Clone, commit and push please.
Hackday Topic 1: orchastration and multi-container workflows
(Paolo Di Tommaso, Nebojsa Tijanic, Brad Chapman, Yannick Wurm, Steven Newhouse, Amos Folarin)
Hackday Topic 2: specifications for bio/informatics containers
(Peter Belmann)
Hackday Topic 3: security and using Docker in multi-user environments
(Aanand Prassad? Other volunteers?)
- #15187 User namespaces in experimental branch
-
Docker security
-
Hypervisor-agnostic Docker Engine_
- ...
Hackday Topic 4: bio/informatics Docker users community requirements
(Thomas Ingraham, Michael Markie)
- Docker F1000 Channel for Docker publications
- If separation from DockerHub is needed (I don't think so myself) Portus
Hackday Tutorial 1: introduction to Docker
(Anand Prassad, Kai Davenport)
Hackday Tutorial 2: advanced Docker concepts
(Matt Bates, Matt Barker, Alfonso Acosta Kai Davenport)
- Docker: Compose, Machine, Swarm, Overlay Networking
- Flocker by ClusterHQ
- Jetstack
- kubernetes
- weave
- New features of Dockerv1.9 Swarm 1.0, and Overlay Networking Tutorial
Hackday Tutorial 3: introduction to the NextflowWorkbench, its Docker IDE and its bioinformatics features
(Fabien Campagne)
The tutorial will demonstrate the interactive capabilities of the NextflowWorkbench. The tutorial will use Docker, but assumes no prior knowledge of docker. I suggest to download the software and images prior to the start of the tutorial [follow these instructions]. If you are able to complete the installation instructions, you will be able to follow the tutorial and learn how to create or run workflows with docker.
- Mini-intro to the MPS platform (5': plugins, solutions, models, languages and devkits)
- Creating a workflow with NextflowWorkbench (10': We will simple workflow to process a set of reads and simply print the filenames. This minimal example will illustrate the structure of workflows and demonstrate auto-completion, automatic type calculation and error detection)
- Using bioinformatics resources in workflows (15': We will continue and mofify the workflow to use Kallisto to estimate counts from each reads file. We are able to do this in 15' by reusing (1) automated installation of GobyWeb resources (2) a docker image that provides all the software needed to install these resources (3) a frozen docker image that includes the pre-built Kallisto human transcriptome index).
Hackday Tutorial 4: Hands-on introduction to Nextflow
(Paolo Di Tommaso)
The tutorial will give a quick introduction of the Nextlow workflow framework and programming model. You will learn:
- How to install Nextflow.
- Main abstractions (channels, operators and processes).
- Resuming a pipeline execution.
- Using Docker containers.
- Nextflow configuration file.
- Deployment profiles.
- Sharing a workflow.
Prerequisites:
- Unix-like OS (Linux, Mac OSX).
- Java 7 or 8.
- Docker engine (note versions 1.7.x and 1.8.x are affected by this bug).
- It is suggested to download the following images prior to the start of the tutorial: nextflow/rnatoy and nextflow/examples.
Other hackday Suggestions (need fleshing out)
- Intel: Hack using docker Intel Edison boards (we have some of these to distribute) if you want to participate in this. Intel are interested in applying this for stream processing NGS sequence data(?). @elij to Elaborate
- Installing ubilinux on Intel Edison
...
The problem with Informatic pipelines
I have made a start. Please contribute.
- Reproducible ?
- Mutli-component
- Each component has different dependencies (eg Python2.7 v Python3.0)
- Each component written in a different Language (python, java etc etc)
- Each component has multiple options for tweaking the input and output
- Each component requires multiple inputs
- Each component produces multiple outputs
- Disk I/O (pipes)
- Large files eg. for NGS 10's-100's GB per input/output
- Pipeline components often require access to large shared data and databases (eg NGS Reference Genomes and annotation files)
- Binaries not often available, requires build from source
- Not easy to set up and reproduce for non-informaticians
- Can be hard for informaticians if code is bad, dependency heavy, undocumented or requires very specific versions of OS's and software etc
- Often Suffers from: "But It works on my Machine??"
Some random thoughts...
Only trusted users should be allowed 1) Access to HPC and 2) To control your Docker daemon.
Who is the malicious Hacker that will delete all files and take down your system or make it all public?
Lets make breaches of security a criminal offence.
In the Academic (Bio)Informatic world, most users are basic or clinician scientists with little computer skill.
We the sys. admins and trusted informaticians should set up the pipelines and provide users with push button solutions. We should control what gets run and when, so that all a user has to do is log in and select a pipeline to run on their data.
Index of Dockered & Related Projects for Informatics
I have made a start. Please contribute your own repositories and knowledge.
-
Docker on GitHub
-
BioDocker - BioDocker.github.io
-
Common Workflow Language
-
bioboxes
-
Campagne Laboratory
-
Björn Grüning - Dockerfiles for Bio- & Cheminformatics & Galaxy
-
rocker-org - "rocker": R in Docker
-
CloudBioLinux: configure virtual (or real) machines with tools for biological analyses
-
bcbio-nextgen - community developed variant calling and RNA-seq analysis
- 2015-09-04-Building-a-secure-multi-tenant-Docker-based-Platform-as-a-Service-Part-1-Design-Considerations
-
2015-09-18-Building-a-secure-multi-tenant-Docker-based-Platform-as-a-Service-Part-2-Implementation
- Reproducibility in Science - Nextflow meets Docker
- Nextflow
- ...
Docker Related stuff and Other Geeky-type Fun
-
Bioinformatics for the masses? (@davidweisss)
- Great visual overview of Docker's inner workings
- Docker Cheat Sheet
-
Jess Frazelle (jfrazelle) - Dockerfiles
-
Mac OS X Dev Setup (jfrazelle)
-
Too Much Fun With Docker
- Autocode - open source code generator for every language and framework.
-
Best Practices for Scientific Computing
-
A Quick Guide to Organizing Computational Biology Projects
-
Ten Simple Rules for Reproducible Computational Research
- Docker Ecosystem
- ...
Participating Organizations
I have made a start. Please contribute
Individual Contributors
I have made a start. Please contribute
- Stephen J Newhouse stephen.j.newhouse@gmail.com
- Amos Folarin amosfolarin@gmail.com
- Paolo Di Tommaso paolo.ditommaso@gmail.com
- Elijah Charles elijah.charles@intel.com
- Fabien Campagne fac2003@campagnelab.org
- C. Titus Brown t@idyll.org
- ...
Event Sponsors
Pipeline Pics...
GATK : best-practices