Pjotr's rotating BLOG (older stuff)

1. Building Julia
2. GEMMA: LOCO implementation
3. GEMMA: SNP output
4. GEMMA: speed up testing
5. GEMMA: LOCO implemented in GEMMA
6. GEMMA: LOCO revisited
7. GEMMA: LOCO support (tests)
8. GEMMA: first LOCO support (CLI LOCO interface)
9. GEMMA: add LOCO support (assess reduced SNPs)
10. GEMMA: add LOCO support (assess reduced K)
11. Executing LMM on the REST API
12. Pinning
13. And faster_lmm_d
14. Getting fast_lmm_d to run on CUDA
15. Updating GeneNetwork on GNU Guix
16. Cleaning up fasterlmmd
17. Beacon and Phi
18. fast-lmm-d
19. Beacon: getting Python to run with MKL
20. Beacon: what software to use?
21. First steps on the Beacon Intel Phi supercomputer
22. Progress on GN2
23. Updating the deployment of GN2
24. Getting the latest pylmm-gn2 into GN2
25. Updating Genotype data for the BXD and reproducibility
26. A great summer of code
27. REST: fetching phenotypes (2)
28. Adding a multi-phenotype QTL plot
29. REST: fetching phenotypes
30. Everyone on Guix
31. Getting a server running for testing
32. A git merge nightmare
33. Guix distribution of GN2
34. Preparing GN2 for distribution
- 34.1. Removing large files out of the git repository
- 34.2. Create archive of large genotype and mapping files
35. Many packages to get r-wgcna going
36. Fixing GN2 file paths and a working GN2 server
37. Packaging Genenetwork (cont 5)
38. Packaging pylmm and sambamba
39. Expanding the GeneNetwork effort (the team is growing)
40. Packaging gemma, plink
41. Packaging Genenetwork (cont 4)
42. Packaging Slurm
43. Managing power of the cluster
44. Back in NL and hacking GNU Guix
45. Packaging Genenetwork (cont 3)
46. Packaging Genenetwork (cont 2)
47. Packaging Genenetwork (cont)
48. Started packaging Genenetwork
49. Install Guix
50. Install R/lmmlite
51. Running pylmm on penguin2
52. Installing penguin2
53. Installing an Arvados cluster
54. R/qtl control, genotype and phenotype parsers added
55. Teaching R/qtl to pylmm
56. A new page - starting on pylmm again
- 56.1. IN PROGRESS Make sure the server immediately realizes an error
- 56.2. IN PROGRESS Fix Redis not accepting large objects on Penguin
- 56.3. IN PROGRESS Add data human and testing framework
- 56.4. IN PROGRESS Add covariate handler
- 56.5. IN PROGRESS integrate pylmm CUDA with GN2 using runlmm
- 56.6. IN PROGRESS Make sure SNPs are named (because order may differ)
- 56.7. IN PROGRESS Check difference of treatment human data in GN2/pylmm
- 56.8. IN PROGRESS Test human mapping for GN2
- 56.9. IN PROGRESS Adapt pylmm in GN2 to accept input files
- 56.10. DEFERRED Create python-CUDA package for GNU Guix
- 56.11. DEFERRED Handle plink input file
- 56.12. DEFERRED Get OmicABEL to compile and run the TODO list.
- 56.13. DEFERRED Get the GPU version of OmicABEL to compile and run
- 56.14. DEFERRED Create cache for K
- 56.15. DEFERRED Reduce precision of kinship matrix for easier storage
57. Even older BLOG

Tis document describes Pjotr's journey in introducing a speedy LMM resolver for GWAS for GeneNetwork.org.

1 Building Julia

Saunak's team wants to write CUDA kernels from Julia. This requires a from source build for Julia, as discussed here. If we want to use Julia in GN context I'll have to turn it into a Guix package at some point (or a container). So, I dived in this week and it proved a huge challenge. First of all the existing Julia package in GNU Guix is broken. Tests fail related to Openblas. I tried to build Julia in a container - adding dependencies one by one - and this failed spectularly too. Turns out that Julia is very specific about the compilers and dependencies it uses, which is OK, but downloads all dependencies and patches them (including LLVM, ouch). Worst, the whole build system is based on these defaults. You try to change anything, such as using the system openblas libs, and it gets confused.

After a few days of trial and error I fell back to installing the following tools for Julia 0.6.0 (don't use 0.6.2!) after making sure the shell was empty with

sudo apt-get install gcc-4.7 g++-4.7 libgfortran-4.7-dev

env -i /bin/bash --login --noprofile --norc
export PATH=/usr/bin:/bin
xport LIBRARY_PATH=

Check the shell is empty with

set|grep guix

Now unpack

tar xvzf v0.6.0.tar.gz
cd julia-0.6.0

modifying Make.inc to point at the compiler:

HOSTCC := gcc-4.7
CC := $(CROSS_COMPILE)gcc-4.7
CXX := $(CROSS_COMPILE)g++-4.7

compiling with its own version of cmake

./contrib/download_cmake.sh

and kick off the process

make OPENBLAS_TARGET_ARCH=NEHALEM -j

on problems you may have to do

make OPENBLAS_TARGET_ARCH=NEHALEM OPENBLAS_USE_THREAD=0 -j

restart make several times, including a download and build of cmake

Run the tests with

cd test && make

and then I could run

penguin:~/tmp/julia-0.6.0$ ./julia
               _
   _       _ _(_)_     |  A fresh approach to technical computing
  (_)     | (_) (_)    |  Documentation: https://docs.julialang.org
   _ _   _| |_  __ _   |  Type "?help" for help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 0.6.0
 _/ |\__'_|_|_|\__'_|  |
|__/                   |  x86_64-linux-gnu

julia> Pkg.update()
julia> Pkg.add("CuArrays")
julia> Pkg.build("CUDAnative", "LLVM")
julia> using CuArrays
julia> xs = cu(rand(5, 5))
5×5 CuArray{Float32,2}:
 0.779928   0.47756   0.470545  0.334982  0.145695
 0.379416   0.183123  0.488386  0.33913   0.811427
 0.420695   0.465014  0.666999  0.702734  0.494614
 0.0371683  0.149964  0.596714  0.742407  0.896451
 0.422963   0.682159  0.782778  0.011608  0.618023

julia> ys = cu[1, 2, 3]
3-element CuArray{Int64,1}:
 1
 2
 3

julia> xs_cpu = collect(xs)
5×5 Array{Float32,2}:
 0.779928   0.47756   0.470545  0.334982  0.145695
 0.379416   0.183123  0.488386  0.33913   0.811427
 0.420695   0.465014  0.666999  0.702734  0.494614
 0.0371683  0.149964  0.596714  0.742407  0.896451
 0.422963   0.682159  0.782778  0.011608  0.618023

at least we can use Julia with GPU now. But I am not exactly happy with their deployment system. Now if they just used GNU Guix for development…

Another point of note is that Julia installs packages in $HOME/.julia/v0.6/. So, separation by minor version number.

I reported issues on the GNU Guix tracker.

2 GEMMA: LOCO implementation

LOCO is implemented in GEMMA for the BIMBAM format. Pass in the -loco 1 switch for LOCO of chromosome 1.

To achieve this I had to include a test frame work. I also made a number of other changes, notably added

various comments in source code
tests in test framework inlc. fast-check
NDEBUG compilation support in the Makefile
-debug switch for GEMMA debug output
debug.h which includes enforce functions which work like assert. Unlike assert, enforce also works in release compilation
-nind switch limit the number of individuals used (trim_individuals for testing)
enforcing tests of input files - e.g. are number of individuals correct
checks for memory allocation - we should add more of those
more checks for gsl results - we should add more of those
replaced strtoken with regex as a first case. They should all be replaced. strtoken is not thread safe, for one.
introduced C++ iterators
introduced C++ closure in BimBam LMM for cached processing
more localized initialization of variables - makes for demonstratably more correct code
-ksnps adds snps into setKSnps
-gwasnps adds snps into setGWASnps
both sets are computed by -loco
attempted to make the code easier to read

3 GEMMA: SNP output

With LOCO GEMMA segfaulted in output of snpInfo. I added debug support so I could find the line with

gdb --args ../bin/gemma -g ../example/mouse_hs1940.geno.txt.gz -p ../example/mouse_hs1940.pheno.txt -n 1 -loco 1 -a ../example/mouse_hs1940.anno.txt -k ./output/mouse_hs1940_LOCO1.cXX.txt -snps ../example/snps.txt -lmm -nind 400 -debug -o myout
  Thread 1 "gemma" received signal SIGSEGV, Segmentation fault.
  LMM::WriteFiles (this=this@entry=0x7fffffffcd00) at src/lmm.cpp:178
  178                                       sumStat[t].beta<<"\t"<<sumStat[t].se<<

which it did at t=0 (and sumStat.size = 0). An out of bounds error and C++ just crashes in debug mode. Another case for using D instead!

Turns out sumStat was not being filled because it happens in batches and because LOCO is a subset it never got large enough for a batch (out of 1000 SNPs there are 67 relevant SNPs on CHR1 for LOCO). When the batch size is set to 1 it works.

After fixing that I got the GWAS correct.

4 GEMMA: speed up testing

To make testing faster the obvious thing to do is to reduce the number of SNPs and individuals used in test files. SNPs we already reduce by using -snps. I added a -nind switch to set the maximum number of individuals to consider. If set it will trim indicator_idv to size. indicator_idv contains boolean values [0,1] and is set with phenotypes and covariates (NA). When reading the genotypes, however, all genotypes are considered.

To set the K size PARAM::ni_total and ni_test are used where ni_total is the size of indicator_idv and ni_test is the number of values == 1 therein. So the trick is to trim indicator_idv in phenotypes and covariates (when set).

5 GEMMA: LOCO implemented in GEMMA

Yesterday I got to implement LOCO for K. Now I want to limit the number of SNPs used (globally) in GEMMA which makes for writing faster tests too. For this we use the already provided -snps file. Before setKSnps goes in to K computation we need to get the cross section with setSnps. GEMMA already uses indicator_snp which is built up by GEMMA - so that works automatically for K!

For GWAS, however, we need to filter on the chromosome. And now we can't limit setSnps to one chromosome when loco is set.

Essentially there are two modes of operation, using -snps and -ksnps and LOCO - where LOCO can have -snps to subset input SNPs. snps and ksnps overlap if the user wants that for some reason. If they are complimentary the K does not compute(!) When ksnps is set, arguably it should be taken as a given and we should not cross section with snps. At this point, however, we focus on LOCO and indicator_snp should be used.

How quickly things get complicated!

when snps: default all SNPs considered in the GEMMA run (incl. K) when ksnps: default all SNPS used in K but with LOCO use chr with snps only gwasnps: SNPS used in GWAS - use overlap with snps only

In the case of LOCO ksnps and gwasnps are computed and non-overlapping. I am not adding a -gwasnps option now, but it could be used in the future to override -snps for gwas.

Now we don't want to redo the work of indicator_snp, which means that we have to tell K that it should only use the overlap of ksnps and indicator_snp if ksnps is defined, otherwise indicator_snp should be used (by default). In the case of GWA (LMM) it should use the overlap of gwasnps and indicator_snp when gwasnps is defined. Otherwise indicator_snp should be used (by default).

What are the use cases?

User runs vanilla GEMMA: all SNPs are considered input for GWA and K
User passes in -snps: all these SNPs are considered for GWA and K
User passes in -snps and -ksnps: All these SNPs are used for GWA, Ksnps are used for K
User passes in -loco: SNPs are split by chromosome (GWA incl., K excl.)
User passes in -snps, -gwasnps and -ksnps could mean that also GWA is subset explicitely (nyi)

In all cases indicator_snp is honored.

6 GEMMA: LOCO revisited

For each chromosome, Peter's script computes K using R's tcrossprod (G*Gt). Next it creates a geno.txt and map.txt file containing only the markers on the chromosome. Gemma is invoked as

gemma -g geno.txt -a map.txt -p pheno.txt -c covariates.txt -k kinship.txt -notsnp -lmm 2 -lmin 0.01 -lmax 100

where -notsnp means "minor allele frequency cutoff is not used", -lmm2 gives the likelihood ratio test and lmin/lmax give the min/max value for lambda (default 1e-5/1e+5).

In my implementation we will be able to leave out any subset of SNPs (LOCO is just one version handled by the -loco switch). Gemma already has a switch -snps filename which specifies an input snps file name to only analyze a certain set of snps.

So, I created a test that mimics LOCO1 (LOCO chr 1) by passing in -ksnps containing a list of snps for computing K and -snps containing a list for GWAS. This is the most general form. When you pass in -loco chr it creates the same lists from the chromosome annotations file.

-snps is contained in PARAM.file_snps setSnps (a set<string>). Similarly, -ksnps is contained in PARAM.file_ksnps and setKSnps.

For the Kinship calculation GEMMA passes in a vector<int> indicator_snp which essentially is a selector for the SNPs to include. It also has indicator_idv to skip on individuals for GWAS later.

So, for the computation of K, what we need to is to setKSnps right at the point of entry. We'll keep it independent of indicator_snp. readFile\_anno is currently part of BimBam (we may need to handle it for plink too) and files mapRS2chr. So, to filter on a chromosome with LOCO we'll setKSnps from that.

Did I write I honestly dislike C++? Takes too much time to get things done. But it is useful to understand GEMMA. We have more plans with the tool.

7 GEMMA: LOCO support (tests)

[2017-07-26 Wed] Back to hacking on GEMMA. First I had to do some work on the test system and the package deploys with tests on GNU Guix. Tests have gone onto main line GEMMA, so that is progress.

Meanwhile Zach is adding support for bimbam format to GN2 which means we no longer have to support binary plink formats.

8 GEMMA: first LOCO support (CLI LOCO interface)

After reducing the warning output of gcc, I looked at getting LOCO started for the Kinship matrix. First passing in a -loco command line switch, next getting chromosome info from a bim or anno file which contain both SNP/marker names and chromosome name/number. For now I'll fetch the chromosome from the anno file. The first round of implementation is on bimbam formats only, so I am ignoring Plink and others.

The annotation to use is mapRS2chr (read by ReadFile_anno, selected in PARAM.file_anno). Funny thing, in the K examples the anno file is passed in on the command line, but it is never used! Looking at the code I realise we can generalise SNP selection for K and GWAS. In fact GEMMA already has subselection of SNPs with the -snps switch, passing it into PARAM.file_snps and loading the snps into setSnps (a set<string>), though it is not yet used for computation of K. Or maybe it is because Readfile_geno actually uses this set. Oh no, it is not because the bimbamKin does not actually use Readfile_geno. Doh.

setSnps are used in vc.cpp, io.cpp (both to subset geno files) and param.cpp (to load SNPs). Now, with LOCO, we need to exclude SNPs on chr for K and include SNPs on chr for GWAS, so we need to use that list in two different ways. I think it is more generic to be explicit (so we can actually ignore SNPs from geno files for other purposes). I.e. every SNP should be marked as either include in K, include in GWAS or ignore. A SNP can potentially be included in both K and GWAS so we need two states: string snp, bool in_K, bool in_GWAS. And there may be others later. For LOCO in_K and in_GWAS are exclusive (normally), but I am not going to rule out combinations. Main driver here is to generalize SNP selection, so we can go beyond LOCO (which would just test for chromosome name).

So, how to proceed? A snpSet should be a set of SNPs that have a certain state. So we can introduce SNPSTATE which will return snp.in_K and snp.in_GWAS for a SNP. We should also be able to select a subset of SNPs to run in the first place. So we should return bool snp.in_set. That would allow changing and combining the way setSnps is used in the current code base.

For LOCO implementation, the first step is too read the anno file when -loco is defined and create snpSet giving state to all SNPs. Next read the -snp file and reduce the set which is another filter. This will give us the option of using smaller files for unit testing too(!)

Note that there already exists a SNPINFO class which includes chromosome name, MAF etc. We are not going to use that because I don't like overusing definitions. SNPSTATE is not the same as SNPINFO. Maybe I should call SNPSTATE SNPFILTER instead, so it is more clear. Nah, I like SNPSTATE better as a generalization of the state SNPs are in for the algorithm(s) - naming stuff is the hardest part of programming. It actually leads to a simplification because now we can have a set of K SNPS and a set of GWAS SNPS as used in LOCO. So we end up with set<string> after all and setKsnps and setLMMsnps as two sets and setSnps as the overall SNP set. It is just a little space inefficient, but that hardly matters here (both K and G matrices will be smaller in LOCO which offsets these lists).

9 GEMMA: add LOCO support (assess reduced SNPs)

[2017-07-09 Sun] Started looking at the LLM fit in GEMMA. Of course all input formats are treated differently, though with more shared code this time. Looks like it is easy to fit by chromosome and the easiest option would be to rerun GEMMA for each K-1 - i.e., an outer loop. This implies loading the SNP data file for every run. Not necessarily a big issue in Linux because the file will be cached if it fits RAM. It would be nicer to load the SNP file once though and create an inner loop. I'll look into that.

Today I added a simple unit test framework to GEMMA (the same one we use for sambamba). I am not moving until something like that is in place. https://github.com/xiangzhou/GEMMA/pull/54

Added openblas linking to the make file and it is only fractionally faster than standard blas on my laptop:

Openblas:

real    1m18.514s
user    1m21.040s
sys     0m2.716s

Original:

real    1m23.480s
user    1m22.844s
sys     0m0.504s

this is due to GEMMA using cblas and eigen instead of native dgemm calls for these file formats. It may also be the old BLAS routines are still linked in by lapack. There is room for improvement here because openblas can use multiple cores in parallel.

Then I updated the Guix definition of GEMMA and that identified a bug.

10 GEMMA: add LOCO support (assess reduced K)

I am looking into adding LOCO support to GEMMA. GEMMA is split into parts that compute K, perform eigen decomposition and run a GWAS. For LOCO we have the option of running a script and feed data to GEMMA as was done here by Peter Carbonetto who, as it happens, also started work on GEMMA itself (convenient!).

Starting from main.cpp, gemma.cpp contains the help info and the logic for running commands. E.g. to compute K see gemma.cpp line 1400, or so "Calculating Relatedness Matrix …".

With K a GSL matrix is initialized (ni²) and filled by CalcKin (param.cpp) which invokes PlinkKin, bgenKin or BimbamKin and computes K from the genotype file (io.cpp). Assuming we have the chromosome information at hand we can compute the reduced K right here in io.cpp. bim, anno, bgen and geno files contain chromosome information.

Note the max K dimension is set to 10,000. In fact, there are many hard coded sizes we should add assertions for. K is computed with gsl_blas_dgemm. eigenlib_dgemm is being introduced, but code is duplicated for every input format, so either one can be used and the code is not consistent.

The good news is that the C++ code is readable and we should be able to adapt it. Tomorrow I'll look at partial SNP runs for LOCO.

11 Executing LMM on the REST API

The REST server gn_server has been continuously running without fail for After updating Elixir and Erlang to freshly minted versions in Guix I added executing shell jobs to gn_server.

12 Pinning

The next step is pinning data structures in CPU or GPU RAM (i.e. caching) so we get faster results. For the test dataset KveT is used for multiplication 1x for each SNP, XX_i double that, Xt and Yt 4x. KveT is the largest matrix (1024x1024) and computed once, so it makes sense to pin that first - saving approx. 8GB of RAM copying.

13 And faster_lmm_d

We are faster than pylmm by a mile now.

(insert mail)

14 Getting fast_lmm_d to run on CUDA

A first build of the CPU version of faster_lmm_d on CUDA shows it is multicore:

real    0m19.324s
user    0m37.324s
sys     0m57.640s

This is probably openblas's doing. This version uses 3.5GB of RAM for the 8000 test versus 1GB for pylmm. The CUDA version is much slower and used 360GB virtual memory! We need to look into that.

After removing the small matrix calls (you don't want to multiply 30x30 on a CUDA) 8000 came within acceptable range

real    0m25.160s
user    0m27.272s
sys     0m51.564s

The other thing is that the pylmm CUDA implementation is quite a bit faster (note the numpy CPU version is a lot slower):

penguin2:~/pylmm_gn2$ time ~/python2.7/bin/python2.7 ./bin/runlmm.py --pheno data/test8000.pheno --geno data/test8000.geno run
real    0m13.670s
user    0m40.136s
sys     1m22.524s

We should be able to beat that!

But still something odd is going on. First multiple processes are fired up and second massive RAM is consumed. This happens right at the first cudaMalloc. I had to dive into the python CUDA libraries to check how they initialize.

After some tweaking the CPU version was on par with pylmm thanks to openblas being multicore inside:

real    0m12.264s
user    0m29.940s
sys     0m44.160s

Unfortunately, when running CUDA on all matrices larger than 50MB (which does one CUDA multiplication) the initial version runs at 20s - which includes warming up the TESLA. Just linking in the CUDA libs

real    0m19.880s
user    0m28.784s
sys     0m52.124s

Running CUDA on all matrices larger than 1MB:

real    1m1.786s
user    0m49.096s
sys     1m4.672s

and on 55MB (which runs one CUDA multiplication)

real    0m20.620s
user    0m21.720s
sys     0m43.628s

In other words, CUDA is slow for smaller matrices (not really a surprise). Note also that the parallization of openblas is lost. I am not sure why this is, either all parallel gain is in the large matrix computation, or some other way it has been changed.

After some profiling of largish matrices (thanks Prasun!) we found that half the time is spent in duplicating arrays and the other half in CUDA copying. To fix the first, all duplications happen in the DMatrix object on initialization. This is on purpose because not all code is safe and it is easy to overwrite DMatrix.elements with new values. Duplication makes sure the algorithm is correct (at least where it comes to side effects). It is a type of validation that we need to be able to run with copying, rather than overwriting.

So, now we need to remove .dup at compilation.

We have a compile-time validate switch, but what that actually validates the output between GPU and CPU. So, we'll add a FORCE_DUPLICATE switch to enforce duplication when we want it (mostly for testing). At least it dropped the last test to

real    0m17.607s
user    0m19.612s
sys     0m42.832s

Based on the profiling output all time is spent in cudaMemcpy. So, we know what to focus on.

15 Updating GeneNetwork on GNU Guix

GeneNetwork2 installs on a recent GNU Guix. I had to update a few packages when dependency resolution changed. We now have Gemma and CTL latest, for example.

For deployment I created a Docker image of GeneNetwork2 using the new pack command:

env GUIX_PACKAGE_PATH=../guix-bioinformatics ./pre-inst-env guix pack -f docker -S /opt/gnu=/ genenetwork2 --no-grafts

It is great to see how Guix is moving forward.

16 Cleaning up fasterlmmd

This week I have been working with Prasun to clean up fasterlmmd. Mostly renaming stuff and introducing immutable datatypes which brought out a number of issues. The current code base is single threaded and meant to be easy to read. It is already faster than the old pylmm.

I read again Artem's fast-lmm BLOG and Karl's. Next I went again through the current code bases of pylmm, lmmlite (Karl), Lippert and OmicAble and I remember why we are doing this again: they are all lacking in the engineering department. Karl's implementation is the best one, I think. If he had not used R and C++ I would have gone for that. But at least it is short and clear and we can use it as a reference (it was based on pylmm too).

After cleaning up we will implement a CUDA version using cuBLAS which supposedly can handle matrices larger than GPU RAM now (alternatively decompose) and attach that to the REST API. After that we should be able to get it into GN2.

And then on to the Phi! We are ordering a KNL for the purpose.

17 Beacon and Phi

We are ORNL with the JICS team to get some of our tools working on the Beacon supercomputer. Login in to Beacon and jump to a node

qlogin -i

Next login to one of the Phi's following this.

[pjotrp@beacon036 .ssh]$ micssh beacon036-mic0
cat /proc/version
Linux version 2.6.38.8+mpss3.7 (qb_user@sid-bld02.pdx.intel.com) (gcc version 5.1.1 (GCC) ) #1 SMP Fri Mar 25 06:35:55 EDT 2016

Recently a new gcc compiler was installed which can use the Phi vector instructions using

g++ -O2 -mavx512f -mavx512cd -fopt-info-vec-all

and so gcc can compile cross platform according to this.

The other good news is that LLVM can generate C. I am trying llvm-cbe. There are also other attempts such as true LLVM backends.

18 fast-lmm-d

[2017-03-16 Thu] Over the last months we have put in a lot of work to make fast-lmm-d happen, the single core version is faster than the python version, even without real optimizations. We are now using the profiler to see where we can do even better.

And I wrote relocatable GNU Guix support to deploy packages on Beacon. Video of FOSDEM2017 presentation in the HPC room.

19 Beacon: getting Python to run with MKL

pip3 install --user virtualenv
~/.local/bin/virtualenv ~/virtualenv-python-3.4
cd ~/virtualenv-python-3.4
source bin/activate
pip3 install numpy

but now I find numpy is linked against OpenBLAS instead of MKL! Well, that is useful for benchmarking, but not what I wanted to do.

The Intel numpy build instructions are somewhat enlightening though they do not mention Phi as a target. What is interesting is explicit linking against MKL and that it is even possible to use the GNU compilers. The gcc provided is

gcc (GCC) 4.8.2 Copyright (C) 2013

which should be good enough to bootstrap Python and even Guix and friends.

20 Beacon: what software to use?

[2016-10-20 Thu] Beacon support wrote:

`Sorry about the empty module. We did support OpenCL at a time (when intel MPSS is at 3.3), but AFAIK the intel doesn't officially support OpenCL for MIC beyond MPSS 3.3. On Beacon we currently have MPSS 3.7.'

and suggests to use for Python Intel developer's pyMIC library (https://github.com/01org/pyMIC). In this library Python is bound against Intel's MKL library, see for example the BLAS functions.

Of course, the MKL is fast. But (like CUDA) it is propriety closed source software which makes it a no go beyond the short term for me.

For D there is MIR which I would love to try.

By the looks of it, as stated by Beacon support, Intel stopped support for OpenCL on Phi in favour of its own 'Manycore Platform Software Stack' which is nasty because it rules out all FOSS. In fact, Boost, BLAST and other tools on Beacon all want OpenCL support! And Beacon even has the intel-opencl compiler (which won't work without OpenCL drivers). The advantage of OpenCL is that it is easier for programmers to target multiple platforms. With my software it is not too hard to target different backends explicitely.

My conclusion: all non-Intel solutions on Beacon require significant software compilation and dependency resolution (read deployment) at this point. For example to get OpenCL going is non-trivial and GNU Guix would be a real help. Once bootstrapped I could even use D and LLVM. Also we may need to build Erlang and Elixir - something Guix is great at.

The current LMM implementation is in Python with Numpy, already targetting BLAS on CUDA. To get LMM running on Phi there are 3 routes now:

Python targetting Intel's MKL libs (pyMIC or Numpy-mkl)
Python using OpenCL
D generating LLVM code for Phi

Python+MKL (1) appears to be the easiest, provided I get pyMIC or Numpy-mkl to compile. OpenCL (2) would be great to have to be able to roll out to different platforms - a true FOSS solution. D+LLVM (3) is the experimental one with great scope for performance improvements. Both (2) and (3) would need GNU Guix to succeed, in my opinion.

It looks like the pre-installed Python on Beacon already supports MKL to some degree:

config-3.4m/Makefile:CONFIGURE_LDFLAGS= -Wl,-rpath,/sw/cs300_centos6.6/python/3.4.3/centos6.6_intel16.1.056/lib,-rpath,/global/opt/intel/parallel_studio_xe_2016_update1/mkl/lib/intel64

I also find that Python numpy shared libs are linked against MKL

ldd ./site-packages/numpy/linalg/lapack_lite.cpython-34m.so|grep -e mkl
    libmkl_rt.so => /global/opt/intel/parallel_studio_xe_2016_update3/mkl/lib/intel64/libmkl_rt.so (0x00007f4f4a329000)

So, if these libraries know how to farm out to the Phi's we should be good to go. At least, it is worth exploring the default Python installation first with numpy. In parallel I'll try to install Guix packages using Roel's method so at least I'll be able to generate my own compiler environment (we'll probably want this for Elixir support too). Roel's method essentially has one build Guix packages on a build host and copy and run them in $HOME. This means Guix itself can not be run on Beacon, but the packages can, including a more recent gcc. I'll use Penguin2 to targe my $HOME which is /nics/b/home/pjotrp.

21 First steps on the Beacon Intel Phi supercomputer

The first step was getting access again. I had locked myself out somehow… But that was fixed in about 6 hours. I have ssh access again. One of the first things to check is that Beacon is running a 2012 Linux kernel using Red Hat 4.4.7.

Linux version 2.6.32-573.22.1.el6.x86_64 (mockbuild@c6b8.bsys.dev.centos.org)
  (gcc version 4.4.7 20120313 (Red Hat 4.4.7-16) (GCC) )
  #1 SMP Wed Mar 23 03:35:39 UTC 2016

For OpenCL Beacon have the intel-opencl/1.2-3.0.67279 compiler. Also PLASMA is provided as the plasma and dplasma modules, a higher level linalg library. There is Magma, another Linalg lib. Boost is also available, but lacks OpenCL support (yet - coming in v1.6.1). Then there is a JVM targetting the Phi. Hmmm. Choice, but not (exactly) what I am looking for.

The first step is to see if I can get a Python to work with OpenCL using the intel-opencl compiler. I found there is a module

module add intel-opencl

which loads and sets paths that do not exist, e.g.

LD_LIBRARY_PATH=/sw/cs300_centos6.6/magma/1.2.0-mic/centos6.6_intel16.1.056/lib:/global/opt/intel/opencl-1.2-3.0.67279/lib64

time for an E-mail to support…

22 Progress on GN2

[2016-10-18 Tue] Many good things happening on GN2. New datasets are being added at a rapid rate. We have a new genotype map for the BXD. The REST API gives access to almost all data. A genome browser has been embedded and the list just goes on… The coming period I am going to focus on running a pilot on a super computer named Beacon.

23 Updating the deployment of GN2

[2016-08-16 Tue] Christian has done a lot of work in getting BD ready for GN2. My job is to get it on staging by updating the Guix packages. The last update was from February. Unfortunately Guix was misbehaving. The GNU servers are giving errors and the latest checkout of the Guix tree is also erroneous. One can live with one or the other but not both! I spent quite a few hours and rolled back to a version from a Month ago to start rebuilding GN2. One of the first problems I encountered was

_imagingft.c:68:32: fatal error: freetype2/fterrors.h: No such file or directory
 #include <freetype2/fterrors.h>
   compilation terminated.
   error: command 'gcc' failed with exit status 1
   phase `build' failed after 16.0 seconds
 builder for `/gnu/store/dmb05a6g2wfj8rx8mx3an1mwqrlznpzw-python2-pil-1.1.6.drv' failed with exit code 1

now python-pil I packaged and it is really, really old. According to this python-pillow is the modern package and, in principle a drop-in replacement. It is also python3 compatible: we have to aim for that. So, my first attempt is simply replacing pil with pillow. I'll probably run into a few issues in GN2.

First error I encountered was related to python-pil/pillow:

  File "/home/pjotr/.guix-profile/lib/python2.7/site-packages/Pillow-3.1.1-py2.7-linux-x86_64.egg/PIL/ImageDraw.py", line 91, in setink
   "Please use keyword arguments instead.")
Exception: setink() has been removed. Please use keyword arguments instead.

This error is coming from piddle, another really old package from the nineties - latest version from 2002. We should really get rid of it, but it is used by the GN1-style plot and (apparently) a plink plot.

Anyway, after a long look, I decided it was easier to fix the old python-pil package so it can find fterrors.h again. Isn't deployment fun?!

Apart from that major changes are R running at 3.3.0, and also I also did critical updates of python-numpy, python-scipy and openblas (see bug in OpenBLAS). I'll also upgrade R/qtl and add a new package for the excellent R/qtl2. There are more upgrades, mostly because thanks to the rolling updates of GNU Guix.q

The short and long of it is that GN2 builds again using a recent checkout of GNU Guix. Mind, I also updated plink-ng in the current build. Some of the parallel tests fail. May need to fix that before we roll into production (and add a test in mechanize).

24 Getting the latest pylmm-gn2 into GN2

The current version of pylmm we are using is my multi-core build from May 2015. That is over a year ago and does not include the CUDA features I added later. The main reason for not updating was a bug in OpenBLAS and that the new code has a different data transport requirement. Now we have the new gn_server running I am ready to bring the CUDA stuff into GN2. First, I tested that the multi-core version gives the same results as the earlier one

bergamo:~/genenetwork/pylmm$ env PYTHONPATH=.:$PYTHONPATH python pylmm_gn2/runlmm.py --geno=../test_gn_pylmm/data/test8000.geno --pheno=../test_gn_pylmm/data/test8000.pheno run

[15%] Calculate Kinship: 1.84636998177
[85%] Doing GWAS: 10.251172781

[ 0.898  0.153  0.324 ...,  0.962  0.883  0.962]
8000 4070.02328033

bergamo:~/genenetwork/pylmm_gn2$ env PYTHONPATH=./pylmm_gn2/:$PYTHONPATH python ./bin/runlmm.py --geno=../test_gn_pylmm/data/test8000.geno --pheno=../test_gn_pylmm/data/test8000.pheno run
[8%] Calculate Kinship: 0.989095926285 sec.
[92%] Doing GWAS: 10.708812952 sec.

[ 0.898  0.153  0.324 ...,  0.962  0.883  0.962]
8000 4070.02312498

so, with Karl's previous lmmlite-pylmm comparison, we should be reasonably confident that pylmm_gn2 is doing the right job. The round-off difference is interesting, and due to one extra SNP (I think I remember one SNP was excluded in the original). Actually,

25 Updating Genotype data for the BXD and reproducibility

The REST API serves genotypes. Apparently the version in GN2 was outdated, for one BXD103 was removed/renamed. To ascertain reproducibility we need to introduce rigorous versioning and we need to be able to serve the different versions.

The genotypes exist as files. So can easily calculate an MD5 checksum and pass that in as metadata. We will also name the files using the partial checksum and publishing data, so BXD.csv becomes, for example, BXD\_20150722\_83771d79.csv. Passing this to the user (in the form of meta data) will allow the to uniquely identify the dataset for the purpose of reproducibility. These values are captured in the downloadable metadata BXD.json (also a file). The YAML version:

---
description: BXD
crosstype: riself
geno: genotypes/BXD/geno.csv
geno_transposed: true
metadata:
  original:
    source: GeneNetwork
    unique_id: 42171462281377824604ec3d83771d79
    date: '20150722'
  geno:
    unique_id: 6a766888cf7a5b5b9376ee165b4518ab
    date: '20150722'
  gmap:
    unique_id: dfbafbe862fb3572d3e847b7b7859540
    date: '20150722'
  genotypes_descr:
    '1': maternal
    '2': paternal
    '3': heterozygous
genotypes:
  B: 1
  D: 2
  H: 3
x_chr: X
na.strings:
- U
gmap: genotypes/BXD/gmap.csv

Phenotypes, meanwhile, are stored in a 'rolling' SQL database. This raises challenges for reproducibility. Some people have the opinion that it does not matter (the latest is the so-called greatest - an opinion also often voiced on the semantic web), i.e., they argue the researcher should archive the base data that was used for the analysis. That is possible, but I get uncomfortable when data changes just like that.

What we want is some way to signal a change to a dataset.

My proposal is to send a HASH value and a date with the phenotypes in the form of meta data. This will tell the researcher that a dataset has changed - if not how (if an archive exists a comparison should be possible). To achieve this we will add columns to the dataset description table and use and external tool to regularly test the datasets for changes. When a change happens and the HASH value has not been updated an alarm needs to be raised.

26 A great summer of code

[2016-08-04 Thu] Time to update this feed. The last two months saw plenty of activity. We have a publication in JOSS. Zach has introduced sessions into GN2. We have a scalable REST server running and we are adding functionality as we go. And Christian is adding the biodalliance genome browser to GN2, now with genotype and QTL tracks.

In the next phase we'll be adding improved LMM support, more tracks and software running in heterogeneous environments.

27 REST: fetching phenotypes (2)

[2016-05-24 Tue] Lots of activity on the #genenetwork IRC channel now the Google Summer of code started. Also distracting, of course, but I think good things come out of it. I am reverse engineering the GN database to provide the REST services.

28 Adding a multi-phenotype QTL plot

[2016-05-23 Mon] Harm has modifed Karl's QTL plot so it can show multiple QTL plots in one figure. I am looking at embedding this form into GN. Basically the user selects an experiment and a gene (presumably as an expression phenotype) and has it looks for all genes that correlate. At this point I am not so interested in the logic of correlations, but I am interested in the plot itself which could plot multiple phenotypes in one GN collection (read bag of phenotypes). I like also the way that it adjusts the chromosome sizes.

Looking at the plot's source code this should be possible.

Interestingly Karl has also a mutliplot in chart 3 but written in clean coffeescript. I am severely tempted to use that one. Harm agreed to look into it and maybe clean up the code. It would be better to make it so that Karl can reuse it.

29 REST: fetching phenotypes

[2016-05-18 Wed] We are adding a new REST service to GN. Main reason is to provide data to people using R, Python, Ruby etc. Introducing the REST server is also an opportunity for splitting functionality out of the main python webserver. One thing we want to avoid is long running jobs in python as it blocks on those. The third reason for a REST interface is to provide WEB 2.0 support to the genome browser and other UI tools. I started writing a maru based REST server.

30 Everyone on Guix

[2016-05-15 Sun] We have a Guix powered staging server now running on http://test-gn2.genenetwork.org/. Others have been installing and we will all be byte-identical and reproducible soon.

Now the main challenge is to get Zach on my version. First step is to update Guix on penguin followed by the guix genenetwork2 install. This pulls in all packages except for ctl.

Dennis updated the r-preprocesscore package - used for something (it is unbelievable what gets pulled in by all these tools) - so that triggers a rebuild of some of the dependencies:

/gnu/store/d6a1hpw92nnljhhqlp2dzvsfffi1hybm-r-wgcna-1.49-425bc170cc0873ddbd414675ac40f6d4d724c7cb.drv
/gnu/store/3l1302hcp4nnp8h033a0mpnaq2mlj23c-r-biocpreprocesscore-1.34.0.drv
/gnu/store/7nabs6k5z6z2r3l3ik7wd14q3knkmicg-genenetwork2-2.0-a8fcff4.drv

you can see r-wgcna pulls in this dependency.

For Zach I set up the new repositories: genenetwork2 and all dependencies (except Danny's CTL) are now on penguin, so we can start synchronizing our efforts using GNU Guix. To use guix add ~/.guix-profile/bin to the PATH. Install system-wide guix packages, e.g.

guix package -i hello

For deployment we use a slightly modified setup, involving two git repositories as described here. Now

cd genenetwork/guix-gn-latest
env GUIX_PACKAGE_PATH=../guix-bioinformatics/ ./pre-inst-env guix package -i genenetwork2 --substitute-urls=http://guix.genenetwork.org:8080

to get the latest update of genenetwork2 dependencies.

Guix suggests the following paths

export PYTHONPATH="/home/user/.guix-profile/lib/python2.7/site-packages"
export R_LIBS_SITE="/home/user/.guix-profile/site-library/"
export GUIX_GTK3_PATH="/home/user/.guix-profile/lib/gtk-3.0"
export GI_TYPELIB_PATH="/home/user/.guix-profile/lib/girepository-1.0"
export XDG_DATA_DIRS="/home/user/.guix-profile/share"
export GIO_EXTRA_MODULES="/home/user/.guix-profile/lib/gio/modules"

I have not added these to ~/.bashrc so as not to confuse the older GN2 setup.

So to use the latest GN staging tree I have checked out

git clone git@github.com:genenetwork/genenetwork2.git gn2
git checkout -b staging
git pull origin staging
git checkout -b dev

Rather than using the secure-server script (which is not secure at all, btw), us the newer ./bin/genenetwork2 script. So

cd ~/genenetwork/gn2
./bin/genenetwork2

try and run that. It may complain it can not find the module yaml, or something. That is because you need guix paths here. Get them with

~/.guix-profile/bin/guix package --search-paths
export PATH="/home/user/.guix-profile/bin:/home/user/.guix-profile/sbin"
export PYTHONPATH="/home/user/.guix-profile/lib/python2.7/site-packages"
export R_LIBS_SITE="/home/user/.guix-profile/site-library/"
export GUIX_GTK3_PATH="/home/user/.guix-profile/lib/gtk-3.0"
export GI_TYPELIB_PATH="/home/user/.guix-profile/lib/girepository-1.0"
export XDG_DATA_DIRS="/home/user/.guix-profile/share"
export GIO_EXTRA_MODULES="/home/user/.guix-profile/lib/gio/modules"

copy and paste these into the terminal! Note you should not copy the path as is, but add /usr/bin and /bin, e.g.,

export PATH="/home/user/.guix-profile/bin:/home/user/.guix-profile/sbin":/usr/bin:/bin

otherwise you lose all commands. Start the server up with

user@penguin:~/genenetwork/gn2$ env TEMPDIR=~/tmp ./bin/genenetwork2 ~/default_settings.py

You can see we set the TEMPDIR now and we use a settings file (take a look inside ~/default_settings.py) that is similar to the old one.

The command started up the new server on port 5003. Your original server is running on port 5002, so you may want to change that setting or ask Lei to make 5003 visible from outside.

See if you can get it running.

Note that I have only added directories and files to your HOMEDIR. In particular ~/.guix-profile/ ~/genenetwork, ~/opt and ~/default_settings.py. Nothing else was changed or overwritten.

Your old system and setup should still work fine - there is no interference.

31 Getting a server running for testing

[2016-04-28 Thu] Testing is crucial. I am putting in a test framework for GN2 - I named it 'Mechanical Rob' :).

A few years back Zach had put in a few Selenium tests and we will continue to use Selenium to generate tests (it can generate code using a browser plugin - I tried that and it is helpful because turning clicks into code is pretty tedious). Unlike the previous edition by Zach, the new tests, will be converted to run without Selenium itself. I want to be able to run the tests on the server without a browser (any time).

Dennis is interested in helping us to get full test coverage. I am not sure how much he will do, but we have a good start. My aim is to cover all functionality in GN2 with these tests.

I am adding dead-link detection in the testing framework. This is how I find that GeneRIF link on the main page no longer works. I replaced it with the wikipedia reference which, funnily, also contains the broken link. Go, go NCBI! There are more dead links, however…

Next we need a server to test against. I got one started using the merged trees on Penguin2 (currently running on http://penguin2:5003/ for those with VPN access.

This server should contain the latest changes by Zach, Danny and myself. WGCNA is broken (it is a driver problem on Penguin2) as well as the CTL functionality (I need to bring in Danny's CTL code). But otherwise the thing should run. I'll fix these two problems. Please notify me, or raise an issue, if you see any unexpected differences between Zach's and my server.

One of the most intrusive changes is that the large files have moved to a different directory and temporary files are now generated in /tmp, rather than in the source tree (that is a no go!). This may explain a missing figure or a failing export function. Again, do notify me if you miss something that ought to be there. One figure that is missing is the bar plot for interval mapping. I'll fix that too.

To get the server up and running (on top of Guix) I expanded the README documentation.

32 A git merge nightmare

[2016-04-21 Thu] Working on three branches (Zach, Danny and myself) we needed to merge our work. This proved less easy than normal, mostly because we were working on essentially different repositories (after I diverged the diet version described below). I tried merging, cherry-picking and hand patching - it all proved too difficult because we had been intrusively removing and renaming on both ends!

Since Zach's tree is still the authoritative one I decided to drop the diet for now and bring in Danny and my changes by file (kind of a rebase). Again, non-trivial as we all three worked on the same files. The 'git am' command came in handily - another GNU Guix trick I learnt. It was the most tedious job. I guess I learnt another lesson here (even though we could merge, better don't diverge git source trees). For a little while we'll use the sumo tree again.

The files that had the majority of conflicts were:

wqflask/base/webqtlConfig.py
wqflask/base/data_set.py
wqflask/wqflask/marker_regression/marker_regression.py
wqflask/wqflask/marker_regression/marker_regression_gn1.py
wqflask/wqflask/marker_regression/gemma_mapping.py

I had to remove the first from .gitignore as it is now supposed to be generic.

There are a few things to fix. R/qtl output is broken (seen that one before) and CTL lacks the binary dependency. Also I need to tick off gemma mapping and Zach's work.

33 Guix distribution of GN2

I spent significant time trying to get 'guix archive' to work, but unfortunately there is a problem with R packages. So I am working around that for Danny.

There are two options: (1) he builds himself and (2) I create multiple archives. The first option is a no go - I think - because there are some problems with the latest guix versions. Multiple archives will work, but I need to recreate the symlink structure. I ought to try Docker now (grumble) as it offers a unified environment, something Danny can use.

Funnily (well not so funny) the guix builds are working again, so Danny is doing that on his machine following these instructions

34 Preparing GN2 for distribution

34.1 Removing large files out of the git repository

In my git repository I have removed the genotype data files (they are now outside the source tree). This trimmed it down from 350Mb to 30MB). We also need to rewrite git history to remove the large files from the git database.

git filter-branch --tree-filter 'rm -rf genotype_files' HEAD

Next I found there was another copy of the directory in git history web/genotypes using

git verify-pack -v .git/objects/pack/*.idx | sort -k 3 -n | tail -5
git rev-list --objects --all | grep  9fc0c77c360f8a0b1f7bc8a696a99b5dff340607
  web/genotypes/HSNIH.geno.gz

git filter-branch --index-filter 'git rm -rf web/genotypes/HSNIH.geno.gz'

and purge (after editing .git/packed-refs) them (see below). Other dirs that had to be mucked out were web/gemma, web/dbdoc, wqflask/output, web/new_genotypes. For each:

git filter-branch -f --index-filter 'git rm -rf --cached --ignore-unmatch web/ wqflask/output webtests/chromedriver '

After finishing the clean up the packed-refs file (remove foreign references), see if git fsck notices the unreachable objects with

rm -rf .git/refs/original/
git reflog expire --all --expire-unreachable=0
git fsck --full --unreachable

if you have a list then you can do:

#+begin_src sh git repack -A -d git prune #+end_src sh

And while I was at it I also removed webtests/chromedriver.

Now this can only work when we repack the repo and people clone afresh. We'll also need to remove some of the JS dirs.

Anyway, the diet version of the genenetwork2 repository has come down to 25Mb (from 350Mb).

34.2 Create archive of large genotype and mapping files

The large data directory of 3.8Gb of data was archived with

tar cvJf gn2_data.tar.xz gn2_data/

Next the pfff checksum was run to rename the file and upload it to the files repository. Actually I switched to lz4 because compressing these large files took forever with xz.

The small database only contains the files for the BXD. When testing BXD.geno, BXD.cross and BXD.json were the only ones used.

The latest files can be found here.

35 Many packages to get r-wgcna going

[2016-02-29 Mon] I wrote the R-WGCNA package with dependencies and had to write packages for, r-biocpreprocesscore r-wgcna, r-acepack, r-latticeextra, r-formula, r-hmisc, r-doparallel, r-iterators, r-foreach, r-fastcluster, r-dynamictreecut, and r-rcppeigen.

Now GN2 runs completely from GNU Guix, the beast named GN2 has been tamed! There will probably be more hickups and I still need to move the javascript libraries to Guix, but at least we can move forward from here.

36 Fixing GN2 file paths and a working GN2 server

Lots of file paths in GN2 are hard coded. To make the beast work and move genotype files out of the git repo (reducing that from 350Mb to a more manageable 30Mb).

I rejigged the (hard-coded) paths. A hickup was that GN2 generates images and puts them in a folder inside the source repository. Scary! Flask supports 'static' serving from subdirectory with the same name. We don't want that, but we can rewire the path which is what I do in this commit. Now, not only, do we avoid uploading images in the source tree (one problem is that Guix won't allow it because it is immutable) but also we have everything in temporary directory which is easier to monitor.

I spend much of my time this week getting paths and dependencies sorted. One ball-breaker was the piddle version I found on penguin. Despite its version number piddle-1.0.15-py2.7 it turned out to be different from the online version. Good grief, I am happy we are running Guix now!

Anyway, as it stands, GN2 runs fine on Guix which means it will run anywhere. pylmm, r-qtl and interval mapping work out of the box. I still need to work on the plink and gemma wiring as much of it is hard coded. We ought to move that out to the REST server anyway, so that is next.

But at least we have something we can roll out.

37 Packaging Genenetwork (cont 5)

[2016-02-22 Mon] Now that pylmm is packaged the final GN2 dependency is the database. MySQL allows for read-only databases these days, so we can create a nice testing/development database that is embedded in a GNU Guix package. To idea is to do a

guix package -i genenetwork2-database

which will install GN2 with the database and GN2 will fire up with one single command

genenetwork2

against which we can run the test system and even do some integration testing (as long as it does not want to write data).

As we are probably going to split up private (mutable) and public (immutable) data in the future I think this is a brilliant way of deploying public datasets for testing. Should work right out of the box. Multiple mysql servers can even run on one single database, for example.

The first step is to fetch a database from somewhere. We now have http://files.genenetwork.org/raw_database/ for that purpose. Even the small database is still too large (1.5GB), so we may want to trim it down.

Then there are the flat files which we are going to move out of the various directories they live in today. There are some .map and .bed files living in a plink_gemma dir, for example, others are in the source code git repo (on github). I'll consolidate these on http://files.genenetwork.org/.

38 Packaging pylmm and sambamba

[2016-02-20 Sat] The current GN is running an old version of pylmm, git@github.com:genenetwork/pylmm.git commit b8a15885ed3701e079170d6a8bf69bb8d8349f9c (Importing multi-core version of pylmm for GN2). So that got pylmm-multicore packaged in guix-bioinformatics thanks to Nick allowing for a proper FOSS license.

In GN2 pylmm is called in two places, so I needed to fix those:

grep -r lmm.py *
utility/tools.py:            pylmm_command = 'python '+path+'/pylmm_gn2/lmm.py'
wqflask/heatmap/heatmap.py:  command = 'python /home/user/gene/wqflask/wqflask/my_pylmm/pyLMM/lmm.py --key {} --species {}'.format(key,

I added a wrapper for pylmm and adapted Guix and GN2 to call into the wrapper named pylmm_redis (this version still uses Redis) so it says

INFO: lmm try loading module
WARNING: LMM standalone version missing the Genenetwork2 environment

which means it can be run standalone.

[2016-02-21 Sun] Roel started work on packaging sambamba, a fast performing NGS tool. To make it work I wrote a new sambamba makefile for Guix which builds fine in my environment. Over to him again.

39 Expanding the GeneNetwork effort (the team is growing)

[2016-02-15 Mon] The story continues. Good news is that Roel and Dennis are joining this week in the effort. And Harm.

Harm is working on a UI using Karl's D3 materials named the 'Arabidopsis eQTL analysis platform' or AraQTL, the Django source code (an MVC) is here. Dennis started work on GPU related stuff and Roel is packaging general bioinformatics software in GNU Guix.

To make it easier to collaborate we have created an IRC channel on freenode. Go to irc.freenode.net and join with

/join #genenetwork

Join on emacs with

#+begin_src scheme (setq erc-autojoin-channels-alist '(("freenode.net" "#genenetwork"))) (erc :server "irc.freenode.net" :port 6667 :nick "name" :password "pwd") #+end_src scheme

Of course it is blocked on the campus networks so people will need some other route.

40 Packaging gemma, plink

[2016-02-14 Sun] GN uses some external tools for computations. First I fixed and packaged the latest versions of gemma (was 0.94, 01/12/2014, now 0.95-alpha or 0.9.5-2de4bfab3) and plink (was PLINK v1.90b3s 64-bit (17 Jun 2015), now PLINK v1.90p 64-bit (12 Feb 2016)) so they faithfully show up.

41 Packaging Genenetwork (cont 4)

[2016-02-13 Sat] The qtlreaper I packaged a few weeks back. Today was htmlgen's turn (a piece of software dating 1998!). The version that comes with GN is slightly different from the standard version. In the Guix repo the python files are now correctly compiled and I only included the .pyc files we are using. Guix just makes me happy.

So, the good news is that all the GN2 code bootstraps using the new ./bin/genenetwork2 from Guix up to

Can't connect to local MySQL server through socket '/run/mysqld/mysqld.sock'

This error is because I am not yet running mysql. I am getting close, really close.

Lei has opened a http accessible storage for large files. This means large static files can be removed from the git repos and can be split out in GNU Guix deployment. This includes genotype files, source packages and even test databases. GNU Guix checksums these files, so we can make sure they do not change (just like that). I do think we should add the date and source to the file name, so it is clear where it came from. E.g., these filenames are not so descriptive and they are duplicates

d4364afdcce9b6071efb4016d585c288  genotype_files/genotypes/BXD300.geno.2brmv
d4364afdcce9b6071efb4016d585c288  genotype_files/genotypes/BXD300.geno.7636

42 Packaging Slurm

[2016-02-12 Fri] Today I packaged slurm, a resource manager for clusters, and submitted the patch to GNU Guix for inclusion. The first step towards using cluster effectively. It was a bit harder than I expected, mostly because non-free software had to be removed and the build system adapted.

43 Managing power of the cluster

[2016-02-11 Thu] Yesterday I got more of Victor's old boxes and it looks like some are in bad shape and some can be rescued. I have one node screaming behind me now and it has PXE, so we can boot over the network. One thing I want to do is easy provisioning of new images to make this a true throwaway GNU Guix cluster. A description of a simple PXE server can be found here, required is DHCP plus tftp. When it gets serious we should implement that. I also looked into wake-on-lan (WOL) which should also work on these machines. It is good we have these capabilities now because a running node pulls 200W. The powersupply for two nodes idling pulls 30W. I think we also need a master switch for ever 8 nodes (240W sleeping 1600W awake!). I am going to shut down 6 of the 8 existing nodes today. Waking them up is as easy as

wakeonlan hwaddr

To list the hwaddr (where machines.txt contains the output of ifconfig)

cat machines.txt |grep HWa|awk '{print $5}'

With the nodes sleeping they still pull 15W each. So that is 8x15W = 120W or about 2 KWh per day ~ EUR 0.40 p.d. or EUR 12 per month.

When the nodes double that will be EUR 24 per month (sleeping, running is 10x as much! You can see the point of cloud computing).

I still think I need we need a 220V master switch which can be managed remotely. So all cost drops to zero when the cluster is not in use.

Next time I am with the machines I need to configure the BIOS to accept PXE. After that we can roll out the provisioning.

On the hacker space mailing list the following question came up with my answer

Q: Would it be an idea to transform (some of) the nodes into a more generalized tinkering platform - for instance I heard someone mention botnets recently, and it would be a shame to rig a separate server for that if one could just spool a VM to a cluster node. Building a "general network tinkering and test deployment platform" might be a nice project in itself.

A: The cluster is exactly meant for experiments and I am happy to share.

The concept is a throwaway setup. That means you should write your installation in such a way that it can be regenerated every time. This is done through GNU Guix scripts which allow for VMs, containers and PXE installs on the nodes.

To run experiments needs some co-ordination with me and perhaps others. Every software deployment and configuration should be captured in Guix.

The idea is NOT to have ad hoc installation of software on the main node (though you are free to mess within your VM) and also NOT to have stuff running full-time.

Think of it as proof-of-concepts which can run for a while. I'll be happy to facilitate with 8x nodes with 8 cores and soon another 8x8.

44 Back in NL and hacking GNU Guix

[2016-02-08 Mon] Returned a week ago and most time went into FOSDEM submitting a grant proposal. My FOSDEM talk can be found here - with slides and video.

This week ramping up GeneNetwork packaging again (after building and testing); So GN is on latest Guix. Also pushing the munge and slurm packages to Guix as well as updating the documentation including impressive graph (I'll add one for GN when it is complete).

I am purging LFS (because it is github propriety) from the GN2 git repo. I.e.

[master b8bbca6] Removing LFS
delete mode 100644 genotype_files/genotypes/.gitattributes
delete mode 100644 genotype_files/new_genotypes/.gitattributes

which also required a manual

rm .git/hooks/pre-push
rm -rf .git/lfs/

Ugly, but fixed.

The rest of the day was spent on a regression of python2-sqlalchemy (grrr). Turned out python2-sqlalchemy does not work with a more recent version of sqlite, so I introduced an older version again. It is amazing how many tools use sqlite. Anyway, the good news is that the build system is moving incrementally forward.

45 Packaging Genenetwork (cont 3)

[2016-01-23 Sat] The GNU Guix GN2 installer is pretty much finished though I don't have the bandwidth to test the project. Next up is the GN1 installer. I dug up an E-mail with instructions from Lei which points out Python 2.4.3 may be required as well as a bunch of modules that are in a 3rd party tar ball: graphviz-2.22.2, htmlgen, json, numarray-1.5.2, piddle, PIL (in guix), pp-1.5.7, pyx, pyXLWriter, svg. I.e., a few more modules to package. At least all these with source, because even a cursory look shows these modules are old and HTMLgen, for example, has no download location any longer. Since we are not going to write software for GN1, I will just use the tar ball with an installer script. The json, pp and numarray modules may be worth adding to GNU Guix. Anyway, it looks all doable now. Today I packaged python2-numarray as a GNU Guix package.

46 Packaging Genenetwork (cont 2)

[2016-01-22 Fri] The latest GNU Guix installer copies the full webserver code into the store

/gnu/store/bfac8x(...)-genenetwork2-2.0-9e9475053/lib/python2.7/site-packages/genenetwork2-2.0-py2.7.egg/wqflask/

With that we have achieved clean versioning as a first step. Now I need to make the dependencies available for the webserver which is somewhat tedious. But once it is done it should last for a long time. Luckily a lot of the packages already exist in GNU Guix - sometimes it pays to wait in informatics ;). I think most of the dependencies are in the package now. Looks like I need to package another 4-5 myself. Hopefully they don't depend on too many modules themselves (r-wcgna has a long list I still need to do! The new bioconductor export module of GNU Guix will be a great help).

Today I added a GNU Guix package for qtlreaper. But looking into the GN2 source code I don't think it is actually used. Likewise a bunch of other modules.

47 Packaging Genenetwork (cont)

Today I managed to complete the python installer for GN2. This has led to a request for handling genotype data outside the tree and looking at contained packages not written by us. I also reported a bug in GNU Guix related to fetching repositories from git and updated R-qtl to the latest version in GNU Guix. I have started adding and testing dependencies following Artems Docker recipe with included requirements for the Python installation. Be interesting what kind of graph comes out!

48 Started packaging Genenetwork

Both Harm Nijveen and I are working on GeneNetwork installation. First hickup is that the git repo for GN1 is not complete and the tar-ball is 33GB. Meanwhile I am porting GN2 to GNU Guix. The first step was to create a python installer because the .py files need to be compiled to execute in a read-only store. The net effect is a GNU Guix install in

~/.guix-profile/lib/python2.7/site-packages/genenetwork2-2.0a0-py2.7.egg/wqflask/

At least the python stuff looks complete, but the html and other files are still missing.

49 Install Guix

To solve the R/lmmlite install and am opting for a GNU Guix package. We are setting up an Arvados cluster based on GNU Guix and agreed that all installation should be through Guix. Progress is recorded here. I needed to do the base install anyway so Roel can continue later. Today I got to the point I can experiment with R/lmmlite.

50 Install R/lmmlite

Working from Zanzibar the time has come to align my version of pylmm with that of R/lmmlite by Prof. Karl Broman (lmmlite is an R/C++ hybrid implementation of Nick's pylmm). It ought to be interesting to compare output and performance of both tools and see how we can move forward hosting one or more of these tools into GN. Speed and memory usage are of interest when hosting such services.

For R/lmmlite the R libraries need to be installed and pulling over 100Mb for the R devtools and dependencies over narrow band is no joy. I am on narrow band and the 'lite' part obviously does not reflect the implicit dependencies. Karl sent me an update that I should use 'R CMD', but that in turn requires knitr etc. This is stuff I can only do at night, so as not to disrupt other users.

51 Running pylmm on penguin2

To build scipy I had to add

apt-get install gfortran libopenblas-dev liblapack-dev
pip install simplejson mako scikit-cuda

Running the Kinship calculation on the K80 it is 3x faster than the old K20:

env LD_LIBRARY_PATH=/usr/local/cuda-7.0/lib64 python ./bin/runlmm.py --pheno ../test_gn_pylmm/data/test8000.pheno --geno ../test_gn_pylmm/data/test8000.geno run

Timing report K20

[23%] Calculate Kinship: 1.86254692078
[77%] Doing GWAS: 6.14982700348

[ 0.898  0.153  0.324 ...,  0.962  0.883  0.962]

Timing report K80

[11%] Calculate Kinship: 0.609519958496 sec.
[89%] Doing GWAS: 4.69884514809 sec.

[ 0.898  0.153  0.324 ...,  0.962  0.883  0.962]

Next to the CUDA stuff I also did some work on the deploy tool. It does some basic file copying now from YAML descriptions, the most important part for deployment. I can use it now for configuring emacs, vim, sshd etc. Simple file editing is the next step.

52 Installing penguin2

The new production GN server (Penguin2) is sweet. 56 2Ghz Intel Xeon cores, 256Gb RAM and an NVIDIA K80m on board. Ready to R&R! At the moment it is not running anything - just a barebones Ubuntu 14.04 with Linux 3.19 kernel. I'll add CUDA (now where did I document that again? Aha, it is in my old BLOG):

First step is to install the CUDA driver and software provided by NVIDIA.

The toolkit (with driver!) is a 1.9Gb download and binary install. Luckily this 1.9GB download link can be used from the server directly. Initially, I installed the local .deb file (note it may pay to take the older Ubuntu version on a Debian system)

wget http://developer.download.nvidia.com/compute/cuda/7.5/Prod/local_installers/cuda-repo-ubuntu1404-7-5-local_7.5-18_amd64.deb
dpkg -i cuda*.deb

To see what it installed

dpkg -L cuda-repo-ubuntu1404-7-5-local
cat /etc/apt/sources.list.d/cuda-7-5-local.list
  deb file:///var/cuda-repo-7-5-local /

Interesting, a local repository. Run

apt-get update
apt-get install cuda

Bit of a roundabout way(!?).

By default it installs in /usr/local/cuda-7.5 which is OK on Penguin.

rmmod nouveau

And the CUDA installer blacklisted it with some settings in /etc adding a file modprobe.d/nvidia-installer-disable-nouveau.conf which blacklists the driver. The kernel log says

[4617358.866019] nvidia: module license 'NVIDIA' taints kernel.
[4617358.870873] nvidia: module verification failed: signature and/or  required key missing - tainting kernel
[4617358.879592] [drm] Initialized nvidia-drm 0.0.0 20150116 for 0000:83:00.0 on minor 1
[4617358.879688] [drm] Initialized nvidia-drm 0.0.0 20150116 for 0000:84:00.0 on minor 2
[4617358.879715] NVRM: loading NVIDIA UNIX x86_64 Kernel Module  346.96  Sun Aug 23 22:29:21 PDT 2015
[4617358.881460] nvidia_uvm: Loaded the UVM driver, major device number 247
[4617358.883559] nvidia_uvm: Unregistered the UVM driver

The UVM is used to share memory between the CPU and GPU in CUDA programs. We won't need that for the time being.

To install the software I used screen (the installer is probably not atomic).

Driver:   Not Selected
Toolkit:  Installed in /usr/local/cuda-7.
Samples:  Installed in /home/pjotr, but missing recommended libraries

Please make sure that
 -   PATH includes /usr/local/cuda/bin
 -   LD_LIBRARY_PATH includes /usr/local/cuda/lib64, or, add /usr/local/cuda/lib64 to /etc/ld.so.conf and run ldconfig as root

Logfile is /tmp/cuda_install_31036.log

I copied the log to my Download folder and

/usr/local/cuda/bin/nvcc --version
  nvcc: NVIDIA (R) Cuda compiler driver
  Copyright (c) 2005-2015 NVIDIA Corporation
  Built on Tue_Aug_11_14:27:32_CDT_2015
  Cuda compilation tools, release 7.5, V7.5.17

To test CUDA run one of the examples in /usr/local/cuda/sample. E.g.

root@penguin2:/usr/local/cuda/samples/0_Simple/simplePrintf# ./simplePrintf
GPU Device 0: "Tesla K80" with compute capability 3.7

Device 0: "Tesla K80" with Compute 3.7 capability
printf() is called. Output:

[3, 0]:         Value is:10
[3, 1]:         Value is:10
[3, 2]:         Value is:10

deviceQuery responds with

CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 2 CUDA Capable device(s)

Device 0: "Tesla K80"
  CUDA Driver Version / Runtime Version          7.5 / 7.5
  CUDA Capability Major/Minor version number:    3.7
  Total amount of global memory:                 11520 MBytes (12079136768 bytes)
  (13) Multiprocessors, (192) CUDA Cores/MP:     2496 CUDA Cores
  GPU Max Clock rate:                            824 MHz (0.82 GHz)
  Memory Clock rate:                             2505 Mhz
  Memory Bus Width:                              384-bit
  L2 Cache Size:                                 1572864 bytes
  Maximum Texture Dimension Size (x,y,z)         1D=(65536), 2D=(65536, 65536), 3D=(4096, 4096, 4096)
  Maximum Layered 1D Texture Size, (num) layers  1D=(16384), 2048 layers
  Maximum Layered 2D Texture Size, (num) layers  2D=(16384, 16384), 2048 layers
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total number of registers available per block: 65536
  Warp size:                                     32
  Maximum number of threads per multiprocessor:  2048
  Maximum number of threads per block:           1024
  Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
  Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             512 bytes
  Concurrent copy and kernel execution:          Yes with 2 copy engine(s)
  Run time limit on kernels:                     No
  Integrated GPU sharing Host Memory:            No
  Support host page-locked memory mapping:       Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support:                        Enabled
  Device supports Unified Addressing (UVA):      Yes
  Device PCI Domain ID / Bus ID / location ID:   0 / 131 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

Looking good!

Next we try to get pycuda up and running. I am using the GNU Guix Python, so (after installing Guix as described here):

guix package -i python2-virtualenv python2-numpy glibc-utf8-locales
export PATH=$HOME/.guix-profile/bin:$PATH
mkdir ~/guix-python
virtualenv ~/guix-python
cd guix-python
. bin/activate
export PATH=$HOME/guix-python/bin:$PATH

Now

export CUDA_ROOT=/usr/local/cuda
export PATH=$CUDA_ROOT/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda/lib64 # optional
# pip install numpy # optional if no guix install
pip install pycuda

The install did warn that a default install may fail and that it was using unoptimized lapack. Running the following example crashed the machine on one Tesla the first time and segfaulted on another. Great.

export LD_LIBRARY_PATH=/usr/local/cuda/lib64:/usr/lib/x86_64-linux-gnu:/lib/x86_64-linux-gnu
export PYTHON_PATH=$HOME/guix-python/lib/python2.7/site-packages:$HOME/.guix-profile/lib/python2.7/site-packages
~/guix_python/bin/python ~/Download/pycuda-2014.1/examples/hello_gpu.py

So much for hello! Is there another nvidia driver installed that conflicts with the CUDA one?

I went through the ldconf paths and removed the ones that are not required and rebooted. Then found that

ls -l /usr/lib/x86_64-linux-gnu/libcuda.so.1
  lrwxrwxrwx 1 root root 17 Aug 14 23:47 /usr/lib/x86_64-linux-gnu/libcuda.so.1 -> libcuda.so.352.39

points to an older install. Hmmmm.

apt-get remove --purge nvidia-352 nvidia-modprobe nvidia-settings libcuda1-352

And reinstalled the nvidia modules v349.96, so

ls -l /usr/lib/x86_64-linux-gnu/libcuda.so.1
  lrwxrwxrwx 1 root root 17 Oct 15 07:32 /usr/lib/x86_64-linux-gnu/libcuda.so.1 -> libcuda.so.346.96

is correct. However, when we want to install the CUDA packages it tries to install the more recent driver. WHAT NOW? I decided to use the binary installer (rather than the .deb) of the OLDER 7.0 CUDA version, to align with Penguin1. Note, however, that the newer CUDA libraries are more efficient at floating point storage, so we may want to upgrade when we start running out of CUDA RAM (sometime down the line). After installation:

Driver:   Installed
Toolkit:  Installed in /usr/local/cuda-7.0
Samples:  Installed in /home/pjotr

Please make sure that
 -   PATH includes /usr/local/cuda-7.0/bin
 -   LD_LIBRARY_PATH includes /usr/local/cuda-7.0/lib64, or, add /usr/local/cuda-7.0/lib64 to /etc/ld.so.conf and run ldconfig as root

To uninstall the CUDA Toolkit, run the uninstall script in /usr/local/cuda-7.0/bin
To uninstall the NVIDIA Driver, run nvidia-uninstall

Please see CUDA_Getting_Started_Guide_For_Linux.pdf in /usr/local/cuda-7.0/doc/pdf for detailed information on setting up CUDA.

Logfile is /tmp/cuda_install_13279.log

Now

ls -l /usr/lib/x86_64-linux-gnu/libcuda.so.1
lrwxrwxrwx 1 root root 17 Oct 15 08:03 /usr/lib/x86_64-linux-gnu/libcuda.so.1 -> libcuda.so.346.46

is the correct one. The CUDA examples work, but python still balks with

python[17633]: segfault at 100000018 ip 00007f03bb062f47 sp 00007ffea1323320 error 4 in libc-2.19.so[7f03bb041000+1bb000]

which differs from the Guix version which is libc-2.22. Hmmmm. Major HMMM. This does not look good. So, next I tried virtualenv with the Debian/Ubuntu packages. And, oh (banging head on table) I forgot to activate the virtualenv! That means I was mixing Pythons.

Now it works:

(python2.7)pjotr@penguin2:~/pylmm_gn2$ env LD_LIBRARY_PATH=/usr/local/cuda-7.0/lib64 python ~/pylmm_gn2/bin/cuda_properties.py
2 device(s) found.
Device #0: Tesla K80
  Compute Capability: 3.7

So I can work back towards using the GNU Guix version with the later CUDA 7.5.

Since we are aiming for reproducible installation paths it will take a bit of effort to install the server properly using GNU Guix, deploy etc. I also intend to write a new tool that takes care of monitoring running services. This has to happen in conjunction with the other stuff we are doing (not least getting pylmm to work with human data).

Next to programming I always end up working on (software) deployments. Maybe it is because few people are really interested. For reproducible software it is key.

53 Installing an Arvados cluster

[2015-10-14 Wed] For GN we are going to use a cluster for back-end computations. These weeks, to evaluate Arvados, I am building a ~180 core compute cluster. The installation process is documented here. The cluster will be hosted in Groningen. The installation is mostly automated. For development and testing of automation I use a KVM virtual machine as described here.

54 R/qtl control, genotype and phenotype parsers added

[2015-10-02 Fri] Today I added control, genotype and phenotype parsers for R/qtl support. R/qtl can add multiple phenotypes, so we need to add a switch for selecting the phenotype (or are we going to run them all by default?).

Run it with

./bin/runlmm.py rqtl --control data/rqtl/iron.yaml --pheno data/rqtl/iron_pheno.csv --geno data/rqtl/iron_geno.csv

the F2 Iron set is also nice because it has a lot of missing data. See how that comes out. In the process I am adding support for column headers.

55 Teaching R/qtl to pylmm

Because of an increase in human data and more complex crosses, such as the diversity outbred mice, there is a demand for using LMMs from R/qtl. The current R/lme4 library could arguably be used, but it misses the speed of tools like pylmm and FaST-LMM to crunch through the data. So, where we recently introduced R/qtl in GN2, now we are introducing pylmm for R/qtl (I rather enjoy inside-out use cases). With Karl using pylmm also scrutiny of the results will get another impetus - which is important.

The first step is to teach pylmm genotype and phenotype data formats that can be emitted from R/qtl and described here.

56 A new page - starting on pylmm again

[2015-10-01 Thu] The coming months I will be working on pylmm again, so time to restart this BLOG. Since my last BLOG we achieved a number of things: First, the pylmm license has been changed to a true FOSS license. Thanks Nick! Second, we got a new CUDA K80 server which should allow for some LMM heavy lifting. Third, we got Arvados to work and we are setting up a dedicated cluster as a pilot for backend computations. Forth, GNU Guix is flying and reproducible software installations are around the corner. Fifth, we are working on embedding pylmm into R/qtl. And, sixth, we did a Google Summer of Code project with LMMs.

There are still some outstanding TODOs for pylmm from the older BLOG. In a nutshell pylmm/CUDA works, but needs tweaking for the human data in GN2 (mostly issues around file size/transfer) and covariates need to be passed in. This work matches the missing links for R/qtl to use pylmm so it is clear what I should be working on.

But first we need to bring in the test system (currently sitting in a separate github repository) and teach pylmm to read R/qtl formats. I'll do that in parallel.

The outstanding pylmm issues on github matches the list from the previous BLOG reads (in no particular order):