Pjotr's rotating BLOG

Table of Contents

RSS feed

Tis document describes Pjotr's journey in (1) introducing a speedy LMM resolver for GWAS for GeneNetwork.org, (2) adding a REST API for genetics using Racket, and (3) solving the pipeline reproducibility challenge with GNU Guix. Ah, and then there is the bug fixing…

Join us on the #genenetwork IRC channel on http://webchat.freenode.net/.

1 Chasing that elusive sambamba bug (FIXED!)

[2019-11-24 Sun] With an upgrade of the LLVM D-compiler the garbage collector (GC) started bailing out on sambamba (ldc>1.10). Some progress on digging was tracked on github. Locating the bug was fairly easy - triggered by decompressBgzfBlock - and I manage to figure out it had to do with the copying of a thread object that messes up the stack. Fixing it, however is a different story! It took a while to get a minimal program to reproduce the problem. It looks like

BamReader bam;
auto fn = buildPath(dirName(__FILE__), "data", "ex1_header.bam");
foreach (i; 0 .. 60)
  bam = new BamReader(fn);
  assert(bam.header.format_version == "1.3");

which segfaults most of the time, but not always, with a stack trace

#0  0x00007ffff78b98e2 in invariant._d_invariant(Object) () from /usr/lib/x86_64-linux-gnu/libdruntime-ldc-debug-shared.so.87
#1  0x00007ffff7d4711e in std.parallelism.TaskPool.queueLock() () from /usr/lib/x86_64-linux-gnu/libphobos2-ldc-debug-shared.so.87
#2  0x00007ffff7d479b9 in _D3std11parallelism8TaskPool10deleteItemMFPSQBqQBp12AbstractTaskZb ()
   from /usr/lib/x86_64-linux-gnu/libphobos2-ldc-debug-shared.so.87
#3  0x00007ffff7d4791d in _D3std11parallelism8TaskPool16tryDeleteExecuteMFPSQBwQBv12AbstractTaskZv ()
#4  0x00005555557125c5 in _D3std11parallelism__T4TaskS_D3bio4core4bgzf5block19decompressBgzfBlockFSQBrQBqQBoQBm9BgzfBlockCQCoQCn5utils7memoize__T5CacheTQCcTSQDxQDwQDuQDs21DecompressedBgzfBlockZQBwZQBpTQDzTQDgZQGf10yieldForceMFNcNdNeZQCz (this=0x0)
#6  0x00007ffff78a56fc in object.TypeInfo_Struct.destroy(void*) const ()
#7  0x00007ffff78bc80a in rt_finalizeFromGC () from /usr/lib/x86_64-linux-gnu/libdruntime-ldc-debug-shared.so.87
#8  0x00007ffff7899f6b in _D2gc4impl12conservativeQw3Gcx5sweepMFNbZm ()
#9  0x00007ffff78946d2 in _D2gc4impl12conservativeQw3Gcx11fullcollectMFNbbZm ()
#10  0x00007ffff78969fc in _D2gc4impl12conservativeQw3Gcx8bigAllocMFNbmKmkxC8TypeInfoZPv ()
#11 0x00007ffff78917c3 in _D2gc4impl12conservativeQw3Gcx5allocMFNbmKmkxC8TypeInfoZPv ()
    task_pool=0x7ffff736f000) at reader.d:130
#29 0x000055555575e445 in _D3bio3std3hts3bam6reader9BamReader6__ctorMFAyaZCQBvQBuQBtQBsQBrQBn (warning: (Internal error: pc 0x55555575e444 in read in psymtab, but not in symtab.)
    at reader.d:135
#30 0x00005555557bff0b in D main () at read_bam_file.d:47

where reader.d:130 adds a BamReader to the task pool. It is clear the GC kicks in and we end up with this mess.

Line #4 contains

yieldForce executes a task in the current thread which is coming from a cache:

alias Cache!(BgzfBlock, DecompressedBgzfBlock) BgzfBlockCache;

and Cache is part of BioD. One trick aspect of Sambamba is that the design is intricate. In our crashing example we only use the simple BamReader wich is defined in std.hts.bam.reader.d. We are using a default taskpool. In reader.d not much happens - it is almost all simple plumbing. std.hts.bam.read.d, meanwhile, represents the BAM format.

The Bgzf block processing happens in bio.core.bgzf.inputstream. The BamReader uses BgzfInputStream which has functions fillNextBlock, setupReadBuffer. The constructor sets up a RoundBuf!BlockAux(n_tasks). When I set n_tasks to a small number it no longer crashes!? The buffer

_task_buf = uninitializedArray!(DecompressionTask[])(n_tasks);

is a critical piece. Even increasing dim to n_tasks+2 is enough to remove most segfaults, but not all.

Remember it is defined as

alias Task!(decompressBgzfBlock, BgzfBlock, BgzfBlockCache) DecompressionTask;
DecompressionTask[] _task_buf;

with dimension n_tasks. Meanwhile BlockAux

static struct BlockAux {
  BgzfBlock block;
  ushort skip_start;
  ushort skip_end;

  DecompressionTask* task;
  alias task this;

Injecting code

struct DecompressedBgzfBlock {
  ~this() {
    stderr.writeln("destroy DecompressedBgzfBlock ",start_offset,":",end_offset," ",decompressed_data.sizeof);

  ulong start_offset;
  ulong end_offset;
  ubyte[] decompressed_data;

It is interesting to see that even when not segfaulting the block offsets look corrupted:

destroy DecompressedBgzfBlock 0:0 16
destroy DecompressedBgzfBlock 0:0 16
destroy DecompressedBgzfBlock 0:0 16
destroy DecompressedBgzfBlock 89554:139746509800748 16
destroy DecompressedBgzfBlock 140728898420736:139748327903664 16
destroy DecompressedBgzfBlock 107263:124653 16
destroy DecompressedBgzfBlock 89554:107263 16
destroy DecompressedBgzfBlock 71846:89554 16
destroy DecompressedBgzfBlock 54493:71846 16
destroy DecompressedBgzfBlock 36489:54493 16
destroy DecompressedBgzfBlock 18299:36489 16
destroy DecompressedBgzfBlock 104:18299 16
destroy DecompressedBgzfBlock 0:104 16

and I am particularly suspicious about this piece of code in inputstream.d where task gets allocated and the resulting buffer gets copied to the roundbuffer. This is a hack, no doubt about it:

DecompressionTask tmp = void;
tmp = scopedTask!decompressBgzfBlock(b.block, _cache);
auto t = _task_buf.ptr + _offset / _max_block_size;
import core.stdc.string : memcpy;
memcpy(t, &tmp, DecompressionTask.sizeof);
b.task = t;

and probably the reason why decompressBgzfBlock gets corrupted, followed by sending the GC in a tail spin when it kicks in. Artem obviously designed it this way to prevent allocating memory for the task, but I think he went a little too far here! One thing I tried earlier, I have to try again which is get rid of that copying.

First of all

alias Task!(decompressBgzfBlock, BgzfBlock, BgzfBlockCache) DecompressionTask;

defines DecompressionTask as calling decompressBgzfBlock with parameters which returns a DecompressedBgzfBlock. Remember it bails out with this block. There is something else that is notable, it is actually the cached version that bails out.

Removing the cache code makes it run more reliable. But not completely. Also we are getting memory errors now:

destroy DecompressedBgzfBlock 4294967306:0 16
core.exception.InvalidMemoryOperationError@core/exception.d(702): Invalid memory operation

but that leaves no stack trace. Now we get


so the caching itself is not the problem. In the next phase we are going to address that dodgy memory copying by introducing a task managed by GC instead of using the ScopedTask.

This is all happening in BgzfInputStream in inputstream.d used by reader.d which inherits from Stream. BamReaders uses that functionality to iterate through the reads usings D popFront() design. Streams allow reading a variable based on its type, e.g. a BAM read. The BamReader fetches the necessary data from BgzfBlockSupplier with getNextBgzfBlock which is used as the _bam variable. BamReader.reader itself returns an iterator.

It is interesting to note how the OOP design obfuscates what is going on. It is also clear that I have to fix BgzfInputStream in inputstream.d because it handles the tasks in the roundbuffer.

Part of sambamba's complexity is due to OOP and to having the threadpool running at the lowest level (unpacking bgzf). If I remove the threadpool there it means that threading will have to happen at a higher level. I.e., sambamba gets all its performance from multithreaded low level unpacking of data blocks. It is unusual, but it does have the (potential) advantage of leaving higher level code simple. I note with sambamba sort, however, Artem injected threads there too which begs the question what happens when you add different tasks to the same pool that have different timing characteristics. Be interesting to see the effect of using two task pools.

[2019-11-27 Wed] Paying some closer attention to block.d again, BgzfBlock is defined as a struct containing a _buffer defined as public ubyte[] _buffer = void; and is used in (indeed) block.d and inputstream.d only. The use of struct means that BgzfBlock gets allocated on the stack. Meanwhile _buffer get pointed into the uncompressed buffer which in turn is is a slice of uncompressed_buf that is also on the stack in decompressBgzfBlock (surprise, surprise) and gets assigned right before returning

block._buffer[0 .. block.input_size] = uncompressed[];

now, the cached version (which I disabled) actually does a copy to the heap

BgzfBlock compressed_bgzf_block = block;
compressed_bgzf_block._buffer = block._buffer.dup;
DecompressedBgzfBlock decompressed_bgzf_block;
with (decompressed_bgzf_block) {
    start_offset = block.start_offset;
    end_offset = block.end_offset;
    decompressed_data = uncompressed[].dup;
cache.put(compressed_bgzf_block, decompressed_bgzf_block);

Artem added a comment

/// A buffer is used to reduce number of allocations.
/// Its size is max(cdata_size, input_size)
/// Initially, it contains compressed data, but is rewritten
/// during decompressBgzfBlock -- indeed, who cares about
/// compressed data after it has been uncompressed?

Well, maybe the GC does! Or maybe the result does not fit the same buffer. Hmmm. If you go by the values

destroy DecompressedBgzfBlock 140728898420736:139748327903664 16

you can see the BgzfBlock is compromised by a stack overwrite. Changing the struct to a class fixes the offsets

destroy DecompressedBgzfBlock 104:18299 16
destroy DecompressedBgzfBlock 18299:36489 16
destroy DecompressedBgzfBlock 36489:54493 16

so things start to look normal. But it still segfaults on DecompressedBgzfBlock on GC in yieldForce.


[2019-11-28 Thu] So, the short of it is that when a thread kicks in with decompressBgzfBlock the underpinning data structure is corrupt. The problem has to be with struct BgzfBlock, struct BgzfBlockAux and the Roundbuf. Both BgzfBlockAux goes on a Roundbuf. I was thinking last night that there may be a struct size problem. The number of tasks in the roundbuffer should track the number of threads. Increasing the size of the roundbuffer makes a crash take longer.

Well I hit the jackpot! After disabling

_task_buf = uninitializedArray!(DecompressionTask[])(n_tasks);

there are no more segfaults. I should have looked at that more closely. I can only surmise that because the contained objects contain pointers (they do) the GC gets confused because it occassionaly finds something that looks like a valid pointer. Using uninitializedArray on an object that includes a pointer reference is dangerous.

Success!! That must have take me at least three days of work to find this bug, one of the most elusive bugs I have encountered. Annoyingly I was so close earlier when I expanded the size of n_tasks! The segfault got triggered by a new implementation of D's garbage collector. Pointer space is tricky and it shows how careful we have to be with non-initialized data.

Now what is the effect of disabling the cache and making more use of garbage collected structures (for each Bgzf block)? User time went down and CPU usage too, but wall clock time nudged up. Memory use also went up by 25%. The garbage collector kicked in twice as often. This shows Artem's aggressive avoidance of the garbage collector does have impact and I'll have to revert on some of my changes now. Note that releasing the current version should be OK, the performance difference does add to overall time and energy use. Even 10% of emissions is worth saving with tools run at this scale.

So, I started reverting on changes and after reverting two items:

- Sambamba
  - [-] speed test
    + [X] revert on class DecompressedBgzfBlock to struct
    + [X] revert on auto buf2 = (block._buffer[0 .. block.input_size]).dup;
    + [ ] revert on DecompressionTask[] _task_buf;
    + [ ] revert on tmp = scopedTask!decompressBgzfBlock(b.block)
    + [ ] revert on Cache

Sambamba is very similar to the last release. The new release is very slightly slower but uses less RAM. I decided not to revert on using the roundbuf, scopedTask and Cache because each of these introduces complexity with no obvious gain. v0.7.1 will be released after I update release notes and final tests.

2 It has been almost a year! And a new job.

[2019-07-13 Sat] A lot can happen in a year. After my last BLOG on the GEMMA release I got a job offer by the University of Tennessee and we moved in March to the USA. The coming years I am writing software and some grant proposals. Focus is still on GeneNetwork, genetic mapping and genomics. Some things changed, but I am working on GEMMA, sambamba, the variant graph and all that… One major shift is from Python to Racket (Scheme, a Lisp). I never liked Python that much (coming from Ruby and Perl earlier), but now I had the excuse to ditch Python. Racket rocks in so many ways! Others have written about that, so I am not going to repeat, but main thing is that I finally found a language that I can use for another 20 years, after trying many alternatives including Scala, Elixir, Erlang etc. Ironically, Lisp is one of the oldest language in use today. Go figure. For performance code I have D and we may replace some of that with Rust. It looks like it is Racket and D and perhaps Rust for me. I have come home to those choices after all these years.

I am writing a REST service which needs to maintain some state. The built-in Racket server has continuations - which is rather nice! Racket also has support for Redis, SQLite (nice example) and a simple key-value interface (ini-style) which I can use for sessions.

Ironically, two weeks after writing above I was hit by a car on my bicycle in Memphis. 14 fractures and a ripped AC joint. It took surgery and four months to feel halfway normal again…

3 Speeding up K

GEMMMA has been released with many bug fixes and speedups. Now it is time to focus on further speedups with K (and GWA after). There are several routes to try this. One possibility is to write our own matrix multiplication routine in D. My hunch is that because computing K is a special case we can get some significant speedups compared to a standard matrix multiplication for larger matrices. Directions we can go are:

  • Stop using BLAS and create multi-core dot-product based multiplication that makes use of
    • multi threaded decompression
    • start compute while reading data
    • use aligned rows only for dot product (less CPU cache misses)
    • compute half the result (K is symmetric!)
    • heavy use of threading
    • use AVX2 optimizations
    • use floats instead of doubles (we can do this with hardware checking)
    • chromosome-based computations (see below for LOCO)
  • Use tensor routines to reduce RAM IO

In other words, quite a few possible ways of improving things. It may be that the current BLAS routine is impossible to beat for our data, but there is only one way to find out: by trying.

The first step you would think is simple: take the genotype data, pass that in to calckinship(g) and get K back. Unfortunately GEMMA intertwines reading the genotype file, the computation of K in steps and scaling the matrix. All in one and the code is duplicated for Plink and BIMBAM formats. Great. Now I don't feel like writing any more code in C++ and, fortunately, Prasun has done some of the hard work in faster-lmm-d already. So the first step is to parse the BIMBAM format using an iterator. We'll use this to load the genotype data in RAM (we are not going to assume memory restrictions for now).

That genotype decompression and reading part is done now and I added it to BioD decompress.d. Next I added a tokenizer which is a safe replacement for the strtok gemma uses.

Using BLAS I added a full K computation after reading then genotype file which is on par (speed-wise) with the current GEMMA implementaton. The GEMMA implementation should be more optimal because it splits the matrix computation in smaller blocks and starts while streaming the genotype data. Prasun did an implementation of that in gemmakinship.d which is probably faster than my current version. Even so, I'll skip that method, for now, until I am convinced that none of the above dot-product optimizations pan out. Note that only a fraction of the time is used to do the actual matrix multiplication.

2018-10-23T09:53:38.299:api.d:flmmd_compute_bimbam_K:37 GZipbyLine
2018-10-23T09:53:43.911:kinship.d:kinship_full:23 Full kinship matrix used

Reading the file took 6 seconds.

2018-10-23T09:53:43.911:dmatrix.d:slow_matrix_transpose:57 slow_matrix_transpose
2018-10-23T09:53:44.422:blas.d:matrix_mult:48 matrix_mult
2018-10-23T09:53:45.035:kinship.d:kinship_full:33 DONE rows is 1940 cols 1940

Transpose + GxGT took 1 second. The total for

real    0m9.613s
user    0m13.504s
sys     0m1.476s

I also wrote a threaded decompressor and it is slightly slower. Gunzip is so fast that building threads adds more overheads.

In this example reading the file dominates the time, but with LOCO we get ~20x K computation (one for each chromosome), so it makes sense to focus on K. For model species, for the forseeable future, we'll look at thousands of individuals, so it is possible to hold all K matrices in RAM. Which also means we can fill them using chromosome-based dot-products. Another possible optimization.

Next steps are to generate output for K and reproduce GEMMA output (also I need to filter on missing phenotype data). The idea is to compute K and LMM in one step, followed by developing the LOCO version as a first class citizen. Based on above metrics we should be able to reduce LOCO K of this sized dataset from 7 minutes to 30 seconds(!) That is even without any major optimizations.

4 MySQL to MariaDB

The version we are running today:

mysql --version
mysql  Ver 14.12 Distrib 5.0.95, for redhat-linux-gnu (x86_64) using readline 5.1

5 MySQL backups (stage2)

Backup to AWS.

env AWS_ACCESS_KEY_ID=* AWS_SECRET_ACCESS_KEY=* RESTIC_PASSWORD=genenetwork /usr/local/bin/restic --verbose init /mnt/big/mysql_copy_from_lily -r s3:s3.amazonaws.com/backupgn init

env AWS_ACCESS_KEY_ID=* AWS_SECRET_ACCESS_KEY=* RESTIC_PASSWORD=genenetwork /usr/local/bin/restic backup /mnt/big/mysql_copy_from_lily -r s3:s3.amazonaws.com/backupgn

I also added backups for genotypefiles and ES.

6 MySQL backups (stage1)

Before doing any serious work on MySQL I decided to create some backups. Lily is going to be the master for now, so the logical backup is on P1 which has a large drive. Much of the space is taken up by a running MySQL server (which is updated!) and a data file by Lei names 20151028dbsnp containing a liftover of dbSNP142. First I simply compressed the input files, not to throw them away.

Because ssh is so old on lily I can't login nicely from Penguin, but the other way works. This is temporary as mysql user

ssh -i /var/lib/mysql/.ssh/id_rsa pjotr@penguin

The script becomes (as a weekly CRON job for now) as mysql user

rsync -va /var/lib/mysql --rsync-path=/usr/bin/rsync -e "ssh -i /var/lib/mysql/.ssh/id_rsa" pjotr@penguin:/mnt/big/mysql_copy_from_lily/

This means we have a weekly backup for now. I'll improve it with more disk space and MariaDB to have incrementals.

Actually, it looks like only two tables really change

-rw-rw---- 1 pjotr mysql  29G Aug 16 18:59 ProbeSetData.MYD
-rw-rw---- 1 pjotr mysql  36G Aug 16 19:00 ProbeSetData.MYI

which are not that large.

To make incrementals we are opting for [restic](https://restic.readthedocs.io/). It looks modern and has interesting features.

env RESTIC_PASSWORD=genenetwork /usr/local/bin/restic init -r /mnt/big/backup_restic_mysql/

now backups can be generated incrementally with

env RESTIC_PASSWORD=genenetwork /usr/local/bin/restic --verbose backup /mnt/big/mysql_copy_from_lily -r /mnt/big/backup_restic_mysql

To list snapshots (directory format)

env RESTIC_PASSWORD=genenetwork /usr/local/bin/restic -r backup_restic_mysql/ snapshots


env RESTIC_PASSWORD=genenetwork /usr/local/bin/restic -r backup_restic_mysql/ check

Prune can merge hashes, saving space

env RESTIC_PASSWORD=genenetwork /usr/local/bin/restic -r backup_restic_mysql/ prune

So, now we can do a daily backup from Lily and have incrementals stored on Penguin too. My cron reads

40 5 * * * env RESTIC_PASSWORD=genenetwork /usr/local/bin/restic --verbose backup /mnt/big/mysql_copy_from_lily -r /mnt/big/backup_restic_mysql|mail -s "MySQL restic backup" pjotr2017@thebird.nl

Restic can push to AWS S3 buckets. That is the next step (planned).

7 Migrating GN1 from EC2

GN1 is costing us $\600+ per month on EC2. With our new shiny server we should move it back into Memphis. The main problem is that the base image is

Linux version 2.6.18-398.el5 (mockbuild@builder17.centos.org) (gcc version 4.1.2 20080704 (Red Hat 4.1.2-55)) #1 SMP Tue Sep 16 20:50:52 EDT 2014

I mean, seriously, ten years old?

Compared to GN2, GN1 should be simpler to deploy. Main problem is that it requires Python 2.4 - currently. Not sure why that is.

Some system maintenance scripts live here. There is a Docker image by Artem here. And there are older notes referred to in the Artem doc. All of that is about GN2 really. So far, I have not found anything on GN1. In /usr/lib/python2.4/site-packages I find the following modules:


nothing too serious. Worse is that GN1 uses modpython which went out of vogue before 2013. The source code is still updated - you see, once things are out there…

This means we'll need to deploy Apache in the VM with modpython. The installation on Lily pulls in 50+ Apache modules. Argh. If we want to support this… The good news is that most modules come standard with Apache - and I think we can disable a lot of them.

8 Fixing Gunicorn in use

DEBUG:db.webqtlDatabaseFunction:.retrieve_species: retrieve_species result:: mouse
DEBUG:base.data_set:.create_dataset: dataset_type: ProbeSet
[2018-07-16 16:53:51 +0000] [4185] [ERROR] Connection in use: ('', 5003)
[2018-07-16 16:53:51 +0000] [4185] [ERROR] Retrying in 1 second.

9 Updating ldc with latest LLVM

Updating ldc to latest leaves only 5 tests failing! That is rather good.

99% tests passed, 5 tests failed out of 1629

Total Test time (real) = 4135.71 sec

The following tests FAILED:
        387 - std.socket (Failed)
        785 - std.socket-debug (Failed)
        1183 - std.socket-shared (Failed)
        1581 - std.socket-debug-shared (Failed)
        1629 - lit-tests (Failed)

build/Testing/Temporary/LastTest.log: (core.exception.AssertError@std/socket.d(456): Assertion failure
build/Testing/Temporary/LastTest.log:core.exception.RangeError@std/socket.d(778): Range violation


Failing Tests (1):
    LDC :: plugins/addFuncEntryCall/testPlugin.d

Fixing these ldc 1.10.0 compiles on LLVM 3.8!

10 Fixing sambamba

We were getting this new error

Error: read-modify-write operations are not allowed for shared
variables. Use core.atomic.atomicOp!"+="(s, e) instead.

which was by the normalize class

bool normalize(R)(R range, ElementType!(R) sum = 1)

the normalization functions fractions range values to the value of sum, e.g.,

a = [ 1.0, 3.0 ];
assert(a == [ 0.25, 0.75 ]);

so, the question is why it needs to be a shared range. I modified it to cast to shared after normalization.

The next one was

BioD/bio/maf/reader.d(53): Error: cannot implicitly convert expression `this._f.byLine(cast(Flag)true, '\x0a')` of type `ByLineImpl!(char, char)` to `ByLine!(char, char)`

also reported on https://github.com/bioconda/bioconda-recipes/pull/8787#issuecomment-389195848

After fixing the compile time problems the tests failed for view, view -f unpack, subsample and sort(?!). In fact all md5sum's we test. For view it turns out the order of the output differs. view -f sam returns identity also converting back from BAM. It had to be something in the header. The header (apparently) contains the version of sambamba!

ldc2 -wi -I. -IBioD -IundeaD/src -g -O3 -release -enable-inlining -boundscheck=off -of=bin/sambamba bin/sambamba.o utils/ldc_version_info_.o htslib/libhts.a /usr/lib/x86_64-linux-gnu/liblz4.a -L-L/home/travis/dlang/ldc-1.10.0/lib -L-L/usr/lib/x86_64-linux-gnu -L-lrt -L-lpthread -L-lm -L-llz4
bin/sambamba.o: In function `_D5utils3lz426createDecompressionContextFZPv':
/home/travis/build/biod/sambamba/utils/lz4.d:199: undefined reference to `LZ4F_createDecompressionContext'
/home/travis/build/biod/sambamba/utils/lz4.d:200: undefined reference to `LZ4F_isError'

objdump -T /usr/lib/x86_64-linux-gnu/liblz4.so|grep LZ4F
000000000000cd30 g    DF .text  00000000000000f6  Base        LZ4F_flush
000000000000ce30 g    DF .text  0000000000000098  Base        LZ4F_compressEnd
000000000000c520 g    DF .text  00000000000002f5  Base        LZ4F_compressBegin
000000000000c4a0 g    DF .text  000000000000003f  Base        LZ4F_createCompressionContext
000000000000dd60 g    DF .text  00000000000000ee  Base        LZ4F_getFrameInfo
000000000000ced0 g    DF .text  00000000000002eb  Base        LZ4F_compressFrame
000000000000c4e0 g    DF .text  0000000000000033  Base        LZ4F_freeCompressionContext
000000000000c470 g    DF .text  0000000000000029  Base        LZ4F_getErrorName
000000000000c460 g    DF .text  000000000000000a  Base        LZ4F_isError
000000000000c8a0 g    DF .text  00000000000000e3  Base        LZ4F_compressFrameBound
000000000000d1c0 g    DF .text  0000000000000038  Base        LZ4F_createDecompressionContext
000000000000d200 g    DF .text  0000000000000037  Base        LZ4F_freeDecompressionContext
000000000000c990 g    DF .text  0000000000000396  Base        LZ4F_compressUpdate
000000000000d240 g    DF .text  0000000000000b1d  Base        LZ4F_decompress
000000000000c820 g    DF .text  000000000000007d  Base        LZ4F_compressBound

11 Trapping NaNs

When a floating point computation results in a NaN/inf/underflow/overflow GEMMA should stop The GSL has a generic function gslieeeenvsetup which works through setting an environment variable. Not exactly useful because I want GEMMA to run with just a –check switch.

The GNU compilers have feenableexcept as a function which did not work immediately. Turned out I needed to load fenv.h before the GSL:

#include <fenv.h>
#include "gsl/gsl_matrix.h"

Enabling FP checks returns

Thread 1 "gemma" received signal SIGFPE, Arithmetic exception.
0x00007ffff731983d in ieeeck_ () from /home/wrk/opt/gemma-dev-env/lib/libopenblas.so.0
(gdb) bt
#0  0x00007ffff731983d in ieeeck_ () from /home/wrk/opt/gemma-dev-env/lib/libopenblas.so.0
#1  0x00007ffff6f418dc in dsyevr_ () from /home/wrk/opt/gemma-dev-env/lib/libopenblas.so.0
#2  0x0000000000489181 in lapack_eigen_symmv (A=A@entry=0x731a00, eval=eval@entry=0x731850, evec=evec@entry=0x7317a0,
    flag_largematrix=<optimized out>) at src/lapack.cpp:195
#3  0x00000000004897f8 in EigenDecomp (flag_largematrix=<optimized out>, eval=0x731850, U=U@entry=0x7317a0, G=G@entry=0x731a00)
    at src/lapack.cpp:232
#4  EigenDecomp_Zeroed (G=G@entry=0x731a00, U=U@entry=0x7317a0, eval=eval@entry=0x731850, flag_largematrix=<optimized out>)
    at src/lapack.cpp:248
#5  0x0000000000457b3a in GEMMA::BatchRun (this=this@entry=0x7fffffffcc30, cPar=...) at src/gemma.cpp:2598

Even for the standard dataset!

Turns out it is division by zero and FP underflow in lapackeigensymmv.

12 A gemma-dev-env package

I went through difficulties of updating GNU Guix and writing a package that creates a GEMMA development environment. This was based on the need for having a really controlled dependency graph. It went wrong last time we released GEMMA, witness gemma got slower, leading to the discovery that I had linked in a less performing lapack.

GNU Guix was not behaving because I had not updated in a while and I discovered at least one bug. Anyway, I have a working build system now and we will work on code in the coming weeks to fix a number of GEMMA issues and bring out a new release.

13 Reviewing a CONDA package

Main achievement last week was getting GEMMA installed in a controlled fashion and proving performance is still up to scratch.

For JOSS I am reviewing a CONDA package for RNA-seq analysis. The author went through great lengths to make it easy to install with Bioconda, so I thought to have a go. GNU Guix has a conda bootstrap, so time to try that!

guix package -A conda
conda   4.3.16  out     gnu/packages/package-management.scm:704:2

and wants to install


The CONDA package in Guix is a bit older - turns out CONDA has a ridiculous release rate. Let's try the older CONDA first

guix package -i conda


conda config --add channels defaults
  Warning: 'defaults' already in 'channels' list, moving to the top
conda config --add channels conda-forge
  Warning: 'conda-forge' already in 'channels' list, moving to the top
conda config --add channels bioconda
  Warning: 'bioconda' already in 'channels' list, moving to the top

so, that was all OK Next

conda install -c serine rnasik

Package plan for installation in environment /gnu/store/h260m1r0bgnyypl7r469lin9gpyrh12m-conda-4.3.16:

The following NEW packages will be INSTALLED:

    asn1crypto:                    0.24.0-py_1          conda-forge
    backports:                     1.0-py36_1           conda-forge
    backports.functools_lru_cache: 1.5-py_1             conda-forge
    bedtools:                      2.25.0-3             bioconda
    bigdatascript:                 v2.0rc10-0           serine
    bwa:                           0.7.17-pl5.22.0_2    bioconda
    bzip2:                         1.0.6-1              conda-forge
    ca-certificates:               2018.4.16-0          conda-forge
    certifi:                       2018.4.16-py36_0     conda-forge
    cffi:                          1.11.5-py36_0        conda-forge
    chardet:                       3.0.4-py36_2         conda-forge
    click:                         6.7-py_1             conda-forge
    colormath:                     3.0.0-py_2           conda-forge
    conda:                         4.5.8-py36_1         conda-forge
    conda-env:                     2.6.0-0              conda-forge
    cryptography:                  2.2.1-py36_0         conda-forge
    curl:                          7.60.0-0             conda-forge
    cycler:                        0.10.0-py_1          conda-forge
    dbus:                          1.11.0-0             conda-forge
    decorator:                     4.3.0-py_0           conda-forge
    expat:                         2.2.5-0              conda-forge
    fastqc:                        0.11.5-pl5.22.0_3    bioconda
    fontconfig:                    2.12.1-4             conda-forge
    freetype:                      2.7-1                conda-forge
    future:                        0.16.0-py_1          conda-forge
    gettext:                        conda-forge
    glib:                          2.53.5-1             conda-forge
    gst-plugins-base:              1.8.0-0              conda-forge
    gstreamer:                     1.8.0-1              conda-forge
    hisat2:                        2.1.0-py36pl5.22.0_0 bioconda
    icu:                           58.2-0               conda-forge
    idna:                          2.7-py36_2           conda-forge
    je-suite:                      2.0.RC-0             bioconda
    jinja2:                        2.10-py_1            conda-forge
    jpeg:                          9b-2                 conda-forge
    krb5:                          1.14.6-0             conda-forge
    libffi:                        3.2.1-3              conda-forge
    libgcc:                        5.2.0-0
    libiconv:                      1.14-4               conda-forge
    libpng:                        1.6.34-0             conda-forge
    libssh2:                       1.8.0-2              conda-forge
    libxcb:                        1.13-0               conda-forge
    libxml2:                       2.9.5-1              conda-forge
    lzstring:                      1.0.3-py36_0         conda-forge
    markdown:                      2.6.11-py_0          conda-forge
    markupsafe:                    1.0-py36_0           conda-forge
    matplotlib:                    2.1.0-py36_0         conda-forge
    mkl:                           2017.0.3-0
    multiqc:                       1.5-py36_0           bioconda
    ncurses:                       5.9-10               conda-forge
    networkx:                      2.0-py36_1           conda-forge
    numpy:                         1.13.1-py36_0
    openjdk:                       8.0.121-1
    openssl:                       1.0.2o-0             conda-forge
    pcre:                          8.41-1               conda-forge
    perl:                           conda-forge
    picard:                        2.18.2-py36_0        bioconda
    pip:                           9.0.3-py36_0         conda-forge
    pycosat:                       0.6.3-py36_0         conda-forge
    pycparser:                     2.18-py_1            conda-forge
    pyopenssl:                     18.0.0-py36_0        conda-forge
    pyparsing:                     2.2.0-py_1           conda-forge
    pyqt:                          5.6.0-py36_5         conda-forge
    pysocks:                       1.6.8-py36_1         conda-forge
    python:                        3.6.3-1              conda-forge
    python-dateutil:               2.7.3-py_0           conda-forge
    pytz:                          2018.5-py_0          conda-forge
    pyyaml:                        3.12-py36_1          conda-forge
    qt:                            5.6.2-3              conda-forge
    readline:                      6.2-0                conda-forge
    requests:                      2.19.1-py36_1        conda-forge
    rnasik:                        1.5.2-0              serine
    ruamel_yaml:                   0.15.35-py36_0       conda-forge
    samtools:                      1.5-1                bioconda
    setuptools:                    40.0.0-py36_0        conda-forge
    simplejson:                    3.8.1-py36_0         bioconda
    sip:                           4.18-py36_1          conda-forge
    six:                           1.11.0-py36_1        conda-forge
    skewer:                        0.2.2-1              bioconda
    spectra:                       0.0.11-py_0          conda-forge
    sqlite:                        3.13.0-1             conda-forge
    star:                          2.5.2b-0             bioconda
    subread:                       1.5.3-0              bioconda
    tk:                            8.5.19-2             conda-forge
    tornado:                       5.1-py36_0           conda-forge
    urllib3:                       1.23-py36_0          conda-forge
    wheel:                         0.31.1-py36_0        conda-forge
    xorg-libxau:                   1.0.8-3              conda-forge
    xorg-libxdmcp:                 1.1.2-3              conda-forge
    xz:                            5.2.3-0              conda-forge
    yaml:                          0.1.7-0              conda-forge
    zlib:                          1.2.11-0             conda-forge

That is a rather long list of packages, including OpenJDK. Conda

CondaIOError: IO error: Missing write permissions in: /gnu/store/h260m1r0bgnyypl7r469lin9gpyrh12m-conda-4.3.16

 You don't appear to have the necessary permissions to install packages
 into the install area '/gnu/store/h260m1r0bgnyypl7r469lin9gpyrh12m-conda-4.3.16'.
 However you can clone this environment into your home directory and
 then make changes to it.
 This may be done using the command:

 $ conda create -n my_root --clone=/gnu/store/h260m1r0bgnyypl7r469lin9gpyrh12m-conda-4.3.16

OK, I suppose that is an idea, though it kinda defeats the idea of a reproducible base repo. But this worked:

conda create -n joss-review-583
conda install -n joss-review-583 -c serine rnasik

The good news is that conda installs in one directory. But 2.7 GB downloaded…

conda info --envs
# conda environments:
conda                    /home/wrk/.conda/envs/conda
joss-review-583          /home/wrk/.conda/envs/joss-review-583
root                  *  /gnu/store/h260m1r0bgnyypl7r469lin9gpyrh12m-conda-4.3.16

Activate the environment

source activate joss-review-583

RNAsik --help
00:00:00.000    Bds 2.0rc10 (build 2018-07-05 09:58), by Pablo Cingolani
Usage: RNAsik -fqDir </path/to/your/fastqs> [options]
main options
        -fqDir <string>         : path to fastqs directory, can be nested
        -align <string>         : pick your aligner [star|hisat|bwa]
        -refFiles <string>      : directory with reference files
        -paired <bool>          : paired end data [false], will also set pairIds = "_R1,_R2"
        -all <bool>             : short hand for counts, mdups, exonicRate, qc, cov and multiqc
more options
        -gtfFile <string>       : path to refFile.gtf
        -fastaRef <string>      : path to refFile.fa
        -genomeIdx <string>     : genome index
        -counts <bool>          : do read counts [featureCounts]
        -mdups <bool>           : process bam files, sort and mark dups [picard]
        -qc <bool>              : do bunch of QCs, fastqc, picard QCs and samtools
        -exonicRate <bool>      : do Int(ra|er)genic rates [qualiMap]
        -multiqc <bool>         : do MultiQC report [multiqc]
        -trim <bool>            : do FASTQ trimming [skewer]
        -cov <bool>             : make coverage plots, bigWig files
        -umi <bool>             : deduplicates using UMIs
extra configs
        -samplesSheet <string>  : tab delimited file, each line: old_prefix \t new_prefix
        -outDir <string>        : output directory [sikRun]
        -extn <string>          : specify FASTQ files extension [.fastq.gz]
        -pairIds <string>       : specify read pairs, [none]
        -extraOpts <string>     : add extra options through a file, each line: toolName = options
        -configFile <string>    : specify custome configuration file

I may be critical about CONDA, but this works ;)

Now I tried on a different machine and there was a problem on activate where the environment bumped me out of a shell. Hmmm. The conda settings of activate are:

CONDA_PS1_BACKUP='\[\033[0;35m\]\h:\w\[\033[0m\]$ '

I guess I can replicate that

penguin2:~$ export JAVA_HOME=$HOME/.conda/envs/joss-review-583
penguin2:~$ export CONDA_PREFIX=$HOME/.conda/envs/joss-review-583
penguin2:~$ export PATH=$HOME/.conda/envs/joss-review-583/bin:$PATH

conda install -n joss-review-583 -c bioconda qualimap

wget bioinformatics.erc.monash.edu/home/kirill/sikTestData/rawData/IndustrialAntifoamAgentsYeastRNAseqData.tar

14 Updates

It has been a while since I updated the BLOG (see below older BLOGs). Time to start afresh because we have interesting developments going and the time ahead looks particularly exciting for GEMMA and GeneNetwork with adventures in D, CUDA, Arvados, Jupyter labs, IPFS, blockchain and the list goes on! I also promised to write a BLOG on our development/deployment setup. Might as well start here. My environments are very complex but controlled thanks to GNU Guix.

15 Older BLOGS

Work on LOCO, faster-lmm-d, Intel Phi and packaging with Guix moved to here.

16 Even older BLOG

Older BLOG material on pylmm etc. can be found here.

Author: Pjotr Prins

Created: 2019-11-28 Thu 18:00