Pjotr Prinshttp://thebird.nl/Pjotr's rotating BLOG2020-02-19T13:22:01-06:00Older rotating BLOG Wed, 30 Sep 2015 00:00:00 +0000
Older BLOG material on pylmm etc. can be found here....Older rotating BLOG2015-09-30T00:00:00+00:002015-09-30T00:00:00+00:00Teaching R/qtl to pylmm Thu, 01 Oct 2015 00:00:00 +0000
Because of an increase in human data and more complex crosses, such as
the diversity outbred mice, there is a demand for using LMMs from
R/qtl. The current R/lme4 library could arguably be used, but it
misses the speed of tools like pylmm and FaST-LMM to crunch through
the data. So, where we recently introduced R/qtl in GN2, now we are
introducing pylmm for R/qtl (I rather enjoy inside-out use
cases). With Karl using pylmm also scrutiny of the results will get
another impetus - which is important....Teaching R/qtl to pylmm2015-10-01T00:00:00+00:002015-10-01T00:00:00+00:00A new page - starting on pylmm again Thu, 01 Oct 2015 00:00:00 +0000
The coming months I will be working on pylmm again,
so time to restart this BLOG. Since my last BLOG we achieved a number
of things: First, the pylmm license has been changed to a true FOSS
license. Thanks Nick! Second, we got a new CUDA K80 server which
should allow for some LMM heavy lifting. Third, we got Arvados to work
and we are setting up a dedicated cluster as a pilot for backend
computations. Forth, GNU Guix is flying and reproducible software
installations are around the corner. Fifth, we are working on
embedding pylmm into R/qtl. And, sixth, we did a Google Summer of Code
project with LMMs....A new page - starting on pylmm again2015-10-01T00:00:00+00:002015-10-01T00:00:00+00:00R/qtl control, genotype and phenotype parsers added Fri, 02 Oct 2015 00:00:00 +0000
Today I added control, genotype and phenotype parsers
for R/qtl support. R/qtl can add multiple phenotypes, so we need to
add a switch for selecting the phenotype (or are we going to run them
all by default?)....R/qtl control, genotype and phenotype parsers added2015-10-02T00:00:00+00:002015-10-02T00:00:00+00:00Installing an Arvados cluster Wed, 14 Oct 2015 00:00:00 +0000
For GN we are going to use a cluster for back-end
computations. These weeks, to evaluate Arvados, I am building a ~180
core compute cluster. The installation process is documented here. The
cluster will be hosted in Groningen. The installation is mostly
automated. For development and testing of automation I use a KVM
virtual machine as described here....Installing an Arvados cluster2015-10-14T00:00:00+00:002015-10-14T00:00:00+00:00Installing penguin2 Thu, 15 Oct 2015 00:00:00 +0000
The new production GN server (Penguin2) is sweet. 56 2Ghz Intel Xeon
cores, 256Gb RAM and an NVIDIA K80m on board. Ready to R&R! At the
moment it is not running anything - just a barebones Ubuntu 14.04 with
Linux 3.19 kernel. I'll add CUDA (now where did I document that
again? Aha, it is in my old BLOG):...Installing penguin22015-10-15T00:00:00+00:002015-10-15T00:00:00+00:00Running pylmm on penguin2 Fri, 16 Oct 2015 00:00:00 +0000
To build scipy I had to add...Running pylmm on penguin22015-10-16T00:00:00+00:002015-10-16T00:00:00+00:00Install R/lmmlite Wed, 09 Dec 2015 00:00:00 +0000
Working from Zanzibar the time has come to align my version of pylmm
with that of R/lmmlite by Prof. Karl Broman (lmmlite is an R/C++ hybrid
implementation of Nick's pylmm). It ought to be interesting to compare
output and performance of both tools and see how we can move forward
hosting one or more of these tools into GN. Speed and memory usage are
of interest when hosting such services....Install R/lmmlite2015-12-09T00:00:00+00:002015-12-09T00:00:00+00:00Install Guix Thu, 10 Dec 2015 00:00:00 +0000
To solve the R/lmmlite install and am opting for a GNU Guix package.
We are setting up an Arvados cluster based on GNU Guix and agreed that
all installation should be through Guix. Progress is recorded here.
I needed to do the base install anyway so Roel can continue later. Today
I got to the point I can experiment with R/lmmlite....Install Guix2015-12-10T00:00:00+00:002015-12-10T00:00:00+00:00Started packaging Genenetwork Wed, 20 Jan 2016 00:00:00 +0000
Both Harm Nijveen and I are working on GeneNetwork installation. First
hickup is that the git repo for GN1 is not complete and the tar-ball
is 33GB. Meanwhile I am porting GN2 to GNU Guix. The first step was to
create a python installer because the .py files need to be compiled to
execute in a read-only store. The net effect is a GNU Guix install in...Started packaging Genenetwork2016-01-20T00:00:00+00:002016-01-20T00:00:00+00:00Packaging Genenetwork (cont) Thu, 21 Jan 2016 00:00:00 +0000
Today I managed to complete the python installer for GN2. This has led
to a request for handling genotype data outside the tree and looking
at contained packages not written by us. I also reported a bug in GNU
Guix related to fetching repositories from git and updated R-qtl to
the latest version in GNU Guix. I have started adding and testing
dependencies following Artems Docker recipe with included requirements for
the Python installation. Be interesting what kind of graph comes out!...Packaging Genenetwork (cont)2016-01-21T00:00:00+00:002016-01-21T00:00:00+00:00Packaging Genenetwork (cont 2) Fri, 22 Jan 2016 00:00:00 +0000
The latest GNU Guix installer copies the full webserver
code into the store...Packaging Genenetwork (cont 2)2016-01-22T00:00:00+00:002016-01-22T00:00:00+00:00Packaging Genenetwork (cont 3) Sat, 23 Jan 2016 00:00:00 +0000
The GNU Guix GN2 installer is pretty much finished
though I don't have the bandwidth to test the project. Next up is the
GN1 installer. I dug up an E-mail with instructions from Lei which
points out Python 2.4.3 may be required as well as a bunch of modules
that are in a 3rd party tar ball: graphviz-2.22.2, htmlgen, json,
numarray-1.5.2, piddle, PIL (in guix), pp-1.5.7, pyx, pyXLWriter,
svg. I.e., a few more modules to package. At least all these with
source, because even a cursory look shows these modules are old and
HTMLgen, for example, has no download location any longer. Since we
are not going to write software for GN1, I will just use the tar ball
with an installer script. The json, pp and numarray modules may be
worth adding to GNU Guix. Anyway, it looks all doable now. Today I
packaged python2-numarray as a GNU Guix package....Packaging Genenetwork (cont 3)2016-01-23T00:00:00+00:002016-01-23T00:00:00+00:00Back in NL and hacking GNU Guix Mon, 08 Feb 2016 00:00:00 +0000
Returned a week ago and most time went into FOSDEM
submitting a grant proposal. My FOSDEM talk can be found here - with
slides and video....Back in NL and hacking GNU Guix2016-02-08T00:00:00+00:002016-02-08T00:00:00+00:00Managing power of the cluster Thu, 11 Feb 2016 00:00:00 +0000
Yesterday I got more of Victor's old boxes and it
looks like some are in bad shape and some can be rescued. I have one
node screaming behind me now and it has PXE, so we can boot over the
network. One thing I want to do is easy provisioning of new images to
make this a true throwaway GNU Guix cluster. A description of a simple
PXE server can be found here, required is DHCP plus tftp. When it gets
serious we should implement that. I also looked into wake-on-lan (WOL)
which should also work on these machines. It is good we have these
capabilities now because a running node pulls 200W. The powersupply
for two nodes idling pulls 30W. I think we also need a master switch
for ever 8 nodes (240W sleeping 1600W awake!). I am going to shut down
6 of the 8 existing nodes today. Waking them up is as easy as...Managing power of the cluster2016-02-11T00:00:00+00:002016-02-11T00:00:00+00:00Packaging Slurm Fri, 12 Feb 2016 00:00:00 +0000
Today I packaged slurm, a resource manager for
clusters, and submitted the patch to GNU Guix for inclusion. The first step
towards using cluster effectively. It was a bit harder than I
expected, mostly because non-free software had to be removed and the
build system adapted....Packaging Slurm2016-02-12T00:00:00+00:002016-02-12T00:00:00+00:00Packaging gemma, plink Sat, 13 Feb 2016 00:00:00 +0000
GN uses some external tools for computations. First I
fixed and packaged the latest versions of gemma (was 0.94, 01/12/2014,
now 0.95-alpha or 0.9.5-2de4bfab3) and plink (was PLINK v1.90b3s
64-bit (17 Jun 2015), now PLINK v1.90p 64-bit (12 Feb 2016)) so they
faithfully show up....Packaging gemma, plink2016-02-13T00:00:00+00:002016-02-13T00:00:00+00:00Packaging Genenetwork (cont 4) Sat, 13 Feb 2016 00:00:00 +0000
The qtlreaper I packaged a few weeks back. Today was
htmlgen's turn (a piece of software dating 1998!). The version that
comes with GN is slightly different from the standard version. In the
Guix repo the python files are now correctly compiled and I only
included the .pyc files we are using. Guix just makes me happy....Packaging Genenetwork (cont 4)2016-02-13T00:00:00+00:002016-02-13T00:00:00+00:00Expanding the GeneNetwork effort (the team is growing) Sun, 14 Feb 2016 00:00:00 +0000
The story continues. Good news is that Roel and
Dennis are joining this week in the effort. And Harm....Expanding the GeneNetwork effort (the team is growing)2016-02-14T00:00:00+00:002016-02-14T00:00:00+00:00Packaging pylmm and sambamba Sat, 20 Feb 2016 00:00:00 +0000
The current GN is running an old version of pylmm,
git@github.com:genenetwork/pylmm.git commit
b8a15885ed3701e079170d6a8bf69bb8d8349f9c (Importing multi-core version
of pylmm for GN2). So that got pylmm-multicore packaged in
guix-bioinformatics thanks to Nick allowing for a proper FOSS license....Packaging pylmm and sambamba2016-02-20T00:00:00+00:002016-02-20T00:00:00+00:00Packaging Genenetwork (cont 5) Mon, 22 Feb 2016 00:00:00 +0000
Now that pylmm is packaged the final GN2 dependency
is the database. MySQL allows for read-only databases these days, so
we can create a nice testing/development database that is embedded in
a GNU Guix package. To idea is to do a...Packaging Genenetwork (cont 5)2016-02-22T00:00:00+00:002016-02-22T00:00:00+00:00Fixing GN2 file paths and a working GN2 server Thu, 25 Feb 2016 00:00:00 +0000
Lots of file paths in GN2 are hard coded. To make the beast work and
move genotype files out of the git repo (reducing that from 350Mb to a
more manageable 30Mb)....Fixing GN2 file paths and a working GN2 server2016-02-25T00:00:00+00:002016-02-25T00:00:00+00:00Many packages to get r-wgcna going Mon, 29 Feb 2016 00:00:00 +0000
I wrote the R-WGCNA package with dependencies and had
to write packages for, r-biocpreprocesscore r-wgcna, r-acepack,
r-latticeextra, r-formula, r-hmisc, r-doparallel, r-iterators,
r-foreach, r-fastcluster, r-dynamictreecut, and r-rcppeigen....Many packages to get r-wgcna going2016-02-29T00:00:00+00:002016-02-29T00:00:00+00:00Preparing GN2 for distribution Thu, 03 Mar 2016 00:00:00 +0000
...Preparing GN2 for distribution2016-03-03T00:00:00+00:002016-03-03T00:00:00+00:00Guix distribution of GN2 Fri, 04 Mar 2016 00:00:00 +0000
I spent significant time trying to get ‘guix archive' to work, but
unfortunately there is a problem with R packages. So I am working
around that for Danny....Guix distribution of GN22016-03-04T00:00:00+00:002016-03-04T00:00:00+00:00A git merge nightmare Thu, 21 Apr 2016 00:00:00 +0000
Working on three branches (Zach, Danny and myself) we
needed to merge our work. This proved less easy than normal, mostly
because we were working on essentially different repositories (after I
diverged the diet version described below). I tried merging,
cherry-picking and hand patching - it all proved too difficult because
we had been intrusively removing and renaming on both ends!...A git merge nightmare2016-04-21T00:00:00+00:002016-04-21T00:00:00+00:00Getting a server running for testing Thu, 28 Apr 2016 00:00:00 +0000
Testing is crucial. I am putting in a test framework for GN2 - I named
it ‘Mechanical Rob' :)....Getting a server running for testing2016-04-28T00:00:00+00:002016-04-28T00:00:00+00:00Everyone on Guix Sun, 15 May 2016 00:00:00 +0000
We have a Guix powered staging server now running on
http://test-gn2.genenetwork.org/. Others have been installing and we
will all be byte-identical and reproducible soon....Everyone on Guix2016-05-15T00:00:00+00:002016-05-15T00:00:00+00:00REST: fetching phenotypes Wed, 18 May 2016 00:00:00 +0000
We are adding a new REST service to GN. Main reason
is to provide data to people using R, Python, Ruby etc. Introducing
the REST server is also an opportunity for splitting functionality out
of the main python webserver. One thing we want to avoid is long
running jobs in python as it blocks on those. The third reason for
a REST interface is to provide WEB 2.0 support to the genome browser
and other UI tools. I started writing a maru based REST server....REST: fetching phenotypes2016-05-18T00:00:00+00:002016-05-18T00:00:00+00:00Adding a multi-phenotype QTL plot Mon, 23 May 2016 00:00:00 +0000
Harm has modifed Karl's QTL plot so it can show
multiple QTL plots in one figure. I am looking at embedding this form
into GN. Basically the user selects an experiment and a gene
(presumably as an expression phenotype) and has it looks for all genes
that correlate. At this point I am not so interested in the logic of
correlations, but I am interested in the plot itself which could plot
multiple phenotypes in one GN collection (read bag of phenotypes). I
like also the way that it adjusts the chromosome sizes....Adding a multi-phenotype QTL plot2016-05-23T00:00:00+00:002016-05-23T00:00:00+00:00REST: fetching phenotypes (2) Tue, 24 May 2016 00:00:00 +0000
Lots of activity on the #genenetwork IRC channel now
the Google Summer of code started. Also distracting, of course, but I
think good things come out of it. I am reverse engineering the GN
database to provide the REST services....REST: fetching phenotypes (2)2016-05-24T00:00:00+00:002016-05-24T00:00:00+00:00A great summer of code Thu, 04 Aug 2016 00:00:00 +0000
Time to update this feed. The last two months saw
plenty of activity. We have a publication in JOSS. Zach has introduced
sessions into GN2. We have a scalable REST server running and we are
adding functionality as we go. And Christian is adding the
biodalliance genome browser to GN2, now with genotype and QTL tracks....A great summer of code2016-08-04T00:00:00+00:002016-08-04T00:00:00+00:00Updating Genotype data for the BXD and reproducibility Fri, 05 Aug 2016 00:00:00 +0000
The REST API serves genotypes. Apparently the version in GN2 was
outdated, for one BXD103 was removed/renamed. To ascertain
reproducibility we need to introduce rigorous versioning and
we need to be able to serve the different versions....Updating Genotype data for the BXD and reproducibility2016-08-05T00:00:00+00:002016-08-05T00:00:00+00:00Getting the latest pylmm-gn2 into GN2 Fri, 12 Aug 2016 12:37:00 +0000
The current version of pylmm we are using is my multi-core build from
May 2015. That is over a year ago and does not include the CUDA
features I added later. The main reason for not updating was a bug in
OpenBLAS and that the new code has a different data transport
requirement. Now we have the new gnserver running I am ready to bring
the CUDA stuff into GN2. First, I tested that the multi-core version
gives the same results as the earlier one...Getting the latest pylmm-gn2 into GN22016-08-12T12:37:00+00:002016-08-12T12:37:00+00:00Updating the deployment of GN2 Tue, 16 Aug 2016 15:59:00 +0000
Christian has done a lot of work in getting BD ready for GN2. My job
is to get it on staging by updating the Guix packages. The last
update was from February. Unfortunately Guix was misbehaving. The GNU
servers are giving errors and the latest checkout of the Guix tree is
also erroneous. One can live with one or the other but not both! I spent
quite a few hours and rolled back to a version from a Month ago to start
rebuilding GN2. One of the first problems I encountered was...Updating the deployment of GN22016-08-16T15:59:00+00:002016-08-16T15:59:00+00:00Progress on GN2 Tue, 18 Oct 2016 15:59:00 +0000
Many good things happening on GN2. New datasets are
being added at a rapid rate. We have a new genotype map for the
BXD. The REST API gives access to almost all data. A genome browser
has been embedded and the list just goes on... The coming period I am
going to focus on running a pilot on a super computer named Beacon....Progress on GN22016-10-18T15:59:00+00:002016-10-18T15:59:00+00:00First steps on the Beacon Intel Phi supercomputer Wed, 19 Oct 2016 15:59:00 +0000
The first step was getting access again. I had locked myself out
somehow... But that was fixed in about 6 hours. I have ssh access
again. One of the first things to check is that Beacon is running a
2012 Linux kernel using Red Hat 4.4.7....First steps on the Beacon Intel Phi supercomputer2016-10-19T15:59:00+00:002016-10-19T15:59:00+00:00Beacon: getting Python to run with MKL Thu, 20 Oct 2016 15:59:00 +0000
pip3 install --user virtualenv
~/.local/bin/virtualenv ~/virtualenv-python-3.4
cd ~/virtualenv-python-3.4
source bin/activate
pip3 install numpy...Beacon: getting Python to run with MKL2016-10-20T15:59:00+00:002016-10-20T15:59:00+00:00Beacon: what software to use? Thu, 20 Oct 2016 15:59:00 +0000
Beacon support wrote:...Beacon: what software to use?2016-10-20T15:59:00+00:002016-10-20T15:59:00+00:00fast-lmm-d Thu, 16 Mar 2017 15:59:00 +0000
Over the last months we have put in a lot of work to
make fast-lmm-d happen, the single core version is faster than the
python version, even without real optimizations. We are now using the
profiler to see where we can do even better....fast-lmm-d2017-03-16T15:59:00+00:002017-03-16T15:59:00+00:00Beacon and Phi Tue, 28 Mar 2017 15:59:00 +0000
We are ORNL with the JICS team to get some of our tools working on the
Beacon supercomputer. Login in to Beacon and jump to a node...Beacon and Phi2017-03-28T15:59:00+00:002017-03-28T15:59:00+00:00Executing LMM on the REST API Fri, 05 May 2017 15:59:00 +0000
The REST server gnserver has been continuously running without fail for
After updating Elixir and Erlang to freshly minted versions in Guix I added
executing shell jobs to gnserver....Executing LMM on the REST API2017-05-05T15:59:00+00:002017-05-05T15:59:00+00:00Pinning Fri, 05 May 2017 15:59:00 +0000
The next step is pinning data structures in CPU or GPU RAM
(i.e. caching) so we get faster results. For the test dataset KveT is
used for multiplication 1x for each SNP, XXi double that, Xt and Yt
4x. KveT is the largest matrix (1024x1024) and computed once, so it
makes sense to pin that first - saving approx. 8GB of RAM copying....Pinning2017-05-05T15:59:00+00:002017-05-05T15:59:00+00:00And faster_lmm_d Fri, 05 May 2017 15:59:00 +0000
We are faster than pylmm by a mile now....And faster_lmm_d2017-05-05T15:59:00+00:002017-05-05T15:59:00+00:00Getting fast_lmm_d to run on CUDA Fri, 05 May 2017 15:59:00 +0000
A first build of the CPU version of fasterlmmd on CUDA shows it is
multicore:...Getting fast_lmm_d to run on CUDA2017-05-05T15:59:00+00:002017-05-05T15:59:00+00:00Updating GeneNetwork on GNU Guix Fri, 05 May 2017 15:59:00 +0000
GeneNetwork2 installs on a recent GNU Guix. I had to update a few
packages when dependency resolution changed. We now have Gemma and CTL
latest, for example....Updating GeneNetwork on GNU Guix2017-05-05T15:59:00+00:002017-05-05T15:59:00+00:00Cleaning up fasterlmmd Fri, 05 May 2017 15:59:00 +0000
This week I have been working with Prasun to clean up
fasterlmmd. Mostly renaming stuff and introducing immutable datatypes
which brought out a number of issues. The current code base is single
threaded and meant to be easy to read. It is already faster than the
old pylmm....Cleaning up fasterlmmd2017-05-05T15:59:00+00:002017-05-05T15:59:00+00:00GEMMA: add LOCO support (assess reduced K) Sat, 08 Jul 2017 00:00:00 +0000
I am looking into adding LOCO support to GEMMA. GEMMA is split into
parts that compute K, perform eigen decomposition and run a GWAS. For
LOCO we have the option of running a script and feed data to GEMMA as
was done here by Peter Carbonetto who, as it happens, also started
work on GEMMA itself (convenient!)....GEMMA: add LOCO support (assess reduced K)2017-07-08T00:00:00+00:002017-07-08T00:00:00+00:00GEMMA: add LOCO support (assess reduced SNPs) Sun, 09 Jul 2017 00:00:00 +0000
Started looking at the LLM fit in GEMMA. Of course all input formats
are treated differently, though with more shared code this time.
Looks like it is easy to fit by chromosome and the easiest option
would be to rerun GEMMA for each K-1 - i.e., an outer loop. This
implies loading the SNP data file for every run. Not necessarily a big
issue in Linux because the file will be cached if it fits RAM. It
would be nicer to load the SNP file once though and create an inner
loop. I'll look into that....GEMMA: add LOCO support (assess reduced SNPs)2017-07-09T00:00:00+00:002017-07-09T00:00:00+00:00GEMMA: first LOCO support (CLI LOCO interface) Wed, 12 Jul 2017 00:00:00 +0000
After reducing the warning output of gcc, I looked at getting LOCO
started for the Kinship matrix. First passing in a -loco command line
switch, next getting chromosome info from a bim or anno file which
contain both SNP/marker names and chromosome name/number. For now I'll
fetch the chromosome from the anno file. The first round of
implementation is on bimbam formats only, so I am ignoring Plink and
others....GEMMA: first LOCO support (CLI LOCO interface)2017-07-12T00:00:00+00:002017-07-12T00:00:00+00:00GEMMA: LOCO support (tests and sets) Wed, 26 Jul 2017 00:00:00 +0000
Back to hacking on GEMMA. First I had to do some work
on the test system and the package deploys with tests on GNU
Guix. Tests have gone onto main line GEMMA, so that is progress....GEMMA: LOCO support (tests and sets)2017-07-26T00:00:00+00:002017-07-26T00:00:00+00:002020-02-19T13:22:01-06:00