Floating sidebar


Sunday, February 19, 2017

Open position for an experienced bioinformatician at Jastrzębiec (near Warsaw), Poland

A position for an experienced bioinformatician is available at the Institute of Genetics and Animal Breeding of the Polish Academy of Sciences (IGAB PAS) at Jastrzębiec, near Warsaw, Poland. The position is for two-years with a possibility of extension depending on results. with a possibility of extension depending on results.

This is primarily a 2-year project funded by NCN (Polish Science Center): "A cis-plus-trans predictive model of mammalian gene expression". This project was ranked first in Poland in biological sciences in the prestigious competition POLONEZ2 co-funded by Horyzont 2020 Marie Skłodowska-Curie COFUND. The post is also likely to involve collaborations with experimentalists (RNA-seq and microarray data). The post will be based in Jastrzebiec‚ IGHZ‚ in Poland. (This is an institute of the Polish Academy of Sciences close to Warsaw).

What you have:

1. A PhD degree in bioinformatics, computer science, software engineering, or a related discipline.

2. The knowledge and skills to implement: (a) R/Python pipeline for machine learning and statistical analysis, (b) next generation sequencing pipeline with the focus on RNA-seq.

3. Good publication track record.

4. Fluent English speaker.

5. Excellent English writing skills.

6. Excellent team player.

7. Expected future publication rate of more than two Q1 papers per year.

What can be offered:

1. Full-time 2-year employment with a possibility of extension.

2. Good salary.

3. Internal grants for preliminary studies.

4. Foreign research visits.

5. Inspiring scientific atmosphere and intellectual environment.

6. Excellent career development possibilities (option for Habilitation, and/or promotion to the level of Associate or Full Professor).

7. The possibility of accommodation at the hotel on the site if the Institute.

The candidates are asked to submit the following documents (combined into single PDF file):

1. Short motivation letter.

2. CV with a publication list (including impact factors and Q-ranking of the respective journals).

3. Scan of the MSc and the PhD degree.

4. Reference letters and/or contact details of previous supervisors.

5. List of relevant experiences in the areas crucial for this post:

(i) statistical programming in R;

(ii) pipeline development in Python;

(iii) RNA-seq NGS analysis;

(iv) machine learning: theory and implementation with R packages;

(v) shell scripting / UNIX / cluster computers.

Please send your application package simultaneously to:





Wished starting time: as soon as possible.


Lukasz Huminiecki (Adj. Prof., PhD)

Bioinformatics Team

Atanas G. Atanasov (Adj. Prof., Dr. habil., PhD)

Head of Molecular Biology Department,

Institute of Genetics and Animal Breeding of the Polish Academy of

Sciences, 05-552 Jastrzebiec, Poland



The project extends our published [1] integrative computational analyses of freely available human functional genomics data from ENCODE and FANTOM5. We continue to work with the concept of the experimentally-defined promoter architectures [1], but extend our modeling towards a cis-plus-trans strategy (to predict tissue-specificity of expression) and through the use of combinatorial approaches (to fish out strongly-determining promoter architectures). The project does not require any new data other then those precisely described by us in high-impact journals [1,2]. This makes our proposal affordable and highly feasible (the project was designed to be high-reward and low-risk by employing two complementary modeling approaches that together form a failsafe synergistic strategy).

Research objectives/hypothesis.

Within the proximal promoter, there are DNA cassettes which facilitate binding of transcription factors. These cassettes are cis-regulatory. The transcription factors are trans-regulatory. In the past, the cis-regulatory cassettes were identified by computer programs, but these programs have prohibitively high positive error rates. Now, there are massive / free databases of experimental data on the binding of transcription factors to databases of experimental data on the binding of transcription factors to proximal promoters (e.g., human ChIP-seq data from project ENCODE). We consistently argue that the ENCODE-derived experimentally-defined promoter architectures must revolutionize how we think about modeling gene expression in human. However, we previously modeled only cis- regulatory elements in the promoter architecture [1]. While the old model could predict the fraction of expressing tissues, it could not predict the identity of expressing tissues [1]. Now, we want to improve on the previous model by pursuing two cis-plus-trans modeling strategies. That is to say, as inputs, we will use not only (i) the architectures of proximal promoters computed as previously described [1], but also (ii) information on expression levels / presence / absence in the target tissue of transcription factors themselves.

Research methodology.

This project is timely because of the tsunami wave of data on gene expression and on protein-DNA interactions, e.g. FANTOM5 [3] and ENCODE [4]. We previously integrated these two databases and used the merged dataset to predict the breadth of expression [1] and demonstrate that haploid expression shaped biased gene content on X [2]. We will integrate ENCODE-derived experimentally-defined promoter architectures with FANTOM5 expression data [1]. Promoter architectures will be represented using sets / multisets (and their fuzzy counterparts that can model uncertainty about (and their fuzzy counterparts that can model uncertainty about promoter architectures due to variable quality scores of ENCODE peaks). To extend on the previously published work, we will create a mathematical and computational framework for representing / searching experimentally-defined the previously published work, we will create a mathematical and computational framework for representing / searching experimentally-defined promoter architectures and inferring expression from them in two cis-plus-trans strategies: (1) using support vector machines trained on data matrices integrating cis- and trans-regulatory inputs, and (2) with combinatorial workflows tailored, in face of data heterogeneity, to fish out promoter architectures strongly-determining expression.

Expected impact of planned research on development of science, civilization and society.

We can, now, identify experimentally-defined promoter architectures, and transcription factor combinations / permutations, which robustly support interesting expression characteristics, not just in one locus, but on the scale of the genome! In addition to basic knowledge, this work will facilitate technological progress in genome annotation and the construction of artificial promoters for gene therapy, transgenic farming or as research tools.


1. Hurst LD, Sachenkova O, Daub C, Forrest AR, Huminiecki L (2014) A simple metric of promoter architecture robustly predicts expression breadth of human genes suggesting that most transcription factors are positive regulators. Genome biology 15: 413.

2. Hurst LD, Ghanbarian AT, Forrest AR, Huminiecki L (2015) The Constrained Maximal Expression Level Owing to Haploidy Shapes Gene Content on the Mammalian X Chromosome. PLoS biology 13: e1002315.

3. Forrest AR, Kawaji H, Rehli M, Baillie JK, de Hoon MJ, et al. (2014) A promoter-level mammalian expression atlas. Nature 507: 462-470.

4. (2012) An integrated encyclopedia of DNA elements in the human genome. Nature 489: 57-74.


Prof. Atanas G. Atanasov (Dr. habil., PhD)

No comments:

Post a Comment