August 1, 2024
Internship in the PIM group
Antoine’s experience at EBI Lale’s experience at EBI Introduction The intersection of biology and computer science has given rise to the rapidly evolving field of bioinformatics. For aspiring scientists in this field, the opportunity to work alongside leading experts at a research institute is a unique experience. Each year, the French Embassy Internship Program makes this dream a reality for a small group of French students, opening the doors to EMBL-EBI.
April 25, 2024
ggCaller installation
Installing ggCaller from Source Installing ggCaller with Docker More information about ggCaller Prerequisites
Before you begin, make sure your system meets the following requirements:
Operating System: Linux (recommended) or macOS. Compiler: GCC (GNU Compiler Collection) version 4.8 or higher. CMake: Version 3.1.0 or later. Git: Version 2.0 or later. Python: Version 3.6 or later. Pip: Python package installer. Basic familiarity with the command line interface. Please download the example fasta files from this link.
December 12, 2023
Multiple horses for multiple courses
This post is about a talk I gave in February 2020 at RSLondonSouthEast, a local conference for research software engineers. First an overview of the talk, then an update after following my own advice for the past few years.
Multiple horses for multiple courses Choosing a language When deciding which programming language to use for a project, a useful principle is ‘horses for courses’ i.e. pick the one that is best suited for the task.
October 30, 2023
Graph-based gene prediction with ggCaller
The bacterial pangenome – quantifying within-species diversity How do you represent a pangenome? Our work – ggCaller How does ggCaller compare to existing tools? Check out the code and the paper In this blog, I give a brief overview of bacterial pangenome analysis, and what problems our tool, ggCaller, solves.
The bacterial pangenome – quantifying within-species diversity A genome is a set of biological instructions, known as ‘genes’, which describe how to make and maintain a living organism.
July 18, 2023
Review: Wellcome Ideathon 2023
The format Our Project (Student Team) Our Experience We attended the Wellcome Data Science Ideathon as semi-finalists in July 2023, which saw the Wellcome Trust invite around 100 researchers across 25 teams to compete to answer some of the biggest public health challenges we face today.
The format The Ideathon was similar to a Hackathon; groups were tasked with answering specific questions in one of three themes - Infectious Disease, Climate & Health, and Mental Health.
February 27, 2023
EMBL-UNESCO Residency in Infection Biology Research
This application is now closed EMBL and UNESCO have recently announced a visitor programme as part of its infection biology scheme, in which we are participating (along with many other EMBL groups, see the link at the end for a full list).
Our group is participating in the scheme, and we’d be love to hear from interested candidates who would like to work with us. We have particular expertise in analysing genomic data from bacterial pathogens, developing and using new bioinformatic tools to do so, as well as developing mathematical models for pathogen transmission and evolution.
December 13, 2022
Mutation Spectra in Streptococcus pneumoniae
An Introduction to Mutation Spectra Mutational Spectra in Pneumococcal Epidemiology Conclusion An Introduction to Mutation Spectra It is intutitive that an organism’s ecological, phenotypic, or epidemiological context exposes it to distinct mutagens, and might thus produce specific signatures and patterns of mutation – that organism’s mutational spectrum.
This idea is well-established in oncology. Cancer epidemiolgy studies have shown that a handful of genes, most prominently the human p53 gene, show patterns of mutation specific to the corresponding cancer types.
Blogs
Peer Review of the pre-print 'Endonuclease fingerprint indicates a synthetic origin of SARS-CoV-2'
Introduction Constructing the null distribution Comparing observations to the null distributions How do new restriction site locations emerge naturally? Mutation analysis Conclusion This is a peer review of the pre-print “Endonuclease fingerprint indicates a synthetic origin of SARS-CoV2”, it is highly recommended that you go and read the pre-print in order to understand this review.
Introduction The broad thread of the argument in the pre-print is that a synthetically engineered COVID-19 virus would be created using a process where ‘restriction’ enzymes cut the vaccine genome into roughly equal fragments so that they can be cloned in a bacterial system before being reassembled.
October 7, 2022
Visualising microbial population structure with mandrake
Paper: https://doi.org/10.1098/rstb.2021.0237
(Joint work with Gerry Tonkin-Hill)
Dimensional reduction and embeddings Our work – mandrake Running on genome datasets –- clusters at multiple resolutions Give it a go! Links What’s the bacronym? Dimensional reduction and embeddings Dimension reduction methods are a popular way to understand large amounts of genetic data: PCA, t-SNE and UMAP have all been used to analyse and visualise large numbers of samples in two-dimensions (with the latter being particularly popular with single cell techniques).