Study: Protein-truncating Gene Variants Linked to Severe Obesity, Type 2 Diabetes, and Fatty Liver Disease

Photo of author


Ethical Standards

Our research adheres to ethical guidelines. All studies included in the research received approval from appropriate boards or committees. The UK Biobank holds approval from the North West Multi-centre Research Ethics Committee as a Research Tissue Bank (RTB) and all participants provided informed consent. The MCPS and PGR studies were also approved by their respective regulatory bodies, with participants providing informed consent. The SCOOP cohort obtained approval from relevant ethics committees, with consent obtained from participants or their guardians. The INTERVAL study received approval from the National Research Ethics Service Committee, with informed consent obtained from all participants.

UK Biobank Data Processing and Quality Control

Our analysis of whole exome sequencing (WES) data from 454,787 individuals in the UK Biobank utilized established processing methods. Quality control measures were applied to the data, including filtering out variants that did not meet specified criteria. Quality control procedures involved normalization and filtering based on read depth, genotype quality, and allele ratios. Variants meeting quality standards were further annotated using Ensembl Variant Effect Predictor with relevant plugins.

To learn more about the impact of protein-truncating variants in the BSN gene on severe adult-onset obesity, type 2 diabetes, and fatty liver disease, read the full article by Nature Genetics.

Frequently Asked Questions (FAQ)

1. What approvals were obtained for the studies mentioned?

Approval was granted by various ethics committees and regulatory bodies for the studies mentioned in the article. Participants provided informed consent before participating in the research.

2. How was the UK Biobank data processed?

The UK Biobank data underwent rigorous processing and quality control steps to ensure accuracy and reliability. Various tools and methods were used to analyze the data effectively.

3. What annotations were carried out on the variants?

Ensembl Variant Effect Predictor was used to annotate the variants, with additional plugins used for further analysis. Prioritization of Ensembl transcripts based on specific criteria was also performed.


What is the focus of the research described in the article?
The research primarily focuses on conducting exome-wide gene-burden testing in the UK Biobank.

What analytical tools were used in the study?
BOLT-LMM v2.3.6 was used as the primary analytical tool to conduct the gene-burden test in the research.

How was the replication of findings carried out in the study?
The replication of findings was sought in two independent predominantly non-European exome-sequenced cohorts: the MCPS and the PGR study.

What data was collected for the MCPS and PGR study participants?
Data collected for MCPS participants included phenotypic information like height, weight, waist and hip circumferences, and disease history. PGR study participants provided information on lifestyle habits, medical history, family history of diseases, and more.

How were exome sequencing data processed for the MCPS and PGR study participants?
Exome sequencing data for the participants were generated, processed, and annotated using specific platforms and tools. This included aligning reads to the GRCh38 genome reference, variant calling, and annotation with gnomAD MAFs.

Quality Control Steps in Genetic Sequencing

As part of the quality control process, all MCPS and PGR exomes underwent a secondary evaluation using AstraZeneca’s bioinformatics pipeline. This step aimed to ensure the integrity and accuracy of the sequencing data. Several criteria were considered during this screening process:

  • Exclusion of sequences with a VerifyBamID freemix level exceeding 4%.
  • Matching inferred karyotypic sex with self-reported gender.
  • Ensuring at least 94.5% coverage of the consensus coding sequence with a minimum tenfold read depth.
  • Removing one individual from each pair of genetic duplicates or monozygotic twins with a kinship coefficient greater than 0.45.
  • Estimation of kinship coefficients using the kinship function from KING v2.2.3.
  • Additional exclusion of sequences with an average CCDS read depth deviating by at least 2 standard deviations below the mean for the MCPS.

After implementing the aforementioned quality control measures, 139,603 (99.0%) MCPS and 37,727 (99.3%) PGR exomes remained eligible for further analysis.

Predicting Genetic Ancestry

For the MCPS, the genetic ancestry of participants was predicted using PEDDY v0.4.2. This prediction utilized the 1000 Genomes Project sequences as population references. Individuals with a predicted probability of admixed American ancestry greater than or equal to 0.95 and within 4 standard deviations of the means for the top four principal components were retained. Similarly, in the PGR study, individuals with a predicted probability of South Asian ancestry greater than or equal to 0.95 and within 4 standard deviations of the means for the top four principal components were retained. Following ancestry filtering, 137,059 (97.2%) MCPS and 36,280 (95.5%) PGR exomes were included in the subsequent analyses.

Association Analysis of BMI and Weight Traits

The association of BMI and weight quantitative traits with genotype at the identified genes of interest was assessed using a gene-level collapsing analysis framework. Variants were classified as PTVs based on specific annotations by SnpEff, such as exon_loss_variant, frameshift_variant, and stop_gained, among others.


What is the purpose of quality control in genetic sequencing?

Quality control steps in genetic sequencing aim to ensure the accuracy, reliability, and integrity of the sequencing data by applying stringent criteria for data inclusion and exclusion.

How is genetic ancestry predicted in research studies?

Genetic ancestry prediction in research studies utilizes population reference datasets and statistical algorithms to infer the ancestral origins of study participants based on genetic markers.

What are PTVs, and how are they identified in genetic analyses?

PTVs, or protein-truncating variants, are specific types of genetic mutations that can have functional implications. In genetic analyses, PTVs are identified based on annotations that indicate disruptive changes to protein-coding sequences.

Rare genetic variants in specific genes like SETD1A have been linked to conditions such as schizophrenia and developmental disorders. The SCOOP and INTERVAL studies involved analyzing these genetic variants in a large number of cases and controls to understand their impact on health. High-impact variants in the BSN gene were identified using advanced bioinformatics tools to study their effects on protein function. Interestingly, missense variants were found in most coding regions of the BSN gene in both case and control groups, indicating similar detection rates.

In the INTERVAL study, a potentially disruptive variant in the BSN gene was identified but was deemed unlikely to significantly affect protein expression levels. Further investigations are needed to confirm the impact of this variant on bassoon protein function. Additionally, genetic burden analysis revealed interesting findings related to stop-gain variants in the BSN gene between SCOOP cases and INTERVAL controls, suggesting a possible association with the risk of certain conditions.

Phenome-wide analysis conducted using data from the UK Biobank revealed insights into various binary and quantitative traits. By analyzing a vast array of phenotypes, researchers aimed to uncover associations between genetic variants in newly prioritized genes and diverse health outcomes. Quality control measures were implemented to ensure the accuracy and reliability of the findings, focusing on individuals of European descent to maintain genetic homogeneity in the study population.

The study employed advanced statistical models to evaluate the relationship between genetic variants and phenotypic traits of interest. By leveraging cutting-edge bioinformatics tools and large-scale genetic data from the UK Biobank, researchers aimed to deepen our understanding of how rare genetic variants contribute to human health and disease. Further research is warranted to validate these findings and explore the functional implications of identified genetic variants.


What are the SCOOP and INTERVAL studies?

The SCOOP and INTERVAL studies aimed to investigate the impact of rare genetic variants, particularly in genes like SETD1A and BSN, on health conditions such as schizophrenia, developmental disorders, and obesity. These studies involved analyzing genetic data from a large number of cases and controls to understand the role of specific variants in disease development.

What is the significance of high-impact variants in the BSN gene?

High-impact variants in the BSN gene, such as stop-gain mutations, frameshifts, and splice-disrupting variants, could potentially alter the function of the bassoon protein. Understanding the impact of these variants is crucial in unraveling their role in health and disease.

How were the genetic variants in the BSN gene analyzed?

The genetic variants in the BSN gene were analyzed using bioinformatics tools like VEP and Ensembl. By annotating and studying these variants, researchers could identify potential disruptive variants and assess their impact on protein function and expression levels.

What did the phenome-wide analysis in the UK Biobank reveal?

The phenome-wide analysis in the UK Biobank provided insights into a wide range of phenotypic traits and their associations with genetic variants in newly prioritized genes. By analyzing diverse health outcomes, researchers aimed to uncover potential links between genetic variations and various health conditions.### Association testing for other anthropometric phenotypes and protein expression levels

We conducted association tests involving carriers of HC PTVs in APBA1 and BSN, as well as carriers of a common BMI-associated variant (rs9843653) at the BSN locus, with various anthropometric phenotypes from the UK Biobank dataset. These tests were performed using R v3.6.3, along with normalized protein expression data obtained from the Olink platform. The analysis included covariates similar to those used in the exome-wide gene-burden tests. Detailed information on the Olink proteomics assay, data processing, and quality control can be found in the study by Sun et al. (Reference 23). For the association tests, additional covariates such as age2, sex, Olink batch, UK Biobank center, genetic array, number of proteins measured, and the first 20 genetic PCs were incorporated, as suggested by Sun et al. A Bonferroni-corrected threshold for significance was set at P < 3.42 × 10-5.

BMI GWAS lookup and downstream analyses

We searched for proximal BMI GWAS signals near the identified genes within a specific genomic range using data from the UK Biobank. Signals within a designated region around each gene were further verified in an independent BMI GWAS study. Colocalization tests were performed using the approximate Bayes factor method to identify regions showing evidence of colocalization. Additionally, gene-level common variant associations were calculated using MAGMA v1.09, incorporating common nonsynonymous variants within each gene and aggregating them into pathway-level associations when applicable.

Interaction effect between the PGS and PTV carrier status

An investigation into the potential interaction effect between PTV carrier status for BSN and APBA1 and the Polygenic Score (PGS) was carried out through linear regression analysis. The model adjusted for sex, age, age2, and the first 10 PCs. The PGS was constructed using genotype and exome sequencing data from individuals of white European ancestry in the UK Biobank. Summary statistics from Locke et al. were utilized for this analysis, and the construction of the PGS was performed using the ‘lassosum’ package in R v3.6.0.

Cellular work and single-cell analyses

Methods employed for cellular work and single-cell analyses are detailed in the Supplementary Note associated with this study.


What are the key components of the association testing conducted in the study?

The association tests in the study involved carriers of HC PTVs in APBA1 and BSN, as well as carriers of a common BMI-associated variant at the BSN locus, with various anthropometric phenotypes using data from the UK Biobank and normalized protein expression data from the Olink platform.

How was the PGS for the interaction effect analysis constructed?

The PGS used in the interaction effect analysis between PTV carrier status for BSN and APBA1 and the PGS was constructed using genotype and exome sequencing data from individuals of white European ancestry in the UK Biobank, incorporating summary statistics from previous studies.

What methods were used for the downstream analyses in the study?

The downstream analyses included searching for proximal BMI GWAS signals near identified genes, performing colocalization tests, and calculating gene-level common variant associations using MAGMA v1.09, among other methods.

Where can detailed information on cellular work and single-cell analyses be found?

Detailed information on cellular work and single-cell analyses can be found in the Supplementary Note linked with the study.

Leave a Comment

For security, use of Google's reCAPTCHA service is required which is subject to the Google Privacy Policy and Terms of Use.

Share to...