The Impact of Structural Variations in the Human Genome on Disease Susceptibility

The human genome is a complex and dynamic blueprint that influences many aspects of our health. Among its many features, structural variations (SVs) are large-scale alterations in the DNA sequence that can significantly impact disease susceptibility. Understanding these variations helps scientists develop better diagnostic tools and treatments.

What Are Structural Variations?

Structural variations (SVs) are genomic rearrangements involving segments of DNA typically longer than 50 base pairs. They include deletions, duplications, insertions, inversions, and translocations. Unlike single nucleotide polymorphisms (SNPs), which change only one base, SVs can span entire genes, regulatory regions, or even multiple genes. These large-scale changes often have more profound biological effects because they can alter gene dosage, disrupt coding sequences, or modify long-range regulatory interactions.

The discovery of SVs dates back to early cytogenetic studies using karyotyping, which could detect chromosomal abnormalities such as translocations and large deletions. With the advent of microarray technologies and next-generation sequencing (NGS), researchers now routinely identify thousands of SVs per genome. Many SVs are common in the population and may be benign, but others are strongly associated with disease. The 1000 Genomes Project and the Genome Aggregation Database (gnomAD) have cataloged millions of SVs, revealing their prevalence and diversity across human populations.

Key Differences Between SVs and SNPs

  • Size: SVs are at least 50 bp; SNPs are single base pairs.
  • Impact: SVs often affect multiple nucleotides, potentially disrupting protein structure or gene regulation entirely; SNPs may have subtle or no effect.
  • Detection: SVs require specialized algorithms and long-read sequencing for accurate characterization; SNPs are easily called from short-read data.
  • Frequency: SVs are less numerous than SNPs but account for a larger fraction of variable base pairs between individual genomes.

Types of Structural Variations

Structural variations fall into several distinct categories, each with unique mechanisms of formation and consequences for genome function.

Deletions

Deletions remove a segment of DNA. When a deletion eliminates an entire gene or critical regulatory element, it can cause loss-of-function. For example, the SMN1 gene deletion causes spinal muscular atrophy. Deletions can also occur in non-coding regions, disrupting enhancers or splicing signals.

Duplications

Duplications create extra copies of a DNA segment. Tandem duplications produce adjacent repeats, while segmental duplications may be interspersed. Increased gene dosage from duplications can lead to overexpression. Duplications of the PMP22 gene cause Charcot-Marie-Tooth disease type 1A, a peripheral neuropathy.

Insertions

Insertions add new DNA sequences, which may originate from other genomic locations (e.g., transposable elements) or be novel sequences. Insertions can disrupt genes if they land in exons or regulatory regions. Large insertions, such as those from LINE-1 retrotransposons, have been linked to hemophilia A and certain cancers.

Inversions

Inversions reverse the orientation of a DNA segment. If breakpoints occur within a gene, inversions can disrupt that gene. More commonly, inversions do not change gene copy number but can affect gene expression by altering chromatin architecture or separating regulatory elements from their target genes. The ~900 kb inversion on chromosome 17q21.31 is a classic example found in the human population.

Translocations

Translocations involve the exchange of DNA between non-homologous chromosomes. Balanced translocations may not cause loss of genetic material but can create fusion genes (e.g., BCR-ABL1 in chronic myeloid leukemia). Unbalanced translocations result in gains or losses of chromosomal segments, often causing developmental disorders.

Detection Methods for Structural Variations

Identifying SVs in the human genome has historically been challenging due to their size and complexity. Modern detection relies on several complementary technologies.

Short-Read Sequencing

Most NGS platforms produce reads of 100–300 base pairs. SVs are inferred using paired-end mapping, read depth, split reads, or de novo assembly. Each method has biases: paired-end mapping can detect deletions and insertions up to ~1 kb but struggles with large events; read depth analysis reveals copy number changes but not balanced rearrangements. Despite these limitations, short-read sequencing remains the most widely used approach for population-scale SV discovery.

Long-Read Sequencing

Technologies from Pacific Biosciences (PacBio) and Oxford Nanopore generate reads of 10–100 kb or more. Long reads span entire SVs, enabling direct detection of breakpoints and complex rearrangements. These platforms have dramatically improved SV identification, especially in repetitive regions that are invisible to short-read methods. The cost is still higher, but long-read sequencing is becoming standard for high-resolution genome analysis.

Optical Mapping

Methods such as Bionano Genomics use fluorescently labeled restriction sites on extremely long DNA molecules. Optical mapping produces genome-wide maps that can detect large insertions, deletions, inversions, and translocations with high confidence. It is often used to validate SVs found by sequencing.

Microarrays

Comparative genomic hybridization (CGH) arrays and SNP arrays detect copy number variations (deletions and duplications) at resolutions down to ~10 kb. They are still used clinically for developmental delay and cancer diagnostics but cannot identify balanced rearrangements.

Impact on Disease Susceptibility

Structural variations can disrupt gene function or regulation, leading to increased risk for various diseases. The effects depend on the type, size, and genomic context of the SV.

Gene Dosage Effects

Deletions and duplications alter the number of copies of a gene. Haploinsufficiency (one functional copy) can cause disease if the gene product is required in a certain amount. Conversely, triplication or amplification can lead to toxic gain-of-function. For instance, duplication of the APP gene causes early-onset Alzheimer's disease due to increased amyloid beta production.

Disruption of Coding Regions

Inversions or deletions that break within an exon can produce truncated proteins or induce nonsense-mediated decay. This mechanism underlies many Mendelian disorders, such as Duchenne muscular dystrophy from large DMD deletions.

Alteration of Regulatory Elements

SVs can remove or reposition enhancers, silencers, or promoters. Even if the gene coding region is intact, loss of a distant enhancer can abolish expression. For example, deletions upstream of the SOX9 gene cause campomelic dysplasia. Similarly, translocations that move a strong enhancer next to an oncogene (e.g., MYC in Burkitt lymphoma) drive cancer.

Genomic Instability

Some SVs create genomic architecture prone to further rearrangements. In regions with segmental duplications, non-allelic homologous recombination (NAHR) frequently generates recurrent deletions or duplications. These hotspots are associated with neurodevelopmental disorders such as Smith-Magenis syndrome and Charcot-Marie-Tooth disease.

Cancer

Somatic structural variations are hallmarks of cancer genomes. Deletions of tumor suppressor genes (e.g., TP53, RB1, CDKN2A) remove brakes on cell proliferation. Amplifications of oncogenes (e.g., ERBB2 in breast cancer, MYCN in neuroblastoma) drive uncontrolled growth. Chromothripsis, a catastrophic shattering and reassembly of chromosomes, produces hundreds of SVs in a single event and is common in bone cancers. Sequencing cancer genomes now routinely identifies actionable SV-derived fusions, such as EML4-ALK in lung adenocarcinoma.

Neurodevelopmental and Psychiatric Disorders

Recurrent deletions and duplications at the 16p11.2 locus are strongly associated with autism spectrum disorder (ASD). Deletions have also been linked to schizophrenia, while duplications correlate with intellectual disability. The 22q11.2 deletion causes DiGeorge syndrome, featuring cardiac defects, immune deficiency, and high risk for psychosis. Large-scale studies show that rare, large SVs contribute to risk across autism, schizophrenia, and developmental delay.

Inherited Diseases

Cystic fibrosis can result from large deletions in the CFTR gene, though point mutations are more common. Spinal muscular atrophy is almost always due to homozygous deletion of SMN1. In hemophilia A, inversions involving the F8 gene are a frequent cause. These examples highlight how SVs can be the primary genetic cause of classic Mendelian disorders.

Cardiovascular and Metabolic Diseases

Structural variations also influence complex diseases. A common deletion at 9p21.3 is a risk factor for coronary artery disease. Copy number variants in LPA affect lipoprotein(a) levels and thereby cardiovascular risk. In obesity, deletions at 16p11.2 again appear, but now as a cause of severe early-onset obesity, demonstrating the pleiotropy of SVs.

Clinical Implications and Therapeutic Targeting

Diagnostic Screening

Clinical genetic testing increasingly includes SV detection. Chromosomal microarray analysis (CMA) is first-tier for developmental delay and congenital anomalies. Whole-exome sequencing (WES) and whole-genome sequencing (WGS) now incorporate SV calling algorithms. For cancer, liquid biopsy assays can detect copy number amplifications and fusions from circulating tumor DNA.

Pharmacogenomics

Copy number variations in drug-metabolizing enzymes affect drug response. For example, duplications of CYP2D6 cause ultrarapid metabolism of codeine and antidepressants, while deletions make poor metabolizers. Understanding SVs in pharmacogenes is crucial for personalized dosing.

Gene Therapy and Genome Editing

Advances in gene editing offer potential corrective strategies for pathogenic SVs. CRISPR-Cas9 can excise duplicated regions, induce deletions to remove toxic gain-of-function mutations, or disrupt fusion genes. In clinical trials, zinc-finger nucleases are being used to correct the CCR5 deletion for HIV resistance. Adeno-associated virus (AAV) vectors can deliver large therapeutic genes to compensate for deletions (e.g., SMN1 in SMA).

Future Directions

Research into structural variations is accelerating, driven by new technologies and large-scale genomics initiatives.

Population-Scale Studies

Projects like the 1000 Genomes Project, gnomAD, and the UK Biobank are cataloging SVs across diverse populations. These resources enable GWAS for SVs, reveal selective pressures, and improve variant interpretation. They also highlight the underappreciated role of SVs in common complex traits.

Functional Annotation

Understanding which SVs are pathogenic requires large-scale functional assays. CRISPR screens can introduce defined SVs into cells and measure effects on growth, gene expression, or drug sensitivity. The ENCODE Consortium is expanding its annotation to include structural variant effects on chromatin and transcription.

Integration with Other Omics

Combining SV data with transcriptomics, proteomics, and epigenomics provides a systems-level view. Allele-specific expression analysis can reveal how SVs alter gene regulation. Long-read sequencing of RNA (Iso-Seq) can directly detect SV-induced fusion transcripts.

Clinical Implementation

As sequencing costs drop, whole-genome sequencing will likely replace arrays for clinical SV detection. Long-read sequencing is being evaluated for rapid diagnosis in neonatal intensive care units. Machine learning models trained on large SV databases will improve automated classification of pathogenic from benign variants.

Conclusion

Understanding the role of structural variations in disease susceptibility is essential for improving diagnostics, developing new treatments, and advancing precision medicine. SVs contribute to a wide range of conditions—from cancer to neuropsychiatric disorders to inherited metabolic diseases. With improved detection technologies, functional studies, and population-scale resources, researchers are poised to translate knowledge of SVs into tangible clinical benefits. Continued research will shed light on the intricate relationship between our genome's structure and health outcomes.

External resources for further reading: National Human Genome Research Institute - Structural Variation, Nature Reviews Genetics - Structural Variation in Human Disease, gnomAD Structural Variants Browser.