Structural variants (SVs) represent a major class of genomic alterations that involve DNA segments typically larger than 50 base pairs. These rearrangements include deletions, duplications, inversions, translocations, and insertions, and they collectively account for more base-pair differences between individuals than single nucleotide polymorphisms. Understanding the impact of SVs on gene expression and phenotypic traits is crucial for unraveling the genetic basis of human disease, evolutionary adaptation, and normal variation. Advances in sequencing technologies have revealed that SVs are ubiquitous in the genome and often exert profound effects by altering gene dosage, disrupting regulatory elements, or modifying chromatin architecture. This article provides a comprehensive overview of structural variant types, their mechanisms of action on gene expression, and their contributions to phenotypic diversity and disease.

Types and Mechanisms of Structural Variants

Structural variants are classified by the nature and orientation of the rearrangement. Although the traditional categories remain useful, modern genomics recognizes a continuum from simple SVs to complex rearrangements involving multiple breakpoints.

Deletions

Deletions involve the loss of a DNA segment ranging from ~50 bp to several megabases. When a deletion removes a gene's coding region, the result is often a complete loss of function, as seen in alpha-thalassemia where deletions of the alpha-globin genes reduce hemoglobin production. Deletions also frequently affect regulatory regions, such as enhancers or promoters, leading to altered gene expression without directly disrupting the gene itself. Hemizygous deletions can unmask recessive alleles or cause haploinsufficiency, where a single functional copy is insufficient for normal function.

Duplications

Duplications create extra copies of a genomic segment. Tandem duplications place the extra copy adjacent to the original, while interchromosomal duplications relocate the copy elsewhere. The primary effect is increased gene dosage, which can drive overexpression. A well-known example is the duplication of the amylase 1 (AMY1) gene, which elevates salivary amylase production and has been linked to high-starch diets in certain populations. Duplications can also generate novel gene fusions or disrupt regulatory landscapes through copy-number imbalances.

Inversions

Inversions reverse the orientation of a DNA segment. Breakpoints often interrupt sequences, but the major functional impact occurs when inversions separate a gene from its regulatory elements or bring it under the control of a different promoter. Inversions can also suppress recombination, creating "genomic islands" that evolve independently. A classic example is the inversion on chromosome 8p23.1 associated with recurrent microdeletion syndromes and altered risk for autoimmune diseases due to effects on defensin gene copy number.

Translocations

Translocations involve the exchange of DNA segments between non-homologous chromosomes. Balanced translocations (no net loss or gain) often disrupt genes at breakpoints or create fusion genes, such as the BCR-ABL1 fusion in chronic myeloid leukemia. Unbalanced translocations result in partial trisomies or monosomies, leading to severe developmental disorders. Translocations also alter the nuclear positioning of genomic regions, which can affect gene expression through changes in the local chromatin environment.

Complex Structural Variants

Advances in long-read sequencing have uncovered complex SVs involving multiple breakpoints and rearrangements that do not fit simple categories. These include chromothripsis (massive local shattering and reassembly), chromoplexy (interconnected rearrangements across chromosomes), and tandem duplications with inversions. Such complex events are particularly common in cancer genomes and are increasingly recognized in germline disorders.

Impact on Gene Expression

Structural variants influence gene expression through several distinct mechanisms, often acting in combination to produce quantitative or qualitative changes in transcript levels. The effects can be positioned near the variant or affect genes at a distance through long-range interactions.

Gene Disruption and Fusion

When a breakpoint lands within a gene, the resulting rearrangement can truncate the coding sequence, delete exons, or fuse two different genes. These events often lead to loss-of-function alleles or neomorphic fusion proteins with altered activity. For example, deletions in the DMD gene cause Duchenne muscular dystrophy by eliminating essential dystrophin domains. Gene fusions like EML4-ALK in lung cancer create constitutively active kinases that drive proliferation.

Alteration of Regulatory Elements

SVs can delete, duplicate, or reposition enhancers, silencers, promoters, and insulators. A deletion removing a tissue-specific enhancer may silence a gene expressed only in that tissue, while duplication of an enhancer can cause ectopic expression. Inversion or translocation can bring a gene under the control of a new regulatory element, as seen in some forms of campomelic dysplasia where SOX9 is misregulated by an inversion that separates it from its enhancer.

Dosage Effects and Imbalances

Copy number variants (deletions and duplications) directly change the number of functional copies per cell. For dosage-sensitive genes, even a 50% reduction (haploinsufficiency) or a 50% increase (triplosensitivity) can cause disease. The human genome contains many haploinsufficient genes, such as TBX1 in 22q11.2 deletion syndrome, and RERE in neurodevelopmental disorders. Conversely, duplications of dosage-sensitive regions, like the duplication of PMP22 causing Charcot-Marie-Tooth disease type 1A, illustrate how increased copy number produces pathology.

Position Effects and Chromatin Architecture

Genomic rearrangements can alter the spatial positioning of DNA within the nucleus. Large deletions or inversions may disrupt topologically associating domains (TADs), causing enhancers to aberrantly interact with genes they normally do not contact. Similarly, translocations can move genes into heterochromatic regions that silence expression. SVs that affect CTCF binding sites or other insulator elements further contribute to misregulation. These position effects explain why some SVs cause phenotypes even when they do not directly disrupt genes.

Phenotypic Consequences in Human Health and Disease

The impact of structural variants on phenotypic traits ranges from subtle quantitative variation to severe developmental disorders. Their role in complex diseases and adaptive traits is increasingly well documented.

Neurodevelopmental and Psychiatric Disorders

Copy number variants (CNVs) are among the strongest known genetic risk factors for autism spectrum disorder (ASD), schizophrenia, and intellectual disability. Recurrent CNVs at 16p11.2, 22q11.2, and 1q21.1 show high penetrance. For example, deletions at 16p11.2 are associated with ASD, macrocephaly, and obesity, while duplications of the same region result in microcephaly and schizophrenia. These reciprocal phenotypes highlight the dosage sensitivity of genes in that interval. Structural variants also contribute to intellectual disability through de novo events, often detected by chromosomal microarray in diagnostic settings.

Cancer Genomics

Somatic structural variants are hallmarks of cancer genomes. Translocations create oncogenic fusions, such as ERG-TMPRSS2 in prostate cancer or PML-RARA in acute promyelocytic leukemia. Amplifications (high-level duplications) of oncogenes like MYC and EGFR drive tumor growth. Deletions of tumor suppressors, including CDKN2A and PTEN, remove critical barriers to proliferation. Moreover, complex rearrangements like chromothripsis can catastrophically alter many genes at once, accelerating cancer evolution. Understanding SV patterns in tumors guides prognosis and targeted therapy selection.

Hematological Disorders

Structural variants underlie many inherited blood disorders. Deletions in the alpha-globin gene cluster cause alpha-thalassemia, with severity proportional to deleted copies. Duplications of the Beta-globin gene are linked to beta-thalassemia traits. Similarly, inversions in the F8 gene cause severe hemophilia A through disruption of the factor VIII coding sequence. Structural variants also play a role in inherited anemias through altered regulation of erythropoiesis-related genes.

Adaptive and Population-Level Variation

Not all structural variants are deleterious. Some have been selected for their beneficial effects. The duplication of the AMY1 gene mentioned earlier is one of the most striking examples: populations with high-starch diets, such as Japanese and European agriculturalists, carry an average of 6-7 copies, while hunter-gatherer populations average fewer. Copy number variation in the CCL3L1 chemokine gene influences HIV susceptibility, highlighting how SVs can modulate infectious disease risk. In addition, structural variants in the LPA gene affect lipoprotein(a) levels and cardiovascular risk, with population-specific frequencies driven by balancing selection.

Technologies for Detecting and Characterizing Structural Variants

The study of structural variants has been revolutionized by sequencing and array technologies. Each method has strengths and limitations in sensitivity, resolution, and the types of SVs detected.

Chromosomal Microarray (CMA)

CMA, including array comparative genomic hybridization and SNP arrays, remains a first-line diagnostic tool for detecting CNVs larger than ~10-50 kb. It excels at identifying deletions and duplications but cannot detect balanced rearrangements, inversions, or translocations without copy-number change. CMA has been instrumental in defining recurrent CNV syndromes and has a diagnostic yield of 10-20% in neurodevelopmental disorders.

Short-Read Whole-Genome Sequencing (WGS)

Short-read WGS at ~30X coverage can detect SVs down to ~50 bp using paired-end, split-read, and read-depth analysis. However, it struggles with SVs in repetitive regions, complex rearrangements, and due to the typical length of reads (~150 bp). Despite these limitations, large-scale projects like the 1000 Genomes Project have cataloged thousands of SVs, revealing their prevalence and population structure.

Long-Read Sequencing

Technologies such as PacBio SMRT and Oxford Nanopore generate reads of 10-100 kb or more, enabling direct detection of SVs in repetitive and complex regions. Long reads can span entire SVs, making them ideal for identifying inversions, tandem repeats, and large insertions. Recent studies using long-read sequencing have uncovered novel SVs associated with Huntington disease and other repeat expansion disorders, and have refined the map of structural variant breakpoints in the human genome.

Optical Mapping and Other Methods

Optical mapping produces high-resolution restriction maps of single DNA molecules, allowing detection of large SVs and rearrangements at the genome-wide level. It is particularly useful for identifying balanced translocations and inversions that are invisible to microarrays. Emerging techniques, including linked-read sequencing and Hi-C, further improve SV detection by providing long-range information from short-read data or chromatin conformation contact maps.

Evolutionary Significance of Structural Variants

Structural variants are not only disease-causing; they also fuel evolutionary innovation. By generating new gene copies, rearranging regulatory landscapes, and creating novel genetic material, SVs are a major source of genetic variation upon which natural selection acts.

Gene Duplication and Neofunctionalization

Gene duplication is a primary mechanism for the emergence of new functions. Duplicated genes can acquire new roles through mutation while the original copy retains the ancestral function. The opsin gene family, expanded through SVs, underlies color vision. Similarly, segmental duplications in primates contributed to brain-expanded genes such as SRGAP2C, which is implicated in cortical development. In humans, recent duplications of the PCSK9 gene have been associated with reduced cholesterol levels under selective pressure from adaptation.

Population Structure and Adaptation

Structural variants often show strong population differentiation, indicating local adaptation. For example, duplications of CYP2D6 gene, which metabolizes many drugs, vary in frequency among populations and affect drug efficacy. In Arctic populations, copy number variation in the FADS gene cluster influences fatty acid metabolism, likely as an adaptation to high-fat marine diets. Inversions, by suppressing recombination, can maintain co-adapted gene complexes, such as the 17q21.31 inversion polymorphism associated with fertility and brain size in Europeans.

Mechanisms of SV Formation in Evolution

Structural variants arise through several mutational mechanisms, including homologous recombination (nonallelic homologous recombination, NAHR), nonhomologous end joining (NHEJ), and replication-based mechanisms (FoSTeS/MMBIR). Regions rich in segmental duplications and low-copy repeats are hotspots for NAHR-mediated SVs. Over evolutionary time, these mechanisms generate a dynamic landscape of gain, loss, and rearrangement that shapes genomes.

Clinical and Translational Implications

Understanding structural variants has direct clinical benefits. Incorporation of SV analysis into precision medicine is improving diagnosis and treatment stratification. For example, in oncology, detection of NTRK gene fusions guides the use of TRK inhibitors, regardless of tumor type. In prenatal genetics, noninvasive detection of large fetal CNVs from maternal plasma is now possible. As long-read sequencing becomes more affordable, comprehensive SV profiling will become routine in clinical genomics.

Challenges remain in interpreting the clinical significance of rare SVs, especially those that are private. Large-scale reference databases, such as the Database of Genomic Variants (DGV) and gnomAD SV, help distinguish benign from pathogenic events. Functional assays, including CRISPR engineering of SVs in cellular models, are increasingly used to validate causality.

Future Directions

The field is moving toward a complete catalog of human structural variants and their functional impact. Key priorities include: (i) sequencing diverse ancestral populations to capture underrepresented variation; (ii) developing computational methods to integrate SV effects with transcriptomic, epigenomic, and proteomic data; (iii) leveraging pangenome references that represent SVs as alternate loci rather than linear coordinates; (iv) and designing therapeutic strategies that target the genomic consequences of SVs, such as antisense oligonucleotides or gene editing.

The discovery that many noncoding SVs influence gene expression through long-range regulation will drive the identification of new disease mechanisms. Furthermore, the role of SVs in somatic mosaicism and aging-associated clonal hematopoiesis is an emerging area with significant implications for cancer risk and age-related diseases. Ultimately, a deep and functional understanding of structural variants promises to unlock new insights into the genotype-phenotype map.

References and further reading