Chapter 1: Protein Structures and Hierachy#

1. Introduction#

Welcome to the study of protein architecture. Proteins are not just simple chains; they are complex molecular machines whose function is dictated by their intricate three-dimensional shape. This section introduces the fundamental concept of the protein structure hierarchy—the four distinct levels that build upon one another, from a simple linear sequence to a complex, functional assembly.

By the end of this module, you will be able to identify the four levels of protein structure and understand their critical importance in precision medicine. Your learning journey will begin with core definitions and progress through detailed examples, showing how each structural level offers unique opportunities for therapeutic intervention.


2. Key Concepts and Definitions#

  • Polypeptide Chain: A long, continuous chain of amino acids linked by peptide bonds. Proteins are composed of one or more polypeptide chains, and their sequence forms the primary structure.

  • Primary (1°) Structure: The linear sequence of amino acids in a polypeptide chain. In drug discovery, the precise order is critical, as a single amino acid change (mutation) can dramatically alter a drug’s binding affinity, often leading to drug resistance.

  • Secondary (2°) Structure: Local, repeating 3D shapes, most commonly α-helices and β-sheets, that are stabilized by hydrogen bonds within the protein’s backbone. These structural motifs are frequently involved in protein-protein interactions, making them key targets for drugs designed to block those interactions.

  • Tertiary (3°) Structure: The overall, unique 3D fold of a single polypeptide chain, formed by interactions between the amino acid side chains. This level of structure is what creates the specific grooves and pockets on a protein’s surface that serve as binding sites for drugs.

  • Quaternary (4°) Structure: The arrangement and assembly of multiple polypeptide chains (subunits) to form a single, larger functional protein complex. Drug discovery efforts can target this level by designing molecules that either stabilize the functional complex or disrupt its assembly.

  • IC50 (Half-maximal inhibitory concentration): The concentration of a drug required to inhibit a biological process (like enzyme activity) by 50%. A lower IC50 indicates a more potent drug.

  • UniProt/PDB: Public databases that serve as essential resources in bioinformatics. UniProt provides protein sequence and functional information, while the Protein Data Bank (PDB) contains 3D structural data determined from experiments.


3. Main Content#

3.1 Amino Acids: Building Blocks of Proteins#

In drug discovery, our primary goal is to find molecules (drugs) that interact with biological targets, which are most often proteins. To understand how a drug binds to a protein, we must first understand the basic building blocks that proteins are built on: the amino acids.

All 20 common amino acids share a common backbone. They have a central carbon atom, called the alpha-carbon (Cα), which is bonded to four different groups:

  1. A hydrogen atom (H)

  2. An amino group (NH₂)

  3. A carboxyl group (COOH)

  4. A variable side chain (also called the R-group)

It’s the R-group that makes each amino acid unique and defines its “nature.”

Note: At the physiological pH of ~7.4, the amino group is protonated (NH₃⁺) and the carboxyl group is deprotonated (COO⁻). This is called a zwitterion.

Amino acids in human proteins are almost exclusively in the L-form.

Hide code cell source

from rdkit import Chem
from rdkit.Chem import AllChem
from rdkit.Chem import Draw
from IPython.display import display, HTML
import json
import uuid

# Define the 20 standard amino acids
amino_acids = {
    'ALA': 'A', 'CYS': 'C', 'ASP': 'D', 'GLU': 'E', 'PHE': 'F',
    'GLY': 'G', 'HIS': 'H', 'ILE': 'I', 'LYS': 'K', 'LEU': 'L',
    'MET': 'M', 'ASN': 'N', 'PRO': 'P', 'GLN': 'Q', 'ARG': 'R',
    'SER': 'S', 'THR': 'T', 'VAL': 'V', 'TRP': 'W', 'TYR': 'Y'
}

# --- Generate 2D and 3D structures ---
pdb_structures = {}
svg_structures = {}

for name, code in amino_acids.items():
    # 1. Create the base molecule from the 1-letter code
    peptide = Chem.MolFromSequence(code)
    
    # 2. Generate the 2D SVG structure
    # Generate SVG string for the 2D depiction
    # svg = Draw.MolToSvg(peptide, size=(350, 350)) # This can cause AttributeError in some rdkit versions
    
    # Use MolDraw2DSVG for more robust SVG generation
    drawer = Draw.MolDraw2DSVG(350, 350)
    drawer.DrawMolecule(peptide)
    drawer.FinishDrawing()
    svg = drawer.GetDrawingText()
    
    svg_structures[name] = svg
    
    # 3. Generate the 3D PDB structure
    peptide_3d = Chem.AddHs(peptide)
    AllChem.EmbedMolecule(peptide_3d, randomSeed=42)
    peptide_3d = Chem.RemoveHs(peptide_3d)
    pdb_structures[name] = Chem.MolToPDBBlock(peptide_3d)

# --- Prepare HTML and JavaScript ---

# Create unique IDs to avoid conflicts when re-running the cell
unique_id = str(uuid.uuid4())[:8]
viewer_3d_id = f"viewer_3d_{unique_id}"
viewer_2d_id = f"viewer_2d_{unique_id}"
buttons_id = f"buttons_{unique_id}"
container_id = f"container_{unique_id}"

# Serialize the structure data to JSON
structures_pdb_json = json.dumps(pdb_structures)
structures_svg_json = json.dumps(svg_structures)

# Generate HTML for the buttons
buttons_html = ''.join([
    f'<button onclick="window.showAA_{unique_id}(\'{name}\')" '
    f'style="margin: 2px; padding: 5px 10px; border-radius: 5px; border: 1px solid #ddd; background-color: #f0f0f0; cursor: pointer;">{name}</button>'
    for name in amino_acids.keys()
])

# Create the final HTML string with side-by-side layout
html_code = f"""
<div>
    <!-- Button container -->
    <div id="{buttons_id}" style="margin: 10px;">
        {buttons_html}
    </div>
    
    <!-- Main container for 2D and 3D viewers -->
    <div id="{container_id}" style="display: flex; flex-direction: row; margin-top: 10px;">
        
        <!-- 2D Viewer -->
        <div id="{viewer_2d_id}" style="width: 380px; height: 380px; position: relative; border: 1px solid #ccc; border-radius: 8px; margin-right: 10px; display: flex; align-items: center; justify-content: center; background-color: #fdfdfd;">
            <!-- SVG will be injected here -->
        </div>
        
        <!-- 3D Viewer -->
        <div id="{viewer_3d_id}" style="width: 380px; height: 380px; position: relative; border: 1px solid #ccc; border-radius: 8px;"></div>
    </div>
</div>

<script>
    (function() {{
        // Parse the structure data from JSON
        const structures_pdb = {structures_pdb_json};
        const structures_svg = {structures_svg_json};
        
        let viewer_3d = null;
        const viewer_2d = document.getElementById('{viewer_2d_id}');
        
        // Check if 3Dmol.js library is already loaded
        let scriptLoaded = typeof $3Dmol !== 'undefined';
        
        function init3DMol() {{
            try {{
                // Initialize the 3D viewer
                viewer_3d = $3Dmol.createViewer(
                    document.getElementById('{viewer_3d_id}'),
                    {{backgroundColor: 'white'}}
                );
                
                // Make the showAA function globally accessible with a unique name
                window.showAA_{unique_id} = function(name) {{
                    if (!viewer_3d || !viewer_2d) {{
                        console.error('Viewers not initialized');
                        return;
                    }}
                    
                    // --- Update 2D Viewer ---
                    viewer_2d.innerHTML = structures_svg[name];
                    
                    // --- Update 3D Viewer ---
                    viewer_3d.clear();
                    viewer_3d.addModel(structures_pdb[name], 'pdb');
                    viewer_3d.setStyle({{}}, {{
                        stick: {{radius: 0.2}},
                        sphere: {{scale: 0.25}}
                    }});
                    viewer_3d.zoomTo();
                    viewer_3d.render();
                }};
                
                // Show the initial structure (ALA)
                window.showAA_{unique_id}('ALA');
                
            }} catch(e) {{
                console.error('Error initializing 3Dmol:', e);
            }}
        }}
        
        // Load the 3Dmol.js library if it's not already present
        if (scriptLoaded) {{
            init3DMol();
        }} else {{
            const script = document.createElement('script');
            script.src = 'https://3Dmol.csb.pitt.edu/build/3Dmol-min.js';
            script.onload = init3DMol;
            script.onerror = function() {{
                console.error('Failed to load 3Dmol.js');
                document.getElementById('{viewer_3d_id}').innerHTML = '<p style="padding: 10px; color: red;">Failed to load 3Dmol.js library.</p>';
            }};
            document.head.appendChild(script);
        }}
    }})();
</script>
"""

# Display the HTML in the notebook
display(HTML(html_code))

3.1 Primary (1°) Structure: The Amino Acid Sequence#

The primary structure is the foundation of a protein’s architecture, determined by the genetic code. It dictates all higher levels of folding, and even a minor change at this level can have profound consequences for drug efficacy.

Hide code cell source

from rdkit import Chem
from rdkit.Chem import AllChem
import py3Dmol

sequence = input("Enter peptide sequence (e.g., GAVVLYFPSWM): ").upper()

peptide = Chem.MolFromSequence(sequence)
peptide = Chem.AddHs(peptide)
AllChem.EmbedMolecule(peptide, randomSeed=42)
peptide = Chem.RemoveHs(peptide)

pdb_block = Chem.MolToPDBBlock(peptide)

viewer = py3Dmol.view(width=800, height=600)
viewer.addModel(pdb_block, 'pdb')
viewer.setStyle({'stick': {'radius': 0.2}, 'sphere': {'scale': 0.25}})
viewer.setBackgroundColor('white')
viewer.zoomTo()
viewer.show()

3Dmol.js failed to load for some reason. Please check your browser console for error messages.

3.3 Secondary (2°) Structure: Local Folding Motifs#

The polypeptide chain begins to fold into regular, repeating patterns stabilized by hydrogen bonds between backbone atoms. The two most common motifs are α-helices (coils) and β-sheets (pleated strands).

Proteins fold because they are seeking their lowest energy state that is thermodynamically favorable.

When a protein folds, it minimizes its Gibbs free energy along the free energy landscape. Favorable interactions such as hydrogen bonds, van der Waals forces, ionic interactions, and especially the burial of hydrophobic residues away from water helps to lower the Gibbs free energy.

3.4 Tertiary (3°) Structure: The Global 3D Fold#

The tertiary structure is the final 3D shape of a single polypeptide chain, resulting from interactions between the side chains of its amino acids. This global fold creates the functional domains and, critically, the drug-binding pockets.

The following image shows the HIV-1 Protease enzyme (PDB: 1HSG) bound to the drug Indinavir.

What it shows:

  1. Protein chains A and B displayed as cartoons in light blue and pink

  2. Indinavir ligand shown as ball-and-stick (cyan carbons) in the binding pocket

  3. Binding site residues within 5Å of the ligand highlighted as yellow sticks

  4. Molecular surface around the binding site rendered semi-transparently in orange

Hide code cell source

import py3Dmol

pdb_id = "1HSG"

# Create the viewer
view = py3Dmol.view(query=f'pdb:{pdb_id}', width=900, height=700, viewergrid=(1,1))

# Initial styles
# Protein - cartoon
view.setStyle({'model': 0, 'chain': 'A'}, 
              {'cartoon': {'color': 'lightblue', 'opacity': 0.8}})
view.setStyle({'model': 0, 'chain': 'B'}, 
              {'cartoon': {'color': 'lightpink', 'opacity': 0.8}})

# Ligand - ball and stick
view.setStyle({'model': 0, 'hetflag': True, 'not': {'resn': 'HOH'}}, 
              {'stick': {'radius': 0.25, 'colorscheme': 'cyanCarbon'},
               'sphere': {'radius': 0.4, 'colorscheme': 'cyanCarbon'}})

# Binding site residues
view.addStyle({'model': 0, 'byres': True, 'expand': 5, 
               'hetflag': True, 'not': {'resn': 'HOH'}},
              {'stick': {'radius': 0.15, 'color': 'yellow'}})

# Surface
view.addSurface(py3Dmol.VDW, 
                {'opacity': 0.25, 'color': 'orange'},
                {'model': 0, 'byres': True, 'expand': 5, 
                 'hetflag': True, 'not': {'resn': 'HOH'}})

view.zoomTo({'hetflag': True})
view.zoom(1.5)

# Display viewer
view.show()

3Dmol.js failed to load for some reason. Please check your browser console for error messages.

3.5 Quaternary (4°) Structure: Subunit Assembly#

Many proteins consist of more than one polypeptide chain (subunit). The quaternary structure describes how these subunits assemble into a functional complex. The interfaces between subunits are important drug targets.

The following structure depicts the human deoxyhemoglobin (PDB: 1GZX) demonstrating quaternary structure - the assembly of multiple polypeptide subunits into a functional protein complex.

What it shows:

  1. 2 alpha subunits (chains A and C) in light blue and cyan

  2. 2 beta subunits (chains B and D) in salmon and red

  3. 4 heme groups (containing iron) shown as ball-and-stick in green, one per subunit

Hide code cell source

import py3Dmol

pdb_id = "1GZX"  # Human deoxyhemoglobin

# Create the viewer
view = py3Dmol.view(query=f'pdb:{pdb_id}', width=900, height=700)

# Quaternary structure - 4 subunits in different colors
# Alpha chains (A and C)
view.setStyle({'model': 0, 'chain': 'A'}, 
              {'cartoon': {'color': 'lightblue', 'opacity': 0.9}})
view.setStyle({'model': 0, 'chain': 'C'}, 
              {'cartoon': {'color': 'cyan', 'opacity': 0.9}})

# Beta chains (B and D)
view.setStyle({'model': 0, 'chain': 'B'}, 
              {'cartoon': {'color': 'salmon', 'opacity': 0.9}})
view.setStyle({'model': 0, 'chain': 'D'}, 
              {'cartoon': {'color': 'red', 'opacity': 0.9}})

# Heme groups - ball and stick
view.setStyle({'model': 0, 'resn': 'HEM'}, 
              {'stick': {'radius': 0.3, 'colorscheme': 'greenCarbon'},
               'sphere': {'radius': 0.5, 'colorscheme': 'greenCarbon'}})

view.zoomTo()

# Display viewer
view.show()

3Dmol.js failed to load for some reason. Please check your browser console for error messages.


4. Summary and Key Takeaways#

In this section, we’ve explored the four-tiered hierarchy of protein structure, a central principle in biochemistry and drug discovery. We saw how each level builds upon the last, from the fundamental amino acid sequence to the final, functional multi-subunit complex. Understanding this hierarchy is essential for identifying and designing effective drugs.

  • Primary Structure is the Blueprint: The linear sequence of amino acids dictates everything. A single mutation can lead to drug resistance.

  • Secondary Structure Forms Local Motifs: α-helices and β-sheets are common repeating shapes that are often involved in protein interactions and can be mimicked by drugs.

  • Tertiary Structure Creates Binding Pockets: The overall 3D fold of a single chain generates the precise pockets and surfaces that drugs are designed to bind to.

  • Quaternary Structure Governs Complex Assembly: The interaction between multiple subunits creates another layer of regulation that can be targeted by drugs to either stabilize or disrupt a protein complex.

This foundational knowledge of protein structure will be crucial as we move on to the next topic, Protein Dynamics and Flexibility, where we will see that these structures are not static but are constantly in motion.