Tuesday 21 April 2009

Introduction to Genetics and X-linked agammaglobulinamia (XLA)

This is intended to provide a quick introduction to genetics, and give people an opportunity to ask questions! This was originally posted as a response to a question about X-linked agammaglobulinaemia...

Genetics: The Basics

At the heart of genetics is the need to move from a gene to a protein: you can think of genes as the instructions to build a protein. Each gene codes a single protein; the gene BTK, which is properly known as Bruton's tyrosine kinase gene, is translated into the protein BTK, and it is this gene which is faulty in the case of XLA.

Genes are made up from 4 molecules, known as nucleotides) strung together: thymine, adenine, guanine and cytosine, normally abbreviated to T, A, G and C. When in the form of double-stranded DNA, which is shown as the spiral in graphics, the T and A and the G and C fit together like the teeth on a zip. You can imagine nucleotides as similar to four different colours of Lego bricks stacked up on top of each other

Proteins, on the other hand, are made up from amino acids which are strung together - there are 20 amino acids which make proteins in the human body; again, you can imagine these as being a bit like 20 different colours of Lego bricks stacked on top of each other. Once the amino acids have been strung together, they spring into the correct shape to do their job. If they are in the wrong shape, they cannot carry out their intended function.

In order to get from a gene to a protein, information is first transcribed from DNA to messenger RNA. For some unknown evolutionary reason, RNA uses uracil (U) instead of thymine (T), but otherwise, the mRNA transcription is identical to the original DNA.

Once the mRNA has been transcribed, it leaves the cell nucleus (where the DNA is stored), and undergoes an editing process. Editing is necessary because genes have sections called introns and exons, but only the exons code for the protein, so the introns are stripped out.

The edited form of the gene attaches to protein-making factories called ribosomes in the main part of the cell. The ribosome then translates the genetic information into proteins by joining together the amino acids in the order specified by the gene.

You will have noticed that there are more amino acids than nucleotides, so how does the cell solve this problem? Instead of reading just one letter at a time, the ribosome reads words which are three letter long, and these words are known as codons. Given four letters read three-at-a-time, this gives 64 different codons (words) which are translated into the 20 different amino acids.

Of particular interest are the codon AUG, which indicates the start of a gene, and three codons (UAA, UAG and UGA) which indicate the end of a gene - you can think of them as genetic punctuation.

It is therefore the codons which make the gene, and small errors in the original DNA, in transcription to RNA and translation to proteins can make large differences to the resulting protein. Why? Consider the following stretch of DNA:

DNA: GATAGCGTTACCAG

This is transcribed to read as

RNA: GAU-AGC-GUU-ACC-(AG)

which translates into

AA: Asp-Ser-Val-Thr

Asparagine, serine, valine and threonine are 4 of the amino acid building blocks (AA stands for amino acid).

However, lets say that the ribosome started reading at the second letter:

DNA: GATAGCGTTACCAG RNA: (G)-AUA-GCG-UUA-CCA-(G) AA: Ile-Ala-Leu-Pro

As you can see, the protein is completely differently from the first one, despite the fact that it consists of the same code.

Now lets assume that the ribosome starts reading at the third letter:

DNA: GATAGCGTTACCAG RNA: (GA)-UAG-CGU-UAC-CAG AA: STOP-Arg-Tyr-Gln

When translated, UAG is a special codon which means STOP, so in this case the ribosome would read STOP and detach from the gene, stopping translation and leaving you with a very short protein!

When a gene mutates (I'm using this word in it's proper sense to mean "a change" as opposed to an X-Men sort of mutant ), it can result in a number of different changes to the gene; as a result, the protein changes. Because each amino acid has its own characteristics (shape, electrical charge, size), swapping just one amino acid for another can result in a protein which has a completely different, and non-functional, shape.

The type of change illustrated above is called a "frameshift"; if you imagine that the ribosome views the gene through a little window frame you can see that the frame has shifted in all three examples.

Another type of mutation is the deletion; a famous example of deletion is called CCR5-Δ32 (Δ is the greek letter delta and means "deletion" in genetics) where 32 nucleotides are missing from the CCR5 gene. CCR5 codes for a protein which is involved in response to inflammation.

Oddly, this doesn't cause people any ill-effects (people with the Δ32 variant are perfectly healthy). However, people with the CCR5-Δ32 variation are resistant to HIV which uses CCR5 protein as a gateway to infect cells. So some mutations have a beneficial effect.

You might have realised that removing 32 nucleotides is a problem for the reading frame because deleting 32 doesn't delete a whole number of codons. In fact, as a result of this deletion, a stop codon is introduced, resulting in the CCR5 from the Δ32 gene only being about half the length it should be. Protein quality control systems recognise that this is abnormal and just recycle the amino acids, so in this case a combination of a deletion and a frameshift result in none of the protein being used.

Another type of mutation is known as Single Nucleotide Polymorphisms or SNPs (pronounced "snips"). This is where one nucleotide is substituted for another.

Using our example above, this might be as follows:

Original: GATAGCGTTACCAG --> GAU-AGC-GUU-ACC-(AG) --> Asp-Ser-Val-Thr

SNPd: GATGGCGTTACCAG --> GAU-GGC-GUU-ACC-(AG) --> Asp-Gly-Val-Thr

As you can see, changing one nucleotide (fourth) resulted in glycine (Gly) being substituted for serine (Ser) in the protein product; this may well result in the resulting protein being unable to do it's job. This is known as as a mis-sense mutation.

The following example has a change, but it doesn't affect the resulting protein because the change codes for the same amino acid. This is known as a same-sense mutation:

Original: GATAGCGTTACCAG --> GAU-AGC-GUU-ACC-(AG) --> Asp-Ser-Val-Thr

SNPd: GATAGAGTTACCAG --> GAU-AGA-GUU-ACC-(AG) --> Asp-Ser-Val-Thr

This final example also has a change; in this case it introduces a stop codon, and this is known as a non-sense mutation:

Original: GATAGAGTTACCAG --> GAU-AGA-GUU-ACC-(AG) --> Asp-Ser-Val-Thr

SNPd: GATTGAGTTACCAG --> GAU-UGA-GUU-ACC-(AG) --> Asp-STOP

X-Linked Agammaglobulinaemia

In XLA, there is a mutation known as Q15X; this is defined as: "a non-sense mutation in exon 2 of BTK leading to stop codon in the PH domain."

We can start to make sense of this: we know that a non-sense mutation results in a stop codon; we known that the exon is the bit of the gene which codes for the BTK protein, so this now means all we need to know are what the BTK protein does, and what a PH domain is.

BTK is an enzyme (Bruton's tyrosine kinase), and whilst we don't know exactly what it does, it is essential for the maturation of B-cells, and is also involved in the activation of mast-cells (which are activated during inflammation.

BTK-Q15X is cut in the section with interacts with chemicals outside the cell (the PH domain), and cannot do it's job effectively. In fact, the name "Q15X" gives us a hint as to what's actually happened at the gene code level, and I have put the first 17 codons (51 nucleotides) of BTK below, and demonstrated the change - it's in the third group from the end of BTK-wt (which stands for "wild type", or the type seen most frequently in the population), and you can see in BTK-Q15X that the SNP results in a stop codon (UAA) appearing.

Amino acids also have one letter abbreviations, and the abbreviation Q = Gln = glutamine; X = STOP. Therefore Q15X can be interpreted as "swap of glutamine for a stop codon at amino acid position 15", and if you count the amino acids below you will see that's exactly what's happened:

BTK-wt (164-215) DNA: ATGGCCGCAGTGATTCTGGAGAGCATCTTTCTGAAGCGATCCCAACAGAAA
RNA: AUG-GCC-GCA-GUG-AUU-CUG-GAG-AGC-AUC-UUU-CUG-AAG-CGA-UCC-CAA-CAG-AAA
AA: START-Ala-Ala-Val-Ile-Leu-Glu-Ser-Ile-Phe-Lue-Lys-Arg-Ser-Gln-Gln-Lys

BTK-Q15X (164-215) DNA: ATGGCCGCAGTGATTCTGGAGAGCATCTTTCTGAAGCGATCCTAACAGAAA
RNA: AUG-GCC-GCA-GUG-AUU-CUG-GAG-AGC-AUC-UUU-CUG-AAG-CGA-UCC-UAA-CAG-AAA
AA: START-Ala-Ala-Val-Ile-Leu-Glu-Ser-Ile-Phe-Lue-Lys-Arg-Ser-STOP

Now, I'm not sure without reading more literature whether the protein is actually expressed or not, because very often the cell will recognise that a protein which is "too short" isn't complete and will just recycle it, but the effect either way is the same: B-cells cannot mature and thus the person doesn't produce antibodies.

In most places, a SNP like this wouldn't be a problem. We have two copies of most chromosomes, and the other copy can normally "pick up the slack". For example, I mentioned CCR5 earlier. Most people have the genotype CCR5-wt/CCR5-wt (ie. they have two copies of the wild-type gene), but about 10% of the white, northern-European population have CCR5-wt/CCR5-Δ32.

The homozygous (genes are the same on both chromosomes) genotype (wt/wt) produce normal levels of CCR5 whereas the heterozygous (different gene versions on each chromosome) population (wt/Δ32) produce CCR5, but at lower levels.

About 1% of the white, northern-European population are homozygous for the Δ32 variant (Δ32/Δ32), and they don't express any CCR5. The heterozygotes are resistant to HIV infection; the Δ32 homozygotes are almost unable to be infected by HIV.

The problem with the gene being on the X chromosome is that us men only have one of them; women of course have two and unless both parents have the BTK-Q15X gene, are unaffected. However, men only need to inherit a single copy of the Q15X variant to be affected.

Q15X is considered to be "classic XLA", which is to say it is the type originally described by Bruton in 1952. The complications of this are those associated with XLA; no B-cells, no antibodies. It is thought that XLA patients may not have allergic disorders as they don't produce any IgE, which mediate allergic reactions.

Most "normal" tests to detect infection are worthless in people with XLA as they test to see if you have produced antibodies to pathogens and are therefore always negative; therefore, people with XLA should ALWAYS have the more expensive test which directly detect the pathogen. XLA isn't associated with other auto-immune diseases (unlike most PIDs).

No comments:

Post a Comment

Although I don't premoderate the comments, I reserve the right to remove comments which are attacking, irrelevant or unscientific.