It's easy to get lost in a eukaryotic cell. Proteins need to be in the right place at the right time to carry out their functions, but the cell is a crowded place, and the layout isn't exactly simple. Fortunately, the cell has a fairly sophisticated transportation system: if you need to head out of the cell, take the secretory pathway; if your job is to regulate genes, the nuclear shuttle will take you where you need to go.

The protein Htb2 hanging out exactly where it is supposed to be - the nucleus

But how does a protein get on the right track? It all comes down to binding interactions: proteins destined for the nucleus, for example, have a stretch of amino acids that cause them to stick to the nuclear import shuttle. Just which amino acids generate the right stickiness, however, isn't always clear. Scientists would love to be able to determine, just by looking at a protein sequence, which proteins are destined for the nucleus, or any other place in the cell. Sometimes we know that a protein resides in the nucleus, but we don't know which structural features ensure that the protein ends up in the right place.

This is where protein cryptography comes in. In a recent paper, a group of Japanese researchers Keio University figured out how to read the encoded signals in a protein sequence that tell a protein to head for the nucleus. They also figured how the nuclear-import signal can get switched off, in the case of a set of cell division proteins that are only allowed in the nucleus for a limited period.

The nuclear import code is contained within protein sequence like "TPSKKCKYSSGF" "KRARTTGSRSL". Computational biologists have tried to decrypt this code, but with limited success, because there are many possible sequences that will be read out as a nuclear localization signal. The Japanese group showed that what was really necessary was more data - in particular, the right kind of data. That data did not already exist, so they rolled up their sleeves and went to work.

They took the protein sequence AAAAAKRARTTGRSL and systematically varied each position, substituting the original amino acid with each of the other 19 amino acids in turn. (So, for example, that first A was replaced with L, then I, then V, etc). Using Green Fluorescent Protein (GFP), the scientists measured how well each modified sequence acted as a nuclear import signal.

This is like breaking a code by trying every possible combination - except that these researchers got lucky. It turns out that each amino acid in the nuclear localization sequence has an additive effect, so that the researchers did not have to try all 20^15 combinations -  just 20 x 15. In other words, putting an L instead of an A in position 1 has the same effect no matter what the rest of the sequence looks like - each amino acid contributes independently to the nuclear localization sequence. So you can put together a simple scoring system to decide whether a sequence makes a good nuclear localization signal - if there is a G in position 1, add +1 to your score, an R in position 2 gives you +2, etc.

Figuring out one protein code wasn't enough for these guys, so they went on and looked for instances of a layered code: nuclear localization signals that get modified by phosphates put on by a kinase protein called Cdk1. Cdk1 likes to pop phosphates on the S of the protein sequence S (or T)-P-X-R (or K), with X in this case representing any amino acid. This Cdk1 site can modify the nuclear localization signal: when you add a phosphate to the S, the nuclear localization signal is masked, and the protein is sent packing out of the nucleus into the cytomplasm. So, when a protein is needed (say, to regulate a set of genes), the localization signal has no phosphate, and the protein resides in the nucleus. When the job is done, the localization signal is phosphorylated by Cdk1, and the protein gets shuttled out.

The researchers looked for SPXK sites inside of nuclear localization signals, and managed to find a whole slew of proteins that shuttle in and out of the nucleus, in a way that's controlled by Cdk1.

There are several lessons to take away from this paper. 1) Proteins, amazingly, find their way around the dense metropolis of the cell, and this process is almost completely controlled by binding affinities - if you stick to the nuclear import shuttle, you're going to the nucleus; if you don't, you're probably not getting in. (This lesson isn't new, but I find it amazing nonetheless.) 2) Sometimes protein code-breaking requires new data. Instead of building an imperfect code-breaking algorithm, biologists need to get to work and do the experiments to needed to build a better one. This is where the Japanese group outshone several other efforts to decode the nuclear localization signal.