This is how it works
The reason why QKD works is because of the peculiar quantum rules for measurements.
Quantum states work like vectors and we can create superpositions of them. The basis states are actually associated with observables. Let's suppose we have an observable A and an observable B. Now let's suppose a measurement of A will yield two values a1 and a2 - this is the case for polarisation - and that the measurement of B yields two values b1 and b2.
The superposition means that we can write a state a1 as an addition of the states b1 and b2.
Thus a1 = b1 + b2
Now here's where the weird quantum rules come in. Suppose we prepare a state a1. If we measure the observable A we will find the state a1 with 100% probability. The state will be undisturbed.
However if we measure B then the quantum rules tell us that we get the result b1 with 50% probability and the result b2 with 50% probability. Furthermore, if we obtain the result b1 the system has been projected into the state associated with b1.
Now if we think of A and B as two coding schemes where a1 = 1 and a2 = 0 and b1 = 1 and b2 = 0 for example then we can see how we can use the quantum rules to give us a QKD system.
For each transmission we randomly choose whether to use the A or B coding scheme and we also randomly choose whether we will transmit a 1 or 0. Thus we end up with a list of transmitted data - an example is given below.
time slot 1 : state a1, bit value 1, coding scheme A
time slot 2 : state b2, bit value 0, coding scheme B
time slot 3 : state b2, bit value 0, coding scheme B
time slot 4 : state b1, bit value 1, coding scheme B
Now at the other end the receiver does not know which coding scheme has been chosen for each time slot and so must guess. If the coding scheme is guessed correctly then the laws of quantum mechanics mean that the transmitted bit value will be read correctly with 100% probability.
What happens when the receiver gets it wrong? Let's look at time slot 1 and suppose the receiver guesses that a B coding has been used. The laws of quantum mechanics tell us that the result of this measurement will be b1 or b2 with 50% probability of each. Thus half of the time the incorrect bit value will be read.
So in order to get consistent data the sender and receiver select all those instances where they used the same coding scheme and throw away all of the rest of the data. They can do this by revealing the coding scheme they used, but they do not reveal the actual bit that was transmitted/received. The actual bit values where the same coding scheme was chosen can be used to form a key - provided there has been no attempt at eavesdropping - but how do we know? Well the laws of quantum mechanics mean that we can tell.
If there is an eavesdropper we might have the situation
transmit A -> eavesdrop B -> receive A
The eavesdropper guesses the coding scheme wrongly - now the laws of QM tell us that there is a 50% chance each of getting the result b1 or b2. This projects the state into b1 or b2 depending on the actual result. So the eavesdropper who has no way of knowing whether this is correct or incorrect transmits the state she has measured. Now the state that arrives at the receiver is now a B state instead of an A state. Let's suppose it was b1
To the receiver's device, set to measure an A coding, this looks like b1 = a1 + a2 by the superposition rule. So the receiver has a 50% chance of getting the result right - even though he is measuring in the same coding scheme that the transmission used. This leads to an error rate in this data which can be detected.
The simple intercept strategy outlined here leads to an error rate caused by an eavesdropper of 25% if she measures and re-sends every photon.