A NOP takes four cycles because there's no memory bandwidth to fetch anything else until four cycles later; the Z80 spent two cycles fetching the NOP opcode, then decoded and performed it during the two cycles when it was issuing a DRAM refresh. As soon as the refresh ends it can seek out the next thing. That's why it's also four cycles for all the other single-byte instructions that don't imply any other accesses to memory — register-to-register arithmetic and moves, and a few others.

