Pipelining in Computer Architecture

Pipelining is a simple technique to minimize our processor's idle time. Suppose you own a cycle manufacturing company with four workers. Manufacturing one cycle takes 20 units of time and is done in four phases (5 units per phase), manufacturing tyres (A), making a seat (B), building the bod (C) and assembly (D). Initially you start with all of your workers working on the first cycle and start building the next one only after the first one is finished, as shown below.
Figure 1; Instruction wise interleaved
You realize this approach is very unoptimized as your resources (workers) are not being utilized to their best. Hence, you assign each worker a single task. For example, worker 1 manufactures the tyres, worker 2 makes the seats and so on. In this model, worker 1 makes the first pair of tyres in 5 minutes and then moves on to manufacturing the second pair, as shown in the figure below.
Figure 2: Phase wise interleaved
This new approach is pipelining and has reduced the production time for 3 cycles from 60 minutes to 30 minutes. Clearly, this is optimized and has better performance. 

Our computers employ the same technique to enhance our system's performance and execute more instructions in a given time. The processor executes an instruction in following phases:

1. Instruction Fetch: Gets the next instruction to be executed in the program from the memory.

2. Instruction Decode: Evaluates the opcode of the instruction and accordingly picks a suitable decoder line.

3. Operand's Address Resolution: Calculates the addresses in he memory where operands are located.

4. Operand Fetch: Picks up the operands from the resolved addresses.

5. Instruction Execute: The instruction is executed by the Arithmetic Logic Unit.

6. Store Result: The computed result is stored in the destination register.

These six operations, in a pipelined environment can be performed  parallely as shown above. For example, in the first clock cycle, the first Instruction, I1 is fetched. In the second cycle, the I1 gets decoded which in the same clock cycle second instruction I2 can be fetched. Same way, in the in the third clock cycle I1 resolves operand's address while I2 is decoded and I3 is fetched and so on.

Following are some important metric for a pipelined architecture:

1. Total execution time for a non pipelined architecture:
number of instructions ✕ phases per instruction

2. Total execution time for a pipelined architecture: 
 number of instructions + phases per instruction - 1

3. Speedup ratio: 
execution time for a non pipelined architecture / execution time for a pipelined architecture