20 March, 2013

Performance metrics of Computer Systems - 1

When we think about evaluating the performance of a system, some things that comes to our mind are

a) Configuration and Flow that needs to be bench-marked.
b) Number of instances of node/server (1 or many ).
c) Duration of the task.
d) Tools that are gonna be used for analyzing and preparing report.
e) Existing benchmark report of the system for comparing the new data.


Let me give you my idea on each of the above aspects as well as my views on performance metrics of a system. Specifically let me discuss about "arriving  at benchmarks when we release the Version 1 of the software" in this blog.

consider a simple coin toss :)  Yeah its about probability and queuing :D
How are we going to measure the performance of this toss?
a) May be height travelled by the coin ?
b) Number of flips it makes? etc etc.

We use "laws of physics" as a tool to measure the performance metrics of the coin.

Now consider a system where messages are received, parsed and given to various components and sent back. This kind of system is called Open system.

There is one more type of system called Closed System. Interpret it to a CI machine where CI-coordinators submit jobs which in-turn is executed on a set of machines ( servers ) and returns back to idle state.

Our software (Version 1) can be open or closed. As we don't have any previous benchmarks for this software, we have to arrive at a theoretical bench-mark so that we are able to ascertain the practical measurements.

So how do we arrive at theoretical benchmarks?
Using Queuing models :-). Yeah its all maths buddy :D

For those who have taken a Queuing course, these things should not be new. Others please continue reading.

As we are used to event-driven programming, lets continue our thoughts using this.

What is event-driven programming?
Some events happen and based on the event we execute some tasks ( code ). After this may be the server goes to wait state to handle next set of events.

So lets define 4 set of events.
a) Customers / Messages arrive randomly (which we call Poisson arrivals)
b) Customers / Message wait in queue for getting processed.
c) They get processed. ( called exponential service times)
d) They leave the system ( our system memory allocs are put back to pool. )

Of-course there are certain systems where messages arrive at fixed time slots like burst ( Systems like ALOHA / Slotted ALOHA uses something like this. Other case may be where attacker is trying to send a burst message causing buffer overflows in network elements) or what not.

So having defined these events we can categorize our system ( software ) as M/M/N/K system.
First M denotes the arrival rate of messages into our system. Second M denotes the rate at which it is processed. N denotes the number of servers handling our messages and K denotes the system capacity.

Why do we have two M's ?
Aren't the messages going to be served in the order they arrive?

No. Most systems have some thing called classes similar to IntServ/DiffServ model where customers paying more are grouped into higher classes and have better experience while those pay low have lower experience comparatively.

Also consider the case where server is bombarded with lots of requests it can't handle. What can server do now ?

It can delay the person entering the system through some protocols like Stateless Cookie protocol
i.e if some one sends INVITE message in case of SIP protocol or SYN message in case of TCP the server handling it can send a reply asking him to solve a hardest puzzle/ calculating a hash which would take sufficient amount of time for the sender to compute and reply. In the mean time server clears the messages in its queue and is ready to accept new connections.

So we are deliberately delaying the customer entering the system. This happens with IRCTC [ Indian Railway Reservation Systems Servers :) ]

So coming back to our M/M/N/K model, this model has a set of predefined formulas through which
one can arrive at various metrics of system like throughput, delay, waiting time of customer(message), time spent by the message in the system etc.

Using these values along with the system parameters one can arrive at the theoretical bench-mark of a system and then use it to compare with practical data of our version1 software.
 
Let us conclude it for now. Other factors would be posted gradually.
Hope you got something out of this post.
Please post comments/feedback (if any).

No comments:

Post a Comment