Contribution 15

From WikiContent

(Difference between revisions)
Jump to: navigation, search
Line 1: Line 1:
== Inter-Process Communication Drives Response Time ==
== Inter-Process Communication Drives Response Time ==
-
''(note: the current revision of this page, and the following text, is serving as a placeholder, to be further elaborated very soon)''
+
It is widely recognized that response time performance is a critical factor in the usability of software. Few things are as frustrating, in the context of our busy lives, as waiting excessively for some software system to respond to a stimulus we’ve given it, especially when our interaction with the software involves repeated pairs of stimulus and slow response. We feel as if the software’s poor performance is wasting our time and affecting our productivity.
-
The number of inter-process communications in response to a stimulus usually drives response time.
+
However the causes of poor response time performance are less appreciated, especially in modern applications. Even recent Application Performance Management literature, for example, still discusses the choice of collection data structure and sorting algorithm, despite decades of Moore’s Law effect since that choice might have dominated application performance characteristics.
 +
 
 +
My experience over the past 15 years of architecting multi-tier enterprise applications has repeatedly been that ''the number of inter-process communications'' (IPCs) in response to a stimulus is the primary driver of response time performance.
 +
 
 +
While there can sometimes be other causes in isolation or in the aggregate, it is intuitively obvious that the number of inter-process communications will usually dominate. Each inter-process communication contributes some non-negligible latency to the overall response time, and these individual latency contributions sum up in the overall response time, especially when they are incurred sequentially.
 +
 
 +
A prime example is the phenomenon of “ripple loading” [Fowler PoEAA p.202] in an application using object-relational mapping. Ripple loading refers to the sequential execution of many database calls to select the data needed for building a graph of objects in the database client process. When the database client process is a middle-tier application server rendering a web-based user interface page, these database calls are likely executed sequentially in a single thread, and their individual latencies contribute to the overall response time. Even if each database call takes only 10ms, a page requiring 1000 calls (which is not uncommon) will exhibit at least a 10-second response time.
 +
 
 +
Database interaction is but one example of inter-process communication. Others include web service invocation, HTTP requests from a web browser, distributed object invocation, request-reply messaging, and data grid interaction over custom network protocols. The more complex an application’s response to a stimulus, in terms of the number of inter-process communications (of whatever kind) are involved in the response, the greater the response time will be.
 +
 
 +
Thus application architects need to be mindful of the ''ratio'' of inter-process communications per stimulus in their application architectures. This metric, in my experience, is the prime determinant of response time performance. When I have had to analyze and optimize the performance of specific poorly-performing application use cases, I have often found IPC-to-stimulus ratios in the thousands-to-one.
 +
 
 +
The strategies for reducing the ratio of inter-process communications per stimulus are relatively few, and relatively obvious or well known. One is to apply the principle of parsimony, by optimizing the interface between processes so that exactly the right data for the purpose at hand is exchanged with the minimum amount of interaction. Another is to parallelize the inter-process communications where possible, so that the overall response time becomes driven mainly by the longest-latency IPC. And a third is to cache the results of previous IPCs, so that future IPCs may be avoided by hitting local cache instead.
 +
 
 +
That the number of inter-process communications in response to a stimulus is the dominant factor in response time performance seems intuitively obvious, and has been repeatedly demonstrated in my experience with enterprise application performance analysis. But I find it necessary and beneficial to state it here in axiomatic form, as I continue to encounter application architects who have yet to appreciate this lesson, to the detriment of our profession.
By [[Randy.stafford|Randy Stafford]]
By [[Randy.stafford|Randy Stafford]]

Revision as of 22:36, 16 June 2008

Inter-Process Communication Drives Response Time

It is widely recognized that response time performance is a critical factor in the usability of software. Few things are as frustrating, in the context of our busy lives, as waiting excessively for some software system to respond to a stimulus we’ve given it, especially when our interaction with the software involves repeated pairs of stimulus and slow response. We feel as if the software’s poor performance is wasting our time and affecting our productivity.

However the causes of poor response time performance are less appreciated, especially in modern applications. Even recent Application Performance Management literature, for example, still discusses the choice of collection data structure and sorting algorithm, despite decades of Moore’s Law effect since that choice might have dominated application performance characteristics.

My experience over the past 15 years of architecting multi-tier enterprise applications has repeatedly been that the number of inter-process communications (IPCs) in response to a stimulus is the primary driver of response time performance.

While there can sometimes be other causes in isolation or in the aggregate, it is intuitively obvious that the number of inter-process communications will usually dominate. Each inter-process communication contributes some non-negligible latency to the overall response time, and these individual latency contributions sum up in the overall response time, especially when they are incurred sequentially.

A prime example is the phenomenon of “ripple loading” [Fowler PoEAA p.202] in an application using object-relational mapping. Ripple loading refers to the sequential execution of many database calls to select the data needed for building a graph of objects in the database client process. When the database client process is a middle-tier application server rendering a web-based user interface page, these database calls are likely executed sequentially in a single thread, and their individual latencies contribute to the overall response time. Even if each database call takes only 10ms, a page requiring 1000 calls (which is not uncommon) will exhibit at least a 10-second response time.

Database interaction is but one example of inter-process communication. Others include web service invocation, HTTP requests from a web browser, distributed object invocation, request-reply messaging, and data grid interaction over custom network protocols. The more complex an application’s response to a stimulus, in terms of the number of inter-process communications (of whatever kind) are involved in the response, the greater the response time will be.

Thus application architects need to be mindful of the ratio of inter-process communications per stimulus in their application architectures. This metric, in my experience, is the prime determinant of response time performance. When I have had to analyze and optimize the performance of specific poorly-performing application use cases, I have often found IPC-to-stimulus ratios in the thousands-to-one.

The strategies for reducing the ratio of inter-process communications per stimulus are relatively few, and relatively obvious or well known. One is to apply the principle of parsimony, by optimizing the interface between processes so that exactly the right data for the purpose at hand is exchanged with the minimum amount of interaction. Another is to parallelize the inter-process communications where possible, so that the overall response time becomes driven mainly by the longest-latency IPC. And a third is to cache the results of previous IPCs, so that future IPCs may be avoided by hitting local cache instead.

That the number of inter-process communications in response to a stimulus is the dominant factor in response time performance seems intuitively obvious, and has been repeatedly demonstrated in my experience with enterprise application performance analysis. But I find it necessary and beneficial to state it here in axiomatic form, as I continue to encounter application architects who have yet to appreciate this lesson, to the detriment of our profession.

By Randy Stafford

This work is licensed under a Creative Commons Attribution 3

Back to 97 Things Every Software Architect Should Know home page

Personal tools