Skip to content

Premature optimization is the root of all evil - not only in the Agile world

October 9, 2008 by Przemysław Bielicki


Picture courtesy of gutter@flickr

I was just reading an excellent book by Josh Bloch, namely "Effective Java, Second Edition" and I was on the optimization subject when it happened. It was funny coincidence but I think it was just a sign for me to write this post.

It doesn't relate to the Agility in any way but it relates to the quality of software so it should be definitely published here. And it all started very innocently - from publishing blog post with the solution to some annoying problem.

In this post I will tell you how easily you can fall into really dangerous and ugly development problems starting optimizing your software too early. I hope you will like the story.

Start with the simplest possible solutions...

I've been reading "must-read" book for all Java developers, namely "Effective Java (2nd Edition)" by Josh Bloch and I was just reading "Optimize judiciously" chapter. In the same time I was doing some Java EE development and I encountered a problem with Struts2 file upload capabilities. I found a solution and posted it to my private blog: http://java2jee.blogspot.com/2008/09/solution-to-struts2-upload-file-err.... "This has nothing to do with the optimization", you may think - and I thought the same but it's wrong assumption.

After few days I received a comment to this post from anonymous user with an "optimized solution". The author of this post wanted to optimize this line of Java code:

  1. if (string.contains("the request was rejected because its size")) {
with this code:
  1. public static Pattern REJECTED_FILE_SIZE_PATTERN = Pattern.compile(".*reject.*size.*");
  2. ...
  3. if (REJECTED_FILE_SIZE_PATTERN.matcher(string).matches()) {

I always considered myself as a seasoned Java developer (hopefully it is still true :) but after receiving this comment I was quite worried. "Why I'm not using regular expressions to check strings? Isn't it much faster", I thought. I was even thinking: "Maybe it's time to become a manager? - my Java/technical knowledge is deteriorating..."

"But hey! I will not let it go like this" - I thought. I wrote a simple Java program to test the performance of both solutions:

  1. import java.util.regex.Pattern;
  2.  
  3. public class Test {
  4.   public static void main(String[] args) {
  5.     int count = 1000000;
  6.     Pattern p = Pattern.compile(".*reject.*size.*");
  7.     String matching = "the request was rejected because its size (1234) some other text";
  8.  
  9.     long start = System.currentTimeMillis();
  10.     for (int i = 0; i < count; i++) {
  11.       if (matching.contains("the request was rejected because its size")) {
  12.         // do nothing
  13.       }
  14.     }
  15.     System.out.printf("contains() matching: %dms%n", System.currentTimeMillis() - start);
  16.    
  17.     start = System.currentTimeMillis();
  18.     for (int i = 0; i < count; i++) {
  19.       if (p.matcher(matching).matches()) {
  20.         // do nothing
  21.       }
  22.     }
  23.     System.out.printf("matches() matching: %dms%n", System.currentTimeMillis() - start);
  24.   }
  25. }
On my machine the standard contains() solution is 50 to 80 times faster than the solution with regexp matcher. What a disastrous effect this could have when applied in the whole application! I can't even imagine.

When I took a look at the contains() method implementation I saw that it operates on the char array (i.e. underlying array that creates the String object). It is fast! And it is the simplest and the most obvious method to call in this situation. It even makes the code more readable and tangible than with the regexp matcher. I see only the advantages.

The conclusion is simple: DON'T OPTIMIZE YOUR CODE AND USE THE SIMPLEST POSSIBLE SOLUTIONS - THEY WORK!

... and stay with them

Joshua Bloch cites these guys:

More computing sins are committed in the name of efficiency (not necessarily achieving it) than for any other single reason - including blind stupidity. (William A. Wulf)

We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. (Donald E. Knuth)

We follow two rules in the matter of optimization:
   Rule 1. Don't do it.
   Rule 2 (for experts only). Don't do it yet - that is, not until you have a perfecly clear and unoptimized solution.
(M.A. Jackson)

What else I can add? Actually, nothing. I just showed that each of the quotes above is true on the real example.

To rephrase Joshua Bloch: Never focus on optimizing your software. If you write good and logically structured code your software will be probably optimized by itself. Use well known, standard libraries and use the most basic features that meet your requirements - the optimization and quality will follow.

Do you have similar adventures with sub-optimal solutions? Maybe you disagree with me? I would gladly read your opinions.

About the Author: Przemysław graduated from Gdańsk University of Technology in 2004 having specialized in Distributed Information Systems. He worked in Lufthansa Systems, Intel Corporation in the past where he developed complex IT solutions in many Java-related technologies. In professional life he is a real Java expert holding couple of Sun Java certificates (Programmer, Developer, Web Developer) and Certified Scrum Master, of course.

Przemysław is a regular contributor to AgileSoftwareDevelopment.com and the author of "From Java to Java EE" blog. He now works as a Software Craftsman in an international company that is the leading Global Distribution System (GDS) and the biggest processor of travel bookings in the world. Contact Przemysław

Comments

Sub-Optimal is often easier

October 9, 2008 by Kevin Rutherford (not verified), 4 years 32 weeks ago
Comment id: 1897

I'm gonna get flamed for this, but here goes...

One example of premature optimisation that happens on many many projects is using a relational database for persistent storage. I strongly believe most applications would be better designed using flat files instead. Sticking in some SQL or some ActiveRecord is easy and comfortable, and often done very early in development. And when challenged, the defence I hear most often is one of performance.

Buck the trend -- make SQL a last resort :)

SQL can be the simpelst solution

October 9, 2008 by Artem, 4 years 32 weeks ago
Comment id: 1898

IMHO, nowadays there are so many developers with the basics of SQL hardwired into their brains, that at times, SQL is indeed the simplest option for them. Though it can be mentally difficult to optimize into a flat file solution later :)

I strongly disagree

October 9, 2008 by pbielicki, 4 years 32 weeks ago
Comment id: 1899

If you want to use flat files use HSQL DB or similar (see the performance data). I treat SQL as an interface to access the data - you can even access messages from the messaging brokers (e.g. JMS) using SQL - why not?

If you want to use flat file you will get stuck in s**t. I did it not once but last time I did it was when I was developing solution for the Sun Certified Java Developer. And I was struggling with it because I had to write my very own database mechanism - do you think it's an optimal solution? I spent 80% of development time on the database mechanism instead of developing the business logic.

Files can be optimal in specific cases - that's for sure - but it strongly depends on what you are going to deliver. It's not an optimal solution for everything, just like SQL.

Very bad example.

October 9, 2008 by brazzy (not verified), 4 years 32 weeks ago
Comment id: 1900

I really don't think the guy who suggested the regexp solution was trying to optimize for speed, but for flexibility - if the message you're looking for changes in any way, the contains() solution fails, while the regexp solution will probably still work.

Also, the performance test is completely meaningless because it compiles the pattern in every iteration, which is just dumb - the whole point of putting the pattern into a static variable is to compile it only once. I strongly suspect that without that factor, the regexp solution is exactly as fast (if not faster) than the contains() solution.

Re: Very bad example.

October 9, 2008 by pbielicki, 4 years 32 weeks ago
Comment id: 1901

If you are so smart why don't you provide any example? Did you notice that the pattern was compiled BEFORE the test? No? - so, it is compiled before the test, not in each iteration.

if the message you're looking for changes in any way, the contains() solution fails, while the regexp solution will probably still work. - blah blah blah - probably will work, probably will not. I don't care - I am the owner of the code and can change it.

I prefer "inflexible" (which is not true in fact) solution that is 80 times faster taking into account that this action will be heavily used. And the regexp solution is not more flexible in any way in this case.

maybe i'm blind or sth....

October 10, 2008 by grapkulec (not verified), 4 years 31 weeks ago
Comment id: 1904

shouldn't variable "start" be set before each loop? you have it set before "contains() matching" loop, but not before regexp loop. i pretty sure that this way you made impossible to get smaller measures for second loop. but maybe i'm blind or sth...

You are absolutely right!

October 10, 2008 by pbielicki, 4 years 31 weeks ago
Comment id: 1905

You are absolutely right! Thanks for finding this typo. BTW. it doesn't change the results :)

hmm it doesn't? i think it

October 10, 2008 by grapkulec (not verified), 4 years 31 weeks ago
Comment id: 1906

hmm it doesn't? i think it should decrease difference between results for each loop and regexp wuouldn't seem soooo slow :)

I'm bored with answering such

October 10, 2008 by pbielicki, 4 years 31 weeks ago
Comment id: 1908

I'm bored with answering such comments. Before you write something just copy-paste the code and run it! I posted the code not to discuss it but to show how it works and what are the results. If you don't believe me (you don't have to) just start this Java program - it will not cheat on you!!!

Cheers!

PS. Here are results from my machine:

  1. contains() matching: 156ms
  2. matches() matching: 8467ms

gee, i'm sorry to comment the

October 13, 2008 by grapkulec (not verified), 4 years 31 weeks ago
Comment id: 1909

gee, i'm sorry to comment the wrong way. never happen again, i promise

Loading huge amount of data in memory at startup

October 13, 2008 by Thomas Eyde (not verified), 4 years 31 weeks ago
Comment id: 1910

A customer I worked with some time ago, had this idea that loading static data at startup would be most efficient. This is an ASP.NET application, and the data are loaded as Datasets. I don't remember the actual size in MB, but we are talking about 400 000+ rows.

Their idea was probably something like RAM is fast, disk is slow, data is static, let's load it once and for all. RAM is cheap, anyway.

Problem was, due to missing abstractions and encapsulation, all code has direct access to these datasets, and looping them happened all over the place. That included nested and circular loops. The net effect was that querying were extremely slow. If they wanted an in-memory database, they should have bought one.

One page alone required 16 million field lookups. No way any web page require that amount of data.

The quick-fix? I cached the already cached data. Ironic, isn't it?

In my experience, there is no

October 14, 2008 by Anay Kamat (not verified), 4 years 31 weeks ago
Comment id: 1911

In my experience, there is no one perfect solution. As it is said, "There is no silver bullet". Determining which is the best solution depends entirely on situation.

For example, it won't be a good idea to use contains method to determine if the string has a specific pattern.

In case of data persistance, if you are using a flat file, you will need to make sure to abstract the operations on that data so that you can easily integrate it with DBMS if required.

Excellent point Anay

October 14, 2008 by pbielicki, 4 years 31 weeks ago
Comment id: 1912

Excellent point Anay - I wouldn't dare using contains() method to search for specific pattern. And there is no one perfect solution to everything. Well, there is a set of perfect or at least optimal solutions that are commonly named "common sense" or "based on experience" :) but that's another story.

Thanks for your comment.

YAGNI!

December 19, 2008 by Anonymous (not verified), 4 years 21 weeks ago
Comment id: 2137

YAGNI!

Optimization

February 9, 2009 by Anonymous (not verified), 4 years 14 weeks ago
Comment id: 2225

Sorry, I think this is a good example of a bad example. It is foolish to ever think that optimizing a simple string compare envolves moving from a simple string compare to a complicated regex. A real example of optimization is to move from the complicated regex to a simple string compare. So I am sorry that your example does not match what you are trying to say. I also disagree whole heartedly with the idea of never worring about optimization until you run into a problem. That has given us far too many problems to count. If you want to generalize, I think a better engineering practive would be to follow something like the eighty-twenty rule. Basically don't spend an hour trying to optimized code paths that are not hit 80% of the time, but please do spend that hour with code that will be hit a lot.

I would submit that Agile

May 23, 2012 by JennyH8208 (not verified), 51 weeks 5 days ago
Comment id: 22507

I would submit that Agile Atlanta is the broader group. We focus on all things Agile. including Scrum. XP. Kanban. Lean. etc. Development is a small subset with most topics revolving around project management. testing. and Agile adoption strategies. rio b and you

Thanks for the great comment.

May 23, 2012 by JennyH8209 (not verified), 51 weeks 5 days ago
Comment id: 22508

Thanks for the great comment. I like the analogy with cooks (I LOVE to cook). It could also apply to painting before you can go and paint original and innovative pieces like Picasso. you have to totally master the basics. That's the ideal scenario. numéro rio

Using the Kupu Editor

December 10, 2012 by nrg-dseing.ru (not verified), 22 weeks 6 days ago
Comment id: 24978

By WebOsPublisher

Parental Icons — Extra Texture
Reveries.com - HubMagazine.com - MarketingHub.com - ExtraTexture.com
Subscribe via RSS
An eclectic collection of marketing-related news headlines. --
Parental Icons
by --
Dads in shorts, stylish moms on hot new Web sites.
0 comments
There are no comments yet...
Kick things off by filling out the form below.
You must log in to post a comment.
Search:
Loading...
Loading...
Bookmarks
MarketingHub
Reveries.com
The Hub Magazine
Sponsors
Acosta Sales $ Marketing
Catapult Marketing
G2 USA
Henry Rak Consulting
Hoyt $ Company
IIR
Insight Out of Chaos
Integrated Marketing Services
JWT/Ogilvy Action
Landor Associates
Marketing Drive
Mars Advertising
McGuinn.com
MineTech
OnRequest Images
RPM Connect
Ryan Partnership
SolutionSet
TracyLocke
University of Tennessee
Vesta Retail Networks
Young $ Rubicam Group
Subscriptions
Subscribe to Cool News of the Day
Subscribe to The Hub Magazine
Site Menu
Log in
Entries RSS
Comments RSS
WordPress.org
Categories
Select Category
Advertising
Africa
Agencies
Analytics
Art
Asia
Automotive
Books
Celebrities
Charity
CMOs
Companies
Consumer Behavior
Consumer Electronics
Cool News
Digital
Direct Marketing
Economics
Education
Entertainment
Environment
Europe
Fashion
Financial Services
Food/Beverage
Global Marketing
Health/Beauty
Health/Fitness
Identity
India
Innovation
Insights
Kids
Litigation
Loyalty
Luxury
Marketing
Media
Men
Mexico
Middle East
Military
Movies
Multicultural
Music
New Products
News Flash
Packaged Goods
Packaging
Pharmaceuticals
Politics
Popular Culture
Promotions
Publishing
Quirky
Radio
Religion
Restaurants
Retail
Science
Shopper Marketing
Sports
Strategies
Technology
Telecom
Television
The Hub
Toys
Transportation
Travel/Leisure
Videogames
Women
Archives
Select Month December 2012
November 2012
October 2012
September 2012
August 2012
July 2012
June 2012
May 2012
April 2012
March 2012
February 2012
January 2012
December 2011
November 2011
October 2011
September 2011
August 2011
July 2011
June 2011
May 2011
April 2011
March 2011
February 2011
January 2011
December 2010
November 2010
October 2010
September 2010
August 2010
July 2010
June 2010
May 2010
April 2010
March 2010
February 2010
January 2010
December 2009
November 2009
October 2009
September 2009
August 2009
July 2009
June 2009
May 2009
April 2009
March 2009
February 2009
January 2009
December 2008
November 2008
October 2008
September 2008
August 2008
July 2008
June 2008
May 2008
April 2008
March 2008
February 2008
January 2008
December 2007
November 2007
October 2007
September 2007
August 2007
July 2007
June 2007
May 2007
April 2007
March 2007
February 2007
January 2007
December 2006
November 2006
October 2006
September 2006
August 2006
July 2006
June 2006
May 2006
April 2006
March 2006
February 2006
January 2006
December 2005
November 2005
October 2005
September 2005
Reveries.com - HubMagazine.com - MarketingHub.com - ExtraTexture.com

Post new comment

The content of this field is kept private and will not be shown publicly.
  • Allowed HTML tags: <a> <b> <i> <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd> <img> <br> <blockquote>
  • Lines and paragraphs break automatically.
  • Web page addresses and e-mail addresses turn into links automatically.
  • You can enable syntax highlighting of source code with the following tags: <code>, <blockcode>. Beside the tag style "<foo>" it is also possible to use "[foo]".

More information about formatting options

By submitting this form, you accept the Mollom privacy policy.

Best of AgileSoftwareDevelopment.com