Premature optimization is the root of all evil - not only in the Agile world

October 9, 2008 by Przemysław Bielicki

Picture courtesy of gutter@flickr

I was just reading an excellent book by Josh Bloch, namely "Effective Java, Second Edition" and I was on the optimization subject when it happened. It was funny coincidence but I think it was just a sign for me to write this post.

It doesn't relate to the Agility in any way but it relates to the quality of software so it should be definitely published here. And it all started very innocently - from publishing blog post with the solution to some annoying problem.

In this post I will tell you how easily you can fall into really dangerous and ugly development problems starting optimizing your software too early. I hope you will like the story.

Start with the simplest possible solutions...

I've been reading "must-read" book for all Java developers, namely "Effective Java (2nd Edition)" by Josh Bloch and I was just reading "Optimize judiciously" chapter. In the same time I was doing some Java EE development and I encountered a problem with Struts2 file upload capabilities. I found a solution and posted it to my private blog: "This has nothing to do with the optimization", you may think - and I thought the same but it's wrong assumption.

After few days I received a comment to this post from anonymous user with an "optimized solution". The author of this post wanted to optimize this line of Java code:

  1. if (string.contains("the request was rejected because its size")) {
with this code:
  1. public static Pattern REJECTED_FILE_SIZE_PATTERN = Pattern.compile(".*reject.*size.*");
  2. ...
  3. if (REJECTED_FILE_SIZE_PATTERN.matcher(string).matches()) {

I always considered myself as a seasoned Java developer (hopefully it is still true :) but after receiving this comment I was quite worried. "Why I'm not using regular expressions to check strings? Isn't it much faster", I thought. I was even thinking: "Maybe it's time to become a manager? - my Java/technical knowledge is deteriorating..."

"But hey! I will not let it go like this" - I thought. I wrote a simple Java program to test the performance of both solutions:

  1. import java.util.regex.Pattern;
  3. public class Test {
  4.   public static void main(String[] args) {
  5.     int count = 1000000;
  6.     Pattern p = Pattern.compile(".*reject.*size.*");
  7.     String matching = "the request was rejected because its size (1234) some other text";
  9.     long start = System.currentTimeMillis();
  10.     for (int i = 0; i < count; i++) {
  11.       if (matching.contains("the request was rejected because its size")) {
  12.         // do nothing
  13.       }
  14.     }
  15.     System.out.printf("contains() matching: %dms%n", System.currentTimeMillis() - start);
  17.     start = System.currentTimeMillis();
  18.     for (int i = 0; i < count; i++) {
  19.       if (p.matcher(matching).matches()) {
  20.         // do nothing
  21.       }
  22.     }
  23.     System.out.printf("matches() matching: %dms%n", System.currentTimeMillis() - start);
  24.   }
  25. }
On my machine the standard contains() solution is 50 to 80 times faster than the solution with regexp matcher. What a disastrous effect this could have when applied in the whole application! I can't even imagine.

When I took a look at the contains() method implementation I saw that it operates on the char array (i.e. underlying array that creates the String object). It is fast! And it is the simplest and the most obvious method to call in this situation. It even makes the code more readable and tangible than with the regexp matcher. I see only the advantages.


... and stay with them

Joshua Bloch cites these guys:

More computing sins are committed in the name of efficiency (not necessarily achieving it) than for any other single reason - including blind stupidity. (William A. Wulf)

We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. (Donald E. Knuth)

We follow two rules in the matter of optimization:
   Rule 1. Don't do it.
   Rule 2 (for experts only). Don't do it yet - that is, not until you have a perfecly clear and unoptimized solution.
(M.A. Jackson)

What else I can add? Actually, nothing. I just showed that each of the quotes above is true on the real example.

To rephrase Joshua Bloch: Never focus on optimizing your software. If you write good and logically structured code your software will be probably optimized by itself. Use well known, standard libraries and use the most basic features that meet your requirements - the optimization and quality will follow.

Do you have similar adventures with sub-optimal solutions? Maybe you disagree with me? I would gladly read your opinions.

About the Author: Przemysław graduated from Gdańsk University of Technology in 2004 having specialized in Distributed Information Systems. He worked in Lufthansa Systems, Intel Corporation in the past where he developed complex IT solutions in many Java-related technologies. In professional life he is a real Java expert holding couple of Sun Java certificates (Programmer, Developer, Web Developer) and Certified Scrum Master, of course.

Przemysław is a regular contributor to and the author of "From Java to Java EE" blog. He now works as a Software Craftsman in an international company that is the leading Global Distribution System (GDS) and the biggest processor of travel bookings in the world. Contact Przemysław


Sub-Optimal is often easier

October 9, 2008 by Kevin Rutherford
I'm gonna get flamed for this, but here goes...

One example of premature optimisation that happens on many many projects is using a relational database for persistent storage. I strongly believe most applications would be better designed using flat files instead. Sticking in some SQL or some ActiveRecord is easy and comfortable, and often done very early in development. And when challenged, the defence I hear most often is one of performance.

Buck the trend -- make SQL a last resort :)

SQL can be the simpelst solution

October 9, 2008 by Artem
IMHO, nowadays there are so many developers with the basics of SQL hardwired into their brains, that at times, SQL is indeed the simplest option for them. Though it can be mentally difficult to optimize into a flat file solution later :)

I strongly disagree

October 9, 2008 by pbielicki
If you want to use flat files use HSQL DB or similar (see the performance data). I treat SQL as an interface to access the data - you can even access messages from the messaging brokers (e.g. JMS) using SQL - why not?

If you want to use flat file you will get stuck in s**t. I did it not once but last time I did it was when I was developing solution for the Sun Certified Java Developer. And I was struggling with it because I had to write my very own database mechanism - do you think it's an optimal solution? I spent 80% of development time on the database mechanism instead of developing the business logic.

Files can be optimal in specific cases - that's for sure - but it strongly depends on what you are going to deliver. It's not an optimal solution for everything, just like SQL.

Very bad example.

October 9, 2008 by brazzy
I really don't think the guy who suggested the regexp solution was trying to optimize for speed, but for flexibility - if the message you're looking for changes in any way, the contains() solution fails, while the regexp solution will probably still work.

Also, the performance test is completely meaningless because it compiles the pattern in every iteration, which is just dumb - the whole point of putting the pattern into a static variable is to compile it only once. I strongly suspect that without that factor, the regexp solution is exactly as fast (if not faster) than the contains() solution.

Re: Very bad example.

October 9, 2008 by pbielicki
If you are so smart why don't you provide any example? Did you notice that the pattern was compiled BEFORE the test? No? - so, it is compiled before the test, not in each iteration.

if the message you're looking for changes in any way, the contains() solution fails, while the regexp solution will probably still work. - blah blah blah - probably will work, probably will not. I don't care - I am the owner of the code and can change it.

I prefer "inflexible" (which is not true in fact) solution that is 80 times faster taking into account that this action will be heavily used. And the regexp solution is not more flexible in any way in this case.

maybe i'm blind or sth....

October 10, 2008 by grapkulec
shouldn't variable "start" be set before each loop? you have it set before "contains() matching" loop, but not before regexp loop. i pretty sure that this way you made impossible to get smaller measures for second loop. but maybe i'm blind or sth...

You are absolutely right!

October 10, 2008 by pbielicki
You are absolutely right! Thanks for finding this typo. BTW. it doesn't change the results :)

hmm it doesn't? i think it

October 10, 2008 by grapkulec
hmm it doesn't? i think it should decrease difference between results for each loop and regexp wuouldn't seem soooo slow :)

I'm bored with answering such

October 10, 2008 by pbielicki
I'm bored with answering such comments. Before you write something just copy-paste the code and run it! I posted the code not to discuss it but to show how it works and what are the results. If you don't believe me (you don't have to) just start this Java program - it will not cheat on you!!!


PS. Here are results from my machine:

  1. contains() matching: 156ms
  2. matches() matching: 8467ms

gee, i'm sorry to comment the

October 13, 2008 by grapkulec
gee, i'm sorry to comment the wrong way. never happen again, i promise

Loading huge amount of data in memory at startup

October 13, 2008 by Thomas Eyde
A customer I worked with some time ago, had this idea that loading static data at startup would be most efficient. This is an ASP.NET application, and the data are loaded as Datasets. I don't remember the actual size in MB, but we are talking about 400 000+ rows.

Their idea was probably something like RAM is fast, disk is slow, data is static, let's load it once and for all. RAM is cheap, anyway.

Problem was, due to missing abstractions and encapsulation, all code has direct access to these datasets, and looping them happened all over the place. That included nested and circular loops. The net effect was that querying were extremely slow. If they wanted an in-memory database, they should have bought one.

One page alone required 16 million field lookups. No way any web page require that amount of data.

The quick-fix? I cached the already cached data. Ironic, isn't it?

In my experience, there is no

October 14, 2008 by Anay Kamat
In my experience, there is no one perfect solution. As it is said, "There is no silver bullet". Determining which is the best solution depends entirely on situation.

For example, it won't be a good idea to use contains method to determine if the string has a specific pattern.

In case of data persistance, if you are using a flat file, you will need to make sure to abstract the operations on that data so that you can easily integrate it with DBMS if required.

Excellent point Anay

October 14, 2008 by pbielicki
Excellent point Anay - I wouldn't dare using contains() method to search for specific pattern. And there is no one perfect solution to everything. Well, there is a set of perfect or at least optimal solutions that are commonly named "common sense" or "based on experience" :) but that's another story.

Thanks for your comment.


February 9, 2009 by Anonymous

I would submit that Agile

May 23, 2012 by JennyH8208
I would submit that Agile Atlanta is the broader group. We focus on all things Agile. including Scrum. XP. Kanban. Lean. etc. Development is a small subset with most topics revolving around project management. testing. and Agile adoption strategies. rio b and you

Thanks for the great comment.

May 23, 2012 by JennyH8209
Thanks for the great comment. I like the analogy with cooks (I LOVE to cook). It could also apply to painting before you can go and paint original and innovative pieces like Picasso. you have to totally master the basics. That's the ideal scenario. numéro rio

December 10, 2012
May 27, 2013 by tiger (not verified), 2 years 14 weeks ago
Measurement is fundamental to

August 8, 2013 by damental (not verified), 2 years 3 weeks ago
Measurement is fundamental to any engineering discipline and the planning of a software creation work is no different.

I am truly inspired by this

February 5, 2014 by Bandar Bola
I am truly inspired by this online journal! Extremely clear clarification of issues is given and it is open to every living soul. I have perused your post, truly you have given this extraordinary informative data about it. Agen Ibcbet

This bus adventure seem to be

February 26, 2014 by Anonymous
I'll highlight several

March 7, 2014 by minar
I'll highlight several seminar topics in the coming posts.

I'll highlight several

March 7, 2014 by minar
I'll highlight several seminar topics in the coming posts.

Excellent Softy maker post. I

April 17, 2014 by leesa
This article Organics bio

April 17, 2014 by leesa
July 7, 2014 by epoksi zemin kaplama
And it all started very innocently - from publishing blog post with the solution to some annoying problem.
September 4, 2014 by code development services
For example, it won't be a good idea to use contains method to determine if the string has a specific pattern.

This is my first time i visit

October 1, 2014 by Leman48
This is my first time i visit here. I found so many entertaining stuff in your blog, especially its discussion. From the tons of comments on your posts, I guess I am not the only one having all the enjoyment here! Keep up the excellent work.

I am always searching online

October 6, 2014 by Joshua
I am always searching online for articles that can help me. A very awesome blog post. There is obviously a lot to know about this. We are really grateful for your blog post.

Until now I did not notice

October 18, 2014 by Anonymous
I am truly inspired by this

October 22, 2014 by kimcil cantik
thanks a lot

November 3, 2014 by Anonymous
January 8, 2015 by andres izal
I love your website. I wanted to appreciate it for this good read!! I definitely enjoying each little bit with the article and I've you bookmarked your site to check out new stuff an individual post.

yeah , i think

January 17, 2015 by Anonymous
Very awesome post , i am

January 23, 2015 by arts de la table
Hello, i am glad to read the

February 9, 2015 by heathkinfe
Hello, i am glad to read the whole content of this blog and am very excited and happy to say that the webmaster has done a very good job here to put all the information content and information at one place. Berita Bola

imi place dvs. mesaje

August 24, 2015 by Anonymous
