I was just reading an excellent book by Josh Bloch, namely "Effective Java, Second Edition" and I was on the optimization subject when it happened. It was funny coincidence but I think it was just a sign for me to write this post.
It doesn't relate to the Agility in any way but it relates to the quality of software so it should be definitely published here. And it all started very innocently - from publishing blog post with the solution to some annoying problem.
In this post I will tell you how easily you can fall into really dangerous and ugly development problems starting optimizing your software too early. I hope you will like the story.
I've been reading "must-read" book for all Java developers, namely "Effective Java (2nd Edition)" by Josh Bloch and I was just reading "Optimize judiciously" chapter. In the same time I was doing some Java EE development and I encountered a problem with Struts2 file upload capabilities. I found a solution and posted it to my private blog: http://java2jee.blogspot.com/2008/09/solution-to-struts2-upload-file-err.... "This has nothing to do with the optimization", you may think - and I thought the same but it's wrong assumption.
After few days I received a comment to this post from anonymous user with an "optimized solution". The author of this post wanted to optimize this line of Java code:
I always considered myself as a seasoned Java developer (hopefully it is still true :) but after receiving this comment I was quite worried. "Why I'm not using regular expressions to check strings? Isn't it much faster", I thought. I was even thinking: "Maybe it's time to become a manager? - my Java/technical knowledge is deteriorating..."
"But hey! I will not let it go like this" - I thought. I wrote a simple Java program to test the performance of both solutions:
When I took a look at the contains() method implementation I saw that it operates on the char array (i.e. underlying array that creates the String object). It is fast! And it is the simplest and the most obvious method to call in this situation. It even makes the code more readable and tangible than with the regexp matcher. I see only the advantages.
The conclusion is simple: DON'T OPTIMIZE YOUR CODE AND USE THE SIMPLEST POSSIBLE SOLUTIONS - THEY WORK!
Joshua Bloch cites these guys:
More computing sins are committed in the name of efficiency (not necessarily achieving it) than for any other single reason - including blind stupidity. (William A. Wulf)
We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. (Donald E. Knuth)
We follow two rules in the matter of optimization:
Rule 1. Don't do it.
Rule 2 (for experts only). Don't do it yet - that is, not until you have a perfecly clear and unoptimized solution.
(M.A. Jackson)
What else I can add? Actually, nothing. I just showed that each of the quotes above is true on the real example.
To rephrase Joshua Bloch: Never focus on optimizing your software. If you write good and logically structured code your software will be probably optimized by itself. Use well known, standard libraries and use the most basic features that meet your requirements - the optimization and quality will follow.
Do you have similar adventures with sub-optimal solutions? Maybe you disagree with me? I would gladly read your opinions.
Comments
Sub-Optimal is often easier
October 9, 2008 by Kevin Rutherford (not verified), 12 weeks 4 days ago
Comment id: 1897
I'm gonna get flamed for this, but here goes...
One example of premature optimisation that happens on many many projects is using a relational database for persistent storage. I strongly believe most applications would be better designed using flat files instead. Sticking in some SQL or some ActiveRecord is easy and comfortable, and often done very early in development. And when challenged, the defence I hear most often is one of performance.
Buck the trend -- make SQL a last resort :)
SQL can be the simpelst solution
October 9, 2008 by Artem, 12 weeks 4 days ago
Comment id: 1898
IMHO, nowadays there are so many developers with the basics of SQL hardwired into their brains, that at times, SQL is indeed the simplest option for them. Though it can be mentally difficult to optimize into a flat file solution later :)
I strongly disagree
October 9, 2008 by pbielicki, 12 weeks 4 days ago
Comment id: 1899
If you want to use flat files use HSQL DB or similar (see the performance data). I treat SQL as an interface to access the data - you can even access messages from the messaging brokers (e.g. JMS) using SQL - why not?
If you want to use flat file you will get stuck in s**t. I did it not once but last time I did it was when I was developing solution for the Sun Certified Java Developer. And I was struggling with it because I had to write my very own database mechanism - do you think it's an optimal solution? I spent 80% of development time on the database mechanism instead of developing the business logic.
Files can be optimal in specific cases - that's for sure - but it strongly depends on what you are going to deliver. It's not an optimal solution for everything, just like SQL.
Very bad example.
October 9, 2008 by brazzy (not verified), 12 weeks 4 days ago
Comment id: 1900
I really don't think the guy who suggested the regexp solution was trying to optimize for speed, but for flexibility - if the message you're looking for changes in any way, the contains() solution fails, while the regexp solution will probably still work.
Also, the performance test is completely meaningless because it compiles the pattern in every iteration, which is just dumb - the whole point of putting the pattern into a static variable is to compile it only once. I strongly suspect that without that factor, the regexp solution is exactly as fast (if not faster) than the contains() solution.
Re: Very bad example.
October 9, 2008 by pbielicki, 12 weeks 4 days ago
Comment id: 1901
If you are so smart why don't you provide any example? Did you notice that the pattern was compiled BEFORE the test? No? - so, it is compiled before the test, not in each iteration.
if the message you're looking for changes in any way, the contains() solution fails, while the regexp solution will probably still work. - blah blah blah - probably will work, probably will not. I don't care - I am the owner of the code and can change it.
I prefer "inflexible" (which is not true in fact) solution that is 80 times faster taking into account that this action will be heavily used. And the regexp solution is not more flexible in any way in this case.
maybe i'm blind or sth....
October 10, 2008 by grapkulec (not verified), 12 weeks 3 days ago
Comment id: 1904
shouldn't variable "start" be set before each loop? you have it set before "contains() matching" loop, but not before regexp loop. i pretty sure that this way you made impossible to get smaller measures for second loop. but maybe i'm blind or sth...
You are absolutely right!
October 10, 2008 by pbielicki, 12 weeks 3 days ago
Comment id: 1905
You are absolutely right! Thanks for finding this typo. BTW. it doesn't change the results :)
hmm it doesn't? i think it
October 10, 2008 by grapkulec (not verified), 12 weeks 3 days ago
Comment id: 1906
hmm it doesn't? i think it should decrease difference between results for each loop and regexp wuouldn't seem soooo slow :)
I'm bored with answering such
October 10, 2008 by pbielicki, 12 weeks 3 days ago
Comment id: 1908
I'm bored with answering such comments. Before you write something just copy-paste the code and run it! I posted the code not to discuss it but to show how it works and what are the results. If you don't believe me (you don't have to) just start this Java program - it will not cheat on you!!!
Cheers!
PS. Here are results from my machine:
gee, i'm sorry to comment the
October 13, 2008 by grapkulec (not verified), 12 weeks 22 hours ago
Comment id: 1909
gee, i'm sorry to comment the wrong way. never happen again, i promise
Loading huge amount of data in memory at startup
October 13, 2008 by Thomas Eyde (not verified), 12 weeks 19 hours ago
Comment id: 1910
A customer I worked with some time ago, had this idea that loading static data at startup would be most efficient. This is an ASP.NET application, and the data are loaded as Datasets. I don't remember the actual size in MB, but we are talking about 400 000+ rows.
Their idea was probably something like RAM is fast, disk is slow, data is static, let's load it once and for all. RAM is cheap, anyway.
Problem was, due to missing abstractions and encapsulation, all code has direct access to these datasets, and looping them happened all over the place. That included nested and circular loops. The net effect was that querying were extremely slow. If they wanted an in-memory database, they should have bought one.
One page alone required 16 million field lookups. No way any web page require that amount of data.
The quick-fix? I cached the already cached data. Ironic, isn't it?
In my experience, there is no
October 14, 2008 by Anay Kamat (not verified), 11 weeks 6 days ago
Comment id: 1911
In my experience, there is no one perfect solution. As it is said, "There is no silver bullet". Determining which is the best solution depends entirely on situation.
For example, it won't be a good idea to use contains method to determine if the string has a specific pattern.
In case of data persistance, if you are using a flat file, you will need to make sure to abstract the operations on that data so that you can easily integrate it with DBMS if required.
Excellent point Anay
October 14, 2008 by pbielicki, 11 weeks 6 days ago
Comment id: 1912
Excellent point Anay - I wouldn't dare using contains() method to search for specific pattern. And there is no one perfect solution to everything. Well, there is a set of perfect or at least optimal solutions that are commonly named "common sense" or "based on experience" :) but that's another story.
Thanks for your comment.
YAGNI!
December 19, 2008 by Anonymous (not verified), 2 weeks 3 days ago
Comment id: 2137
YAGNI!
Post new comment