Thursday, May 18, 2006

Say Goodbye to your Relational Database

db40 Has Entered the Building

I have been using db4o a LOT lately, and I don't expect I shall return. I can say without question that it has increased my productivity by at least... 1 million times. Well probably not that much, but a heck of a lot. It definitely takes a bit of getting use to if you come from a relational database world, but consider never having to map your objects to a relational database. And how about never having to make the database at all! You just worry about your object model and db4o will take care of the persistence (long term storage).

Check out these examples to see just how easy it is.

Starting the Database Server

// openServer() takes the filename where you want your database stored and a port
ObjectServer server = Db4o.openServer("somefilename.yap", 1234);
// Now you can give access to particular logins
There are other ways of doing this: you can run it in non-server mode, but I don't recommend that since you'll probably just end up changing it down the road if your application is multi-threaded (and what application isn't these days?); you can also use an embedded server by using port zero [ObjectServer server = Db4o.openServer("somefilename.yap", 0);] and getting connections from the ObjectServer directly [ObjectContainer db = server.openClient();] which isn't too bad because it'll be faster, but it can't be networked.

When your program is ready to shutdown, you'll want to call server.close() to shut it down.

Getting a Connection

ObjectContainer db = Db4o.openClient("localhost",PORT,USER,PASSWORD);
// with the client, we can do all the fun stuff in the examples below
} finally {

Saving and Updating an Object

Doesn't get much easier than this.


This is equivalent to your SQL Select. For this example, lets say we saved a couple of objects that looked like this:
Contact contact = new Contact();
contact = new Contact();
Now to query for the contact named Travis, you do this:
Query q = db.query();
ObjectSet result = q.execute();
while (result.hasNext()) {
Contact c = (Contact);
System.out.println("Contact: " + c.getName());

The example above is a SODA query which as you can see, uses strings and can be built dynamically. db4o does have another method of querying called Native Queries. db4objects recommends Native Queries and it's for good reason. Native queries don't reference fields by strings, they're compile time checked, are object oriented and, most importantly, they are refactorable.

List <Contact> contacts = db.query(new Predicate<Contact>() {
public boolean match(Contact contact) {
return contact.getName().equals("Travis");



for (Contact contact: contacts) {

System.out.println("Contact: " + c.getName());


Deleting an Object



Now not everything is greener on the other side, there are still many little gotchas that may bite you, for instance:
  • No ID support like Hibernate / EJB so you can't let db4o rebind objects based on an id field in your objects
    • This is extremely apparent in a web based application where you have to be sure to reload your objects in every request, so they can be in your new db4o session (assuming session per request pattern). I have figured out ways to work around this though to make it automated and I'll try to share these with you in a later post.
  • Weak management GUI
    • If you're used to working with nice tools like the tools provided by major database vendors, the db4o Object Manager doesn't really compare. But hopefully with time and with more adopters, the Object Manager will get better and we'll see third party tools.
  • No joins
    • This isn't bad if your object model is not complex and you can have all your objects connected to each other in your object graph, but you have to be careful when your data set gets large and your object model is a complex graph with bidirectional relationships because you may end up loading your entire database! Or at least a good chunk of data that you may not always need. And this is a *huge* performance killer.
    • Now the obvious answer would be to break apart your object model which is easy to do in a relational database because you have id's and foreign keys. In db4o, if you break apart your model, there is no built in way to reference your disconnected objects.
    • It is possible to workaround this though, by using keys and ID's as fields in your objects and querying for the related objects when you need them.


In any case, Carl Rosenberger and the db4objects folks have done a great job with this open source object database and they are definitely the leaders in the space. The simplicity, performance, and productivity gains outweight and disadvantages by a long shot. I urge you to try it.

Also, I plan on sharing more of my experiences with db4o in the near future, along with code such as how to easily use it in a web application, how to partition, and how to scale.

Tags: | |

Installing Java 5 JDK and Tomcat on Ubuntu (using VMWare)

This document will walk you through installing Apache Tomcat on Ubuntu Linux using VMware for virtualization. Most of the steps apply even if you're not using VMware.

  1. Download and install VMWare on your server
  2. Download and install VMWare client on your workstation (if it is different than your server)
  3. Option 1: Prepackaged Ubuntu VM (Please note, this will NOT work on a headless server)
    1. Download Ubuntu VM from
    2. Open Ubuntu VM (user/pass is ubuntu/ubuntu) in VMWare client
  4. Option 2: New fresh install of Ubuntu (you have to do this if you have a headless server)
    1. Download Ubuntu ISO from
    2. Create new virtual machine in vmware, choose Linux -> Ubuntu
    3. Mount the cd drive for the virtual machine to your downloaded iso in the virtual machine settings
    4. Start virtual machine, this will ask if you want to install, so perform the full install (this is just a regular Ubuntu install)
  5. Now you should be in Ubuntu linux
  6. modify /etc/apt/sources.list (ex: run: sudo nano -w /etc/apt/sources.list)
    - Change the first section lines deb breezy main restricted
    to deb breezy main restricted universe multiverse
    You can also add universe multiverse to deb-src and do the same to the breezy-update lines too.
    (breezy will be dapper in 6.X versions)
  7. run: sudo apt-get update
  8. Install JDK
    1. Option 1:
      1. run: sudo apt-get install sun-java5-jdk
    2. Option 2: (if option 1 doesn't work)
      1. download jdk 1.5 from sun, the Self extracting linux version, .bin extension (NOT rpm)
      2. run: sudo apt-get java-package
      3. run: fakeroot jdk***.bin
      4. run: sudo update-alternatives --config java
        - select the j2sdk1.5-sun option
      5. run: java -version
        - just to make sure it's the new version
  9. add:
    export JAVA_HOME=/usr/lib/jvm/java-1.5.0-sun-
    if you used Option 1 for installing JDK above or:
    export JAVA_HOME=/usr/lib/j2sdk1.5-sun
    if option 2 was used
    to /home/ubuntu/.bashrc
  10. Open a new console window to continue with the rest of the steps
  11. Download tomcat 5.5 from
  12. extract tomcat (to a directory under your HOME directory is a good idea - /home/ubuntu/java/apache-tomcatXXX)
  13. go to tomcat directory/bin
  14. run: ./
  15. surf to http://yournewserversip:8080/
    1. You should see the Tomcat welcome page
If you want to run Tomcat as a service:

Other tips:
- If you want to use 7-zip which is nice for all compression/decompression tasks, run: "sudo apt-get install p7zip", then to extract anything, you just run "7za x myfile" (.zip or .tar or .tar.gz or .7z, etc)
- If you want an ssh server (sshd) on your new ubuntu box, run: "sudo apt-get install openssh-server". Apparently Ubuntu does not come with an ssh server installed out of the box.
- if you use nano, while you're editing .bashrc above, it's a good idea to add an alias for nano with the -w option like: alias nano='nano -w'
- VMware kicks a**, so be sure to try it (note: I have no affiliation with VMware, I just like good products)

If you have any issues with the steps above or want to add anything, please post your comment below.

Good night and good luck...

Friday, May 05, 2006

Normalize Schnormalize (aka Real-Time Data Warehousing)

Some applications need real-time reporting, ecommStats Web Analytics is one such app. ecommStats customers want to see what is happening on their website and they want to see it now.

A little background might be in order: In general, with enterprise software, you move and transform data from your Transactional Database (the one that is involved in the day to day work and storing every little piece of data) to your Data Warehouse so that you can generate reports faster and easier, which basically boils down to Business Intelligence. And this is generally a heavy process so it will run once per day, once per week, or whatever fits into the business needs and the gap must be longer than it takes to actually run the process (if the process takes 24 hours to run, it's probably not a good idea to run it every day).

But why wait until the night time (the night time is the right time) or the weekend to move all that data into a separate database when you can do simple preemptive operations on your database to give your users real-time data?

Normalization is great and all, but a hybrid approach can work wonders. The data is still normalized, but we have extra tables and extra columns that act as our data warehouse.

Lets take the Search Phrase report for instance, it is a time based report which shows you how many people have searched on a particular search phrase or keyword. In a normalized system, you would have your search phrase in one table and the request that came in in another table (this is seriously simplified to make a point):



Now you could sum up all the requests that used the phrase everytime someone wants the Search Phrase report, but do you want to make your users wait for this report? There could be thousands of words and thousands of requests for each word which quickly puts you into the millions of rows! The user might as well go have a coffee while they're waiting. And do you want your poor server to have to work so hard for this little report?

But by adding a single column, we can make the report almost instantaneous:


Now when we're showing that report, we don't even have to look at the SearchRequest table, so what used have to go through potentially millions of rows, now is just a few thousands rows (or however many search phrases you are looking at). Obviously this example won't give you the time based reporting, but using an extra time bucket table, with similar techniques can get the results you want.

And lets say you also wanted to show when the last time some searched for a particular phrase. You could look in the SearchRequest table, sort by date descending and then grab the first one, but this is time, disk and processor intense if the request table is large. So try simply adding a date column to the SearchPhrase table like above and update it when the request is made. Don't be afraid to duplicate data when the performance benefits can be substantial.

Now how much harder is this to implement in your code? It's actually very simple in most cases, usually just requiring an extra update. In the example above, when saveSearchRequest(SearchPhrase phrase) is called in the code, it will save the SearchRequest, and do a simple update on the requestCount column:
UPDATE SearchPhrase SET requestCount = requestCount + 1, lastRequestDate = sysdate where id = PHRASE_ID;

This one time update is insignificant when compared to the many times that queries will be run against your tables.

In conclusion, using these hybrid techniques for database design can reduce wait time significantly and you can report in near real-time, rather than making your users wait days for their data. The new columns and/or tables may take up more space, but space is cheap and peformance is not. It truly is a small price to pay when compared to the huge price of doing this repeatedly in a purely normalized database or a non-real-time huge bulk process to move the data into a data warehouse.

Tags: | | | |