Monday, September 18, 2006

Poor Man's Oracle Client

I generally despise having to install Oracle software because it's so big and bloated, it's scary. Most downloads won't fit on a typical CD, even their Oracle Express edition is bigger than Microsoft Windows! I don't know what Express means to them, but "huge" isn't the description I would think of.

But there is light at the end of the tunnel, Oracle now has Instant Client which is only 35MB! Wow! And all you have to do is unzip it. It includes OCI, OCCI, and JDBC drivers, but if you want to make it useful out of the box, you should get the SQL Plus package for Instant Client (722KB!) and unzip it into the same directory.  Put your tnsnames.ora into the directory as well if required.  Then run sqlplus.exe as you normally would and voila, you're done.

  1. Download Oracle Instant Client
  2. Download SQL*Plus for Instant Client (from same download page)
  3. Unzip Instant Client into a directory
    1. eg: c:\oracle_ic
  4. Unzip SQL*Plus into the same directory
  5. Put tnsnames.ora into the same directory (OPTIONAL)
  6. Run: sqlplus username/password@SID

Get SQL Explorer for a free JDBC based client if you want to connect with a visual client.

Tuesday, July 04, 2006

Db4oUtil (aka. The Easiest Way to Get Started With Db4o)

Want the quickest and easiest way to get started with db4o? Use the following Db4oUtil class. This class is very similar to the HibernateUtil class that you may be familiar with if you're a Hibernate user. It uses a thread local object so it's thread safe, makes multi-tier usage a no brainer and it's just plain simple to use.

The simple step-by-step for this tutorial:

  1. Copy the Db4oUtil.java code below into a new class
  2. Use the Db4oUtil methods in your code, samples below

That is it... seriously. You can optionally tweak the fields in Db4oUtil to your liking, like you'll want to change the YAPFILENAME field for each application you will use this with.

Db4oUtil.java

public class Db4oUtil {
// EDIT THESE SETTINGS
private static final String YAPFILENAME = "test.db4o.yap";
private static final int PORT = 0;
/*
If you want the server to be networked, change the port number above and uncomment the USER, PASSWORD lines below/
Then in getObjectServer, uncomment the objectServer.grantAccess line
*/
//private static final String USER = "username";
//private static final String PASSWORD = "password";

private static ObjectServer objectServer;

private static final ThreadLocal dbThreadLocal = new ThreadLocal();

public static ObjectContainer getObjectContainer() {
ObjectContainer oc = (ObjectContainer) dbThreadLocal.get();
if (oc == null || oc.ext().isClosed()) {
oc = getObjectServer().openClient();
dbThreadLocal.set(oc);
}
return oc;
}

public static void closeObjectContainer() {
ObjectContainer oc = (ObjectContainer) dbThreadLocal.get();
dbThreadLocal.set(null);
if (oc != null) oc.close();
}

public synchronized static ObjectServer getObjectServer() {
if (objectServer == null) {
objectServer = getObjectServerForFilename(YAPFILENAME, PORT);
// and give access
//objectServer.grantAccess(USER, PASSWORD);
}
return objectServer;
}


public static void shutdown() {
if (objectServer != null) {
objectServer.close();
}
}

public static ObjectServer getObjectServerForFilename(String yapfilename, int port) {
File parentDir = getDbDirectory();
File dbfile = new File(parentDir, yapfilename);

// for replication //////////////////////////
Db4o.configure().generateUUIDs(Integer.MAX_VALUE);
Db4o.configure().generateVersionNumbers(Integer.MAX_VALUE);

// other options
Db4o.configure().exceptionsOnNotStorable(true);
Db4o.configure().objectClass("java.math.BigDecimal").translate(new com.db4o.config.TSerializable());

// now open server
ObjectServer objectServer = Db4o.openServer(dbfile.getPath(), port);

return objectServer;
}

private static File getDbDirectory() {
// will store data files in {user.home}/db4o/data directory
String dbfile = System.getProperty("user.home") + "/db4o/data";
File f = new File(dbfile);
if (!f.exists()) {
f.mkdirs();
}
return f;
}
}
You can edit the fields at the top to modify your database filename, etc.

Using Db4oUtil

This is a simple code sample of usage:
// Get ObjectContainer - you can repetitively call this function in different methods without the need to pass the ObjectContainer around
ObjectContainer oc = Db4oUtil.getObjectContainer();

// YOUR CODE HERE - use the ObjectContainer and do all your stuff here

// Close ObjectContainer when done
Db4oUtil.closeDb();

// Close object server when completely exiting application
Db4oUtil.shutdown();
That's all there is to it!

Thursday, May 18, 2006

Say Goodbye to your Relational Database

db40 Has Entered the Building

I have been using db4o a LOT lately, and I don't expect I shall return. I can say without question that it has increased my productivity by at least... 1 million times. Well probably not that much, but a heck of a lot. It definitely takes a bit of getting use to if you come from a relational database world, but consider never having to map your objects to a relational database. And how about never having to make the database at all! You just worry about your object model and db4o will take care of the persistence (long term storage).

Check out these examples to see just how easy it is.

Starting the Database Server

// openServer() takes the filename where you want your database stored and a port
ObjectServer server = Db4o.openServer("somefilename.yap", 1234);
// Now you can give access to particular logins
server.grantAccess("username","password");
There are other ways of doing this: you can run it in non-server mode, but I don't recommend that since you'll probably just end up changing it down the road if your application is multi-threaded (and what application isn't these days?); you can also use an embedded server by using port zero [ObjectServer server = Db4o.openServer("somefilename.yap", 0);] and getting connections from the ObjectServer directly [ObjectContainer db = server.openClient();] which isn't too bad because it'll be faster, but it can't be networked.

When your program is ready to shutdown, you'll want to call server.close() to shut it down.

Getting a Connection

ObjectContainer db = Db4o.openClient("localhost",PORT,USER,PASSWORD);
try{
// with the client, we can do all the fun stuff in the examples below
// ALL EXAMPLES ARE ASSUMED TO BE PLUGGED IN HERE
} finally {
db.close();
}

Saving and Updating an Object

Doesn't get much easier than this.
db.set(myObject);

Query

This is equivalent to your SQL Select. For this example, lets say we saved a couple of objects that looked like this:
Contact contact = new Contact();
contact.setName("Travis");
db.set(contact);
contact = new Contact();
contact.setName("Jimbo");
db.set(contact);
Now to query for the contact named Travis, you do this:
Query q = db.query();
q.constrain(Contact.class);
q.descend("name").constrain("Travis");
ObjectSet result = q.execute();
while (result.hasNext()) {
Contact c = (Contact) result.next();
System.out.println("Contact: " + c.getName());
}

The example above is a SODA query which as you can see, uses strings and can be built dynamically. db4o does have another method of querying called Native Queries. db4objects recommends Native Queries and it's for good reason. Native queries don't reference fields by strings, they're compile time checked, are object oriented and, most importantly, they are refactorable.


List <Contact> contacts = db.query(new Predicate<Contact>() {
public boolean match(Contact contact) {
return contact.getName().equals("Travis");

}

});

for (Contact contact: contacts) {

System.out.println("Contact: " + c.getName());

}

Deleting an Object

db.delete(myObject);

Gotchas

Now not everything is greener on the other side, there are still many little gotchas that may bite you, for instance:
  • No ID support like Hibernate / EJB so you can't let db4o rebind objects based on an id field in your objects
    • This is extremely apparent in a web based application where you have to be sure to reload your objects in every request, so they can be in your new db4o session (assuming session per request pattern). I have figured out ways to work around this though to make it automated and I'll try to share these with you in a later post.
  • Weak management GUI
    • If you're used to working with nice tools like the tools provided by major database vendors, the db4o Object Manager doesn't really compare. But hopefully with time and with more adopters, the Object Manager will get better and we'll see third party tools.
  • No joins
    • This isn't bad if your object model is not complex and you can have all your objects connected to each other in your object graph, but you have to be careful when your data set gets large and your object model is a complex graph with bidirectional relationships because you may end up loading your entire database! Or at least a good chunk of data that you may not always need. And this is a *huge* performance killer.
    • Now the obvious answer would be to break apart your object model which is easy to do in a relational database because you have id's and foreign keys. In db4o, if you break apart your model, there is no built in way to reference your disconnected objects.
    • It is possible to workaround this though, by using keys and ID's as fields in your objects and querying for the related objects when you need them.

Conclusion

In any case, Carl Rosenberger and the db4objects folks have done a great job with this open source object database and they are definitely the leaders in the space. The simplicity, performance, and productivity gains outweight and disadvantages by a long shot. I urge you to try it.

Also, I plan on sharing more of my experiences with db4o in the near future, along with code such as how to easily use it in a web application, how to partition, and how to scale.




Tags: | |

Installing Java 5 JDK and Tomcat on Ubuntu (using VMWare)

This document will walk you through installing Apache Tomcat on Ubuntu Linux using VMware for virtualization. Most of the steps apply even if you're not using VMware.

  1. Download and install VMWare on your server
  2. Download and install VMWare client on your workstation (if it is different than your server)
  3. Option 1: Prepackaged Ubuntu VM (Please note, this will NOT work on a headless server)
    1. Download Ubuntu VM from www.vmware.com
    2. Open Ubuntu VM (user/pass is ubuntu/ubuntu) in VMWare client
  4. Option 2: New fresh install of Ubuntu (you have to do this if you have a headless server)
    1. Download Ubuntu ISO from www.ubuntu.com
    2. Create new virtual machine in vmware, choose Linux -> Ubuntu
    3. Mount the cd drive for the virtual machine to your downloaded iso in the virtual machine settings
    4. Start virtual machine, this will ask if you want to install, so perform the full install (this is just a regular Ubuntu install)
  5. Now you should be in Ubuntu linux
  6. modify /etc/apt/sources.list (ex: run: sudo nano -w /etc/apt/sources.list)
    - Change the first section lines deb http://archive.ubuntu.com/ubuntu breezy main restricted
    to deb http://archive.ubuntu.com/ubuntu breezy main restricted universe multiverse
    You can also add universe multiverse to deb-src and do the same to the breezy-update lines too.
    (breezy will be dapper in 6.X versions)
  7. run: sudo apt-get update
  8. Install JDK
    1. Option 1:
      1. run: sudo apt-get install sun-java5-jdk
    2. Option 2: (if option 1 doesn't work)
      1. download jdk 1.5 from sun, the Self extracting linux version, .bin extension (NOT rpm)
      2. run: sudo apt-get java-package
      3. run: fakeroot jdk***.bin
      4. run: sudo update-alternatives --config java
        - select the j2sdk1.5-sun option
      5. run: java -version
        - just to make sure it's the new version
  9. add:
    export JAVA_HOME=/usr/lib/jvm/java-1.5.0-sun-1.5.0.06
    if you used Option 1 for installing JDK above or:
    export JAVA_HOME=/usr/lib/j2sdk1.5-sun
    if option 2 was used
    to /home/ubuntu/.bashrc
  10. Open a new console window to continue with the rest of the steps
  11. Download tomcat 5.5 from tomcat.apache.org
  12. extract tomcat (to a directory under your HOME directory is a good idea - /home/ubuntu/java/apache-tomcatXXX)
  13. go to tomcat directory/bin
  14. run: ./startup.sh
  15. surf to http://yournewserversip:8080/
    1. You should see the Tomcat welcome page
If you want to run Tomcat as a service: http://tomcat.apache.org/tomcat-5.5-doc/setup.html

Other tips:
- If you want to use 7-zip which is nice for all compression/decompression tasks, run: "sudo apt-get install p7zip", then to extract anything, you just run "7za x myfile" (.zip or .tar or .tar.gz or .7z, etc)
- If you want an ssh server (sshd) on your new ubuntu box, run: "sudo apt-get install openssh-server". Apparently Ubuntu does not come with an ssh server installed out of the box.
- if you use nano, while you're editing .bashrc above, it's a good idea to add an alias for nano with the -w option like: alias nano='nano -w'
- VMware kicks a**, so be sure to try it (note: I have no affiliation with VMware, I just like good products)

If you have any issues with the steps above or want to add anything, please post your comment below.

Good night and good luck...

Friday, May 05, 2006

Normalize Schnormalize (aka Real-Time Data Warehousing)

Some applications need real-time reporting, ecommStats Web Analytics is one such app. ecommStats customers want to see what is happening on their website and they want to see it now.

A little background might be in order: In general, with enterprise software, you move and transform data from your Transactional Database (the one that is involved in the day to day work and storing every little piece of data) to your Data Warehouse so that you can generate reports faster and easier, which basically boils down to Business Intelligence. And this is generally a heavy process so it will run once per day, once per week, or whatever fits into the business needs and the gap must be longer than it takes to actually run the process (if the process takes 24 hours to run, it's probably not a good idea to run it every day).

But why wait until the night time (the night time is the right time) or the weekend to move all that data into a separate database when you can do simple preemptive operations on your database to give your users real-time data?

Normalization is great and all, but a hybrid approach can work wonders. The data is still normalized, but we have extra tables and extra columns that act as our data warehouse.

Lets take the Search Phrase report for instance, it is a time based report which shows you how many people have searched on a particular search phrase or keyword. In a normalized system, you would have your search phrase in one table and the request that came in in another table (this is seriously simplified to make a point):


SearchPhrase
id
phrase


SearchRequest
id
phraseId
date
user



Now you could sum up all the requests that used the phrase everytime someone wants the Search Phrase report, but do you want to make your users wait for this report? There could be thousands of words and thousands of requests for each word which quickly puts you into the millions of rows! The user might as well go have a coffee while they're waiting. And do you want your poor server to have to work so hard for this little report?

But by adding a single column, we can make the report almost instantaneous:

SearchPhrase
id
phrase
requestCount
lastRequestDate



Now when we're showing that report, we don't even have to look at the SearchRequest table, so what used have to go through potentially millions of rows, now is just a few thousands rows (or however many search phrases you are looking at). Obviously this example won't give you the time based reporting, but using an extra time bucket table, with similar techniques can get the results you want.

And lets say you also wanted to show when the last time some searched for a particular phrase. You could look in the SearchRequest table, sort by date descending and then grab the first one, but this is time, disk and processor intense if the request table is large. So try simply adding a date column to the SearchPhrase table like above and update it when the request is made. Don't be afraid to duplicate data when the performance benefits can be substantial.

Now how much harder is this to implement in your code? It's actually very simple in most cases, usually just requiring an extra update. In the example above, when saveSearchRequest(SearchPhrase phrase) is called in the code, it will save the SearchRequest, and do a simple update on the requestCount column:
UPDATE SearchPhrase SET requestCount = requestCount + 1, lastRequestDate = sysdate where id = PHRASE_ID;


This one time update is insignificant when compared to the many times that queries will be run against your tables.

In conclusion, using these hybrid techniques for database design can reduce wait time significantly and you can report in near real-time, rather than making your users wait days for their data. The new columns and/or tables may take up more space, but space is cheap and peformance is not. It truly is a small price to pay when compared to the huge price of doing this repeatedly in a purely normalized database or a non-real-time huge bulk process to move the data into a data warehouse.

Tags: | | | |

Tuesday, February 14, 2006

New rel8r Blog!

I've decided to start a new blog focused on rel8r topics such as tagging, blogging, tag and blog spam, etc.

Check it out here:

http://blog.rel8r.com/

And be sure to subscribe!

Wednesday, January 11, 2006

Mozy has Nailed Remote Backups

Mozy has nailed it. I have been very interested in remote backup solutions for a long time for two reasons:

1. Everyone should do backups
2. Automated remote backups is the only way to go

My interest goes so far that I created a peer to peer distributed backup system in University (if you want to read my papers on it, check here).

Today, most people are getting sold on the ol' external hard drive unit with one click backup, backup to CD, or some other lame manual process. Manual being the keyword here. Nobody is going to remember to backup their stuff every time they change it. Lets say you edit a word document, are you going to be sure to back it up right after you're done? CD or DVD backups are the worst of the options because are you going to burn a new CD for a 1KB file? And what if you're house burns down? Then you've lost all your CD's and your external hard drive anyways.

Those types of systems are not for regular backups. A backup system must have the following properties:

  1. Automated - setup once, and don't think about it again
  2. Incremental - will only backup modified files
  3. Encrypted - it should be encrypted on the client before being sent for the best security
  4. Remote - this one may be questionable to some, but I think it's essential

The location one is important due to fire, robbery, etc, but is also important for laptop users since they may not be at home.

I'm trying not to stray too far off topic here, but I wanted to provide background as to why I think I have found the new winner for backups. And the winner is: Mozy . It has all of the required properties in a nice easy to use package, but the best part is you get up to 2 GB free! There are other solutions out there, but free is not in their vocabulary.