Java Serialization – Appending Objects to an Existing File

The Problem

You cannot simply append objects spanning open/close pairs to a partially populated file and expect to read them later with a single pass.  Doing a series of:

  1. open file X in append mode
  2. write object(s) // transaction
  3. close file

… time goes by or system reboots, etc ….

  1. open file X in append mode
  2. write objects(s) // transaction
  3. close file

I’ve tried this and upon reading I got this error:

java.io.StreamCorruptedException: invalid type code: AC
java.io.StreamCorruptedException: invalid type code: AC

This is due to the stream header appearing before each object S in the file.  It is written everytime a ObjectOutputStream is opened/created.

But Wait – Why do this?

It is a reasonable question.  It is likely you do not need to do this – right?  Open the file, write to it, and close it?  Doing this over and over against just seems inefficient.  Why not just write all the objects at once and be done with it.  Well, there are situations, like writing to a database log file where this might happen. You might consider rolling to a new file for each group of time-separated writes or just keep the file open.  But keeping a file open indefinitely or having many separate tiny files seems just broken somehow.  What if the file is a socket and you cannot simply open or close it anytime.

The key is you cannot come back to an existing file/socket and write more objects after that stream was once closed and expect to read this same stream later using a simple reverse technique.

There are techniques to work around this.

Solution #1: Fake Multiple file in a Single Stream

In looking closely at the prevayler code, which seems to do the same thing for command journaling, I discovered one possible technique.

Write your “transaction” to a ByteArrayOutputStream, then write the length and contents of this ByteArrayOutputStream to a file via the DataOutputStream.

ByteArrayOutputStream aos = new ByteArrayOutputStream();
ObjectOutputStream os = new ObjectOutputStream(aos);
DataOutputStream dos = new DataOutputStream(new FileOutputStream(outfile, true));
try
{
    os.writeObject(order);
    os.writeObject(new Date());
    os.flush();

    byte[] raw = aos.toByteArray();
    dos.writeInt(raw.length);
    dos.write(raw);
    dos.flush();
}
finally
{
    if ( os!= null ) os.close();
    if ( dos!=null ) dos.close();

}

To read, reverse the process.  Read the length of the byte stream into an int, create a byte array of this length, then read that amount of data into byte array.  Once you created the ByteArrayInputStream, use it to read the transaction via ObjectInputStream.

DataInputStream dis = new DataInputStream(new FileInputStream(filename)); 
try{
    while ( true )
    {
        int recLen = dis.readInt();
        if ( recLen <= 0 )
            break;
        byte[] raw = new byte[recLen];
        dis.read(raw);
        ByteArrayInputStream bis = new ByteArrayInputStream(raw);
	ObjectInputStream is = new ObjectInputStream(bis);
	int i=0;
	MyClass1 obj1 = (MyClass)is.readObject();
	MyClass2 obj2 = (MyClass)is.readObject();
	// use or store obj1 and obj2
	is.close();
    }
}
catch(EOFException e)
{
    System.out.println("Reached EOF");
}
finally
{
    if ( dis != null ) dis.close();
}
You will be able to write an series of separate object-sets/transactions and then read this series later.  This has the effect of writing multiple "virtual" files sequentially into a single file.

It will keep working regardless of changes to the underlying serialization framework.

Solution #2:  Reopen and Skip

Another solution involves saving the file position using:

long pos = fis.getChannel().position();

closing the file, reopening the file, and skipping to this position before reading the next transaction.

This is simply to implement.  It keeps the writing side from changing beyond the simple open/writeObject/close semantic.  This works but at the cost of performing many OS-level open/close operations and may limit performance.  The best part of this solution, however, is that it might save you from assuming you could read such files you have created.  If you have code writing logs without having tested the read side, then this code will allow you to read that data.

Solution #3: Override ObjectInputStream

This assumes the implementation of this class shall remain unchanged and was suggested in this thread forums.  Part of the issue is the stream header and the forum entry argues for by-passing it.  I did not try it because it seems to be a bad practice since it involves overriding protected routines.  These are protected for a reason – duh!

If I am missing something, or there other ways to solve this problem, I’d like to hear about them.

Peaceout.

Tags: , , , ,

Leave a Reply

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>