Optimal Buffer and Destination Byte Array Size for java.io.BufferedInputStream Reads (for a fast disk)

Micro-Bechmark Results on a MacBook Pro with an Intel X25M SSD

When implementing file reading Java code with Java IO’s BufferedInputStream, what buffer size should one choose? Should we just not specific it and go with the default? And what destination byte array size is best?

These questions pop up from time to time when I have the opportunity to write such code. And I’ve seen folks ask similar questions on Stack Overflow and The Java Ranch. So, with my trusty new SSD drive, and a bit of spare time this holiday season, I set out to answer those questions.

The methodology for the below micro-benchmark results was simple: Graph the maximum speed of a simple algorithm with various BufferedInputStream buffer and destination byte array payload sizes. They algorithm was just as simple: add up all the bytes in a file. This is both computationally light weight and serves as a simple checksum (to ensure the consistency of my algorithm in with various parameters.) The code: OptimalBufferSizeSquentialReader.java

The target file read in these metrics was a 31MB video clip. To prevent the OS file cache from mucking up the results, there is a unique file for each buffer size + destination byte array size, for a total of 7GB in test data.

Destination Byte Array Size vs. KBps (for 16 different Buffer Sizes + the default)

(Larger Image)
Detailed description of the graph:

  • x-axis: The destination byte array size used in individual read method calls as defined by payloadSize in this snippet:
        byte[] payload = new byte[payloadSize];
        int readIn = is.read(payload);
  • y-axis: The speed in Kilobytes per second for a complete build up, file opening and reading, and the algorithm’s computation.
  • series (individual lines): Individual BufferedInputStream buffer size’s as set with buffSize during class initialization.
        FileInputStream fis = FileInputStream(file);
        InputStream is = new BufferedInputStream(fis, buffSize)

An interesting graph. My conclusions:

  • One generally cannot go wrong with a destination byte array size of 512 or 1024 bytes, regardless of what the BufferedInputStream’s buffer size has been initialized to.
  • BufferedInputStream’s default buffer size is pretty well tuned, as long as one does not use a destination byte array size smaller than 8 or 16 bytes
  • There is no point in having BufferedInputStream’s default buffer size initialized to anything larger than 2KB. In fact, if an application is going to have many concurrent threads running this code (like in a web application) then large values will only wastefully consume memory, limiting overall scalability.
  • The lower destination byte array sizes seem to have some sort of Sigmoid Function.
  • My new SSD is freaky fast. ~130,000KBps is ~125MBps! Yeah, baby!

One thing I don’t get is that the default buffer size is 8KB, but the 8KB series does not match the default series. Humph.

Update! I was curious aoubt two other aspects of the buffer size and destination byte array size, CPU load and the impact of speed of the disk. To that end, I’ve followed up with a second, similar set of metrics in Optimal Buffer and Destination Byte Array Size for java.io.BufferedInputStream Reads (for a slow disk)