Archive for August, 2006

I *heart* Guy Kawasaki

Wednesday, August 30th, 2006

I might write more about specifics, but I wanted to get this link out there.  I only now came across Guy Kawasaki’s entreprenuership video series hosted by the Stanford Technology Ventures Program.  It’s from 2004, so I’m probably the last software company owner to “discover” it, but I’m glad I did nonetheless.

I’ve long discounted Guy as “that Mac evangelist from way back” — to my detriment.  He’s really putting some great content and great ideas out there, regardless of what some may think of how he comes off personally.  There are so many aspects of these clips that resonate with my personal experience of launching Snowtide, seeing it fail, and then relaunching it again two years ago (and thankfully seeing it soar this time!).  That kind of connection-at-a-distance is rare, and really valuable, so I’ll certainly be keeping up with (and catching up with) Guy’s doings from now on.

Memory-mapping Files in Java Causes Problems

Wednesday, August 30th, 2006

Today, we released PDFTextStream v2.0.1 — a minor patch release that contains a workaround for an interesting and unfortunate bug: on Windows, if one accesses a PDF file on disk using PDFTextStream, then closes the PDFTextStream instance (using PDFTextStream.close()), the PDF file will still be locked. It can’t be moved or deleted.

This is actually not a bug in PDFTextStream, but in Java, documented as Sun bug #4724038. In short, any file that is memory-mapped cannot reliably be “closed” (i.e. the `DirectByteBuffer` (or some native proxy, perhaps) that holds the OS-level file handle does not release those resources, even when the `FileChannel` is closed that was used to create the `DirectByteBuffer`). Reading the comments on that bug report show a great deal of frustration, and rightly so: regardless of the technical reasons for the behavior, memory-mapping files isn’t rocket science (or, hasn’t been for 20 years or somesuch), and this kind of thing shouldn’t happen.

Since we can’t fix the bug, we devised a workaround: if you set the `pdfts.mmap.disable` system property to `Y`, then PDFTextStream won’t memory-map PDF files. Simple enough fix. FYI, there appears to be no performance degredation associated with using PDFTextStream in this mode.

Of course, this is only a problem on Windows, which does not allow files to be moved or deleted while a process has an open file handle. We have a number of customers that deploy on Windows Server (although that number is much smaller than those that deploy on a variety of *nix), but until last week, they hadn’t reported any problems. Our best guess is that, given the systems we know those customers are running, they are probably using PDFTextStream’s in-memory mode (where PDF data is in memory, and provided to PDFTextStream as a `ByteBuffer`). Of course, in that case, no file handles are ever opened, so all is well.

This problem is the topic of a new FAQ entry as well.

PDFTextOnline ‘Save Text to Disk’ Function Now Available

Wednesday, August 23rd, 2006

In my rush to self-flagellate in my last post, I neglected to mention that PDFTextOnline’s ‘Save Text to Disk’ command is now available.  This is really what makes PDFTextOnline worthwhile — being able to get a quality text extract from your PDF documents without spending time copy-and-pasting everywhere.  (Not to mention all of the other advantages that PDFTextOnline gets you, especially Chinese, Japanese, and Korean text extraction capability, which is generally shoddy in ‘regular’ PDF viewers.)

Give a high-quality text extraction tool a whirl.

Welcome to 1995: Web UI is NOT Desktop UI

Wednesday, August 23rd, 2006

PDFTextOnline, our shiny-new AJAX-y PDF text extraction application, is a nifty tool, and we’re getting some decent feedback. However, many people have indicated (not so indirectly) that its user interface sucks. Yeah, OK, our bad.

This is a lesson that was learned about a decade ago, which we didn’t so much as recognize as stumble over. Here’s PDFTextOnline’s user interface currently (click to enlarge):

pdfto_ui_old.jpg

It doesn’t look too bad, right? Not so shabby for ‘beta’, whatever that means. Of course, using it is a wholly separate matter. The buttons in the toolbar in the upper right corner are entirely opaque as to their meaning — even though the icons use familiar visual metaphors (open folder for the ‘Open File’ action, a disk for the ‘Save As’ action, etc), they don’t seem to work within this environment. The split betwen the drawer on the left and the main text area doesn’t really work quite right, apparently regardless of whether you’re a Windows, Mac, or Linux person.

Users are flummoxed, and don’t see what the path is from pont A to point B.

Those are just a few of the comments we’ve received so far. The point being, of course, that we didn’t design the UI for the web, as we should have — we designed it to mimic a desktop PDF viewer (except PDFTextOnline’s stock in trade is text). Maybe if we redoubled our efforts, we could roll in a new widget set (perhaps those from Backbase, or something similar), tighten all of javascript that worked with the interactive bits to make those parts more snappy, and end up with something that felt more desktop-ish.

Of course, that’s a bad idea. This is not a desktop application, it’s a web application. Duh.

We’ll have something better in a few weeks, promise.