September 11th, 2006
We’ve finally recovered from our harrowing experience of being discovered via digg (so horrible to have the problem of too much attention!). PDFTextOnline is back online, and greatly beefed up.
Also, the new interface I promised earlier is now in effect as well. It’s simpler, easier to understand, and provides some new features as well (such as being able to choose the font used to display extracted PDF text, and being able to choose which layout mode should be used when performing each extraction). Let us know what you think.
Finally, we’ve brought in Adsense ads. I guess we’ve sold out now, eh? Of course, it’s the smart thing to do given the pretty significant waves of traffic we continue to get from around the web that was prompted by the digg post.
Posted in PDFTextOnline, The Business | Comments Off
September 7th, 2006
Well, that was a surprise. PDFTextOnline was linked to on Digg, and made it to the front page (it made it to #2 when I saw it).
Of course, you know the drill from here. We built PDFTextOnline and put it out there as a nifty little tool, hoping that some people would find it useful, and maybe a couple curious software developers and managers might stumble upon PDFTextStream as a great way to bring PDF text extraction like they see in PDFTextOnline into their organization. We haven’t promoted it, or even linked to it heavily on snowtide.com.
Given all that, we didn’t put PDFTextOnline on a particularly large server — in fact, it was running on a mid-level VPS. Definitely nothing special.
Then we got hit with the digg-effect, and whammo, say goodbye. I haven’t poked at the server logs much yet, but the flood of traffic was heavy and unyielding.
So, I got the hint — PDFTextOnline is genuinely interesting to an audience larger than us.
Now I need to go server-shopping.
My hope is that PDFTextOnline will be back up later tonight, and then moved to a real server next week. Then maybe we can get slashdotted, and do it all over again!
Posted in Geek Commentary, PDFTextOnline | 2 Comments »
August 30th, 2006
I might write more about specifics, but I wanted to get this link out there. I only now came across Guy Kawasaki’s entreprenuership video series hosted by the Stanford Technology Ventures Program. It’s from 2004, so I’m probably the last software company owner to “discover” it, but I’m glad I did nonetheless.
I’ve long discounted Guy as “that Mac evangelist from way back” — to my detriment. He’s really putting some great content and great ideas out there, regardless of what some may think of how he comes off personally. There are so many aspects of these clips that resonate with my personal experience of launching Snowtide, seeing it fail, and then relaunching it again two years ago (and thankfully seeing it soar this time!). That kind of connection-at-a-distance is rare, and really valuable, so I’ll certainly be keeping up with (and catching up with) Guy’s doings from now on.
Posted in The Business | Comments Off
August 30th, 2006
Today, we released PDFTextStream v2.0.1 — a minor patch release that contains a workaround for an interesting and unfortunate bug: on Windows, if one accesses a PDF file on disk using PDFTextStream, then closes the PDFTextStream instance (using PDFTextStream.close()), the PDF file will still be locked. It can’t be moved or deleted.
This is actually not a bug in PDFTextStream, but in Java, documented as Sun bug #4724038. In short, any file that is memory-mapped cannot reliably be “closed” (i.e. the `DirectByteBuffer` (or some native proxy, perhaps) that holds the OS-level file handle does not release those resources, even when the `FileChannel` is closed that was used to create the `DirectByteBuffer`). Reading the comments on that bug report show a great deal of frustration, and rightly so: regardless of the technical reasons for the behavior, memory-mapping files isn’t rocket science (or, hasn’t been for 20 years or somesuch), and this kind of thing shouldn’t happen.
Since we can’t fix the bug, we devised a workaround: if you set the `pdfts.mmap.disable` system property to `Y`, then PDFTextStream won’t memory-map PDF files. Simple enough fix. FYI, there appears to be no performance degredation associated with using PDFTextStream in this mode.
Of course, this is only a problem on Windows, which does not allow files to be moved or deleted while a process has an open file handle. We have a number of customers that deploy on Windows Server (although that number is much smaller than those that deploy on a variety of *nix), but until last week, they hadn’t reported any problems. Our best guess is that, given the systems we know those customers are running, they are probably using PDFTextStream’s in-memory mode (where PDF data is in memory, and provided to PDFTextStream as a `ByteBuffer`). Of course, in that case, no file handles are ever opened, so all is well.
This problem is the topic of a new FAQ entry as well.
Posted in PDFTextStream | 2 Comments »