Archive for the ‘PDFTextStream’ Category

Totally Flattened

Tuesday, June 28th, 2005

**The past 10 days have been just nuts.**

When it rains, it’s buckets.

We got hit last week with serious inquiries from a half a dozen very large organizations — a good mix of governmental, corporate, and nonprofit/research. Each of them already had a grasp of what PDFTextStream could mean to them and their projects, especially on the performance and text extraction quality fronts. However, each of them also were looking for some broader extraction functionality: bookmarks, annotations, tagged PDF structures, etc.

This is stuff we were already working on and planning to add into the mix, but these new requests certainly kicked the pace up quite a bit. Some of it was pretty quick and easy to finish up and move into beta phase — that will find its way into released versions very soon.

Other stuff is a little harder though, to put it mildly: OCR of text in images in PDF’s, decryption of digitally-signed documents, and other higher-order functionality. Again, all stuff we’ve been positioning ourselves to jump on, but when there’s fish to fry, we all start cooking a little faster. (Now’s when you’re supposed to groan at the horrible pun….)

So, we definitely busy. Now, who said software slowed down in the summer?

Marketing is Hard and Scary

Sunday, June 19th, 2005

**Marketing is really hard, despite the rumors you’ve heard. The more I get into it, the more I’ve come to respect the skills (if not necessarily the tactics) necessary to deliver a message to prospective customers.**

Up until this point, Snowtide has done virtually no marketing, and we’ve made out very nicely. We now have a mature product that really kicks ass. I’m proud of what PDFTextStream is doing for its users, some of whom simply would not be able to do their jobs if it weren’t for it.

But we’re past the point of working small niches. Scores of development shops, large and small, would have fewer bad days if they had PDFTextStream humming on their servers and in their products. So, the time has come to spread the gospel and make sure they know that.

To that end, we’re starting a new marketing strategy in July. It’s going to start slow as we learn our footing (the conventional wisdom is that summertime sees a slowdown in corporate software purchases because of vacationing). It will build through the end of the year. And, it will end with PDFTextStream being the only serious choice for developers in enterprise-class environments.

There’s the tricky part, though: convincing people that our product is better than its competition. The foundation for that has been laid for PDFTextStream — it’s been borne out in customer experiences. The problem is that, without appropriate marketing, the people that are likely to appreciate that fact will never even know about your product. In order to change that, we’ve got to write good ad copy, hire good designers to craft and mold that copy into digestable elements (ad banners, text ads, white papers, editorial placements, etc), and feed those elements into a cacaphony of interruptive marketing noise to be noticed and not ignored.

Technical people and marketing folks have always had their differences; they simply do not understand the difficulties inherent in their respective trades, and that often leads to disrespect. That is ever so slowly changing, in part because of pieces similar to this post, typically made by an in-the-trenches software company founder (like myself, I suppose), who inevitably describes how difficult marketing is. And seriously — it’s really, really, hard.

Every step in the progression of tasks I enumerated that leads to a prospective customer seeing, noticing, and acting on a pice of advertising is hard. And personally, I find it very unpleasant, simply because I am, by nature, technical. I know how the bits in software work, and I know those types of things very well. It’s a perfect occupation for someone who is a bit of a control nut. Yes, I am that.

So it makes me very uneasy to engage in an activity (like marketing) where I cannot readily control the outcome. It makes me even more uneasy to engage in an activity (like marketing) where I am less than fully confident in my (and in this case, our) abilities. We are fundamentally technical; we know how the bits work. Even with help, we find the fuzzy, soft, vague world of marketing just a little scary.

That will get better in time, as we fail a little, succeed a little, and do a little more of the latter and a little less of the former each time we try. It would be a high crime to not try, try hard, and try often; we have a great product, it should be seen, and it will be seen.

Worldly Exposure

Thursday, June 16th, 2005

**Every person has a particular set of experiences they search for when choosing an occupation. For me, I’ve always be fascinated with the act and process of discovery. Thankfully, helping to build and maintain PDFTextStream satisfies that fascination in spades in ways that I never anticipated.**

One would assume that working on a piece of software that extracts text from PDF documents would be pretty dry work. And, to a certain extent, it is: supporting all of the intricacies and minutiae associated with a complex file format like PDF is not the most thrilling software development work.

However, what can be exciting about the experience is how it forces me to be exposed to things that I never would have seen otherwise. See, in order to ensure that PDFTextStream works well and continues to do so as it is improved and changed, we have developed a suite of test PDF documents. These documents must be examined one by one, fed into PDFTextStream, and records of the documents’ logical structure and text content saved off into what are called ‘ground truth’ files. Then, whenever a change is made to PDFTextStream, our automated tests compare all of the preexisting ground truth files with what PDFTextStream provides after it has been changed. This process of constantly tracking the impact of changes to PDFTextStream is critical in ensuring that it continues to be robust, providing high-quality output.

The point here though, is that the process of building up and maintaining our suite of PDF documents (which numbers in the thousands now) exposes us to documents from nearly every corner of human activity. That’s thrilling for me, as I get the option to read about things that I never would have come across had I not been involved in PDFTextStream. For example, our test suite includes PDF documents like:

  • An issue of the newsletter produced by the National Multiple Sclerosis Society
  • A research paper describing CFS, a Cryptographic File System for Unix that was developed at AT&T
  • Various PDF versions of U.S. patents
  • A maintenance worksheet that describes how to apply and care for a particular type of asphalt emulsion
  • A whitepaper discussing various systems that help in managing spectral data
  • An essay by Seth Godin called Do Less that discusses the need to be selective in one’s entrepreneurial venture
  • An English translation of an al Qaeda training manual siezed by the Manchester, UK police in a raid of an al Qaeda cell house
  • An article discussing options for 2D visualization of complex ontologies
  • The 2004 roster for the University of Pittsburgh softball team
  • A PDF version of a Powerpoint presentation about the excruciating financial minutiae of reinsurance
  • An article about how to safely set up and use tower scaffolding
  • A catalog of activities at the 2003 Melbourne Scarf Festival (who knew someone would ever host a lecture called “The Nature of Scarves”?)

As you can see, the list goes on and on and on. The world of human knowledge and experience is functionally infinite, but I love getting glimpses of obscure corners of it and making little personal discoveries. Pretty geeky, I know, but that’s not really surprising, is it?

Isn’t Software Grand?

Friday, December 3rd, 2004

**Reading palms is a better percentage bet than trying to accurately project the effort or time needed to ship a particular software feature.**

So, PDFTextStream v1.3 is late. There, I said it.

Yes, I said (mouth, meet foot) that v1.3 would be ready by the end of November. Such is the nature of youth and optimism, I suppose.

Version 1.3 is coming, and it is close. However, we just couldn’t pull the trigger on a new release without being entirely satisfied that it’s up to par. v1.3 wasn’t come November 30th, so we didn’t push it. Late is better than both never and on-time-but-buggy.