Thursday, September 10, 2009

Revising the Hack Test (Round Two)

Before we use the Hack Test, I have one more revision. This lengthy suggestion comes from my old college housemate, Joe Paxton.

He’s working on his masters for something or another at Harvard. (No, most OU graduates do not continue to Harvard. He’s always been the type of guy who ruins the curve for all of us.) Consequently, he uses phrases like “multi-dimensional quality space” and “personalized learning algorithm.” He also makes an excellent point that simultaneously reaffirms and ruins my Hack Test. His words:

I love this idea. I think, though, the algorithm would have to be a learning algorithm rather than something static, because of how difficult it is to articulate exactly what makes for good/bad writing. And if might be a good idea to tailor it to the individual as well, since tastes vary.

I imagine that, with an individually-tailored learning algorithm, you could feed a book into a program that implements that algorithm, and rate it along some scale from good to bad. The program would then analyze the text of the book along multiple dimensions -- the dimensions you suggested would be a good start, but you could constantly add dimensions as you think of them.

After you’ve fed in a couple dozen books, the computer would have some idea of what, on average, you think makes for a good book. This would be encoded in a “multi-dimensional quality space” that combines the good/bad rating for a given book with the dimension scores for that book, and averages these multi-dimensional quality scores over all books.

So now you have your personalized multi-dimensional quality space, which tells you how important you personally take each dimension to be. I imagine that we could construct a database that holds the dimension scores for any arbitrary book. By comparing the dimension scores for that book to your personalized multiple-dimensional quality space, you could get a good idea of whether you’d be likely to enjoy the book.

I worry, though, that this method wouldn’t allow you to flag the really great books. The really great books often seem to be great because they break the mold, somehow going beyond most everything you’ve read before. But I do think this kind of personalized learning algorithm approach would enable you to at least filter out the hacks without having to read them. And it might even give you some decent recommendations.

Of course, this personalized learning algorithm approach is currently pretty impractical. But that’s not because the algorithm is particularly complicated. It’s mainly due to the fact that a lot of books have not been digitized yet. And that’s particularly true of fiction books. It seems that Google Books is in the process of changing that. So I could imagine such a system being practical within the next ten years.

And someone will write the program, given enough demand. In fact, I would be surprised if some computer scientists with a passion for literature hasn’t already started writing it.


Three thoughts on Joe’s statement: One, he’s absolutely right about this test not identifying great books. The best books do not follow logic, they defy it. However, I expect it will identify writers with hacky tendencies. (Thus, it is a Hack Test and not a Genius Exam.)

Two: I think Joe underrates how difficult it might be to write a computer algorithm for the hack test (and not just because my knowledge of HTML code ends with knowing how to underline statements). The sticking point would be the cliché clause. I feel the same way about clichés that Potter Stewart feels about pornography. I may not be able to list all the clichés in the English language, but I know them when I see them. Also, clichés have iterations that even the most inclusive algorithm might miss.

For example, if I write about “the straw that breaks the literary critic’s back,” it’s a cliché. It may not be verbatim, but it still counts.

Third, and most importantly: Joe is correct to say that these algorithms would need to be personalized. What is important to me will matter less to other readers. We can list factors (originality, conciseness of writing), but each reader must decide how to weigh these factors.

My Hack Test roots in William Zinsser’s notion of good writing. Don’t use three words when two will suffice. But Zinsser’s philosophy does not fit Cervantes, Dickens or Shakespeare. It would be pointless to apply my hack test to anything before 1940. But it offers one more tool we can use to measure literature.

The next time I post, I’ll be testing the test.

-Jason Lea, JLea@News-Herald.com

P.S. Because Joe mentioned it, an update on the Google case.

Labels: ,

0 Comments:

Post a Comment

Subscribe to Post Comments [Atom]

<< Home