recent/current projects

Check out Aburt's publishing company
ReAnimus Press
  Publishing award-winning and bestselling authors like:

     
Welcome to AndrewBurt.com
New from ReAnimus Press
Dear America: Letters Home from Vietnam   The Box: The History of Television
 
 

Google Books Settlement — an Open Source Project?

Here's a half-baked thought: In the Google Books Settlement why not require that Google place the unclaimed books into a non-profit, open source project status? That is, anybody — not just Google — could make use of the scans and text.

Now before I get into the meat of this open source project idea, let me clearly define which aspects of the book scanning project I'm talking about: Unclaimed works. Google is scanning a lot of books, and for many of them the owner of the copyright is known. The settlement agreement encourages owners to come forward and claim the books they have rights to. "That book is mine." For those books, the situation with the rights is unambiguous: The owner controls them. End of story. (They and Google may enter into an agreement to sell them, and it is entirely up to the copyright owner, as it should be.)

What I want to focus on are the books where no owner comes forward to claim them. These are essentially "orphan" books. There are probably a vast number of these. Millions. It would be a shame to lose them. I agree with Google that would be a loss to society.[**]

However, under the proposed settlement, Google, and Google alone, stands to profit from these unclaimed works — after breaking the law to get them — and that seems wrong. Google should not get a cash-cow monopoly revenue stream from an illegal action. (I generally hate to see lawbreakers rewarded, though obviously it happens.) More importantly, for the benefit of society, Google should not be the only organization allowed to do this. That sort of monopoly sets up a bad situation for the future. There should be a penalty for breaking the law, and there should be competition.

My suggestion is that they establish open-source, non-profit competition to themselves for these books.

This solves both issues: The monopoly is removed, and the penalty for breaking the law is that their ability to profit from unclaimed works is consequently diminished.

This actually meshes with what Google says it wants.

Google's VP and Chief Legal Officer, David Drummond, says their Google Books digitization project is meant to help the world, and that they don't want to be a monopoly. He writes (emphasis added):

In reality, nothing in this [settlement] agreement precludes[**] any other organisation from pursuing its own digitisation efforts. We wish there were a hundred such services. But despite a number of important projects to date -- and Google has helped fund some of them -- none has been on the same scale simply because no one else has yet chosen to invest the time and resources required. But if there are to be a hundred services in future, we have to start with one.

Well, okay, wonderful, you can help establish those hundred others!

Technically this would be simple. The scanned images, OCR'd text, and meta-data (titles, coordinates of the text within the images, etc.) for all unclaimed works would be made available, free, in a standardized format, to anyone to use as they see fit. (For profit or otherwise; the market would decide which uses are worthwhile. For works that are later claimed, I would think it fair if the settlement agreement's profit-sharing provisions would then apply between the registry and whoever made use of the work. [But not profit-sharing with google.])

The one licensing proviso for use would be to periodically check and cease using any work that has become claimed. (Obviously those who make use of the works could negotiate with the now-identified owners to continue using the work. After a few years the number of newly claimed works would probably be quite small so this isn't much of a barrier. A list of book-ids that have been recently claimed would be easy for developers to check in their applications.) This ensures that only works no owner cares about are being used for free.

The registry that the settlement creates would be the obvious place to distribute all these files.

This would open up a wealth of uses, probably many clever ones we would never imagine.

This addresses Google's valid point that they're preserving a lot of orphan works that might otherwise be lost. That rightsholders have to take action to protect their rights is annoying (and technically illegal) but in this context I can see the logic that it's not possible to find the rightsholders for all those millions of works; and "opt-in" means they could be lost. That preservation aspect seems useful. Yes, it does turn copyright law upside down, but it's a recent change anyway that authors should not have to take action to protect their rights: US copyright law used to require authors take actions to preserve rights (registration and renewal). This was the case in the US until 1976. Indeed, authors are actually still required to take an action, registration, to obtain maximum protection. If the law had remained as it was before 1976, that authors had to take steps to keep their work in copyright, nobody would bat an eye in the first place at authors having to take steps to assert their rights today in this settlement.

No-author-action-required probably seemed like a good idea in 1976. I doubt anyone at the time imagined computerized scanning of millions of old books or a google-like system for searching in a mere 30 years. (Even science fiction writers hardly imagined it.) That's fine — but times have changed, and perhaps, indeed, to preserve a corpus of millions of old books, it may be the lesser evil to require action on the part of authors to claim their works. I would also note I make this suggestion in the context of the settlement being a seeming done deal. If the judge decides this settlement is to become law, and essentially overturn the requirement to locate copyright owners before using their work, then (in that fait accompli context) it would be better, in my opinion, to broaden the accessibility of unclaimed works rather than leave google the only gatekeeper.

Google should not, in any event, be the sole beneficiary of their illegal action. If the judge decides society benefits by access to digital works, then society will benefit more if unclaimed works are freely available to all. (Until they are claimed, when the author's rights to control their work resumes as it always was.) This will spur the hundreds of other such services that Google itself wishes for.

I realize the settlement may be close to cast in stone at this point, but this would seem like a beneficial revision.

What do you think? Would this be a good thing? (Add your comments -here-.)

Taking this a step further, all the other scanned books — the claimed ones — should also be available to all comers under the same exact terms as the settlement provides to Google. (The same revenue split, the same rights to remove one's book from the system, etc.)

If the settlement agreement goes forward as now written, with Google the only entity expressly permitted to display unclaimed works, then — and this is just musing here after a few glasses of the grape — it would be ironic if some group pirated all the scans of truly orphaned works (the ones nobody claims) and put them up free on bit-torrent. :)

Also, a final suggestion for the settlement folks: It would be nice if rightsholders could be given a free copy of the scans/text/metadata of their own works. This too would increase competition by allowing copyright owners to easily take their own works elsewhere for display. (Why should google provide this? Again, because they broke the law to obtain these books in the first place. It's a minor penalty to ask them to share with the rightful owners.)

Anyway, enough post-grape musings for the night. :)

What are your thoughts? Crazy idea?


Notes

By way of biographical note, while I was VP of SFWA I chaired the Orphan Works Committee. Making orphaned works available to the public in a fair manner is a wish of mine.

To insulate users from their own potential copyright liability, in addition to making the images/text/metadata available for download, google should also host all the data in directly usable form. For example, I might write an ebook reader application that displays scanned image pages — the URL I would use inside my app would point to the page on google's server. Thus I wouldn't have to host the page image myself, unless I wanted to. Likewise the raw text should be retrieable in real time from google's servers.

Disallowing derivative works and temporarily escrowing potential owner shares of profits if they later turn up, as per the current proposal, could also be reasonable licensing restrictions.

Note this idea is orthogonal to the whole opt-in / opt-out question. (What the Berne Convention calls a "formality" [and forbids], i.e. authors having to actively claim their work rather than passively, like registration or renewal.) That is: While it seems likely the settlement will get approved with some form of action required by copyright owners, even if not, some works will remain that even after diligent search no owners can be found. This proposal addresses any unclaimed / orphaned works, no matter how many or few, or how hard google does or doesn't try to find the owners. Regardless how many works are in the "unclaimed" or "orphan" category, it would be nice to prevent google from having a monopoly over these unclaimed works.

Related:
Axioms in the Future of Publishing
Thoughts on Copyright
Ebook (un)availability case study