How to create a patent infringement with a couple of lines of code

2009-02-23 21:07:00

Introduction

I have been telling people around me that software patents are a bad idea as it stands right now. In this blog post, I want to focus on a specific argument in the discussion about software patents. My argument is in disfavour of software patents and is based on the concept of triviality. So first of all, what is trivial? As far as my understanding goes, triviality is the fact that an item, for example a piece of information, is unrelevant, unimportant in terms of semantic meaning, or just plain simple in its structure.

What does that mean from a developers perspective? Let's take an example: to save a piece of information on a computer, one has to allocate memory for that information. Trivial. To compute the arithmetic average of a list of numbers, one has to compute the sum and divide the sum by the number of elements. Trivial. I hope you get my point: some problems are not really problems.

The process of creating software for a computer is constrained by a couple of circumstances. The laws of mathematics are one constraint, the hardware limitations another constraint. But most importantly, the problem to be solved (the job of programmers) is usually the biggest constraint. In mathematics, writing down the problem often leads directly to the solution. Look at the following problem:

x + 3 = 8

This is the problem description on one hand, but also the implicit solution too, because by only changing the way it is written (through the laws of mathematics), one gets:

x = 8 - 3

The problem

What I want to show is a specific software problem which I had to solve a couple of days ago and which can be solved merely by reformulating it. What is baffling me, is that I don't have the right to use that solution, at least in the US (to my understanding, I am not a lawyer). Here is the problem in some simple words:

Description: The program "spam-and-eggs.py" creates files inside a directory, each file is created with a unique filename. The program can create an arbitrary amount of files, only restricted by filesystem constraints which make it crash at a certain point. Many, even modern filesystems, have a limit on the number of files that can be placed within a directory.

Problem to be solved: find a way to save the files without hitting the filesystem limits.

This is a relatively clear problem. What about the solution?

Finding the solution

To find a solution, we usually have to look at the constraints we have:

place a lot of files on the filesystem
don't hit filesystem limits, in our case, 32000 files per directory
each file is uniquely named

Obviously, to solve constraint 1 and 2, we will have to create more than one directory, at least if the number of files exceeds 32000. From an algorithmic design perspective, it would be good if the program would not need to know if we hit the 32000 limit. It should be designed in a way so that no one has to do bookkeeping with regard to the total number of files, but this is not a strict requirement, I'll just throw it in additionally. Let's re-iterate over our constraints:

create or re-use directories as files are generated
automatically keep the number of items within a directory below 32000
each file is uniquely named

The sky still seems clouded, but it has changed slightly. Constraint 1 and 3 look like we are gonna need to create or re-use directories based on the names that are generated by our program.

So what are the constraints around our filenames? Our program will give us arbitrary names, e.g. arbitrary arrays of byte values. Because filenames cannot usually have arbitrary names, this is a problem. But let's assume for a moment that we have the WONDERFULLY PERFECT FS©, then this means that we can measure the number of possible entries in a directory like so:

max = 256^numbytes

E.g for one glyph filenames, we get a total of max == 256 possible filenames, for 2 glyphs in our filename, we get max == 65536 possible filenames. Clearly, 65536 is already out of bound with regard to our maximum of 32000.

Hm. The sky is still clouded. Let me throw in some abstraction. Let me call a directory a container for slots and a file or subdirectory a slot. We know that the number of free slots is limited. What characterizes a slot? Its name. If we could limit the number of possible names, we automatically limit the number of slots. Our limit is 32000, which means that by limiting slot-names to one single glyph, we could restrict the number of slots to 256, limiting it to two glyphs would already grow out of bound. This would work, but has 2 problems:

we do not actually have the WONDERFULLY PERFECT FS©
restricting slots to only 256 entries is suboptimal, as it leaves almost 30000 slots unused

Problem 1 can be bypassed by transliterating an arbitrary array of bytes into something that can be mapped onto our filesystem, which is POSIX compliant. An example on how to do that would be to encode a given name into hexadecimal form. The advantages are clear:

the hexadecimal numeral system is well understood and widespread (http://en.wikipedia.org/wiki/Hexadecimal)
each symbol (glyphs) used in the hexadecimal system is acceptable in a POSIX filesystem (http://www.opengroup.org/onlinepubs/009695399/toc.htm)
using filenames with just 16 possible glyphs is probably highly portable

Problem 2 can be optimized by having 3 hexadecimal glyphs: 163 = 4096 slots, which is not a perfect match, but a lot better than only 256 slots.

Let's re-iterate our constraints and re-formulate:

each unique name is converted to its hexadecimal representation
generate slots names based on hexadecimal values
generate slots names that have a maximum of 3 glyphs

Implementing the solution

The sky is now clearing. We can actually start to write code. I use Python, so here we go:

def generateSlotSequenceFromName(name):
   hexname = name.encode("hex")
   slotsequence = []
   for pos in range(0, len(hexname), 3):
      slotsequence.append(hexname[i:i+3])
   return slotsequence

slotsequence = generateSlotSequenceFromName("An example name")

This can be shortened by using list comprehensions:

def generateSlotSequenceFromName(name):
   hexname = name.encode("hex")
   return [hexname[i:i+3] for i in xrange(0, len(hexname), 3)]

The code needed to save a file, is as simple as:

if not os.path.exists(os.path.dirname(os.path.join(slotsequence))): os.makedirs(os.path.dirname(os.path.join(slotsequence)))
open(os.path.join(slotsequence), 'wb').write("THIS IS OUR PIECE OF INFORMATION WE WANT TO SAVE")

To open an existing file, one could write:

data = open(os.path.join(slotsequence), 'rb').read()

Admittedly, the last lines are a bit thrown in and won't appeal to most programmers eyes, but hey, it works ;-)

Googling

I drafted the above code during a cup of coffee after getting up in the morning. Before I left home and went for work, I thought I'd google for this kind of approach to see if there were any nice alternative solutions or if there are any nifty open-source libraries that do this. This is a habit of mine, as I have often found to do things in an awkward way and re-using online publications to learn and verify your own work is as good as asking your mentor for feedback (if you have one). I couldn't find much but lots of filesystem related papers and research. This wasn't what I was looking for. After continuing my searches, I suddenly found this. To be sure this wasn't some sort of fake (haha, how could it?), I looked up the same patent in the public USPTO fulltext and image archive and found US patent 7,412,449 (for your convenience, I organized a PDF file for US patent 7,412,449 with the text created by an OCR program).

Patent infringement

As I am not a lawyer, my first reaction upon reading the patent document, was "wow, cool, this is the confirmation that my approach must be worth something". After a couple of seconds, though, I realized that this patent document probably also means that the code I just wrote may not be used without permission from the patent holder. But luckily, because I live outside the U.S., the patent in question has no legal effect on me. Thinking like that made me feel slightly better, but knowing that nowadays many patents are cross-published in many countries, I wasn't so sure anymore. Especially since the assignee of the patent is SAP, an internationally active company, and the inventor being located in Germany (where I live), I became worried again. So I tried to look up other online databases for publications in other countries, especially in Germany, referring to the same "invention". This search didn't yield any results and although I tried hard for a moment, I quickly realized that I could not actually make sure to find a potential patent for the same invention in Germany without the help of someone who knows its way through the patent publication jungle.

Consequences

Because I want to use the above code (I created) at work, I will now probably need to involve the company legal department. Also, as a software developer, I feel strongly attacked at my right to be creative and solution oriented. When I drafted the implementation, I was also considering creating a small library and publish it as open-source, probably using a license like the LGPL or the MIT License, as there wasn't any library doing this. Obviously, such a library would have to provide a nice API and all the tidbits of modern programming, not just the 2 lines of code creating a hash slot sequence from a name. But I am not sure I can do this easily anymore, as uploading such a library to a community site could create legal problems for that site if it is located in the U.S., which I obviously want to avoid.

Essentially, I am now forced to stop working on what I am actually supposed to do and either:

find a different approach, bypassing the patent
ignore potential legal problems and continue
switch my job to something non-creative

Conclusion

By reformulating a problem in just 2-3 steps, a problem description can become a self apparent solution description. Trivial. I consider the above stuff trivial, because there is no new concept being created, there is no real invention, just the mere re-use of the laws of logic, mathematics and some lesser generic constraints from the computer world.

I am devastated by the fact that a minimal piece of code like

thevar = "An example name"
[thevar.encode("hex")[i:i+3] for i in xrange(0, len(thevar.encode("hex")), 3)]

created during my first cup of coffee to start the day potentially has legal problems attached to it.

I am not convinced that this is something worth patenting, I think it is non- inventive, it is not something I had to put a lot of research into. Maybe it is creative in the sense that every act of writing is creative.

Actually, I have no real conclusion, except that I got reaffirmed in my position about software patents. I might add that I don't think patents are bad in general, but that in a world where every online transaction is governed by pieces of code like this, potentially every business that runs online runs into patent problems. That is obviously unacceptable, because it hinders the deployment of a better marketplace.

It is time to rethink how patents apply to software and how they can both help creating incentives for creating new technology and protecting investments. Creating publically available peer review mechanisms for patent applications would be one optimization to the existing system, another could be to allow patent holders to legally specify their intent on how they want to enforce their patent or not. Many companies apply for patents in order to create a legal portfolio that will only be used in case of legal disputes with other plaintiffs. Also, it might be wise to reconsider the time spawns involved with the validity periods of patents in the software area.

Remarks

The Obama administration has two interesting points in their technology agenda:
Protect American Intellectual Property at Home: Update and reform our copyright and patent systems to promote civic discourse, innovation, and investment while ensuring that intellectual property owners are fairly treated.
Reform the Patent System: Ensure that our patent laws protect legitimate rights while not stifling innovation and collaboration. Give the Patent and Trademark Office (PTO) the resources to improve patent quality and open up the patent process to citizen review to help foster an environment that encourages innovation. Reduce uncertainty and wasteful litigation that is currently a significant drag on innovation.
There are active projects that provide public peer reviewing of patent applications like http://www.peertopatent.org/.
I actively changed the naming in my code to use words like "hash slots" after reading the patent in question to focus on similarity.
The content divulged in this article (unless otherwise noted) is licensed under the Creative Commons Attribution-Noncommercial-Share Alike 3.0 license. This obviously excludes the USPTO patent application filed under the application number 10/444,509 and the publication number 7,412,449.