The Physics of Bits

I have been both reading and writing about computers for a long time. One thing that I’ve found lacking in a basic explanation of key “obvious” concepts which are vital to understanding many legal and moral problems in current society and politics. These are not intended to be political statements. They are neither Republican nor Democratic – Neither libertarian or fascist. These statements are, as clearly as I am able to make them, fundamental properties of the digital world.

The Physics of Bits

  1. Copying bits is easy, cheap, and fast.
    1. Storage is cheap and getting cheaper.
    2. We have not even close to the physical limits of information density
    3. Storage is effectively free for some types of information. I expect that more types of information will fit into this category as time goes on.
  2. Moving bits is easy, cheap and fast
    1. The cost of moving bits at a given rate is affected by three factors
      1. Distance: The distance that the bits must be moved
      2. Technology: The available means of moving bits over that distance
      3. Politics: The factors in society that promote or discourage adoption of newer/faster technologies
    2. The cost of moving any normal dataset (<4.5 billion bytes) within a building has reached the point of being insignificant.
    3. The cost of distributing a normal dataset to an effectively worldwide audience has dropped a factor of 10 over the past five years
    4. The dominant factor in current cost calculations is politics. The technological obstacles to greatly increasing data transfer rates over continental (2000-5000 mile) distances have been adequately dealt with.
  3. Any set of data which can be reduced to a series of numbers can be stored, manipulated, copied and distributed digitally.
    1. Any practical analog format can be converted to digital using well understood techniques.
    2. Once converted, the resulting data is effectively indistinguishable from data which was originally created using digital techniques.
  4. The ease of copying and distributing information digitally is the primary value of digital information as opposed to other systems of information storage.
    1. The low relative cost of making perfect digital copies is their key advantage over all other distribution methods
    2. The ability to move information quickly and reliably over computer networks reduces the cost of distribution and allows information to be delivered to a worldwide audience very quickly at low relative cost
    3. Any measure which is aimed at preventing a given piece of digital information from being copied will invariably increase the cost of copying and distribution to the precise extent that the measure is successful.
    4. There is no practical difference, other than speed of copying, between making a local copy and transmitting information over a network
  5. The fundamental tools for copying, distributing and manipulating digital information are content neutral.
    1. Bits do not care whether they represent “The Story of O” or “The Collected Writings of St. Thomas Aquinas”.
    2. Computer networks do not care what information they carry.
    3. The knowledge encapsulated in digital information is not only in the bits that are distributed. It comes about as the end result of those bits being manipulated by a series of computer programs and then presented to the user.
    4. There are well understood techniques in computer science which allow any type of data which can be represented digitally to be restructured in a way which makes it appear to be any other type of data.
      1. This process can be made completely reversible
      2. This process can be made sufficiently robust as to defeat any particular automated process for determining whether the information being transmitted is actually what it claims to be.
      3. Changing the detection process will, at most, require a minor change to the program used to reformat the information
      4. The history of computer science is filled with attempts to break this property of digital information
      5. None of these attempts have been successful to date.
  6. The first consumer of digital information is computers – The second consumer is people.
    1. In order for a person to work with information, the computer must understand how that data is structured.
    2. Only after the computer has interpreted the original data can a human being work with that data – and only to the extent that the computer has been programmed to understand and allow for that type of manipulation.
  7. The value in information (defined sets of bits) is related to three major factors
    1. The content of the information: Basically, what the creator of that data was trying to say, and how they were saying it.
    2. The structure of the data: How the information is presented to the computer
      1. The more widely understood the structure of the data is, the more tools will exist for viewing, manipulating, and creating data in that format, and the more flexible and capable those tools will be.
      2. The less widely understood the structure of the data is, the fewer tools will exist for viewing, manipulating and creating data in that format, and the less flexible and capable those tools will be.
    3. Metadata: What we know about the content of the data
      1. All information collections have metadata – things like the filename, the time of creation, the person or group that created that data, the location that the data can be retrieved from, the format of the data and other obvious pieces of information.
      2. When the metadata is available in a way that computers can be taught to recognize easily and automatically, it increases the usability of the data it is related to. This happens for the same reason a card catalog increases the value of a library – by helping you find what you are looking for.
      3. Lack of metadata can sometimes make information useless – just a a library with books in random places on the shelves and no card catalog would be useless. It’s not that the information you are looking for isn’t there, it’s that the information cannot reasonably be found.
  8. Information is not knowledge. Knowledge is not understanding. Understanding is not wisdom.
    1. Information has potential knowledge in the same way that a weight held above the floor has potential energy.
      1. Information can become knowledge if it is studied by a human being
      2. Such study is often the primary value of that information
      3. Such study does not destroy the information studied
    2. Knowledge is the ability to recall facts (information). It can be, and often is, important in itself. It is not the same as understanding.
      1. Knowledge is about being able to recall facts. Understanding is about the ability to use those facts.
      2. Knowing, in theory, how an internal combustion engine works does not imply that someone understands those facts well enough to rebuild a car’s engine. Similar examples exist in every field of human endeavor
      3. Consequently, the growth in the availability of knowledge of a topic does not imply an immediate growth in the availability of understanding of that topic.
    3. Understanding is the ability to use facts. Wisdom is the ability to interpret those facts in a greater context than the immediate task at hand.
      1. Understanding is about using facts within their context. It is about, for example, using facts about the theory of internal combustion engines to fix cars. Wisdom is (in part) about how internal combustion engines, cars, global warming, automobile accidents and city planning are related.
      2. Understanding is difficult to achieve, and slow. Wisdom is even harder and slower to achieve. Growth in understanding will, eventually, result in societal wisdom about a subject – but this is a slow and uncertain process.
    4. Any attempt to distort or hide information about a subject area will result in all later stages of this process coming more slowly, and less certainly.
      1. Suppression or distortion of information makes understanding of an area difficult.
      2. Lack of understanding of an area makes wisdom about that area effectively impossible to achieve.
      3. These trends tend to be followed regardless of the subject in question
      4. This implies that the suppression of information will invariably have effects far outside the area of interest.
      5. This also implies that this will have significant social costs which cannot be determined in advance.

Notes

In order to keep this post from resembling “War and Peace”, some on the phrases I use have very specific meanings. When I use the phrase “have been dealt with”, I mean that there are currently multiple vendors selling the required technology on the open market. When I use the phrase “well understood techniques” I am simply stating that the process for doing so has been extensively documented in the engineering literature. Normally, both of these phrases also imply that a reasonably diligent web search using Google, Yahoo or another similar search engine will find the information in, at most, a few hours. In addition, when I write “no practical difference” what I am really saying is that any real difference is so effectively papered over by the availability of operating system functions and libraries that any remaining difference might as well not exist.

The statements in section 8 (“Information is not knowledge”) are not meant to be anything other than tautologies. This isn’t meant to be a stand-alone work, but rather a foundation. The idea is to agree on a common vocabulary so that there is a good basis for later discussion.

If you think I made an error, please let me know. I am extremely willing to discuss the possibility that something here might be wrong. If you want me to make a change, however, I strongly suggest that you provide evidence that my statement is in error. This is especially important if you want to dispute the accuracy of statements in sections 4 or 8.

This entry was posted in "IP" law, Society, Work, Writing. Bookmark the permalink.