Tagging BiologyPosted: January 16, 2007
(Disclaimer: I don’t remember if I’ve seen what I’m writing about elsewhere, although there’s a strong chance I have, so I apologize if others have said the same thing before, and I’m failing to acknowledge that.)
A few months back I wrote about Clay Shirky’s article Ontology is Overrated. His main point is that sometimes it’s just not appropriate to try and organize information in a hierarchical category structure (i.e. an ontology).
Perhaps biology is one of those categories. Species classification is one of the most famous and well-known ontologies in existence. We all learn about it in school. At the top you have your Kingdoms, then Phyla, Classes, Orders, Families, Genii, and finally Species.
To the best of my knowledge, biology today has moved away from thinking about this sort of thing, and has focused far more on goings-on at the cellular level. However, organizing and keeping track of the different species, is still critically important, since our understanding of genetics is inextricably tied to study of the end-product (the phenotype – i.e. the animal/plant/whatever).
The problem with this classification of species is everything Shirky mentions in his article: the classification system is artificial, and sometimes species don’t fit nicely into the elaborate tree humanity has constructed. He gives the example of deciding whether “Books” belong under “Art” or “Entertainment” – an artificial question. Books are books – they don’t intrinsically fit under either category. One book may be art, and another may be entertainment, and another may be a bit of both, and yet another may be neither (a textbook, for example). I don’t have any specific biological examples, unfortunately, but it is certainly reasonable to expect that once you get down to the nitty-gritty details, and are classifying based on subtleties in bone structure, you’re going to run into problems of species belonging to more than one (or no) spots in the classification tree.
In my opinion, a tagging approach would be much more effective way to organize the different species. The same properties that are used to determine where a species “fits” in the classification tree would be used as tags. For example, some tags might be “warm-blooded”, “invertebrate”, “eukaryotic”, and “egg-laying”. A scientist analyzing a recently discovered species would simply list all the various attributes she notices and associate them with the species. A species database might then list all the various species with similar tags. Knowledge of those species may be applied to the new ones. This sort of system would highlight, immediately, what species have in common with each other, and what they don’t. The task of figuring out where the new species belongs in some convoluted system doesn’t appear here. The scientist merely documents attributes in a systematic fashion, which is something she’s doing anyway.
Of course, all this is going on anyway – in the scientist’s head. The scientist obviously knows to compare a newly discovered bacterium with other bacteria (as opposed to a reptile). And she also knows to compare the new bacterium with bacteria that share many attributes with the new one, and not to compare it with ones that have less in common. But, under a classification system, the scientist is forced to figure out where in the species tree the new bacterium belongs (which may be a cause for some debate), and compare the new bacterium with its determined “relatives”. Why bother with that first step? Just enter what you know about the bacterium as a list of tags, and the software will spit back matches. The most tedious and pointless step in the process has been removed, and real work can proceed.
You may be concerned that, we are losing something by disposing of the highly familiar species tree. We’re not*. The various relationships between the species will be preserved in the tags (one way to look at the tree now is to say that the more tags two species have in common, the closer together they are on the tree). And that’s really all you need. The tree doesn’t add anything else except constraints and constructs like “relatives”. With today’s technology, the tree is an inferior way of storing information about the species.
Last but not least, there is the issue of tag management. The number of nuances and subtleties that exist as differences between species is staggering. Listing all the differences between a housecat and a tiger is quite a job, and those species are very close, relatively. The complete list of tags in the system might become unwieldy (one downside to tagging is its sensitivity to error – “warm-blooded” and “warm blooded” may be seen as different). So that would be one obstacle. But with “auto-complete” and similar technologies, it shouldn’t be a big one.
* Well, to be fair, terms like “mammals” and “reptiles” come from names of branches of the tree. We can simply replace each definition with a set of tags (i.e. “An animal is a reptile if it has all of the following tags: ‘lays eggs’, ‘cold-blooded’, etc”).
p.s. If there are any biology people reading this, I’m curious if I’m being completely off the mark or blindingly obvious and obtuse.