Biozon is a web server at Cornell that “integrates roughly 2 million protein sequences, 42 million DNA or RNA sequences, 32’000 protein structures, 150’000 interactions and more from sources such as GenBank, UniProt, Protein Data Bank (PDB) and BIND.”

The web interface is focussed on searching and displaying different kinds of data. There is a quick search, a type-specific field-based search and a sequence search. The quick search searches all fields and groups results by data type. The field-based search allows you to connect different data types through a multi-step query building process. It’s a bit cumbersome, but works. Note that most fields are not very fine-grained, e.g. “comments” or “features”.

Queries are POSTed and therefore can’t be bookmarked. If you want to save queries you need to sign up for an account. I couldn’t be bothered…

In the result view only the internal identifier and the DE line is shown for each match. There are no options for sorting. They claim to have a PageRank-like ranking system. The ranking procedure seems to be quite slow and is not performed by default. Couldn’t find any description of the ranking algorithm, perhaps there will be something in their upcoming paper. Results are paged, but the paging code seems to be a bit buggy, and you can’t see how many results there are.

The protein view is quite simple. It consists more or less of some names, identifiers, links to sources and related objects, a sequence, and a graphical display of feature for domains. The graphical display shows information only on mouseover, which is a bit annoying.

The authors of the web site have taken care to ensure that all URLs are simple. Unfortunately they are in mixed case, and the web server is case sensitive, which is bound to lead to some problems. It is also interesting to note that they use our mnemonics rather than our accession numbers throughout the application.

Finally, not all data is completely up to date–our data for example was last updated on May 5, 2005.