Astronomical Catalogs Query Pages

Why?


UCO/Lick acquired a Sybase license over two years ago specifically for scientific applications. The most obvious application of a database server to astronomical research is in the maintenance of astronomical catalogs. Catalogs can be used to select objects of interest for observing programs, or to analyze the existing data. They are a valuable, standard resource for the astronomy community. Many researchers keep private copies of large astronomical catalogs and then write throwaway, single-use applications using awk/grep, FORTRAN, or C to extract data of interest from them. Much disk space and effort are "wasted" on this process, each person re-inventing the wheel each time a catalog query is required.

Maintaining popular catalogs in a central database server, with an easy front-end for general use, would both reduce the duplication of effort and provide faster, more powerful searching algorithms than the average person has time or inclination to write. Moreover, such a system if designed with sufficient generality could be expanded to include as many catalogs as we have disk space to house, i.e. the initial investment in database server and labour would serve to manage an ever-growing body of data.

The first catalogs were copied from the CADC CDROM, and were acquired and installed in the summer of 1994. A set of tools was written at that time to parse FITS header information supplied with CADC catalogs and automatically generate SQL code and documentation from the header records. There were only two GUI interfaces to the sybase server available to us at that time: the 'wisql' tool, which was rich in features but somewhat cumbersome for the casual user, and the X11 'forms' interface written at UCO/Lick for the administrative database system -- basically a GUI data-entry forms package. These interfaces proved too complex, in practice, for general use; while a few grad students learned to use wisql to access the original star catalogs, most faculty and grads did not feel the benefits of the utility were worth the learning curve. Those who saw a demo of the forms interface felt likewise.

With the advent of the UCO/Lick WWW server and success in some minor Sybase/HTML applications, I became convinced that the Web browser (Mosaic, Netscape, etc) was the correct interface for any data library for general use. Most people were comfortable with the Web browsers and did not feel that they had to "learn yet another tool" to get access to information if it could be offered via the Web. I therefore merged the essential SQL-generating logic from the forms application with new HTML-generating code, and came up with a toolset capable of generating HTML forms for query against any Sybase table. No hand-coding is required to generate the user query pages. This makes the offering of a large catalog library via WWW economically feasible, and highly maintainable. Table nomenclature, datatyping, and indexing changes can be automatically reflected with a single regeneration of the query pages.

The query page allows the user to specify field and record selection without knowning any SQL. The results are returned as a preformatted text block in an HTML page, so that the columns of text and numbers are easily imported into other utilities (plot programs or the user's own code). The more advanced user has the option to write his/her own record selection expression, and there is support for simple statistical functions. There is support for other HTML references in the sybase meta-tables which accompany the basic data: images, MPEGS, or any other legitimate href object can be attached to the base table records. I hope that this basic library will expand over time into a powerful data publishing and research tool for use at UCO/Lick.


How?

The catalogs consist of fixed-length records containing fields of known meaning and datatype. This maps directly to the database concept of a "table". The raw (usually ascii card image) data of the catalog are loaded into a Sybase table, which is then properly protected, keyed, indexed, etc.

A table of metadata called the "data dictionary" is maintained in Sybase, which documents for each table the meaning of each field in the table, with clarifying comments. This dictionary information is generated automatically or hand-typed for each catalog.

A utility is then run which derives information from the Sybase server automatically, given the table name, and generates the query page, browser page, and data dictionary page for the table. Cross-links between these pages are included as the text of the pages is generated. The new html pages are placed in a target directory requested by the user of the utility. The user must then hand-edit the master index page for this web, to add the latest catalog.

If the user wants to add more elaborate features such as hrefs in the sybase data, 2 more metatables are used to store (a) information about correct http (URL) paths for embedded hrefs, and (b) if necessary, individual hrefs themselves. These metatables are automatically read at query time and anchors are generated as part of the results page, if anchors are needed.

Two CGI scripts, a selector and a browser script, are installed on the HTTP server host to execute the end-user's queries. These scripts accept data from the query or browser form respectively, and use it to generate (a) a SQL query that returns some set of data from the server, and (b) an HTML page in which the returned data are embedded. This HTML code, written to stdout, is displayed to the user as the result of his/her query.

All the code, except for some FORTRAN data-reformatting code, is Tcl(X) with the sybtcl extension. The Sybase server is a stock Sybase dataserver v4.9. No user-defined datatypes or other advanced features are used. Primary keying is on a generated field (for guaranteed uniqueness), and indexing is on fields which seem like likely query targets. The httpd in use here is the NCSA version for SunOS.

webmaster@ucolick.org
De Clarke
UCO/Lick Observatory
University of California
Santa Cruz, CA 95064
Tel: +1 408 459 2630
Fax: +1 408 454 9863