|Our Herbarium: History Staff Location Research Documentation||Online Specimen Catalog|
This is a description of how we built our virtual herbarium. I’ll try to describe why we did things, and how we came to certain decisions, and then describe the code we wrote that drives the virtual herbarium.
Websites consist of static and dynamic pages. Static pages look the same to every visitor. They are simply HTML-formatted documents visitors download and view in their web browser (the browser handles the download and display). Once the page is created it is static until it is again manually edited.
Dynamic pages are very different. Each dynamic page is generated on the web server when it is requested. This is necessary because dynamic pages allow each user to select different things to view. For example, a virtual herbarium might have a keyword search form, where a visitor can enter, say, the common name of a plant hoping that you have a picture of that plant. Since you can't anticipate what search term the visitor will enter, you can't pre-create that page (even if you wanted to have a static page created for every search term that might be used you'd never have enough time to create them all; if you want to do it that way stop reading now and get to work).
Dynamic pages are created by a server-side script. These are simple programs built into the web page (along with the HTML formatting) that run on the server. They will take input from the visitor (in our example, the keyword search term) and search a database that contains your specimen records. When the proper records are returned the script formats the data in HTML and returns it to the visitor. Writing the script is not easy, as you'll need to know how your database information is organized, how the logic of the searches, and how to return the information to the visitor that both looks good and is functional.
Here we describe the options and choices we made in setting up a database-driven virtual herbarium website in hopes that other herbaria directors can benefit from our experience.
In the planning stage you'll need to consider several things, and hope you get it right, because it's difficult to change partway through the process:
For those of you just starting to examine the possibility of your own virtual herbarium, consider the options:
Let's assume now that you are going to create your own website. Those who will buy a turnkey solution need to speak with the vendors and make comparative lists of pricing, functionality (both in the herbarium and online), ongoing costs, ease of branding and customization, etc., and compare them with the free solutions linked above. Other examples and a more complete list (some no longer in development) are listed here.
You need to know just what you want your visitors to first see and do when they enter your website. Do they see a keyword search box, a list of families or genera, or do they see a complex search form listing all searchable fields and how they are associated with each other? We've seen sites designed all these ways. Some have a few example search links to familiarize visitors with what they can do with searches. Some sites seem designed by database programmers, and require a thorough understanding of how to combine AND and OR searches to get just the data needed. Some sites are designed to provide a very specific fit to the data on hand, others are programmed generically to allow additional information to be added and searched later.
Once the initial search is entered, what do you want your visitor to see? All the information you have on each specimen, or just a summary with links to the details? Do you want them to see a thumbnail of any images you have, and if you have multiple images for a specimen, which one do you want them to see? When visitors move to a details or image page, do you want to have links there to get back to the original search, or should they re-enter the search term to view the list again?
How much of your data do you want to give your visitor? All you have, in the best, research quality form, or do you want your visitors to just see what you have, then contact you for research-quality information?
There is perhaps no better way to get this information than by looking at online herbaria and making notes on what you liked and didn't like. Your web programmer needs these notes to do the job right. If you don't have these notes, and if your programmer isn't a botanist, you'll likely get a website that appeals primarily to web programmers, not your target audience.
The most important decision to make is how to get your sheet label data over to the web server. In our case, we were already using the University of California Davis Herbarium Management code (http://herbarium.ucdavis.edu/database.html), so we stuck with it. This system was adopted by the UVU Herbarium staff some years back, and has all the label data (some 11,000 records) already in the database. It uses a Microsoft Access front-end/backend configuration, with the front-end .mdb file holding the (static) data entry and printing forms, and the backend .mdb files holding all the (dynamic) label and customer data. Having all label data in a single .mdb file suited us as we could transfer the current version of the data to the server when there have been significant changes.
If your data is already in a database, good. That's the first most important step. If your database is ready for the web, then your herbarium staff can continue using it as they are, and you can go directly to getting your website built. But many herbarium databases are not in the way they are set up ready for web use, and must be moved to a database server. Spreadsheets, or herbarium management programs that store data in either a text file or in a proprietary database must be moved to a server or a file-based database like Access or FoxPro.
Entering information into a database is a time consuming, error-prone process. Most herbarium management applications have a data table with label information. As the data is entered, some of the fields will come from look-up tables (family, genera, species, state, county, etc.) which will have the correct spelling and formatting. Other data is unique to the sheet (collector, location, descriptions, etc.) and must be typed in by hand, meaning that some of it will be corrupted on entry. Few herbaria have the budget to afford double-entry and cross-checking of the label data to ensure accuracy. But careful planning of the way data is entered can make the process of catching errors a little easier. For example, we can do a search on county and plot the geographic locations on a map. It's very easy to see accessions that are located outside the county and check those to see what was wrong (Index kentuckiensis can do this also). A very good programmer can use the coordinates of the county lines to do the same check. Alphabetized lists of all unique data in a particular field (for example, the names of collectors), make it easier to spot spelling and alternative name errors.
The real purpose of good planning of the entire process, from database entry to website maintenance, is to make data entry and handling as easy and painless for your staff as possible. Tired people make mistakes, and botanists have a lot more fun working with plants than with a computer. Data entry, sheet scanning, and website maintenance must be made as straightforward, obvious, and responsive to your staff as possible. Otherwise it will be done slowly, and with error. You staff must be involved in the process of planning how all tasks should be done. Don't let your programmer make these decisions. He won't be doing the work, and he probably doesn't know what it is you do, exactly. But make sure your programmer is versatile because maybe you aren't sure either, and he'll occasionally have to throw away a weeks worth of great programming because your staff discovers a better way of doing it, or you changed your mind. And just so you know, it's okay to throw out programming if it doesn't work; you don't publish a first draft, neither do good programmers.
Most virtual herbarium packages have a problem with image presentation. It’s a natural result of not having images in a herbarium, but sheet-mounted dried specimens; herbarium management software just doesn't need images to keep track of things. Clearly the presentation of a sheet on a website requires a high-quality image, preferably one of sufficient resolution that it can be examined as closely as you might do with the real sheet. It can be done in one of two ways:
Method A requires a huge file (a minimum 2 MByte file that consumes 60 MB of computer memory when decompressed). Method B requires some logic in the script to send the proper image to the client, and the fervent hope the client will not want to see detail in any part of the image you haven’t photographed.
Method A is much preferred, but has the significant drawback of sending too big a file to the client, and lacking any client-side image manipulation software common to all clients. One operating system or browser might be able to scroll around a very large image, another might not. And a 2MB image file is just too big for modem users.
The solution to using method A is to send only part of the image to the client at a time. Server-side image serving is a major preoccupation of the GIS industry, where 20 to 200 MB files are common. The de facto standard for these files is JPEG2000. This is an image compression format much like the very common JPEG used in all website images, but with a few enhancements. JPEG2000 has a higher compression ratio than JPG, and it has the ability to decompress only a portion of the image. Unfortunately the open/free software community has not grasped the JPEG2000 format yet, and the only programs for handling them are commercial. MrSID is the most popular of these programs, and it comes in two parts: the server, which sends a requested part of a large image to the client, and the client-side viewer to scroll and zoom the image sent. They work together, so when you scroll the image the newly-revealed part of the image is requested from the server automatically and appears in a few moments. The other JPEG2000 programs work similarly.
We chose not to use this method of image presentation for two reasons: cost, and quality. MrSID charges for every image compressed, and if you compress enough GBytes of files you need to buy more GBytes from the makers. We didn’t look into the licensing of any similar programs. The quality of the delivered images was lower than we wanted for primarily one reason: when a digital image is resized it needs to be sharpened. I’ve learned this from long experience creating landscape images for my photo website. The JPEG2000 programs just don’t do any resizing, presumably because they were developed to deliver large maps and aerial photography for further treatment by the client. We chose to make our own server-side image delivery system. The whole thing is the bit of script you’ve seen above. We have the viewer.asp script calculate the crop window and image size, then deliver that information to the sheet.asp script which loads the large image file into the server memory, crops and resizes the image, then sharpens it for delivery to the client browser. We can adjust the sharpening parameter to suit our taste, and deliver exactly the image we want. And with a reasonably fast server (the standard is a dual processor XEON server running at around 3 GHz) a 35MByte image can be delivered in 4 seconds. Not an unreasonable time, and about double the delivery times for a JPEG2000 image.
Image sharpening is a tricky thing. You must never sharpen an image twice. Sharpening an image means to enhance the contrast around edges by lightening the light side and darkening the dark side. When over done, or done twice in a row, sharpening will leave a bright halo around every dark object. The effect is not visible in low-contrast images (such as one sees on most herbarium sheets), but very visible when text is seen. You can see this in all our sheet images when you zoom in on the label. We’ve chosen to sharpen the specimen to what we feel is an optimal level, a level that routinely over sharpens the text.
Our method of image processing does have one major drawback: it lacks scalability. If only one client is visiting the site at a time there is no problem; that one client has all the server hardware resources available to process the image. But what happens if 5 clients are visiting and they all request image pages at the same time? Some will have to wait until the CPU if free. We programmed our image code in a way what will not scale beyond the two CPU's available in the server. If we need to host more traffic we'll need more CPU's. Most of the CPU time is spent decompressing the entire image into memory. This is a poor situation to be in. The JPEG2000 application scale better. Decompressing only the needed portion of the image into memory requires fewer CPU and memory resources, so multiple clients should be able to click simultaneously without degrading the server response time. We hope that the delay isn't too noticeable to our visitors, and when it is we hope they'll find the better image quality is worth the wait.
We chose to do large, 42-48 MPixel scans, which result in 0.5 to 3 MB .jpg files, and process the images into presentable magnifications and dimensions on the server.
Making a 44 Mpixel image is no easy trick. There are three methods available:
Large megapixel SLR camera or medium-format digital backs
Large-format scanning back
Flatbed scanners, in our tests, produce very good-looking images. Most scanners have optical systems that compensate for about 0.5 inches of subject depth, have very flat and white illuminating lamps, and can easily produce 280 Mpixel images. But they do have the significant drawback of specimen handling damage. The sheet holding the specimen must be placed against the flatbed scanning surface. Damage is easily confirmed by measuring the amount to specimen detritus left behind on the scan bed. Some attempts have been made to invert the scanner over the sheet, but have, in our reading of anecdotal reports, been largely unsuccessful in eliminating specimen damage when the scanner is lowered over the sheet. One interesting solution may be in the offing with thin, see-through scanners that are meant to be set atop the sample, However, current scanners (at 8.5 x 11 inches) are too small for full-sheet scans.
There are no SLR cameras (up to 1.5 inch sensor) or medium format backs (up to 2.5 inch sensor) capable of the resolution we wanted. The best we could find were several 22 Mpixel cameras, but even those remarkable cameras introduced aliasing problems we didn't care to solve. Aliasing is the appearance of artifacts when the red, green and blue sensors for a particular pixel are located in different positions on the sensor. When a contrasting line crosses the pixel area, it may be that the blue sensor is on the white side of the line while red and green are on the black side, so the pixel will appear blue instead of grey. Camera manufacturers solve this difficulty by adding an anti-aliasing filter to the surface of the sensor, which has the effect of blurring the image slightly. This blurring is then corrected in-camera with a sharpening step. To get a 44 Mpixel image we'd need to upsize the image, which would result in the sharpening artifacts becoming more visible. These cameras do have the advantage of speed. Since the image is recorded at once, instead of being scanned in, they are much faster in use than either a flatbed or focal-plane scanner.
We chose to use a large-format scanning back. This is used in a camera meant to accommodate 4x5 inch film. We purchased a used monorail bellows camera to hold the Better Light Model 6000E-HS 48Mpixel scanning back, and an enlarger lens (we didn't use a photographic lens as we had no need of a shutter, and enlarger lenses, like so called "process lenses", have flat focal planes). This back has a 3.5 inch scanning width that traverses 4 inches of the camera focal plane in about 90 seconds. Since the R, G, and B pixel sensors directly follow each other there are no aliasing problems. The scanning back is controlled by a program on an associated computer, but the files themselves are stored in 16-bit-per-pixel format (and are thus 250 MBytes each) on the interface/hard disk that controls the back itself.
The camera is mounted on an enlarger stand above the top opening of a Coloreal eBox light box. This is a metal box with 5000 K full-spectrum fluorescent lights standing vertically in each corner behind diffusers to provide very flat and even lighting on the sheet lying on the bottom of the box. We had to enlarge the top opening a bit to accommodate the 135 mm lens angle of view. At f/16 our calculations show a depth-of-filed of about one inch, enough to accommodate most mounted specimens. Very rarely do we need to bring the focal plane up to keep thicker samples in focus.
The scanner is calibrated in a rudimentary fashion by keeping the RBG values of the B, M, and A zones of a Kodak Q13 grey scale within 2 units of each other. We use the default color profile for our scanner and adjust the overall R, G, and B channels to match grey only. By our eyes the colors of the reference sheet match the colors displayed on a Pantone Spyder-calibrated monitor. The reference sheet we use has the Kodak Q13 targets and many focus targets. Before a series of scans the reference sheet is scanned and the color calibration is checked, the corner and central focus is established (checking the four corners focus is also a check that the back is completely parallel to the sheet). The Better Light scanner control program includes handy functions to check focus and color calibration. We have seen the exact color of the fluorescent bulbs (and plastic diffusion covers) change slightly over time, so we recheck color calibration every session.
Sheets are fitted into an "L" frame taped to the bottom of the light box for scanning. Rulers are placed in an empty area, and the scan is made. One thing we do need to watch carefully is that the width of the scan exactly matches the width of the sheet. We use the sheet width and image width to calibrate the displayed rulers in viewer.asp. After a days scanning all the images are processed together. Image processing is as follows:
Scan all images using our accession number as the file name. This is the only data-entry step of the image-acquisition-to-website process.
Move all images from the scanner interface/hard drive to the computer and delete them from the interface.
Rotate all images to the proper orientation and save them as mildly-compressed .jpg files. Note that we do not sharpen or color correct the image during this stage of processing; color correction is done during the scan, and sharpening is done only when the image is being sent to a visitor.
Move all .jpg files to a temporary directory on the web server.
Log into the web server and process the images in admin.asp. This will add the image to the images table (using the filename/accession number to keep images associated with the proper specimen) of the herbarium.asp database and create the thumbnail and reference images in the proper subfolder.
Archive the .jpg files on the scanner computer. We don't archive the original 16-bit-per-channel scans nor the 8-bit .TIFF files generated from them.
A document describing our procedures can be found on the downloads tab.
Most virtual herbaria are attempting to incorporate geographic information into their websites. Geographic information has many forms (streets and rivers, political boundaries, underground utilities locations, etc.), but we are most interested in climate data. This is data collected by the US Government (the National Climate Data Center, a branch of NOAA) and provided as shapefiles for a very low price to the public.
Geospacial Information Systems (GIS) comprise a data structure (in the form of a shapefile) and software to manipulate and display the data. Shapefiles were defined by the ESRI company, come in many varieties (point, line, polygon, etc), but climate data use the polygon format almost exclusively. The shapefiles are the form of ranges of temperature, precipitation, dates of first freeze or last thaw, wind, sunlight, etc, organized into a shapes, the perimeters of which are described as entries in the shapefile. Here is an example of the data and maps that are found in a shapefile:
Each colored zone represents the annual average precipitation collected over a period of 30 years by the NCDC. The maps generated from GIS data are typically presented in an interactive window on the website, where users can zoom in and pan around the map. This method of presentation has the immediate advantage of making the site more interesting to the tech-savvy user, but has the drawback of requiring installation of viewing software on the client computer, and presenting maps at a somewhat slow pace. The temperature or precipitation values for a point on the map can be read by clicking that point on the map after selecting an 'info' button.
We have chosen a very different approach. Since we are working with collections of species, genera and families, to get useful information from the GIS data we must collect and average the GIS values for many locations. Using the typical GIS web display method, a user would take a great deal of time getting every value off the map. The reason this is so is the unfortunate fact that shapefiles are built for one-way data usage. Shapefiles are very fast when used to draw a map from many polygons. But they are notoriously slow at finding the value of a polygon from a single location. The only way to use the file is to go through every polygon and test to see if the point is in that particular polygon. On a fast computer this typically takes 3 to 9 seconds. Multiply that by the number of specimen in a search and you could be waiting a minute or so for the average and range.
So we looked for a rapid method of returning this information. We found that using the maps themselves is the fastest way of determining the values from a dataset. The method is simple: make a map from the GIS data of sufficient resolution that not too many errors are generated when detailed shapes are presented in a somewhat compressed map. We don't think the errors are any worse than the errors introduced by NCDC in interpolating data over areas not directly measured then placing these values into ranges and representing those areas as (relatively) low vertex polygons. To generate our data we simply open the image, then measure the color of the pixel representing the location of the specimen in question. We can examine hundreds of locations is a second, making these maps fast enough for web delivery without undue tax on the server hardware.
We have a second approach to the problem. We are currently extracting value information from the shapefiles into a separate database. We will in the future use this data to generate monthly temperature and precipitation profiles for every specimen for which we have a location.
GIS analysis can be a very informative process. When supplied with data of sufficiently high quality, a good GIS analyst can produce answers to questions like
This isn't a comprehensive list by any means, but wouldn't you like to know the answers to some of these questions?
For GIS analysis like this to be truly useful, we need to create websites that collect data from many sources and "expose" the data in a way that lets even the lay user see what the answer might be. Here is where we think the future might go:
All natural collections, plants, insects, animal, need to get their data into a database, and to georeference (determine the latitude and longitude, for example) of all collected items. This can be a daunting task for collections numbering in the millions. Small collections, like ours, can be leaders in this area, because our problem is so much more tractable. It is possible to have all small collections (25,000 or fewer) completely georeferenced and in a database by 2010. Having the collections imaged is not so important for the sort of analysis described in this section. There is a movement among herbaria managers to have this done by 2020 for even the largest herbarium.
All databased collections must combine the data into a single network. Currently the DIGiR protocol is emerging as the front runner in the US, but other data-sharing protocols exist and are favored on other continents. The DigiR protocol is a method of XML data sharing using definitions common to the sort of data being shared. For herbaria the definition is the DarwinCore v1.5. The DIGiR protocol seems to be most actively developed currently by the GBIF organization. In our estimation, the closest thing we now have as a data-sharing standard for herbaria is the free GBIF package, which can 'put online' a database in an almost turnkey installation. But other protocols may prove to be more efficient in the future, especially for GIS analysis.
There has been almost no development of GIS analysis of population data. The UVU Herbarium has more online GIS analysis than any other we've found, and even our work is rudimentary. The forefront of GIS analysis for natural collections is found in the DIVA-GIS application, but it has no on-line component. DIVA-GIS allows the user to easily make biodiversity maps from any amount of data, to make climate niche predictions (we use the DIVA-GIS program to make our online predictions), to find climate outliers in a population as a means of identifying errors in the data. As yet, however, there are no applications developed for the natural scientists for doing GIS analysis on multiple species. The current method is to use a program like ArcGIS to formulate and program the analysis yourself. Unfortunately, the learning curve to do this is quite steep, and the time required makes even one analysis a difficult project, and impossible to do online. During the last years of our web development funding (2005-2007), our web programmer, Dr. Bruce Wilson, spent most of his effort developing tools for online GIS analysis.
While a small herbarium can share its data, and even provide an online herbarium site without too much difficulty, until there is an easy-to-install GIS analysis web package it is rather difficult to provide online GIS analysis. The development of national and regional websites will bring resources enough to make GIS analysis possible. Currently there are 16 regions in the US (southeast, southwest, long-term ecological, northwest, Texas & Oklahoma, California, great plains, north central, rockies, and the intermountain west) who either have put up regional websites, or who plan to. We think a likely outcome for these sites is to, with the help of GBIF, use data shared over DIGiR to do advanced GIS analysis to expose the data to the sort of questions mentioned above, especially those questions involving multiple species. For example, a website user might be able to 'save' search results on the web server, then compare the results of two different searches in several ways (displaying coincidence maps, calculating overlap figures, perhaps even looking at distributions over time, and determining how much bias exists in the data for the calculation of uncertainties). The availability of as much data as exists is necessary to make the analysis meaningful, and data sharing is as essential to this project as the GIS tools used for the analysis.
This may not be the actual future, but it's probably ours.
Here is a map of how visitors move around the site, starting with the Online Specimen Catalog:
Let's run through the pages we need to run the virtual
herbarium, and what functionality each page must include:
We have not examined the cumulative
error on the statistical distributions, but can argue that the cumulative
errors of "binned" attributes (for example, precipitation given in some rather
broad range categories rather than measured or estimated values, or the method
used to determine predominant plant communities) and errors in the estimation
of layer attributes are probably larger.
Auxiliary pages serve image content. Note that while these are .asp pages, which typically return HTML-formatted text (using the txt/html header), they can also return any other content needed. Here is a list of the auxiliary pages and what they return:
Download our .ASP code here (73 KB)
Our scanning and processing procedures document, v1.2, usernames and passwords removed (80 KB .zip file).
These files are released under the GNU public license. They may be used without royalty or remuneration. If any part of these files, including modifications of the code that you make, are included in any other distributed form, that form must be distributed under this same license and the source code must be made freely available. We are not responsible in any way for the manner in which you use the files or descriptions we gave you, nor for the damage you manage to do with them.
This website was developed by the Utah Valley University Herbarium
under the direction of Dr. Jim Harris
Dr. Bruce Wilson (UVU Department of
Chemistry) - database programming & server-side scripting
Vadra Rowley (former College of Science and Health web programmer) - HTML formatting & look and feel
with much useful feedback from
Donna Barnes (former Herbarium Curator) & Dr. Renee Van Buren
The development of the UVU Virtual Herbarium
is supported by a grant (DBI-0447301) from the National Science Foundation.
We thank the following for contributions of code to our project:
Mike Shaffer - lat/long to distance/bearing
Jim Deutch of Syracuse, New York - arctan code
Carlos Baptiste and others - XML-generation code
4umi.com - tabbed-display code
Google for their free and easy-to-use Google Earth viewer
Marty Wefald for his township, range, section to lat/long conversion program
Ross Pickard for the ArcViewShapeFileDLL
Pierre E. Gougelet - for the utterly handy GFLAx image component
and DIVA-GIS.org for their avid.exe Ecological Niche Predction software and especially
The UC Davis Herbarium - for making its Management Software freely available.
Last edited 20 May 2006.
Let’s talk about the database server options. I’ve mentioned one, Access .mdb file-based data. Let’s complete the list, from most rudimentary to the most sophisticated:
If you have a database, you can continue using it as is and adapt your virtual herbarium to use it, or you might consider upgrading your current database to a server-based system. Access can handle upgrading (“upsizing” as Microsoft calls it) very nicely, especially to MSDE, MS SQL Express, or MS SQL.
Make sure the database you have chosen is compatible with the scripting language you want to use. Most scripting languages or database servers will come with “database connectors”, small programs that translate commands from the scripting language code to the binary format used by the database. Check that you don’t get stuck with management software, scripting languages, and database systems that won’t talk to each other.
By way of explanation, we’re using MS Access frontend/backend database structure. We need to store some information that is used by the server only (image data, GIS information, and accession comments) that must be kept on the server and not overwritten when the labelsdata.mdb table is uploaded. So we have several database files. Herbarium.mdb is our main database. It holds the images and comments tables. It also links to the lablesdata.mdb and GNIS.mdb tables. We had to do front-end/backend rather than separate database files so we could combine the results from tables of data in each file at the same time. Readers with MS Access can download the database tables for closer examination here.
Our database is organized as follows:
Queries (stored procedures):
Tally (and many associated t* queries for generating counts)
The herbarium.mdb file resides permanently on the server. It has the images and comments tables, both of which change on a pretty regular basis. The images table contains one record for every image added to the database. The comments table contains one record for each comment made. Herbarium.mdb is linked to tables in labelsdata.mdb, the primary data file in the UC Davis Management System. the table Herbarium Labels has one record for each accession. Family and Genera are tables of al families and genera represented by the collection (plus some not represented). The tblUTM table hold UTM coordinates of some accessions. The labelsdata.mdb file is uploaded to the server from the primary herbarium record-keeping computer whenever there are significant additions or changes.
The GNIS.mdb table is a version of the GNIS database downloadable from the USGS for the United States and territories.
Database systems use Structures Query Language (SQL, sometimes pronounced "sequel") to communicate with the database. All database system (except flat-file DB’s) use it, and except for some small differences in formatting, use identical commands. SQL statements in MS Access look something like these:
WHERE accession = '00284';
In this command we are asking the database for all fields form the table (or, in this case, stored procedure already in the database) with the accession field matching the text value “0284”. Or you can get SQL commands that have complicated search terms:
WHERE (((((Elevation) BETWEEN 5000 AND 6000) AND ((ElevationUnits)='ft.')) OR (((Elevation) BETWEEN 1524 AND 1829) AND ((ElevationUnits)='m.'))) AND ((GeoTertiaryDivision) LIKE '%utah%') AND ((Collector) LIKE '%wilson%'));
Note that in Access all text is entered between single quotes, and percent signs are used as text wildcards. Other DB systems will have slightly different formats. And finally, very complex queries can be constructed that combine the data in several tables is very specific ways. This is the SQL query stored in our Access database as “AllSearchP” (AllSearch is the same except we add the WHERE Private=FALSE condition):
SELECT L.*, G.Genus, F.Family, C.Comments, I.Images, U.Datum, U.UTMZone, U.easting, U.northing
FROM tblUTM AS U
RIGHT JOIN (ImagesCount AS I
RIGHT JOIN (CommentsCount AS C
RIGHT JOIN (Genera AS G
RIGHT JOIN (Family AS F
RIGHT JOIN [Herbarium Labels] AS L
ON F.ID = L.FamilySource)
ON G.GenusNumber = L.GeneraSource)
ON C.ID = L.ID)
ON I.ID = L.ID)
ON U.Label_ID = L.ID;
Here the ImagesCount and CommentsCount are also stored queries, and the joins are how we relate tables to each other. Note that these queries can be written in code instead of having them stored in the database. Here are the two stored queries mentioned above:
SELECT [Herbarium Labels].ID, Count(comments.ID) AS Comments
INNER JOIN [Herbarium Labels] ON comments.accession = [Herbarium Labels].Accession
GROUP BY [Herbarium Labels].ID;
SELECT [Herbarium Labels].ID, Count(image.ID) AS Images
FROM [Herbarium Labels]
INNER JOIN [image] ON [Herbarium Labels].Accession = image.accession
GROUP BY [Herbarium Labels].ID;
The SQL terms RIGHT JOIN, LEFT JOIN, INNER JOIN, GROUP BY, COUNT(), etc, all have very specific meanings, meanings that are difficult to learn and nearly impossible to master. Fortunately, if you have your data in Access (or some other query building program), you can define the relationships between the data tables, build the query using a (fairly-easy-to-understand) graphical method, and test the query; then Access will create the actual SQL statement. Remember what I said about Access having advantages?
The reader should consult the herbarium.mdb file and the search.asp & details.asp files for more examples of the SQL queries we use.
A server-side script is a web page, written in a combination of HTML (which is a formatting language) and a scripting language (which runs like a program every time a page is requested).
Here is a small example of a server-side script written in VBScript, Microsoft’s scripting version of the Visual BASIC computer language:
<%@ EnableSessionState = FALSE %> <!--#include file="config.asp"--> <% set GflAx = Server.CreateObject("GflAx.GflAx") GflAx.LoadBitmap request.querystring("f") x = request.querystring("x") 'OriginX y = request.querystring("y") w = clng(request.querystring("w")) 'WindowWidth h = clng(Request.querystring("h")) dh = request.querystring("dh") 'DisplayHeight dw = request.querystring("dw") if (w<GflAx.width) OR (h<GflAx.height) then GflAx.Crop x, y, w, h GflAx.Resize dw, dh GflAx.Sharpen 40 GflAx.SaveFormat = 1 'JPEG GflAx.SaveJPEGProgressive = TRUE GflAx.SaveJPEGQuality= 75 Response.ContentType = "image/jpeg" Response.BinaryWrite GflAx.SendBinary set GflAx = nothing %>
Note: the highlight colors of the scripts are the default .ASP colors in SourceEdit, http://www.brixoft.net/default.asp, the best script editor available, and it too, is free. This script uses the GFLAx image component.
Several things to point out in this script: everything that appears between the script delimiters, “<%” and “%>”, is not returned directly to the client; instead the commands are “run” on the server, and change the information returned to the client. This is the code to modify the image of the sheet and return it as an image. It is not sending any text to the client. So instead of the browser asking for an image called “sheet.jpg” it asks for
This request includes a “query string”, variables that are passed to the script which define the crop parameters and resize dimensions of the image file called “sheet234.jpg”. The script executes on the server to crop and resize the image, then sharpens the image before returning it to the client to look exactly like it were an image file. The client doesn’t know it’s a modified file being returned. Probably doesn’t care either. But the client sees nicely customized images. This script is used to display the sheet image at various magnifications (f is the full image name, x & y tell where to start the crop box, w & h tell the dimensions of the box, and dw & dh tell the final size of the image).
There are several predominant scripting languages. You’ll need to choose one and stick with it, as translating one language to another is rather difficult.
We chose Active Server Pages as our scripting language, mostly because that's the one I knew best. Please download and examine our annotated code for a better idea of just how we do our server-side .ASP scripting.
The Do It Yourself option brings with it many decisions that must be made before you start writing code. The primary decision is how to keep track of where the client is in the website. This is called "state". State refers to what the server knows about particular clients. For example, you visit a well-designed site like Amazon.com and if you have ever logged in in the past, you will be greeted by a page that says "Welcome Joe. Here are some book recommendations for you." How did the server know it was you visiting and not someone else? How does the website know what books you've looked at?
State, the keeping track of what a particular visitor has done during a visit, means that the script is aware of the search terms used, what settings the visitor has made, and helps in building navigation links. This is done by storing information as variables in some place that persists between clicks. It can be done on of four ways:
A. In the early days of the Internet state did not exist. All pages were static files, and all visitors saw exactly the same information. When web designers wanted to customize a page, they needed a method of identifying which client was which. At first they used the client IP address (the number that identifies your particular computer to the internet, as in 128.025.186.205) to keep track, but this method was confounded by corporate firewalls and proxies, which make all clients in the corporation appear to have the same IP address. So they invented "cookies". Cookies are small text files placed on the client hard drive which contain information generated by the web server. For example, the Amazon.com cookie looks like this:
The Amazon.com website generates these keys and values and stores them in their database. When you visit Amazon.com again, your web browser automatically sends the Amazon.com cookie back to the website with every click. The script on the website reads the cookie information, looks up the particulars in the database, and instantly knows who you are, what you like, what books you've looked at, etc. This method could be used to store variables needed by the server to keep track of which pages have been seen, or what the last search terms were, and it could do so without the visitor knowing that it was being done, were it not for two things: some visitors will have cookies turned off, and the back button messes everything up.
B. Scripting engines (VBScript, PHP, etc.) automatically keep track of state. Most server-side-scripted sites hosted on a Windows machine will generate a cookie with data looks something like:
It's a random string of characters that identifies you uniquely. Every time you click a link on that site, that information is sent to the server. The server can then (doesn't have to, depending on what the programmer wanted) use state to keep track of a lot of internal variables. For example, when the advanced search is used there might be a lot of very specific search information entered (name, altitude, lat/long, collector, etc.) to generate the search results. How does the server know what the search term were when you click the link to go to page 2? The programmer can create a bunch of server variables, variables that exist only in the server memory, and associated only with the client that returns the matching session string (above), to hold all that information, so when you click the link to "page 2" the script can find all the search information you entered.
Server-maintained state has one big disadvantage: timeout. Since it is not reasonable to keep track of every visitor's stated variables over the course of weeks or months, state will timeout on a server and delete from memory all the variables it had stored. Most servers use a 20 minute default timeout. If you wait more than 20 minutes between clicks, the server won't remember where you've been or what you've entered unless it has saved your data in a file or database, and no one does that except Amazon.com and a few site like it. Wait 20 minutes, click on a link, and you'll either get an error page or be sent back to the home page to start over. This is a major disadvantage to a virtual herbarium where clients can reasonably be expected to spend quite some time studying a particular image or retrieving a specimen from the lab for comparison. It also means that URLs cannot be saved that will retrieve the same information from the database.
C. Another method is to pass all the variables you need in the query string of each URL (web link). Here is an example URL from a search using the combination-of-terms form:
o=Family,G.Genus,SpecificEpithet&l=(((((Elevation) BETWEEN 6000 AND 10000) AND ((ElevationUnits)='ft.')) OR (((Elevation) BETWEEN 1829 AND 3048) AND ((ElevationUnits)='m.'))) AND ((GeoTertiaryDivision) LIKE '~utah~') AND ((MoreCollectors) LIKE '~harris~'))&t=&cc=y&ce=&cf=y&cg=y&cl=&m=utahsmall
Here we are sending quite a bit of information: p=page number, s=type of search made, o=the ORDER BY clause used in the SQL query, l=the WHERE clause of the SQL query, all the cx variables indicate which columns of data to display on the search results table, m=which map to display. There are also variables for the page title (t) , shortcuts to particular information in the database (n) that aren't being used. The advantage of using this method of passing state: no personalization is available (like client names), so no personal information is exposed in the URL, and a client can click on this any time and get the same information back. The disadvantage is obvious: you can't type that in. You'd need to email the link to your friend or yourself to use it on a different computer. But you can save the URL to a particular search result as a favorite link.
D. The last method of maintaining state: use hidden variables in a form. This requires that all links be part of a web form (not that big a deal), but sacrifices the ability to save the URL or send it to as friend.
In the end we chose to use primarily the query string to
maintain state. We thought the ability to save URL's as favorites, and to come
back at any time and get the same page suited the needs of our clients better
than their need for clean URLs. We do, however, use the form to return advanced
search requests to the server, and use several short-lived server variable (an
array) to keep track of accession positions on the maps.