Clicking the Query (csv) button on the web interface
with the example query in the figure on that page is equivalent in action to the following URL:
http://galaxy-catalogue.dur.ac.uk:8080/Millennium?action=doQuery&SQL=select top 10 * from millimil..mmhalo.
This example returns the following result:
#SQL= select top 10 haloid,snapnum, x,y,z,np from millimil..mmhalo
#QUERYTIMEOUT 420 sec
#COLUMN 1 name=haloid JDBC_TYPE=-5 JDBC_TYPENAME=bigint
#COLUMN 2 name=snapnum JDBC_TYPE=4 JDBC_TYPENAME=int
#COLUMN 3 name=x JDBC_TYPE=7 JDBC_TYPENAME=real
#COLUMN 4 name=y JDBC_TYPE=7 JDBC_TYPENAME=real
#COLUMN 5 name=z JDBC_TYPE=7 JDBC_TYPENAME=real
#COLUMN 6 name=np JDBC_TYPE=4 JDBC_TYPENAME=int
What this implies is that users could conceivably avoid using the web interface and instead construct the URL
and use their favorite way of accessing the web passing in the URL. One standard way is via the command line function wget
(e.g GNU wget.
The following command will return the above result to standard out:
wget "http://galaxy-catalogue.dur.ac.uk:8080/Millennium?action=doQuery&SQL=select top 10 haloid,snapnum, x,y,z,np from millimil..mmhalo"
To store the results in a file named "result.csv" use:
wget -O result.csv "http://galaxy-catalogue.dur.ac.uk:8080/Millennium?action=doQuery&SQL=select top 10 haloid,snapnum, x,y,z,np from millimil..mmhalo"
When a log-in is required, like for the protected site
one needs to change the command to something like:
wget --http-user=**** --http-passwd=**** "http://galaxy-catalogue.dur.ac.uk:8080/Millennium?action=doQuery&SQL=select top 10 haloid,snapnum, x,y,z,np from millimil..mmhalo"
The nice feature of wget, that it can be used from within scripts, now and then causes problems as well.
Now and then the load on our web server has been very large, which could be tracked down to users sending
large numbers of relatively small queries using wget.
The problem is that the web server creates a session for every request that is posed to it, unless it
is recognised as belonging to a previous session. We use the standard way of tracking sessions using
cookies and web browsers will handle this properly.
Without extra work, wget, used as documented above does not enable this session tracking.
As sessions have a finite lifetime, sending large numbers of queries can now clog up the web server.
As of version 1.10 of wget it is possible to enforce session tracking.
The following example command shows how:
wget --http-user=**** --http-passwd=***
--cookies=on --keep-session-cookies --save-cookies=cookie.txt --load-cookies=cookie.txt -O out.csv
"http://galaxy-catalogue.dur.ac.uk:8080/MyMillennium?action=doQuery&SQL=select top 10 * from mpagalaxies..delucia2006a"
The new features are the various cookie parameters.
The file name after --save-cookies and --load-cookies should be the same and may have to be fully
qualified if you run this from within an environment such as IDL. What this does is that it writes
the cookie info that comes back from the server, and reads it again upon next execution. This allows
the server to work within a single session. Please also consult your local manual on wget
(if you can find it) for more information.
Please use this pattern when querying using wget.
For other options, see the command line options of wget.
The fact that we are creating a URL implies we must beware for some URL encoding issues.
In particular we must beware for using characters in the SQL that are interpreted as special
characters in a URL. An example of this is the + (plus) sign,
which is interpreted as a space. So a query like
select x+y from millimil..mpahalo
will cause an error, unless encoded as
select x%2By from millimil..mpahalo
The next few pages show ways using the wget command form
within environments such as R and