Authors: A.J. Hobday and K. Hartmann, 2006.
This paper is a nice example of a theoretical model backed up by data. The data in question is actually rather difficult to collect: in an attempt to figure out the temperature preferences of a certain beleaguered species of tuna, scientists tag them with little thermometer-widgets that take and archive ambient temperature profiles. After a year with the fish, the devices detach themselves and float to the surface, where they transmit a summary of their data to certain scientific satellites in polar orbit. This temperature data is combined with satellite estimates of ocean surface temperature, estimates that are in turn plugged into an oceanic model that gives a vertical temperature profile based on the surface estimates. The thrust of this paper is that if you can figure out what temperature the fish like, and if you can figure out where that temperature obtains, you can tell fishermen to avoid that area and thus minimize the odds of infelicitous by-catch.
The paper uses a fair bit of level-headed but elementary statistical hocus-pocus. For example, the oceanographic model gives temperature at each point on a 3-D grid. To figure out whether this temperature at this depth represents an attractive environment for these tuna, the authors look at all the tag data and try to figure out which percentage of observations correspond to tuna hanging out at that depth but in a cooler temperature. (It is not clear to me if the tag data can be sorted by location; I don't think so, but if not, then there is nothing location specific about this statistic: it becomes a function of depth alone. Since the authors claim that the numbers relate to location as well, I may be missing something.) They do this for all the (discretized) depths at a given local, and then assess the overall attraction of the whole water column at that location by taking a weighted sum of these figures, where the weights correspond to percentages of time spent at that depth. (I confess that I don't really understand the data structure, so I'm not sure if this technique makes sense.)
Mathematically, I think their procedure is as follows: the data consist of $K$ sets of histograms. Two of these histograms are average temperature vs. depth and percent time vs. depth; the third is percent time vs. temp. I believe the third histogram is essentially ignored. The data from the first two are ordered pairs of the form $(d_{i},t_{ki})$ and $(d_{i}, p_{ki})$, respectively, where $i=1\cdots D$ indexes the depth and $k=1\cdots K$ indexes the observation. Given a temperature $t$ and a depth $d_i$, the authors calculate a statistic
$s(t,d_i) = \frac{1}{K}\sum_{k=1}^K n(k,t,d_i)$,
where $n(k,t,d_i)$ is equal to 0 if $t_{ki} > t$ and equal to 1 otherwise. Note that calculating this statistic involves looking at the $i$th component of each histogram, and can be done at depth intervals no finer than those reported by the PAT. Now, given a water column, discretized with depths $d_i$ and corresponding temperatures $t_i,$ the authors compute the statistics $s(t_i,d_i)$ and then use these to form
$\hat{s} := \sum_i w_i s(t_i,d_i)$
where the $w_i$ are given by
$w_i := \frac{1}{K}\sum_k p_{ki}$.
Note that $w_i$ represents the average fraction of time spent at depth $d_i$. The statistic $\hat{s}$ is thus supposed to be a measure of how desirable a water column might be: if its value is high, it means that for highly-frequented depths (big $w_I$) the temperature the hot enough that fish tend to seek out cooler spaces (i.e. $s(t_i,d_i)$ is also big.)
The rest of the paper deals with ways to automate the placement of management boundaries. There are statistical tests showing that the automation algorithm does relatively well, though it is not clear to me what "truth" might be.
Questions:
1.) What is the grid point spacing on the oceanographic model?
2.) Do the tags provide any location information?
3.) This "fish predilection" model is a touch ad-hoc (as admitted by the authors.) Is there a way to improve it? Add predation, use location information, etc.?
4.) How stable are the profiles generated by the PAT data?
No comments:
Post a Comment