Tuesday, 2 May 2017

Data verification - how is it done?

A question posed on the NFBR Facebook group today raised an interesting question ' Do BWARS, or any other recording scheme or BRC, publish any guidance on verification for their species group?'

I have never tackled this problem directly but have given some guidance to novices about self-verification after making an identification. Once you have made your diagnosis, it is important to check some basic data:
  • Does the animal conform to the detailed descriptions given in major monographs such as Stubbs & Falk or Van Veen.
  • Does the species occur in your general vicinity? It may not have been previously recorded from your 10km square but if you live in NW Scotland and the determination you have made is of a species that is confined to southern England, you can be pretty sure your ID is either wrong or the record is highly aberrant. Range is important!
  • Does the date recorded coincide with the core phenology range? If not, there must be doubt, although there are exceptions.
  • Is the species common or rare? Bear in mind that rarity does not mean that you won't find such an animal, but the vast majority of records are of common species, and many rare species have peculiar habitat associations. If the species is rare, make extra checks and seek the view of a specialist.
  • Can a firm identification be made using the medium you have chosen? If you are basing ID on photographs, you need to be aware that a lot of hoverflies can only be identified with certainty using microscopic characters that either don't show in photos or are on a part of the animal that cannot be accessed using conventional live-animal techniques (e.g. the male genitalia).
  •  Are there other species with which it might be confused?
  • Does the habitat match the descriptions in the guide book?

So, how do I verify records?

If a record is accompanied by a photograph I can start by saying:
  •  Is the animal sufficiently well depicted to make an ID? We do see a fair selection of photos that are at low resolution and cannot be used to look at key features; many photos are also from angles that don't capture the critical features, and some are obscured by glare and colour casts that give rise to uncertainty.
  •  If it is potentially identifiable, can I conclude the genus? if so:
  • Can I take this animal to the correct couplet in the key? if so:
  • Can I arrive at a firm conclusion either at specific level or for a species pair (e.g. Platycheirus scutatus sl. or s.s.)
  • If the affirmative can be given for the above then I can either verify the diagnosis made by the recorder or make my own diagnosis.
Where records are not accompanied by a photograph (or supported by a specimen) I follow a pretty standard routine:
  • Is the recorder and their abilities known to me? If so, how experienced are they? For experienced recorders that are a known quantity I probably don't need to do much for common and readily identified species. If not, it is helpful to see a full list of their records to get perspective of what they are recording. Each time I see a dataset I build a picture of the competency of the recorder.
  • Do the records coincide with the known distribution and phenology for the relevant species? If on the limits of these ranges then one might want to check further.
  • If a single record then obviously one cannot do much more than say 'is this record potentially believable? In many cases one has to use a significant element of trust. To my mind this is not really verification it is just a broad-scale quality assurance process but in no way says that all records are correct. We all make mistakes, so all datasets are likely to contain glitches. For the most part, a low level of mistakes has little or no impact on the reliability of the data.
  • Are there tricky species? If so, I may go back and ascertain the presence of a specimen. If there is no specimen and the recorder has relatively low levels of experience then the record may not be reliable. An awful lot of verification is based on trust and a knowledge of individual recorders, so that is something that only happens over time.
  • And, in knowing recorders I often ask myself how bold they are in their diagnosis - the more caution I see, the greater assurance I have that records are likely to be reliable if not perfect.

So what about verification of iRecord?

I get frustrated with iRecord and much prefer spreadsheets. My reasons are focussed on data that are not supported by specimens or photographs. Analysing a spreadsheet is a relatively quick process, whereas trying to sort out the odd few records entered intermittently on iRecord involves a lack of context.

When I receive a spreadsheet I quickly scan for certain indicators. For example, if I know that a recorder does not take specimens and their lists contain species that cannot be done from photos or in the field, then the data is suspect. Does the list contain 'Chinnery' mistakes? There are glitches in Chinnery that are giveaways that the data are suspect - I won't give these tricks out because they are so useful to me! Are there records of females that can only be taken to aggregate but are listed as a segregate?

Similarly, do the lists contain habitat indicator species that are clearly out of range. A classic is to see inland dry sites with lists that include species such as Platycheirus immarginatus (it does happen and shows that the data have been created using the pictures in Stubbs & Falk).

There is no guaranteed way of ensuring correct identifications and with lists it is a matter of trust. Having spent a lot of time developing data from photographic photographs, I now have a big dataset that has improved the parameters for species phenology. I also know a lot more about the potential problems that people encounter - the list of mistakes is huge, as my previous post on iRecord data has shown.

And, there is a lesson for us all

On one occasion I accepted a record that was not supported by a photograph but came from the right sort of place and was not terribly difficult to ID, so I accepted it. Shortly afterwards, the originator posted a photo of the animal and extolled the virtues of iRecord as a means of confirming the identity of a specimen. The photo and the determination did not match - this made me look a fool! It just goes to show that unless the verifier checks a piece of chitin on a pin, the record can only be said to be likely to be reliable and not a correct ID!


  1. Not sure I understand why you prefer a spreadsheet which is always without photos to iRecord entries that usually (?) do have photos. Or isn't that what you are saying?

  2. Very useful and informative blog Roger. I'm not trying to pressure you to use iRecord, you are of course entitled to do whatever works best for you, but just in case it helps it is possible to filter records by recorder name within iRecord, and also you can download the hoverfly records into a spreadsheet whenever needed.

  3. This comment has been removed by the author.