Saturday 9 December 2017

Should unverified data be placed on the NBN?

In previous posts I have drawn attention to the problem of 'dodgy' records on the NBN. It is an issue that vexes some and is of little concern to others, but I wonder whether it has really been properly thought out.

A recent thread on the NBN Facebook page allowed commentators to observe that there seemed to be greater emphasis on data users than on data providers by the NBN. Whether that is true or not I cannot be sure, but there is an issue if the data providers start to feel that way! I think that part of the problem is that there is a growing dichotomy between the 'professionals' in the biodiversity data industry and the technical specialists, most of whom do so on a voluntary basis.

So, lets start by asking a few questions:

What experience do you need to be a LERC data manager?
Well, I guess the highest priority must be an ability to manage data - i.e. understand the nuts and bolts of RECORDER 6 and be able to drive it to produce the reports that clients need. You probably need to be good with people - to get your local recording community to contribute records so that you have something to sell to commercial clients. By implication, it is clear that you cannot be a technical specialist across all taxa - the Eric Philps of the world are very few and far between, and are probably not ideal database managers!

So, what are you aiming for? As much data as possible - although you cannot say it, the commercial imperative is quantity and not quality. Also, you probably want to concentrate on the data that is most commercially useful - bats, badgers great-crested newts, Schedule 41/42 species, BAP species.

What experience do you need to run a recording scheme?
Probably the most important requirement is an innate interest in the group of organisms that you are proposing to record. If you are taking over an existing scheme then you probably also need to have the confidence of fellow recorders that you know what you are talking about, or are able (and willing) to seek the input of others more skilled than yourself. Data management skills are desirable but not a pre-requisite (although you will become unstuck in due course if you don't gain skills). Like the LERC manager, you also need to be capable of motivating people, but more importantly you must understand how important it is to give structured feedback to your contributors: schemes that act as a black hole rapidly lose impetus (and there are good many such schemes). As a scheme organiser you must also be confident enough to challenge incoming records, no matter who from, so you MUST develop the necessary taxonomic, ecological and biogeographic skills. If you run a national scheme it also helps to have a reasonable knowledge of the landscape and ecology of the whole of the British Isles (but that can come with time).

What are you intending to produce with the data?
Here I think the biggest divergence probably obtains between the LERC and the Recording Scheme.

Unless the LERC holds data and is seen as a suitable source of information it will not get the income to survive - so financial survival is critical. You will not necessarily be judged on data quality - to many clients the important thing is enough information to satisfy the local planners that you have adequately investigated the environmental parameters of your proposal. For the NBN, an ability to list the vast numbers of records at a broad taxonomic scale is probably pretty important to attract ongoing statutory agency funding. In other words, nobody is starting from the question of data quality - it is headlines that grab the politicians and it is the politicians that control the purse-strings. And, as the vast majority of politicians have no scientific knowledge whatsoever and could not interpret a graph if it hit them in the face, big means best!

For the Recording Scheme, the most obvious and stinging criticism is that the dataset contains anomalies. The second most damaging problem is a failure to get data organised and maps and other outputs transmitted to your supporters (in other words they have a different set of supporters). If the supporters lose faith in the scheme then they will not engage and the scheme will decline. If it is vibrant and producing lots of outputs and creating a community around the interest group, more records will be generated and reliable recorders will be recruited. So, a vibrant recording scheme has got to be there to create a community spirit and to provide motivational feedback.

How many LERC and NBN staff run recording schemes or contribute records out of office hours?
I doubt this one can be quantified. I do know that at least some are occasional contributors to schemes. But in a parallel situation, I was always amazed at the low level of interest in biological recording amongst colleagues at Natural England. True, there were plenty of twitchers but real recorders were at a premium. Precious few former colleagues sit amongst the major or even minor contributors to the HRS (except former CSD and EFU staff who make up a significant part of the key major recorders). It was always a disappointment to me to hear that people felt that they did not want to take their job home with them but were happy to sit in meetings talking about what the recording schemes could be tasked with.

And the moral of the story?

Unless we get away from the volume rather than quality argument, there will be increasing question marks over the reliability of NBN data. That in turn means that Recording Schemes will be called upon for ever more data validation - which is already reaching unviable proportions. So, as a starting point perhaps LERCs and the NBN need to be developing lists of species that they can use as reality checks for datasets. If there are obvious dodgy records then the whole dataset should be disallowed until it has been validated - and if there is nobody available to validate it then simply mark the data as unvalidated and don't have them appear on the maps.

Unless some attention is paid to data quality, users of the NBN will start to become wary of its content. If the data are unreliable, then so too are the outputs of the science that is based on those data. In the past, the HRS has flagged dodgy records with LERCs. Some have taken action, but others have ignored our advice - so we find it best not to access those datasets because we have to go through the same process time and again. We no longer bother to provide that feedback except where we know LERC managers will act.




2 comments:

  1. Given us in the birding world at the county level have to have robust processes to validate records, if data is coming in from unverified sources then it will be suspect. These do not only include records of scarce birds requiring a description but also large counts in odd places, and out of place or odd date records

    ReplyDelete
  2. This comment has been removed by a blog administrator.

    ReplyDelete