TY - JOUR
T1 - Data leakage and loss in biodiversity informatics
AU - Townsend Peterson, A.
AU - Asase, Alex
AU - Canhos, Dora Ann Lange
AU - de Souza, Sidnei
AU - Wieczorek, John
N1 - Publisher Copyright:
© Townsend Peterson A et al.
PY - 2018
Y1 - 2018
N2 - The field of biodiversity informatics is in a massive, "grow-out" phase of creating and enabling large-scale biodiversity data resources. Because perhaps 90% of existing biodiversity data nonetheless remains unavailable for science and policy applications, the question arises as to how these existing and available data records can be mobilized most efficiently and effectively. This situation led to our analysis of several large-scale biodiversity datasets regarding birds and plants, detecting information gaps and documenting data "leakage" or attrition, in terms of data on taxon, time, and place, in each data record. We documented significant data leakage in each data dimension in each dataset. That is, significant numbers of data records are lacking crucial information in terms of taxon, time, and/or place; information on place was consistently the least complete, such that geographic referencing presently represents the most significant factor in degradation of usability of information from biodiversity information resources. Although the full process of digital capture, quality control, and enrichment is important to developing a complete digital record of existing biodiversity information, payoffs in terms of immediate data usability will be greatest with attention paid to the georeferencing challenge.
AB - The field of biodiversity informatics is in a massive, "grow-out" phase of creating and enabling large-scale biodiversity data resources. Because perhaps 90% of existing biodiversity data nonetheless remains unavailable for science and policy applications, the question arises as to how these existing and available data records can be mobilized most efficiently and effectively. This situation led to our analysis of several large-scale biodiversity datasets regarding birds and plants, detecting information gaps and documenting data "leakage" or attrition, in terms of data on taxon, time, and place, in each data record. We documented significant data leakage in each data dimension in each dataset. That is, significant numbers of data records are lacking crucial information in terms of taxon, time, and/or place; information on place was consistently the least complete, such that geographic referencing presently represents the most significant factor in degradation of usability of information from biodiversity information resources. Although the full process of digital capture, quality control, and enrichment is important to developing a complete digital record of existing biodiversity information, payoffs in terms of immediate data usability will be greatest with attention paid to the georeferencing challenge.
KW - Biodiversity data
KW - Digitization
KW - Fitness for use
KW - Geographic referencing
KW - Georeferencing
KW - Informatics
KW - Place
KW - Taxon
KW - Time
KW - Usability
UR - http://www.scopus.com/inward/record.url?scp=85057489577&partnerID=8YFLogxK
U2 - 10.3897/BDJ.6.e26826
DO - 10.3897/BDJ.6.e26826
M3 - Article
AN - SCOPUS:85057489577
SN - 1314-2828
VL - 6
JO - Biodiversity Data Journal
JF - Biodiversity Data Journal
M1 - e26826
ER -