Throughout the project, meats from different farms are presented to tasting panels. The properties of the meat are also determined in the laboratory. Spoolder assures us that “we carry out a thorough chemical and physical analysis of the meat. We want to establish a link with the origin of the meat. For example, we’re looking for isotopes that show whether a pig ate Spanish grass or Polish grass.”
“In general, many different types of data are collected in different European countries. These include scores from animal welfare questionnaires that researchers conduct on farms, data from taste panels, and data from laboratory studies.” It must also be possible to link all of this data together. : after all, you want to be able to check whether the meat from the German organic pig farm has a different taste and composition than, for example, conventional pork.
And then all data must also comply with the General Data Protection Regulation (GDPR), which means that farmers’ personal data may not be visible. Spoolder: “We try to guarantee as much anonymity as possible. Each farmer receives a number and a country name. But only the investigators of the country in question know which farmer is hiding behind this code.
WUR is the coordinator of this project and is also building the data warehouse where all the data will be collected. How do you handle something like this? That’s what Wouter Hoenderdaal, database developer at Wageningen Food Safety Research, is working on. Hoenderdaal: “The project is still in the start-up phase, but the process before data collection is at least as important. It is essential that everyone measures the same thing and submits this data in the same way. Therefore, we send all researchers a specific format in which they can enter their data”.
Therefore, in the mEATquality project, it is important that the data can be linked to each other. Hoenderdaal: “Part of the animal goes to the laboratory, another part of the same animal goes to the tasting panels. We therefore need to put in place an airtight coding system that allows you to trace where the meat sample comes from: which animal, which farm, which region and which country. Two parts The data warehouse consists of two parts and a sort of portal. The latter is a file system in which the researchers themselves can upload their raw data. They will only receive access rights to their own folder. Hoenderdaal: “All files are also password protected. So user X can only read in his own folder, and then only read his own files there.”
The data warehouse itself consists of a development database and a production database. Hoenderdaal: “We will build and test on the development database, and when we see that everything is correct, all data will be pushed to the production database. Researchers do not have access to the development and production database, but they do have access to the file system. This is to prevent the database from being polluted with unusable data or, worse, from being partially deleted by an inattentive researcher. “We created this database in Postgres, an open source relational database, in which data is stored in a structured way.”
The transfer of data from the file system to the development database is automated. “We write scripts in Python so that researchers’ files automatically end up in the right place in the database. The idea is that scripts can’t prevent an error file from being uploaded to the filesystem, but they can recognize it and prevent it from entering the database. This way we prevent incorrect data from being loaded into the database. We build everything to be infallible; after all, not all researchers are equally tech-savvy.
Researchers need to be able to search the production database so they can compare their own data with others, but they cannot access this database. How do Hoenderdaal and his colleagues solve this? “We hope they mainly want to see standard datasets, where some data is combined. Then we can prepare it for them in a secure folder. If a researcher has a very specific question, we will compile a custom dataset for him.”
What are the dangers of this type of international data sharing? Hoenderdaal: “Language can cause problems. The language of instruction is English, which means that there may be errors in the translation from native language to English. The researchers have now incorporated a control themselves by first translating an English text into German and then back again. If the second English text has the same result as the first, they know it’s okay.”
A second pitfall is system-related: a relational database like Postgres is well suited for storing structured data, but less so for unstructured data like PDFs or text snippets. Hoenderdaal: “You can receive structured data about a certain meat sample, for example from the lab, but maybe also scans. After all, not everything can be captured in structured data. We still need to find something to tie this unstructured data to structured data. So there is a lot to learn for us on this project.
If owned by Hans Spoolder, the mEATquality project will form the basis of a large European database on the origin of meat. Spoolder: “A European database of this type already exists for wine. The Oritain company creates a database on beef and lamb. They are interested in our data on chickens and pigs. Meat traceability is important to prevent meat fraud; think of the horsemeat scandal, but also of the labeling of organic meat, when it comes from intensive farming.
Spoolder: “Determining meat fraud is a secondary step in our project. We don’t have a budget to expand it further, but we could eventually contribute to an international meat database. »