– or –
Please login or register to participate.
Discussion
.
laurimyllyvirta Feb 17, 2012 12:12 PM
Duplicate entries in the E-PRTR database

I was trying to work with the downloadable E-PRTR database, but ran into a problem where the same facility has been entered with two or more different Facility IDs, causing double counting of some of the emissions. The online search and the EEA report "Revealing the costs of air pollution from industrial facilities in Europe" do not appear to suffer from this problem. Is there a systematic way to clear up the double entries, or is it possible to obtain a list of all the most recent Facility IDs to filter them from the database? I am working with way too many entries for any manual approach to be viable.

Thank you so much for your help,

Lauri Myllyvirta
Replies (5)
EEA Feb 17, 2012 12:46 PM
Hi Lauri.

Indeed, there was an issue with facilityIDs in the E-PRTR database and this has now been corrected. The updated dataset can be obtained from the EEA data service here:
http://www.eea.europa.eu/[…]/617DD46F-1162-40DF-9B15-0FE7AFB9C5F9

Hope this helps!
laurimyllyvirta Feb 17, 2012 05:07 PM
Thank you for the answer. However, the version you linked to is exactly the version I was using, and I found that the problem very much still exists.
EEA Feb 21, 2012 10:56 AM
Could we ask you to please provide an example of the problem? For your information, it is annual reporting and a facility may be in the database up to 5 times (once for each reporting year).
EEA Feb 21, 2012 01:22 PM
Depending on what you are trying to do, it may be helpful to have a look at "European Pollutant Release and Transfer Register (E-PRTR) - Summary tables" (the fourth file from the top on the linked page).

Otherwise an example may help to clarify if the issue is a mistake in the database. Thank you!
laurimyllyvirta Feb 21, 2012 03:01 PM
Thank you so much for pointing me to the summary tables. I did not expect "summary" to entail facility-level data and therefore missed the csv file that had all the data compiled. The problem that I had with the big database seems to have been that the same facility has been entered with different FacilityID and FacilityName in different years, for example the 2007 FacilityID 46638 corresponds FacilityID 74740 in 2008&2009, and I could not figure out a way to link the different reports to specific years. And the FacilityID 67166 in the summary tables also exists in the big database as 124908 and has has FacilityReportID 234623 filed under it; the latter FacilityID is not found in the summary tables. But the summary table "eprtr_v3.3_summary_flat_pollutants_media_year.csv" seems to have resolved my problems. Thank you!
 
Loading