Thursday, 27 August 2015

RI Tips - Cleaning up messy data from regulatory agencies

From time to time I come across tools and techniques which I have found extremely valuable in getting to grips with the plethora of data sources needed for effective regulatory intelligence work. In this article, I introduce OpenRefine and how it can be usefully deployed to help clean up messy data published on regulatory agency websites.

In this example, I have provided a brief video demonstration of how to use OpenRefine to clean up ATC codes in the published list of registered human medicines on the EMA's website. To aid viewing the detail, you may find it helpful to view the video in full-screen mode, or directly from the YouTube site.

If you have any comments or questions, please contact me via the e-mail address in the video or via the comments on this blog.


  1. Very useful and well-done explanation on using this tool. I've done a lot of data clean-up with Excel and this certainly is a more straightforward way to address most of the issues one finds.

  2. Thank you for your feedback. This is much appreciated. I hope you find OpenRefine useful.


If you have any comments or questions please let me know.