Thursday, 27 August 2015

RI Tips - Cleaning up messy data from regulatory agencies

From time to time I come across tools and techniques which I have found extremely valuable in getting to grips with the plethora of data sources needed for effective regulatory intelligence work. In this article, I introduce OpenRefine and how it can be usefully deployed to help clean up messy data published on regulatory agency websites.

In this example, I have provided a brief video demonstration of how to use OpenRefine to clean up ATC codes in the published list of registered human medicines on the EMA's website. To aid viewing the detail, you may find it helpful to view the video in full-screen mode, or directly from the YouTube site.

If you have any comments or questions, please contact me via the e-mail address in the video or via the comments on this blog.