Sunday, June 19, 2011

Review of the Publish or Perish Bibliometrics Software

Publish or Perish (PoP) is a freeware interface for deriving citation metrics from Google Scholar (GS). I downloaded it, wrapped it up as a Mac application using Wineskin and ran it. The three most useful features of the software are:

1. It automatically computes a bunch of statistics such as the h-index from the GS results.

2. It allows you to merge together the multiple entries for a single paper that GS sometimes generates For example, my 1996 paper in World Development is scattered across multiple entries. This reduces the citation count of the main entry and introduces some bogus publications into the calculation of my h-index. By dragging and dropping the various entries returned you can merge them together into a single entry. The h-index and other statistics recompute automatically.

3. It allows you to sort GS results by title, publication, year of publication, and authors. The standard Google Scholar output is only presented in order of the number of citations, so this could be really useful.

The most common type of analysis is likely to be one for the impact of individual authors, which is the normal application of the h-index and other statistics. PoP can also do analyses of the impact of journals by computing their h-index and other metrics as well as general literature and citation searches. The Publish or Perish book gives lots of examples of the various types of searches.

Accuracy in computing citations analyses for individual authors will depend on two things:

1. How common the name of the person is.

2. If the name is common, how interdisciplinary the person is.

For someone with a unique two initial name (i.e. D. I. Stern) the results are somewhat accurate. Searching for D. I. Stern captures about 83% of my total citations on Google Scholar. The main discrepancy is again the World Development paper listed under D. Stern. There is no way to merge the results of two different searches. Searching for "D Stern" excluding "D* Stern" (which also excludes "DI Stern") generates more than 1000 hits itself and only the first 1000 are counted. Narrowing down the search by subject area excludes a lot of my publications due to my interdisciplinarity. Normally I run one GS search for "DI Stern" in all subject areas and then a second search for "D Stern" excluding "DI Stern" in Business and Economics only.* You can do these two searches in PoP but you can't merge them and get the h-index etc. for the combined search. Of course, you can copy the data retrieved and paste it into a spreadsheet and do this analysis yourself.

So for analyses of scholars with common single initial names (and mine is hardly the most common!) and publications across different disciplinary areas you will likely not find the software to be really user-friendly at the moment though better than using Google Scholar directly. Still, for people like me with a unique combination of two initials it will be fairly accurate in most cases, and using PoP to do a series of searches and then combining them in a spreadsheet should be easier than direct use of GS. If the researcher in question has a unique single initial name like F. Jotzo, J. Pezzey, or T. Kompas then the software is going to be a lot more useful and accurate.

* The main issue here seems to be that if you exclude any one subject area from the Google Scholar search articles in Nature are not included in the results. Two of the articles in my h-index are in Nature.

No comments:

Post a Comment