Using the Web of Science Excel Converter tool
Library collection and scientometric projects are based on the analysis of publication metadata, typically coming from an article index or database. At Iowa State University (ISU), we are working to support more open publishing models, and a critical step in this is to analyze data from publications that have a Corresponding Author1 from ISU.
Web of Science is a subscription-based citation index that includes information on the Corresponding Author (CA) of each publication, but this detail is not provided to the user as a searchable term or in a limiting facet. This means that one must first export the results and run post-processing to filter the list of papers from those with an ISU author anywhere in the string down to only those papers with an ISU Corresponding Author.
Iowa State University publishes around 4,000 articles per year which are indexed in Web of Science. (The database takes a selective approach to content  and does not index the contents of every journal; therefore, the records of some ISU-authored publications are not included in this database.) To get the CA information for ISU-authored records, I must export in batches of 500 (the export limit), then reconstruct the multiple individual files. While not a huge deal, it is an annoyance to run eight separate export commands, save the files, and re-assemble them after the fact.
The Web of Science API offers direct access to the underlying data instead of interacting with the records through the web interface. Since CA is not offered as a facet in the web interface, this work requires access to the exported data anyway; using the API is faster, more convenient, and can offer even richer data than is available through the web.
Unfortunately, I have always found the Web of Science API to be difficult to use and understand. There are varying levels of API access possible, and these must be included in the library’s negotiated contract with Clarivate. Even when access has been agreed upon, I have found the approval procedure to be cumbersome, slow, and opaque. I had to reach out to our sales rep to get my request expedited, and even after I have was granted an API key (a secret string of 40 characters that uniquely identifies me), I still run into errors.
Python libraries such as wos for Web of Science (or pybliometrics for Scopus) attempt to provide a structure to automate the API query process through a script, but I have never been able to get them to work.
I consider myself fairly tech-savvy, but without a Computer Science degree, I quickly get lost at this step. Generous colleagues from around the world have been kind enough to share their Jupyter Notebooks or Python code with me as demonstrations, but these are still uniquely customized and tailored to their specific institutions and use cases, and are not completely generalizable to adapt locally.
It is an enormous jump to go from interacting with the WoS web interface to an environment that requires you to understand GET requests, query strings, and unpacking nested JSON responses.
Note: I find the database Dimensions to have the most user-friendly API, with clear documentation, multiple routes to access, and plenty of worked examples to start from. Unfortunately, Dimensions relies on publishers to provide CA information in the publication metadata, and not all publishers include this in their records.
It turns out Clarivate already has a tool to help facilitate pulling this data, the WOS API Excel Converter.
I had never heard of this tool and only learned of it when working through a separate issue with Clarivate Support. For all its apparent usefulness, I have never seen it referenced anywhere and am surprised that Clarivate does not make it more widely known.
Launched in February 2021, this small tool simplifies the process of accessing the WoS API data and helps the user bridge that gap when going from a familiar point-and-click site to a more sophisticated API request. A bonus is that it exports the data directly to Excel - no more messing with unpacking JSON and extracting the data values you want.
Per the product page, https://github.com/clarivate/wos-excel-converter:
This is a small and easy-to-use desktop application that allows exporting Web of Science API Expanded data in Excel with a configurable and flexible data export structure.
If you visit the above link, you might be confused (as I was) on what you’re actually supposed to run. The real link is a “secret” page, https://github.com/clarivate/wos-excel-converter/releases, which has the actual executables to download and install on your specific system.
Step 0 is to sign in to the Clarivate Developer Portal and apply for an API Key. As I mentioned above, this may take several days to a few weeks to complete.
Once you are approved and assigned, Step 1 is to paste that key in the tool and click Verify. You can see I am authenticated and have already retrieved some of the 1,000,000 records my account is approved for.
It would be nice if the tool remembered the API Key across sessions, but for now, you do need to cut and paste that string every time, so keep it in a place that is easily acccessible.
Next, construct the query using the WoS query parameters. These are the same as in the Advanced Search from the web interface of WoS.
In the example below, I am retrieving records that have an author from Iowa State University (OG=Organization Enhanced) anywhere in the author string, and were published in 2020 (PY=Published Year).
You can also specify which database to search and the language used.
The Validate button helps preview how many records the query will find if actually run. In this case, around 4,000 is what I expect, while a 0 is a good indication that something went wrong.
Notably, the Publisher [PU] tag is not supported in either the API or the Advanced Search. This would be a welcome addition since my work often focuses on ISU’s publishing output with a given publisher. For now, I still need to export everything ISU produced in a given year, then filter on the PU field that appears within the exported data.
Step 3 has some options regarding the JSON output and how the data is split across tabs in the resulting Excel spreadsheet. I leave this as the default, though a motivated user could dive in to understand this more and customize it to their liking.
However, I believe any changes would also be reset when closing and re-launching the tool, so those customizations would need to be re-done each time.
Finally, Step 4 defines the path and filename of the output before executing. Another nice feature is the double-sided slider, which lets you export only a selection of the records returned by your query.
“Start export” actually runs the query and provides an estimate of the time remaining. It handles the looping and segmenting of sections of records, combining them into one single output file for you.
Once the export is complete, the file will be saved and data displayed across multiple tabs, as defined in Step 3 above.
To begin, your institution must have access to both Web of Science and the API Expanded. Access to the WoS website does not necessarily mean you also have access to the API. If your library does, you must then individually apply and sign up to get an API key for input into Step 1 of the converter tool.
This tool is not perfect. The filenames seem to have a bug in assigning the export date (issue has been reported), and the data is split across multiple tabs in the resulting Excel sheet in a convoluted and confusing way. The Publisher facet would be useful to make available in the query language, and making the API Key and any JSON changes sticky would be a nice user experience improvement.
However, I do find this tool to be a welcome improvement, and more people could benefit from knowing about it. Getting direct access to the API through Python or R code gives greater control over what exactly you extract, how it is structured, and what is returned, but for those users (like me) who are ready to move beyond the web interface but still find the script access intimidating, this tool is a useful middle ground.