Skip to main content
SearchLoginLogin or Signup

An Easier Way to Access the Web of Science API

Using the Web of Science Excel Converter tool

Published onDec 09, 2021
An Easier Way to Access the Web of Science API
·

Analyzing bibliographic data

Library collection and scientometric projects are based on the analysis of publication metadata, typically coming from an article index or database. At Iowa State University (ISU), we are working to support more open publishing models, and a critical step in this is to analyze data from publications that have a Corresponding Author1 from ISU.

Web of Science is a subscription-based citation index that includes information on the Corresponding Author (CA) of each publication, but this detail is not provided to the user as a searchable term or in a limiting facet. This means that one must first export the results and run post-processing to filter the list of papers from those with an ISU author anywhere in the string down to only those papers with an ISU Corresponding Author.

Iowa State University publishes around 4,000 articles per year which are indexed in Web of Science. (The database takes a selective approach to content [1] and does not index the contents of every journal; therefore, the records of some ISU-authored publications are not included in this database.) To get the CA information for ISU-authored records, I must export in batches of 500 (the export limit), then reconstruct the multiple individual files. While not a huge deal, it is an annoyance to run eight separate export commands, save the files, and re-assemble them after the fact.

The Web of Science API offers direct access to the underlying data instead of interacting with the records through the web interface. Since CA is not offered as a facet in the web interface, this work requires access to the exported data anyway; using the API is faster, more convenient, and can offer even richer data than is available through the web.

Sounds great, just use the WoS API then

Unfortunately, I have always found the Web of Science API to be difficult to use and understand. There are varying levels of API access possible, and these must be included in the library’s negotiated contract with Clarivate. Even when access has been agreed upon, I have found the approval procedure to be cumbersome, slow, and opaque. I had to reach out to our sales rep to get my request expedited, and even after I have was granted an API key (a secret string of 40 characters that uniquely identifies me), I still run into errors.

Python libraries such as wos for Web of Science (or pybliometrics for Scopus) attempt to provide a structure to automate the API query process through a script, but I have never been able to get them to work.

I consider myself fairly tech-savvy, but without a Computer Science degree, I quickly get lost at this step. Generous colleagues from around the world have been kind enough to share their Jupyter Notebooks or Python code with me as demonstrations, but these are still uniquely customized and tailored to their specific institutions and use cases, and are not completely generalizable to adapt locally.

It is an enormous jump to go from interacting with the WoS web interface to an environment that requires you to understand GET requests, query strings, and unpacking nested JSON responses.

Python code snippet of a function def_author, with for loops and if statements

Example of an advanced Web of Science API parsing script, ©Tran Ha

Note: I find the database Dimensions to have the most user-friendly API, with clear documentation, multiple routes to access, and plenty of worked examples to start from. Unfortunately, Dimensions relies on publishers to provide CA information in the publication metadata, and not all publishers include this in their records.

Pull quote: It is an enormous jump to go from interacting with the WoS web interface to an environment that requires you to understand GET requests, query strings, and unpacking nested JSON responses.

Enter the WOS API Excel Converter tool

It turns out Clarivate already has a tool to help facilitate pulling this data, the WOS API Excel Converter.

I had never heard of this tool and only learned of it when working through a separate issue with Clarivate Support. For all its apparent usefulness, I have never seen it referenced anywhere and am surprised that Clarivate does not make it more widely known.

Launched in February 2021, this small tool simplifies the process of accessing the WoS API data and helps the user bridge that gap when going from a familiar point-and-click site to a more sophisticated API request. A bonus is that it exports the data directly to Excel - no more messing with unpacking JSON and extracting the data values you want.

Per the product page, https://github.com/clarivate/wos-excel-converter:

This is a small and easy-to-use desktop application that allows exporting Web of Science API Expanded data in Excel with a configurable and flexible data export structure.

If you visit the above link, you might be confused (as I was) on what you’re actually supposed to run. The real link is a “secret” page, https://github.com/clarivate/wos-excel-converter/releases, which has the actual executables to download and install on your specific system.

1. Get Access

Step 0 is to sign in to the Clarivate Developer Portal and apply for an API Key. As I mentioned above, this may take several days to a few weeks to complete.

Once you are approved and assigned, Step 1 is to paste that key in the tool and click Verify. You can see I am authenticated and have already retrieved some of the 1,000,000 records my account is approved for.

It would be nice if the tool remembered the API Key across sessions, but for now, you do need to cut and paste that string every time, so keep it in a place that is easily acccessible.

Screenshot of the tool on Step 1: enter the API key and click Verify

Enter the API Key and click Verify

2. Query

Next, construct the query using the WoS query parameters. These are the same as in the Advanced Search from the web interface of WoS.

In the example below, I am retrieving records that have an author from Iowa State University (OG=Organization Enhanced) anywhere in the author string, and were published in 2020 (PY=Published Year).

You can also specify which database to search and the language used.

Screenshot of the tool on Step 2: choose database, language, and construct query boxes

In Step 2 of the tool, specify the database and language, then construct the query

The Validate button helps preview how many records the query will find if actually run. In this case, around 4,000 is what I expect, while a 0 is a good indication that something went wrong.

Notably, the Publisher [PU] tag is not supported in either the API or the Advanced Search. This would be a welcome addition since my work often focuses on ISU’s publishing output with a given publisher. For now, I still need to export everything ISU produced in a given year, then filter on the PU field that appears within the exported data.

3. Customize the output

Step 3 has some options regarding the JSON output and how the data is split across tabs in the resulting Excel spreadsheet. I leave this as the default, though a motivated user could dive in to understand this more and customize it to their liking.

However, I believe any changes would also be reset when closing and re-launching the tool, so those customizations would need to be re-done each time.

Screenshot of the tool on Step 3: configure the nested JSON result

Define how the JSON output gets split across multiple Excel tabs

4. Specify filename and run the export

Finally, Step 4 defines the path and filename of the output before executing. Another nice feature is the double-sided slider, which lets you export only a selection of the records returned by your query.

“Start export” actually runs the query and provides an estimate of the time remaining. It handles the looping and segmenting of sections of records, combining them into one single output file for you.

Screenshot of the tool on Step 4: declare filename and run query. Purple progress bar 25% of the way through, 3min 5sec remaining.

Start Export and let the tool run

Once the export is complete, the file will be saved and data displayed across multiple tabs, as defined in Step 3 above.

Animated gif showing the four tabs of the resulting Excel file: Research Output, Addresses and Authors, Authors without Addresses, Grants

Resulting data in Excel

Limitations

To begin, your institution must have access to both Web of Science and the API Expanded. Access to the WoS website does not necessarily mean you also have access to the API. If your library does, you must then individually apply and sign up to get an API key for input into Step 1 of the converter tool.

This tool is not perfect. The filenames seem to have a bug in assigning the export date (issue has been reported), and the data is split across multiple tabs in the resulting Excel sheet in a convoluted and confusing way. The Publisher facet would be useful to make available in the query language, and making the API Key and any JSON changes sticky would be a nice user experience improvement.

However, I do find this tool to be a welcome improvement, and more people could benefit from knowing about it. Getting direct access to the API through Python or R code gives greater control over what exactly you extract, how it is structured, and what is returned, but for those users (like me) who are ready to move beyond the web interface but still find the script access intimidating, this tool is a useful middle ground.

Comments
0
comment
No comments here
Why not start the discussion?