PubMed Author Evaluation

A selection of Python scripts for automatic search of biomedical publications followed by extraction and processing of information about their authors using the PubMed database.

Background

The goal of this project is to identify individual actors in the public debate of scientific issues and to determine their position within science. It specifically aims to establish:

whether an actor can be considered a contributing expert in a given subject area as demonstrated through published research papers
whether an actor can be considered a contributing expert, but has not published research papers in a given subject area

Installation and execution

See: Python Installation und Ausführung (german)

Procedure

1. Manual: Adjustment of input files

Add topics (topic.csv)
Add actors in CSV file

2. Automatic via get_actors_publications.py

Collects all publications for every actor (via PubMed)
Filters publications (via journal_ranking.csv)
Generates author file for every actor
- one row = one publication

3. Manual: Review of output

Review generated author files
- Particularly the values Active and Authorship Confidence
Adjust input files if necessary
- Adjust search string for topic/actors
- In case of automatic authorship rating: improve lists Location and Institution
- On changes: Execute get_actors_publications.py again

4. Automatic via calc_authors_results.py

Uses all author files
Generates complete list with all authors and their metrics
- one row = one author

Both scripts ask for the topic (see topics.csv) upon launch and whether publications should be retrieved using the name or ORCID of the author.

In- and output

INPUT folder

The INPUT folder contains all lists with actors and the journal ranking. Of importance here is the file topics.csv as it is the so-called configuration file for the scripts.

topics.csv

File format: csv
First row: column names
Following rows: one row = one topic

short – an acronym for the topic, used upon script launch
search string – search string is the way to check whether a research paper was published in a specific subject area. A search in PubMed is initiated, which contains the publication ID and this search string. (see code for further details)
actors list file – file name of the lists of actors in a given subject area

Lists of actors

File format: csv
First row: column names
Following rows: one row = one actor

AktID – unique ID of an actor
Name – precise name of a person
Position (optional)
Institution (optional)
Label (optional, e.g. doctor, expert, researcher)
InstitutionList – list of institutions at which the actor published research papers (for automatic authorship evaluation)
LocationList – list of known cities/countries in which the actor published research papers (for automatic authorship evaluation)
PubMedSearch – search string used to identify person in PubMed. Simple string or complex query.

journal-ranking.csv

Export from Scimago Journal & Country Rank

OUTPUT folder

author_[topic]/ subfolder – contains all author files of the specified topic.
result_[topic].csv – contains final evaluation for all authors of a the specified topic.

Autordateien

File format: csv
First row: column names
Following rows: one row = one publication

Active – 0 or 1. Controls whether publication will be considered in the complete evaluation of a subject area.
Authorship Confidence – Result of an automatic authorship evaluation. (see below)
ID (LINK) – ID of publication in PubMed database. It is linked and can be opened with Ctrl+Click.
Topic – Acronym of subject area: if the publication could be assigned to the subject area (via subject area search string).
Title – Title of publication
Citations – Number of citations in other publications
Date – Date of publication
Author Position – first/middle/last
Co-Author count – Number of coauthors

Automatic authorship evaluation

Because the data in the PubMed databank are inconsistent, the algorithm cannot always assure that the person (the actor) under consideration is, in fact, the author of the found research paper or it is, for instance, a person with similar initials.

To address this issues, the script get_actors_publications.py facilitates an automatic evaluation of the authorship. This can be turned on or off upon launch.

There are indicators that increase the confidence over the authorship. When applies, the number will be added to the Authorship Confidence value.

+0.4 – Firstname matches
+0.3 – an institution matches
+0.2 – a location matches

The data about institution and location will be fed from the list of authors. The more data is available in the list, the more significant is the Authorship Confidence value.

The Authorship Confidence value has a direct impact on the Active value. Should the Authorship Confidence be equal zero, so the Active will also be set to zero. In this case, it is necessary to review the authorship manually again. Should the person under consideration be the author of publication, it is necessary to add institution and location to the list of actors. This ensures a better Authorship Confidence value in the next run of the script.

License

Conception: Prof. Dr. Markus Lehmkuhl (KIT & FU Berlin), Dr. Evgeniya Boklage (FU Berlin)
Implementation: Yannick Milhahn (TU Berlin & FU Berlin)

Distributed under GPLv3 License. See LICENSE for more information.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
INPUT		INPUT
OUTPUT		OUTPUT
PubMedAuthorEvaluation		PubMedAuthorEvaluation
docs		docs
.gitignore		.gitignore
LICENSE		LICENSE
README.de.md		README.de.md
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PubMed Author Evaluation

Background

Installation and execution

Procedure

1. Manual: Adjustment of input files

2. Automatic via get_actors_publications.py

3. Manual: Review of output

4. Automatic via calc_authors_results.py

In- and output

INPUT folder

topics.csv

Lists of actors

journal-ranking.csv

OUTPUT folder

Autordateien

Automatic authorship evaluation

License

About

Releases 1

Packages

Languages

License

ymilhahn/PubMedAuthorEvaluation

Folders and files

Latest commit

History

Repository files navigation

PubMed Author Evaluation

Background

Installation and execution

Procedure

1. Manual: Adjustment of input files

2. Automatic via get_actors_publications.py

3. Manual: Review of output

4. Automatic via calc_authors_results.py

In- and output

INPUT folder

topics.csv

Lists of actors

journal-ranking.csv

OUTPUT folder

Autordateien

Automatic authorship evaluation

License

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages