Browsed by
Category: Work & Research

A guide to identifying author gender for bibliometric analyses

A guide to identifying author gender for bibliometric analyses

In 2018, I wrote a post for The Bibliomagician blog on identifying authors’ genders based on name analyses, based on a lively discussion on the LIS-Bibliometrics listserv. I’m reposting the blog post here under a CC-BY license.

Recently on the LIS-Bibliometrics listserv, Ruth Harrison (Imperial College London) posed a question on behalf of a patron who was interested in identifying authors’ genders based upon names listed on ~2,000 journal articles–too large a corpus for manual analysis. The community weighed in with many good suggestions for ways to approach a large scale gender analysis for author names. We thought it would be helpful to others to share what Ruth learned (with permission from the original posters).

Here are some recommendations from LIS-Bibliometrics listserv members on the best places to find author names, APIs and software you can use to analyze gender, consultants you can hire the analysis out to, and previous approaches to analysis from other gender bibliometrics researchers.

Where to find author names lists

Web of Science was most recommended as being a good way to download full author names for publication lists. Programmatic access via the Web of Science API is usually available for licensing (libraries are usually the purchasers of Web of Science access for institutions, so you should contact your library to inquire as to whether API access is included in your institution’s contract).

We would be remiss if we did not point out the challenges that face anyone seeking to do a study that determines a person’s gender, based on name alone.

New citation index Dimensions also makes authors’ full names available for download (though only for up to 50 papers at once in the free version of the app) and via the Dimensions API, which is freely available for those doing scientometrics research.

On the other hand, listserv members pointed out that Scopus only makes authors’ first initials available both in metadata downloads for publication lists and via the Scopus API. Therefore, it is unsuitable to use in isolation for finding author names.

APIs and software

Automated gender analysis requires a bit of programming knowledge (or at least a willingness to learn). In particular, calling APIs and parsing publication metadata are two essential programming skills.

Gender API is a recommended service that allows you to look up the likely gender (and degree of confidence) for a particular name or list of names. For example, you could query the name “Diana” and learn that the name is classified as ‘female’, with a 93% accuracy rate based on a sample of 523 names. The providers offer clients for interacting with the API in PHP, Python, and several other programming languages.

Namsor is another recommended API for looking up gender based on names, and it has the added feature of looking up ethnicity, as well. The free API allows for a limited number of monthly calls; you can also pay for API access to increase your API call limit.

GenderChecker is a recommended name list that can be downloaded for less than $200 USD, then analyzed. As one listserv poster explained, “It’s not 100 percent accurate, but works for most American/European first names, especially if you have a large dataset. Be very careful with Chinese/Japanese/Korean names; most of the time they should be neutral unless you further checked.” is yet another API that was not recommended by listserv members, but appears in several recent studies and reports. The Genderize database reportedly contains 216,286 distinct names across 79 countries and 89 languages. It is free to use but rate-limited to 1000 requests per day.

gender author identity
CC_By Pixabay

Finally, the recommended Python package SexMachine allows you to look up the gender for around 40,000 names. For each name you query, you will get a response for one of the following categories: andy (androgynous), male, female, mostly_male, or mostly_female. For example, the query “Paul” would return “male”, whereas the name “Stacy” would return “mostly_female”.

Other gender researchers’ approaches

Listserv members also suggested that Ruth and her patron look to existing author gender analysis studies to find methods to borrow. Two in particular–a 2013 commentary from Nature, and a more recent Elsevier report–were the most mentioned:

The Nature study’s supplementary files include a thorough discussion of how to parse Web of Science names data for a variety of countries of origin.

One listserv respondent pointed out that “The Elsevier report’s methodology implies they didn’t have an easier way to [identify author gender] (“Scopus Author Profiles were combined with gender-name data from social media, applied onomastics, and Wikipedia”).” More details on the study’s methods can be found in a report appendix. Particularly useful is a discussion of the various name-gender APIs suitability for multi-country analysis.


For those who want to hire out the work, Science-Metrix, Elsevier Analytical Services, and Digital Science Consultancy are all businesses that offer a variety of bibliometrics analysis services, which may include gender analysis. Contact the consultancies themselves for more information.


We would be remiss if we did not point out the challenges that face anyone seeking to do a study that determines a person’s gender, based on name alone. First and foremost, there is the question of ethics: does this kind of study rob authors of their right to be identified as a particular gender that might not match the expected gender for someone with their name?

Related to that issue is the problem of the assumption of a gender binary. All studies in this area tend to identify authors as “Male”, “Female”, “Unisex” (as in, a name that is suitable for both men and women), and “Unknown”. How can researchers more accurately identify the gender of someone who identifies as genderqueer or agender, for example? It doesn’t seem possible to do so using a simple names analysis, meaning that these kinds of studies should be approached and described with that caveat in mind.

Then there are technical issues related to the dearth of useful author metadata and regional name-gender data. “What about cases where the author info only includes initials?” one listserv respondent wrote. Other respondents pointed out that many name-gender analysis tools are biased towards Western names, making it difficult to do accurate analysis on authors from other areas of the world.

Do you have suggestions for other ways to analyze gender based upon author names (or other freely available information online)? If so, please leave them in a comment below!

The Idealis’s next chapter

The Idealis’s next chapter

Cross-posted from

In August, I stepped down as a Founding Editor of The Idealis to focus on other projects. Nicky Agate is now The Idealis’s Editor in Chief.

The Idealis started out of community conversations around LIS scholarship and open access, and I’m proud of what we’ve accomplished so far: over 290 recommendations for freely available scholcomm research; more than 44,000 views and 400 subscribers; and most importantly a stellar team of 38 editors who have dedicated their time and expertise to finding the very best scholcomm research and sharing it with the community.

I’m very grateful to The Idealis’s volunteers, especially Nicky, for taking The Idealis forward. I look forward to seeing what The Idealis has in store, and will remain a faithful reader of the site for years to come. Thank you!

New article: “Scholarly Communication Librarians’ Relationship with Research Impact Indicators: An Analysis of a National Survey of Academic Librarians in the United States”

New article: “Scholarly Communication Librarians’ Relationship with Research Impact Indicators: An Analysis of a National Survey of Academic Librarians in the United States”

The Journal of Librarianship and Scholarly Communication just published “Scholarly Communication Librarians’ Relationship with Research Impact Indicators: An Analysis of a National Survey of Academic Librarians in the United States“.

This is the final publication related a topic I’ve been working on since 2013 (!), when I first realized that although academic librarians were interested in research metrics, no one had yet studied the reality of how they were using these kinds of indicators in their day-to-day jobs and in support of their own careers.

Along the way, I’ve been privileged to work with Sarah Sutton and Rachel Miles (and for a short period, Michael Levine-Clark) on a series of publications and presentations that include:

We ultimately learned that:

  • Your seniority/years of experience has no effect upon how familiar you are likely to be with various research metrics
  • Librarians and LIS educators alike are more familiar with traditional research impact metrics like the JIF than they are with altmetrics
  • Altmetrics are least likely to be used for collection development, though this is a use case I’ve been promoting for a long time
  • The more scholcomm-related duties you have in your job, the more you’ll use metrics of all kinds
  • Altmetric is the most popular altmetrics database used by librarians 😎

Sarah and Rachel plan to carry this path of research forward, expanding the scope of the study to include librarians worldwide, and also possibly looking at library promotion and tenure documents’ discussion of metrics. I wish them the very best and want to once again express my gratitude towards them as collaborators: Ladies, I hope to work with you both again in the future!

New article published, “Building and Sustaining a Grassroots Library Organization: A Three Year Retrospective of Library Pipeline”

New article published, “Building and Sustaining a Grassroots Library Organization: A Three Year Retrospective of Library Pipeline”

Last month, an article I co-authored with Josh Finnell on the challenges of organizing librarians at the grassroots was published in International Information & Library Review.

We librarians love to bemoan the state of our professional organizations. (Who doesn’t?) But as board chair of Library Pipeline–a fledging professional association for librarians–and volunteer for both Pipeline’s Green Open Access Working Group and the Innovation in Libraries Awesome Foundation chapter, I have to say, running a professional organization is often tough and thankless work.

Luckily, it’s also rewarding work. Through Pipeline, I’ve gotten to know our profession’s best and brightest (including my co-author Josh), contributed personally to ‘opening up’ the LIS literature to all readers, and helped others vet and fund some amazing library-based projects from around the world.

The article that Josh and I wrote explains the brief history of Library Pipeline to date and where we’re headed next–while also pointing out some challenges that exist for others who might want to launch a grassroots library professional organization of their own. You can read it on the IILR website or check out the preprint on Figshare.

In case you’re wondering, Pipeline has been mostly quiet for the latter half of 2017 as the board worked to create our bylaws and revise our mission statement, so we’re better positioned to expand our work in 2018. To learn more about Library Pipeline and to become a volunteer, visit our website.

Finnell, J.and S. Konkiel. Building and Sustaining a Grassroots Library Organization: A Three Year Retrospective of Library Pipeline. 2, figshare, 2 Jan. 2018, doi:10.6084/m9.figshare.5727084.v2.


Prototyping an article recommendation “bot” powered by Altmetric data

Prototyping an article recommendation “bot” powered by Altmetric data

cute robot
Hello, world! I’m @SociologyBot!
Avatar CC-BY-SA clipartkid / Wikimedia

I’ve recently launched a fun new project, @SociologyBot. It’s a Twitter account that recommends recently discussed research in the field of (you guessed it) sociology.

I’ve been wanting to explore the “altmetrics as a filter” idea for a long time. Being able to find not only disciplinary research but also the conversations surrounding research appeals to me, and I bet that other researchers would like access to that kind of information, too.

So, now I’m experimenting with a prototype “bot”, @SociologyBot. What sets @SociologyBot apart from other research recommendation bots on Twitter are a few things:

  • It’s a social sciences bot (which are surprisingly rare!)
  • It tweets out new and old research alike (not just the “recently published” stuff)
  • It surfaces both research and the conversations surrounding research
  • It’s not actually a bot (yet)!

I’m prototyping @SociologyBot right now, meaning it’s powered using a mix of manual and automated means. (Hence the scare quotes I keep putting around “bot”.) That’s because I want to understand if people actually care about this kind of a bot before I put a lot of time and energy into coding it! I guess you could call @SociologyBot a “minimum viable product”.

Here’s how @SociologyBot currently runs:

  1. I set up a search in Altmetric Explorer to find articles from the top ten sociology journals (as identified by SCImago Journal Rank) that have been mentioned in the last day. I use the journal shortlist as a basis this not because I particularly care for finding only research published in the “top” journals, but because it makes the list of articles much more manageable.
  2. Explorer sends me a daily email summary of said articles.
  3. Based on the shortlist provided in the summary email from Explorer, I schedule new daily tweets using TweetDeck that include both the article with the highest Altmetric Attention Score (AAS) and a link to the Altmetric details page, where discussions of the articles can be found.
  4. Using TweetDeck as automation, @SociologyBot then tweets out one scheduled article daily, at 8 am Mountain time.

Here’s how I plan to build @SociologyBot so that it’s fully automated:

  1. I write a script to query the Altmetric API every 24 hours to find sociology articles that have been mentioned online in the past day.
  2. The script takes the article with the most mentions and checks whether it’s already been tweeted about in the past month, as a safeguard against the same popular articles being constantly recommended.
  3. If it hasn’t, the script then composes a tweet that links to the article and its Altmetric detail page. If it has, the script will then check for the article with the next highest AAS that has not been recently tweeted, and will compose a tweet for that one instead.
  4. The script then posts the article and its Altmetric details page immediately to the @SociologyBot Twitter account.

Whether or not @SociologyBot gets a lot of followers, and whether or not those followers actually click on the Altmetric Details Page links, will determine whether @SociologyBot is a success (and thus whether I should bother coding it to be a proper bot!)

So: if you’re interested in sociology research and want to see this little guy come to life, please give @SociologyBot a follow!

Mellon Grant to Support Values-Based Metrics for the Humanities and Social Sciences

Mellon Grant to Support Values-Based Metrics for the Humanities and Social Sciences

CC-BY Nicky Agate / Medium

I’m excited to announce that the HuMetricsHSS research team–which I was a part of at the 2016 TriangleSCI conference–has received the support of the Andrew W. Mellon Foundation to continue our work of encouraging the discovery and use of “humane” research evaluation metrics for the humanities and social sciences.

HSS scholars are increasingly frustrated by the prevalence of the use of evaluation metrics (borrowed from the sciences) that do not accurately capture the impacts of their work. Our grand vision is to develop better metrics, ones that are rooted in the values that are important to scholars. This grant-funded research is a start.

From the press release:

“We are reverse-engineering the way metrics have operated in higher education,” said Christopher P. Long, Dean of the College of Arts & Letters at Michigan State University and one of the Principal Investigators (PIs) of the Mellon-funded project. “We begin not with what can be measured technologically, but by listening to scholars themselves as they identify the practices of scholarship that enrich their work and connect it to a broader public.”

We’ll be sharing updates on the HuMetricsHSS project from our website and on Twitter, so please follow along!

Much gratitude to the Mellon Foundation for supporting HuMetricsHSS.

Library Pipeline launches “Innovation in Libraries” micro-grant

Library Pipeline launches “Innovation in Libraries” micro-grant

I’m super excited to announce that the Innovation in Libraries grant is now accepting applications:

A core group of Library Pipeliners has been working hard for months to recruit rank-and-file librarians worldwide, many of whom are funding this grant out of their own pockets (!). Each month through August 2017, our Awesome Foundation chapter will award a $1000 USD grant to prototype library-based innovations (both technical and non-technical in nature) that are inclusive, daring, and diverse.

I am so proud of this grassroots effort to support risk-taking in librarianship. This is a great step towards building community through organizing, and I’m really excited to be a part of it.

Special recognition goes to Josh Finnell (Los Alamos National Lab), Robin Champieux (OHSU), and Bonnie Tijerina (Data & Society/ER&L), all of whom were crucial to getting this project off the ground.

For more information on the grant, please do visit the grant webpage or apply via the Awesome Foundation.

Altmetrics and the reform of the promotion & tenure system

Altmetrics and the reform of the promotion & tenure system

For the past few weeks, I’ve been working with a colleague at Altmetric to develop a guide for using altmetrics in one’s promotion and tenure dossier. (Keep an eye out for the resulting blog post and handout on–I think they’re going to be good!)

Altmetrics and P&T is a topic that’s come up a lot recently, and predictably the responses are usually one of the following:

  1. Do you seriously want to give people tenure based on their number of Twitter followers?!!?! ::rageface::
  2. Hmm, that’s a pretty interesting idea! If applied correctly (i.e. in tandem with expert peer review and traditional metrics like citation counts, etc), I could see how altmetrics could improve the evaluation process for P&T.

You can probably guess how I lean.

With that in mind, I wanted to think aloud about an editorial I recently read in Inside Higher Ed (a bit late to the game–the essay was written in 2014). It’s a great summary of many of the issues that plague P&T here in the States, and in particular the bits about “legitimacy markers” make a great argument in favor of recognizing altmetrics in P&T evaluation and preparation guidelines.

Below, I’ve excerpted the parts [to which I want to respond] (and the bits I want to emphasize), but please visit Inside Higher Ed and read the piece in its entirety, it’s worth your time.

The assumption that we know a scholar’s work is excellent if it has been recognized by a very narrow set of legitimacy markers adds bias to the process and works against recognition of newer form of scholarship.


Typically candidates for tenure and promotion submit a personal narrative describing their research, a description of the circulation, acceptance rate and impact factors of the journals or press where they published, a count and list of their citations, and material on external grants.  This model of demonstration of impact favors certain disciplines over others, disciplinary as opposed to interdisciplinary work, and scholarship whose main purpose is to add to academic knowledge. [Emphasis mine.]

In my view, the problem is not that using citation counts and journal impact factors is “a” way to document the quantity and quality of one’s scholarship. The problem is that it has been normalized as the only way. All other efforts to document scholarship and contributions — whether they be for interdisciplinary work, work using critical race theory or feminist theory, qualitative analysis, digital media or policy analysis are then suspect, marginalized, and less than.

Using the prestige of academic book presses, citation counts and federal research awards to judge the quality of scholarship whose purpose is to directly engage with communities and public problems misses the point. Interdisciplinary and engaged work on health equity should be measured by its ability to affect how doctors act and think. [One might argue that altmetrics like citations in public policy documents and clinical care guidelines are a good proxy for this.] Research on affirmative action in college admissions should begin to shape admissions policies. [Perhaps such evidence could be sourced from press releases and mainstream media coverage of said changes in admissions policies.] One may find key theoretical and research pieces in these areas published in top tier journals and cited in the Web of Science, but they should also find them in policy reports cited at NIH [again, citations in policy docs useful here], or used by a local hospital board to reform doctor training [mining training handbooks and relevant websites could help locate such evidence]. We should not be afraid to look for impact of scholarship there, or give that evidence credibility.

Work that is addressing contemporary social problems deserves to be evaluated by criteria better suited to its purposes and not relegated to the back seat behind basic or traditional scholarship.

Altmetrics technologies aren’t yet advanced enough to do most of the things I’ve suggested above (in particular, to mine news coverage or the larger Web for mentions of the effects of research, rather than links to research articles themselves). But the field is very young, and I expect we’ll get there soon enough. And in the meantime, we’ve got some pretty decent proxies for true impact already in the main altmetrics services (i.e. policy citations in Altmetric Explorer, clinical citations in PlumX, dependency PageRank for useful software projects in Depsy/Impactstory).

In the shorter term, we need for academics to advocate for the inclusion of altmetrics in promotion & tenure evaluation and preparation guidelines.

Most researchers don’t know that this data is available, so they tend not to use it in preparing their dossiers. Fair enough.

What concerns me are the researchers who are aware of altmetrics, but who are hesitant to include it in their dossiers for fear that their colleagues a) won’t know what to do with the data, or b) won’t take them seriously if they include it. After all, there’s a lot of misinformation out there about what altmetrics are meant to do, and if you’ve got a reviewer that’s misinformed or that has a bone to pick re: altmetrics, that could potentially affect your career.

Then there’s the tenure committees, often made up of reviewers from all disciplines and at all (post-tenure) stages of their career. If they’re presented with altmetrics as evidence in a P&T dossier but a) they’re biased against altmetrics, and/or b) their university’s review guidelines don’t confirm that altmetrics–in the service of providing evidence for specific claims to impact–are a respectable form of evidence for one’s dossier, then the tenure applicant is either met with confusion or skepticism (at best) or responded to with outright hostility (at worst).

(Before you think I’m being melodramatic re: “outright hostility”–you should see some of the anti-altmetrics diatribes out there. As in many other aspects of life, some people aren’t content with the “you do it your way, I’ll do it my way” thing–they are pissed that you dare to challenge the status quo and will attack those who suggest differently.)

Anyone reading this post that’s got a modicum of influence at their university (i.e. you’ve got tenure status and/or voting rights on your university’s faculty council) should go and petition their vice provost of faculty affairs to update their university-wide P&T review and preparation guidelines to include altmetrics. Or, at the very least, focus on changing departmental/college P&T guidelines.

Once you’ve done so, we’re that much closer to reforming the P&T process to respect the good work that’s being done by all academics, not just those who meet a very traditional set of criteria.

“The Use of Altmetrics in Promotion and Tenure” published in Educause Review

“The Use of Altmetrics in Promotion and Tenure” published in Educause Review

An article I co-authored along with Cassidy Sugimoto (Indiana University) and Sierra Williams (LSE Impact Blog) was recently published in the Educause Review.

From the intro: “Promotion and tenure decisions in the United States often rely on various scientometric indicators (e.g., citation counts and journal impact factors) as a proxy for research quality and impact. Now a new class of metrics — altmetrics — can help faculty provide impact evidence that citation-based metrics might miss: for example, the influence of research on public policy or culture, the introduction of lifesaving health interventions, and contributions to innovation and commercialization. But to do that, college and university faculty and administrators alike must take more nuanced, responsible, and informed approaches to using metrics for promotion and tenure decisions”

Read the full article on the Educause Review website.

Reddit AMA – May 10th!

Reddit AMA – May 10th!

Cross-posted from the Digital Science blog on 25th April 2016


reddit logo

Join us for a Reddit Ask Me Anything with Stacy Konkiel (@skonkiel), Outreach & Engagement Manager at Altmetric, at 6pm GMT/1pm EDT on the 10th May.

The Reddit Ask Me Anything forum is a great way to engage and interact with subject experts in a direct and honest Q&A, asking those burning questions you’ve always wanted to get their perspective on! Mark Hahnel, the founder of Figshare, Euan Adie, the founder of Altmetric and John Hammersley, co-founder of Overleaf, have also all participated in this popular discussion forum.

Following their lead, on Tuesday 10th May at 6pm UK time / 1pm EST Stacy Konkiel, Altmetric’s Outreach & Engagement Manager, will be taking part in an AMA on the AskScience subreddit.

Photo on 4-22-16 at 4.53 PM #2

Stacy plans to talk about what the metrics and indicators we like to rely upon in science (impact factor, altmetrics, citation counts, etc) to understand “broader impact” and “intellectual merit,” are actually measuring what we purport they measure.

She is not sure they do! And instead thinks that right now, we’re just using rough proxies to understand influence and attention. We’re in danger of abusing the metrics that are supposed to save us all, altmetrics, just like science has done with the journal impact factor.

Stacy will talk about improving measures of research impact, but is also open to taking other relevant questions.

If you wish to participate in the Ask Me Anything, you will need to register with Reddit. There will also be some live tweeting from @altmetric and @digitalsci, and questions on the #AskStacyAltmetric hashtag, so keep your eyes peeled!