I’m Stacy. I help scientometrics researchers find and understand the data they need to study how research is created, communicated, funded, and commercialized in society.
In 2018, I wrote a post for The Bibliomagician blog on identifying authors’ genders based on name analyses, based on a lively discussion on the LIS-Bibliometrics listserv. I’m reposting the blog post here under a CC-BY license.
Recently on the LIS-Bibliometrics listserv, Ruth Harrison (Imperial College London) posed a question on behalf of a patron who was interested in identifying authors’ genders based upon names listed on ~2,000 journal articles–too large a corpus for manual analysis. The community weighed in with many good suggestions for ways to approach a large scale gender analysis for author names. We thought it would be helpful to others to share what Ruth learned (with permission from the original posters).
Here are some recommendations from LIS-Bibliometrics listserv members on the best places to find author names, APIs and software you can use to analyze gender, consultants you can hire the analysis out to, and previous approaches to analysis from other gender bibliometrics researchers.
Where to find author names lists
Web of Science was most recommended as being a good way to download full author names for publication lists. Programmatic access via the Web of Science API is usually available for licensing (libraries are usually the purchasers of Web of Science access for institutions, so you should contact your library to inquire as to whether API access is included in your institution’s contract).
We would be remiss if we did not point out the challenges that face anyone seeking to do a study that determines a person’s gender, based on name alone.
New citation index Dimensions also makes authors’ full names available for download (though only for up to 50 papers at once in the free version of the app) and via the Dimensions API, which is freely available for those doing scientometrics research.
On the other hand, listserv members pointed out that Scopus only makes authors’ first initials available both in metadata downloads for publication lists and via the Scopus API. Therefore, it is unsuitable to use in isolation for finding author names.
APIs and software
Automated gender analysis requires a bit of programming knowledge (or at least a willingness to learn). In particular, calling APIs and parsing publication metadata are two essential programming skills.
Gender API is a recommended service that allows you to look up the likely gender (and degree of confidence) for a particular name or list of names. For example, you could query the name “Diana” and learn that the name is classified as ‘female’, with a 93% accuracy rate based on a sample of 523 names. The providers offer clients for interacting with the API in PHP, Python, and several other programming languages.
Namsor is another recommended API for looking up gender based on names, and it has the added feature of looking up ethnicity, as well. The free API allows for a limited number of monthly calls; you can also pay for API access to increase your API call limit.
GenderChecker is a recommended name list that can be downloaded for less than $200 USD, then analyzed. As one listserv poster explained, “It’s not 100 percent accurate, but works for most American/European first names, especially if you have a large dataset. Be very careful with Chinese/Japanese/Korean names; most of the time they should be neutral unless you further checked.”
Genderize.io is yet another API that was not recommended by listserv members, but appears in several recent studies and reports. The Genderize database reportedly contains 216,286 distinct names across 79 countries and 89 languages. It is free to use but rate-limited to 1000 requests per day.
Finally, the recommended Python package SexMachine allows you to look up the gender for around 40,000 names. For each name you query, you will get a response for one of the following categories: andy (androgynous), male, female, mostly_male, or mostly_female. For example, the query “Paul” would return “male”, whereas the name “Stacy” would return “mostly_female”.
Other gender researchers’ approaches
Listserv members also suggested that Ruth and her patron look to existing author gender analysis studies to find methods to borrow. Two in particular–a 2013 commentary from Nature, and a more recent Elsevier report–were the most mentioned:
- Larivière, V., Ni, C., Gingras, Y., Cronin, B., & Sugimoto, C. R. (2013). Bibliometrics: Global gender disparities in science. Nature, 504(7479), 211.
- (2017). Gender in the global research landscape.
The Nature study’s supplementary files include a thorough discussion of how to parse Web of Science names data for a variety of countries of origin.
One listserv respondent pointed out that “The Elsevier report’s methodology implies they didn’t have an easier way to [identify author gender] (“Scopus Author Profiles were combined with gender-name data from social media, applied onomastics, and Wikipedia”).” More details on the study’s methods can be found in a report appendix. Particularly useful is a discussion of the various name-gender APIs suitability for multi-country analysis.
For those who want to hire out the work, Science-Metrix, Elsevier Analytical Services, and Digital Science Consultancy are all businesses that offer a variety of bibliometrics analysis services, which may include gender analysis. Contact the consultancies themselves for more information.
We would be remiss if we did not point out the challenges that face anyone seeking to do a study that determines a person’s gender, based on name alone. First and foremost, there is the question of ethics: does this kind of study rob authors of their right to be identified as a particular gender that might not match the expected gender for someone with their name?
Related to that issue is the problem of the assumption of a gender binary. All studies in this area tend to identify authors as “Male”, “Female”, “Unisex” (as in, a name that is suitable for both men and women), and “Unknown”. How can researchers more accurately identify the gender of someone who identifies as genderqueer or agender, for example? It doesn’t seem possible to do so using a simple names analysis, meaning that these kinds of studies should be approached and described with that caveat in mind.
Then there are technical issues related to the dearth of useful author metadata and regional name-gender data. “What about cases where the author info only includes initials?” one listserv respondent wrote. Other respondents pointed out that many name-gender analysis tools are biased towards Western names, making it difficult to do accurate analysis on authors from other areas of the world.
Do you have suggestions for other ways to analyze gender based upon author names (or other freely available information online)? If so, please leave them in a comment below!
Not too long ago, I used to live in New Mexico. For those thinking of visiting, here are my many recommendations for local curiosities and delicious food.
- Careful with your alcohol consumption – the elevation can make one drink feel like three!
- If you’re visiting in the fall/winter, bring your jacket – people are often surprised to learn that central and northern New Mexico can get snow and cold, due to the elevation
- In Albuquerque: careful when out and about, both in the daytime and at night – ABQ has a lot of crime, including violent crime and muggings
- In Albuquerque: Careful with your car – break-ins are common
To the east of Albuquerque, on Route 66
If you’re traveling through New Mexico along Route 66, I recommend stopping at Tinkertown Museum just outside of Albuquerque. It’s a fun, weird little collection.
While you’re in Cedar Crest area, you should hike the eastern side of the Sandia Mountains. One trail (I can’t remember the name) crosses the ridge to the western side of the range and ends in a lookout over Albuquerque, and the hike is gorgeous. I’d highly recommend this over the La Luz trail on the western side of the Sandias–LL is popular but can be dangerous (as in, people have died) for those unaccustomed to hiking in NM (nearly all NM hiking is SUPER hardcore – even trails marked “beginner” or “moderate” can be challenging).
The tram up the Sandias on the western side (part of the La Luz trail) is popular, and can be reached via car instead of hiking.
I should warn you that when you get into ABQ via the eastern side of Route 66, it’s really, really depressing. The further west you go, the more livable it gets.
Consider visiting during the annual Balloon Fiesta. It’s truly magical.
Some fun and delicious Route 66 stops & sights along the way through ABQ include:
- El Patio (just off of Route 66/Central on Harvard) for classic New Mexican cuisine. It’s much more laid back than the popular (and touristy) El Pinto, and the food is amazing. They often have Spanish guitarists playing on the patio in the evenings.
- A lot of the kitschy old hotel signs have been converted to public art – fun for a nighttime drive
- Forget the “Route 66 diner” tourist traps in town, and get to Frontier Diner at Cornell & Central
- For vegetarian food, Namaste (off of Yale) is a better bet than Annapurna’s (IMHO), and Vinaigrette near Old Town is really delicious
- Duggans Coffee for the best iced Americano and egg breakfast in town
- Anodyne in downtown is the chillest bar with a great beer selection
- 516 Arts is one of the best places for contemporary art in NM
- Old Town Albuquerque is cool for learning about New Mexican history, but if you have the time I’d recommend visiting Santa Fe (an hour north) to see a better example of Spanish colonial history
- In general, Albuquerque has a great craft beer scene. I personally loved Marble Brewing (try Marble Red) and La Cumbre (esp. Elevated IPA). Nexus Brewing wins for best food. Bow & Arrow Brewing is your bet for the nicest ambiance for an afternoon hang.
Bernalillo (just north of Albuquerque) & Jemez Springs
I used to live in a bedroom community called Placitas, which is nestled just north of the Sandia Mountain range, so I spent a lot of time in Bernalillo.
- The Coronado Historic Site in Bernalillo (just north of ABQ) is a great little museum for learning about the conquistadors, plus some of the ancient indigenous residents of the area. You can also walk down to the banks of the Rio Grande from the site.
- Kaktus Brewing is a funky place to spend the afternoon hanging out and playing with chickens (with great beer and kombucha to boot)
- The Range Cafe in Bernalillo is a local landmark, with delicious food
- On the drive into Placitas, there’s a trailhead just east of the shopping center. There are a bunch of trails back there that edge up against the Sandia foothills. The beauty is unparalleled – it’s hard to believe I once spent every morning hiking there with my dogs.
- If you have time, I HIGHLY recommend continuing north on Route 550 and visiting Jemez Springs. There are both natural hot springs within hiking distance of the highway, plus more developed hot springs in town.
- Meow Wolf alone is worth the trip. Full stop.
- The Tea House on Canyon Road has good vegetarian and gluten-free food, and great service.
- Ten Thousand Waves Japanese-style spa – you can do drop-ins during the day. Be sure to try the on-site restaurant – it’s a tad spendy but totally worth the price.
- The entire town is a monument to Spanish colonialism, very worth checking out the history museum – especially if you’re from a region of the US that think the Pilgrims founded America.
- Learn about the Pueblo Revolt before you go (here’s a great book on the topic, and here’s the Drunk History version), then think about Pope as you walk around town
- Skip the Georgia O’Keefe Museum and head to the IAIA Museum of Contemporary Native Arts (MoCNA)
- Christmas Eve in Santa Fe is beautiful. All the galleries on Canyon Road open up their doors, and there’s a big street party with piñon logs burning and biscochitos and cider. Around 11 pm, the Basilica church bells ring and all the Catholics scurry into Mass, at a church that houses La Conquistadora, a Madonna (the first!) that was schlepped to the New World by the conquistadors (!!!)
Westward on Route 66
- Petroglyph National Monument to the west of Albuquerque is a great hike with amazing artifacts – as is El Malpais National Monument
- I’ve heard great things about Acoma pueblo, if you’re interested in visiting a reservation (I’ve never been)
- Pro-tip: there’s not a lot to do in Grants or Gallup
Cross-posted from TheIdealis.org
In August, I stepped down as a Founding Editor of The Idealis to focus on other projects. Nicky Agate is now The Idealis’s Editor in Chief.
The Idealis started out of community conversations around LIS scholarship and open access, and I’m proud of what we’ve accomplished so far: over 290 recommendations for freely available scholcomm research; more than 44,000 views and 400 subscribers; and most importantly a stellar team of 38 editors who have dedicated their time and expertise to finding the very best scholcomm research and sharing it with the community.
I’m very grateful to The Idealis’s volunteers, especially Nicky, for taking The Idealis forward. I look forward to seeing what The Idealis has in store, and will remain a faithful reader of the site for years to come. Thank you!
The Journal of Librarianship and Scholarly Communication just published “Scholarly Communication Librarians’ Relationship with Research Impact Indicators: An Analysis of a National Survey of Academic Librarians in the United States“.
This is the final publication related a topic I’ve been working on since 2013 (!), when I first realized that although academic librarians were interested in research metrics, no one had yet studied the reality of how they were using these kinds of indicators in their day-to-day jobs and in support of their own careers.
Along the way, I’ve been privileged to work with Sarah Sutton and Rachel Miles (and for a short period, Michael Levine-Clark) on a series of publications and presentations that include:
- “Is What’s “Trending” What’s Worth Purchasing? Insights from a National Study of Collection Development Librarians” in The Serials Librarian (which we also presented upon at NASIG 2016 in Albuquerque)
- “Awareness of Altmetrics among LIS Scholars and Faculty” in Journal of Education for Library and Information Science (which we compared to librarians at ER&L 2016 in Austin, TX)
- “What’s used to gauge when engaging?: Determining academic librarian roles in research assessment reporting services“, presented at the 2016 Bibliometrics and Research Assessment Symposium in Bethesda, MD.
- “Scholarly Communication Librarians’ Relationship with Research Impact Metrics,” a panel presentation at ‘Finding Meaning in Metrics’ at ALA Annual 2016 in Orlando, FL
- “Use of Altmetrics in US-based academic libraries,” a presentation at the Second Altmetrics Conference in Amsterdam (summarized on the Altmetrics Conference blog by Ian Mulvaney)
- “Myth vs. reality: Altmetrics and librarians,” a presentation at the Altmetrics15 workshop in Amsterdam
We ultimately learned that:
- Your seniority/years of experience has no effect upon how familiar you are likely to be with various research metrics
- Librarians and LIS educators alike are more familiar with traditional research impact metrics like the JIF than they are with altmetrics
- Altmetrics are least likely to be used for collection development, though this is a use case I’ve been promoting for a long time
- The more scholcomm-related duties you have in your job, the more you’ll use metrics of all kinds
- Altmetric is the most popular altmetrics database used by librarians 😎
Sarah and Rachel plan to carry this path of research forward, expanding the scope of the study to include librarians worldwide, and also possibly looking at library promotion and tenure documents’ discussion of metrics. I wish them the very best and want to once again express my gratitude towards them as collaborators: Ladies, I hope to work with you both again in the future!
Last month, an article I co-authored with Josh Finnell on the challenges of organizing librarians at the grassroots was published in International Information & Library Review.
We librarians love to bemoan the state of our professional organizations. (Who doesn’t?) But as board chair of Library Pipeline–a fledging professional association for librarians–and volunteer for both Pipeline’s Green Open Access Working Group and the Innovation in Libraries Awesome Foundation chapter, I have to say, running a professional organization is often tough and thankless work.
Luckily, it’s also rewarding work. Through Pipeline, I’ve gotten to know our profession’s best and brightest (including my co-author Josh), contributed personally to ‘opening up’ the LIS literature to all readers, and helped others vet and fund some amazing library-based projects from around the world.
The article that Josh and I wrote explains the brief history of Library Pipeline to date and where we’re headed next–while also pointing out some challenges that exist for others who might want to launch a grassroots library professional organization of their own. You can read it on the IILR website or check out the preprint on Figshare.
In case you’re wondering, Pipeline has been mostly quiet for the latter half of 2017 as the board worked to create our bylaws and revise our mission statement, so we’re better positioned to expand our work in 2018. To learn more about Library Pipeline and to become a volunteer, visit our website.
Finnell, J.and S. Konkiel. Building and Sustaining a Grassroots Library Organization: A Three Year Retrospective of Library Pipeline. 2, figshare, 2 Jan. 2018, doi:10.6084/m9.figshare.5727084.v2.
In 2017, I:
- Earned my yellow belt in krav maga
- Drove 1,321 miles cross-country* to make a new home in Minneapolis, MN
- Learned how to drive a manual car
- Started taking computer programming a bit more seriously
- Visited Poland
- “Came out” as a socialist and joined DSA (I’m now a monthly sustaining member and believe you should be, too, if you’re a progressive of any flavor–DSA does amazing work nationwide)
- Told anyone who would listen about Sarah Schulman’s Conflict Is Not Abuse: Overstating Harm, Community Responsibility, and the Duty of Repair. Go read it, now.
I also accomplished a lot professionally, but that’s a post for another time.
I began using Qbserve to track my computer-based time around June of this year. In looking back at the past six months’ worth of data, I’m a bit disturbed to learn that I spent:
- 2 days chatting in Slack
- 18 hours watching Netflix
- 8.5 hours watching Amazon Prime
- 26 hours answering emails (both personal and work-related)
- 21.25 hours in Omnifocus (task management software for complete nerds)
- …and at least 20 hours faffing about on various social media sites
In 2018, I hope to maintain or lower most of these metrics (which would mean I’d be cutting that time spent roughly in half), in favor of getting to know my neighbors and deepening my personal relationships, both at work and at home.
To that end, I’m aiming to leave Twitter for the year (though I may pop back in occasionally for work-related postings). (Here’s a bit of background on why I’m making that decision.) It’s a bit nerve-wracking–Twitter is pretty important to me professionally–but I’m guessing that it will pay off to spend my time and energy elsewhere.
I’m also looking to simplify my life in other ways, which will mean fewer new projects (and ending some existing projects–more on that in the months to come). Saying “no” has historically been difficult for me when I get excited about an idea. In 2018, I want to do less, but better.
I hope to update this blog more regularly, in lieu of offering updates via social media. If you’re reading this, chances are I want to hear updates from you, so you should stop what you’re doing and email me right now to say hello (firstname.lastname@example.org for work colleagues, email@example.com for friends and family).
Here’s to a grounded, intentional 2018!
* If you’re curious, this road trip consisted of overnight stops in Clayton, NM; Dodge City, KS; Kansas City, KS; and Bloomington, IN. This was my fourth cross-country road trip. America is a great big beautiful place–especially the middle part.
I’ve recently launched a fun new project, @SociologyBot. It’s a Twitter account that recommends recently discussed research in the field of (you guessed it) sociology.
I’ve been wanting to explore the “altmetrics as a filter” idea for a long time. Being able to find not only disciplinary research but also the conversations surrounding research appeals to me, and I bet that other researchers would like access to that kind of information, too.
So, now I’m experimenting with a prototype “bot”, @SociologyBot. What sets @SociologyBot apart from other research recommendation bots on Twitter are a few things:
- It’s a social sciences bot (which are surprisingly rare!)
- It tweets out new and old research alike (not just the “recently published” stuff)
- It surfaces both research and the conversations surrounding research
- It’s not actually a bot (yet)!
I’m prototyping @SociologyBot right now, meaning it’s powered using a mix of manual and automated means. (Hence the scare quotes I keep putting around “bot”.) That’s because I want to understand if people actually care about this kind of a bot before I put a lot of time and energy into coding it! I guess you could call @SociologyBot a “minimum viable product”.
Here’s how @SociologyBot currently runs:
- I set up a search in Altmetric Explorer to find articles from the top ten sociology journals (as identified by SCImago Journal Rank) that have been mentioned in the last day. I use the journal shortlist as a basis this not because I particularly care for finding only research published in the “top” journals, but because it makes the list of articles much more manageable.
- Explorer sends me a daily email summary of said articles.
- Based on the shortlist provided in the summary email from Explorer, I schedule new daily tweets using TweetDeck that include both the article with the highest Altmetric Attention Score (AAS) and a link to the Altmetric details page, where discussions of the articles can be found.
- Using TweetDeck as automation, @SociologyBot then tweets out one scheduled article daily, at 8 am Mountain time.
Here’s how I plan to build @SociologyBot so that it’s fully automated:
- I write a script to query the Altmetric API every 24 hours to find sociology articles that have been mentioned online in the past day.
- The script takes the article with the most mentions and checks whether it’s already been tweeted about in the past month, as a safeguard against the same popular articles being constantly recommended.
- If it hasn’t, the script then composes a tweet that links to the article and its Altmetric detail page. If it has, the script will then check for the article with the next highest AAS that has not been recently tweeted, and will compose a tweet for that one instead.
- The script then posts the article and its Altmetric details page immediately to the @SociologyBot Twitter account.
Whether or not @SociologyBot gets a lot of followers, and whether or not those followers actually click on the Altmetric Details Page links, will determine whether @SociologyBot is a success (and thus whether I should bother coding it to be a proper bot!)
So: if you’re interested in sociology research and want to see this little guy come to life, please give @SociologyBot a follow!
CC-BY Nicky Agate / Medium
I’m excited to announce that the HuMetricsHSS research team–which I was a part of at the 2016 TriangleSCI conference–has received the support of the Andrew W. Mellon Foundation to continue our work of encouraging the discovery and use of “humane” research evaluation metrics for the humanities and social sciences.
HSS scholars are increasingly frustrated by the prevalence of the use of evaluation metrics (borrowed from the sciences) that do not accurately capture the impacts of their work. Our grand vision is to develop better metrics, ones that are rooted in the values that are important to scholars. This grant-funded research is a start.
From the press release:
“We are reverse-engineering the way metrics have operated in higher education,” said Christopher P. Long, Dean of the College of Arts & Letters at Michigan State University and one of the Principal Investigators (PIs) of the Mellon-funded project. “We begin not with what can be measured technologically, but by listening to scholars themselves as they identify the practices of scholarship that enrich their work and connect it to a broader public.”
Much gratitude to the Mellon Foundation for supporting HuMetricsHSS.
I just used a service called Cardigan to delete the 10k+ tweets I’ve published since 2007, when I first joined Twitter.
I don’t know about you, but I’ve changed a lot since I was 24 years old.
It didn’t make sense to me to keep ten years worth of miscellany–silly jokes, uninformed hot takes, occasional sharp insights, and so on–up on the Internet, gathering dust, making advertising money for Twitter. I don’t want to support a company that even with a $10.8 billion valuation somehow can’t get it right and stop banning innocent users rather than the Nazis who are harrassing them.
I don’t enjoy Twitter anymore. Over the years, Twitter has gone from a great place (to stay in touch with friends and former colleagues worldwide, to find interesting research and industry news, to meet new people) to one that seriously bums me out every time I log on (every day brings a new outrage, smart people sniping at each other, Mean Librarian Twitter, and unintelligible memes). It’s become superficial on a lot of levels. It’s often used as a tool to demean and call out rather than enrich and uplift.
All that said, I’m not going to delete my account outright. Twitter is still somewhat important professionally, so I’ll continue using it to share the occasional piece of research or to livetweet interesting conferences.
But I’d rather let my writing and research speak for itself, in longform. And for my personal and professional relationships to deepen, offline.
I’ll be slowly unfollowing accounts who aren’t directly relevant to my interests or my work at Altmetric (sorry!) and hopefully logging on a lot less. I’ll also aim to delete my tweets and favorites every so often, to keep things fresh.
If you need me, email me at firstname.lastname@example.org (personal) or email@example.com (work).
With love and gratitude to my friends and followers for ten years of shitposting and networking…