Stanford University: Digitization projects at Stanford Hopkins Marine Station library give researchers insight into the history of Monterey Bay

0

The Stanford Hopkins Marine Station doesn’t have a time machine – but it does have a librarian.

The Miller Library at Hopkins Marine Station uses new approaches to unlock old observational data and make it openly available online.

Amanda Whitmire, the head librarian and bibliographer at the Harold A. Miller Marine Biology Library at Hopkins, glimpses into the Pacific’s past through her work digitizing decades of research about the physical and biological environment around the station, which overlooks Monterey Bay.

“That’s the kind of information that you can’t ever replace. You can’t go back in time and make an observation over again,” Whitmire said. “It’s really special that researchers who want to study things that occur over long timescales have access to those kinds of observations through our collections.”

For these historical observations to help researchers now and in the future, scanning them into a computer isn’t enough. Whitmire and her collaborators are designing ways to extract data from the digitized research and make them accessible to researchers far and wide.

From paper to pixels
Whitmire began her career as a scientist, earning her PhD in oceanography from Oregon State University. While searching for alternatives to a traditional faculty career, she was drawn to (and hired for) a position as a data management specialist in the OSU library.

Seeing a way to combine her library experience with her love of oceanography, Whitmire applied for the Hopkins librarian job in 2015. On a tour of the station during her interview, she noticed shelves stuffed with notebooks and binders from Stanford’s partnership with the California Cooperative Oceanic Fisheries Investigations (CalCOFI). These became her first digitization project at Hopkins.

Every week from 1951 to 1974, Hopkins researchers visited stations across Monterey Bay to collect oceanographic data for CalCOFI. The notebooks that caught Whitmire’s attention contained decades of observations about water temperature, salinity, and oxygen, along with counts of phytoplankton and zooplankton, which form the base of the oceanic food web.

Turning to crowdsourcing to extract the data, she uploaded handwritten pages and asked volunteers to translate them into a digital format. The scanned pages are in the Stanford Digital Repository.

Next, Whitmire turned to the collection of 746 undergraduate research papers written at Hopkins from 1963 to 2011. With funding from Stanford and external gifts, Whitmire worked with Hopkins library specialist Melissa Tabbarah to scan them all. Then, they contacted every student author who was still alive to ask for permission to make the papers open-access online.

“You get these emails back from people saying, ‘Oh, my time at Hopkins was the highlight of my entire undergraduate experience. I had so much fun, such a great place,’” Whitmire said. “It was an overwhelmingly positive experience to reach out to all these former students.”

Several of the scanned papers are also accessible in the Stanford Digital Repository, with more coming soon.

From digitized to database
With the experience Whitmire gained from the CalCOFI and undergraduate research papers, she began digitizing other collections and transitioning to the data extraction phase of her first projects. She doesn’t have time to read 23 years of weekly ocean measurements or 746 papers, so she deploys advanced tools that can “read” them for her.

Whitmire’s primary collaborator on Stanford’s main campus is Nicole Coleman, the digital research architect for Stanford Libraries. Coleman has connected Whitmire to resources for mining data, including artificial intelligence.

AI tools are becoming common in digitization work – a standard mobile phone can use optical character recognition to grab text off a printed page. But the trick is getting the data out of that text.

For example, imagine a researcher in 2022 studying a crab species in the intertidal zone. The Hopkins student papers might contain decades-old observations of that crab, which the researcher could use to track shifts in the crab’s population or range. But just having the student papers digitized wouldn’t help. Using natural language processing – an AI tool that can analyze human language – to look for species names mentioned in the student papers, Whitmire could quickly find papers that might contain relevant data.

Whitmire and Coleman are also exploring tools for recognizing handwritten text, which could unlock all sorts of material at Hopkins.

“Nicole views AI as helping us build ‘power tools’ for librarians,” Whitmire said. “It’s something we’re using to help us, not a replacement for librarians.”

From Stanford to the world
Whitmire hopes the materials she digitizes and extracts will be useful to researchers not only at Stanford but also around the world. She hopes her work will help librarians in other places, too. If Whitmire and her collaborators can develop systems for making historical research and data easy to access, they can share their methods.

“I am so lucky to have the resources available to me that I do at Stanford,” she said. “I feel an obligation to be doing everything I can to leverage what we have here for the greater good.”

She’s also working to connect Hopkins’ library collections with global open databases, such as the Global Biodiversity Information Facility.

“To the extent that we can evolve toward a situation where our information isn’t living in isolation and is in fact connected out to these other databases, that’s the vision that I’m working toward,” she said.

Whitmire also assists current Hopkins researchers in structuring their data and metadata so that future librarians and researchers will be able to access it easily.

But she expects most of the rest of her career to focus on making data from the past useful for the present.

“There are all these different ways of approaching biodiversity data to answer different questions researchers might be interested in,” she said. “There are so many possibilities I can’t even imagine.”