Designing a GenAI search tool for NASA scientists

THE OUTCOME

Prototyped a generative AI- powered search tool, the Future Works Lab, which pulls and summarizes the future works section of several papers in the user’s specified domain.

TOOLS

Figma

THE CHALLENGE

Design an AI-powered solution to help NASA scientists discover new areas of research.

ROLE

AI Product Design Prototyper

 

Prototype: Future Works Lab

The prototype of Future Works Lab (Expand for readability)

Relevance 

Arguably, the most difficult part of the scientific process is finding an interesting research question to explore. To find one, researchers spend weeks (sometimes longer) scouring the existing literature in and outside of their domain of expertise. Usually, most of this literature is irrelevant or not useful. This can lead to a frustrating and disempowering experience for researchers, which is unfortunate since this part of the scientific process can be quite exciting.

Existing research databases don’t allow researchers to easily find research opportunities or gaps. While they’ll return a list of likely relevant papers based on a researcher’s search terms, a user still needs to click through each search result individually with no promise that it will offer ideas for future work. The process is still manual and tedious. 

With our search tool, Future Works Lab, we provide researchers with a quick and easy way to identify research gaps and opportunities for new research in a particular domain or domains. Our search tool will decrease the time it takes researchers to understand current research and discover ideas for future work. 

How it works

I prototyped the experience in Figma to help visualize my solution. We are still working on a fully-functioning alpha version of this tool. To use the search tool, users first type a query into the search bar. This query can be any topic they’re interested in exploring gaps in, such as “quasar” or “exoplanets”. The query can be in any domain that is accessible in arXiv. This includes: computer science, economics, electrical engineering and systems science, mathematics, physics, quantitative biology, quantitative finance, and statistics.

Figure 1: This is the home page for our search tool. Here, users can type their query into the search bar.

The search tool will then return 5 relevant papers in the specified domain that each have actionable ideas for future works. Each result’s preview will have the paper’s title, author(s), and a 30 word summary of the future works. The user will also have the option to read more about how to improve upon the research.

Figure 2: Once users select “Search”, they are presented with 5 relevant papers with robust future works that are summarized in 30 words in the preview.

If a user selects “Read more on how to improve this research”, they’ll be taken to an expanded view that includes the paper’s title, author(s), abstract, full summary of the future works, and an option to download a PDF of the full paper.

Figure 3: If users select “Read more on how to improve this research”, they are taken to an expanded view that has the paper’s title, authors, full abstract, summarized future works, and PDF download option.

On the backend, we’re essentially finding and summarizing the future works (usually) outlined in the discussion or conclusion sections of research papers and presenting that to the user to speed up the process of finding relevant gaps in research related to a user’s query. For the technicalities of how we’re approaching this, please read the Deployability section.

Effectiveness

To decide on the best approach for our task, we conducted user interviews with astrophysicists. We wanted to understand researchers’ current process in discovering research gaps and their perspectives on the “future works” section of published papers. We asked the following questions:

  • How do you stay up-to-date on literature in your field? What is easy and hard about it?

  • How often do you find yourself looking for information at the intersection of your field and another field? What is easy and hard about doing this?

  • Do you ever need to know what’s “trending” in other fields that may intersect with yours?

  • How do you or your team find inspiration for the next thing to research?

  • How actionable are the plans in the “future work” or “future research” section of a research paper?

    • Is it something that people actually look at?

    • As someone who writes papers, how serious do you take writing that section?

    • Do most scientific papers have this section?

    • How often have you received inspiration from this section to explore something new?

Our interviews yielded many helpful insights, but here are a few highlights that informed our approach:

  • We learned that groundbreaking papers are rare; most papers are inspired by researchers trying to reproduce other work, or researchers making small method changes to existing work

  • Future works sections are usually about small changes in methods that could make a difference in a future paper or address caveats

  • Not all papers have a future works section. Sometimes future works are mentioned in the conclusion, discussion, or limitations sections too

Due to our user research at the beginning of our design process, we were able to design a prototype of an experience that directly addresses researchers’ needs. Our tool effectively:

  • Identifies the top most relevant papers based on the robustness of their future works

  • Summarizes the future works sections of these relevant papers, thus quickly providing researchers with ideas for improving upon existing work

  • Looks for future works throughout a paper, including the conclusion, discussion, or limitations sections in order to provide the most accurate future works summary to the user

  • Decreases the time it takes for a researcher to find gaps because all the relevant future works in their domain of interest is clearly and succinctly summarized for them to decide whether it’s a topic they want to pursue

According to our preliminary user research, standard practice of finding gaps and opportunities in research takes weeks, possibly longer, as researchers have to manually sift through, read, and understand thousands of words of content before finding ideas that are relevant and interesting to them. With Future Works Lab’s search tool, we anticipate this time to be cut in half because we directly provide researchers with relevant gaps in existing research that they can explore. Standard academic search tools are not built to identify gaps in research; rather, they are built to churn out hundreds of papers that are related to a query, many of which are not relevant to what a user is actually looking for. At this time, because we are still working on building and testing an alpha version of this search tool, we haven’t tested it with real users yet, but this will be our next step.

Deployability

We are still working on a fully-functioning alpha version of this tool. Our vision is for the tool to be consumed as a SaaS application that works like any other web tool: a user simply visits our website (or the website can be hosted within the NASA domain) to use the tool.

Under the hood, we use ArXiv’s public API to search for papers that are relevant to the user’s search term (e.g. “quasar”) and that also mention “future work”. We’re also making use of an open-sourced API wrapper called arxiv that is available as a Python module.

We plan to expand this search to use phrases similar to “future work”, such as “going forward”, “in the future”, and “next steps”. Since Arxiv’s APIs don’t actually allow us to search through the paper itself, we had to design another way to do this. To extract the future work portions of each paper, we plan to use the arXiv Python module to store the CSV versions of each paper that a search returns in an Amazon S3 bucket, where we’ll use an open-sourced document analysis AI tool from HuggingFace to extract the sections of the paper that describe future works.

Once extracted, we’ll use an open-sourced AI summarizer published on HuggingFace called bart-large-cnn, which was built and published by Facebook, to summarize the future work, which we’ll show the user in our tool’s UI.

Assuming this tool would be deployed within NASA’s own network, the only cost is Amazon S3. S3 is very cheap — the cost directly depends on the number of papers stored in S3, which depends on the number of research areas the tool should support (e.g. stellar astrophysics, astrobiology, exoplanets). As per the official S3 pricing page, S3 costs $0.0023 / GB / hour for storage, and $0.0005 per query request. For practical purposes, the total cost would be less than $10 per month.