AI and searching

February 4, 2023

While ChatGPT has gotten all the press, there are some “AI” (I don't actually like this term[^1] ) Large Language Model and Semantic Analysis tools out there that I think can help doing searches and finding literature.

In a theoretical search scenario, I think I'd start with Perplexity.ai (https://perplexity.ai; no registration required), an “answer engine.” It also gives you a short answer to questions, but, unlike ChatGPT, it's doing actual internet searches (or at least searching a reasonably updated internet index) and cites its sources. You can even ask it for peer-reviewed sources. This is a lot like using a good Wikipedia entry – get an overview, some interesting details, and some references to follow up on. It is, like most internet-based things, going to be biased towards whatever the majority of the sources say, so I could see it spewing some pseudoscience or conspiracy stuff out, but it does look like the programmers gave it some filters on what it uses for sources. As they say, “Perplexity AI is an answer engine that delivers accurate answers to complex questions using large language models. Ask is powered by large language models and search engines. Accuracy is limited by search results and AI capabilities. May generate offensive or dangerous content. Perplexity is not liable for content generated. Do not enter personal information.”

Then, I'd take those sources and plug them into a semantic/citation network search like SemanticScholar.org (https://semanticscholar.org; no registration required), ConnectedPapers.com (https://connectedpapers.com; no registration required), and/or ResearchRabbit.ai (https://researchrabbit.ai; registration required.) These look at the citation networks, author networks, and/or semantic relationship networks of scholarly works and display them in different ways to show you (what might be) related works. Most of these are based on SemanticScholar's database (as the most open and freely available scholarly source out there) so they mostly come up with similar results, but each has additional features that expand on the base. SemanticScholar's “Highly Influential” citations attempt determine works most closely based on the original work. ConnectedPapers looks at 2nd order citations (cited by citing articles or references, etc.) to identify what might be foundational works or review articles, and has nice network maps to explore. ResearchRabbit can look at groups of papers to find common citations and authors, and you can view results in network maps and timelines. If you register, all of these offer alerts, too, based on your searches.

Once I had a core set of works, I'd go back to the tried and true library databases, especially ones with subject headings/controlled vocabulary. Controlled vocabulary establish a single word or phrase for concepts within that particular discipline (MeSH for medicine, ERIC Thesaurus for education, etc.) Every work entered into the database is tagged with these “controlled” terms so that you can be confident that all the articles in Medline/PubMed about heart attacks are tagged with “myocardial infarction.” (There are some experiments with using semantic algorithms to tag database entries, but to the best of my knowledge all or most of the traditional sources still use humans as quality control.) By looking up some of the articles I found through the other sources, I could find the relevant subject headings and use those to search out more results.

Ellicit.org (https://elicit.org; registration required) is another GPT-based tool that bills itself as a “research assistant.” It's a little more complicated, but has some very interesting features. It pulls out quotes from the results that it determines are relevant to your question or topic. You can ask it for particular types of research (qualitative, etc.) or have it highlight aspects of the research, like study size. There are additional “tasks” besides the general search feature – one of which is to find search terms related to a topic! It's still very much in the experimental stage, but also very intriguing.

So...with all of these new tools, am I worried about being replaced by a librarianbot? No, I'm not.

Using these tools requires skill, which means either training or time and willingness to experiment. In my experience, most people want training and are happy to outsource the experimentation to people like me.
The scholarly publishing world is complicated and while open access has made a lot of stuff more easily available, it's also made things even more confusing. I get a LOT of questions about accuracy, currency, and other quality issues and I do not see those going away anytime soon. (And in the short term, I think those will be even more of an issue as tools like ChatGPT generate plausible but inaccurate text that gets put out there unidentified.)
Access is still a big issue, and librarians are the people that most institutional scholars turn to for access issues.
A lot of people like working with people or at least having a person available to them. It's reassuring to know a real person has your back. “Hand holding” (you know what to do but you want me to reassure you that you are doing it right) is a big part of learning, especially in this age of anxiety.
Most of these tools remove tedium. Younger scholars have no idea what I mean when I say that the biggest benefit of online databases is being able to search more than one year at a time. Only people who remember printed indexes (or at least CD indexes) can appreciate NOT having to search each yearly (or semi-annual) volume one after another after another... I'm quite happy not doing that any more. It means I get to do more interesting things, like working on systematic reviews with other researchers, or investigating new search tools, or teaching citation and note-taking systems.

Which means I'm excited for these new tools, as long as they are producing useful results. I'm less excited about tools that produce mis-information, like ChatGPT's made up citations[2] or the AI-powered voice synthesizer that everyone except the promoters predicted would be used for faking celebrity statements.[^3]

So go out and enjoy the AI (again, see footnote 1). No one is stuffing this genie back into the bottle, so we need to learn how to live with it. And it can make some things better.

[1] I don't like the term AI/Artificial Intelligence because we all grew up with science fiction and AI that was actually AI – artificial beings with what we could easily recognize as human-like consciousness and intelligence. (Putting aside for the moment the problem that we often don't recognize or want to recognize the intelligence of other human beings – often the point of those science fiction stories.)

[2]

Why does chatGPT make up fake academic papers?

By now, we know that the chatbot notoriously invents fake academic references. E.g. its answer to the most cited economics paper is completely made-up (see image).

But why? And how does it make them? A THREAD (1/n) ? pic.twitter.com/kyWuc915ZJ
— David Smerdon (@dsmerdon) January 27, 2023

[3] https://gizmodo.com/ai-joe-rogan-4chan-deepfake-elevenlabs-1850050482

Update (2023-04-23): Aaron Tay has also been experimenting with using LLM's for search, specifically Perplexity, and he keyed in on the ability to restrict the sources used as well. https://medium.com/a-academic-librarians-thoughts-on-open-access/using-large-language-models-like-gpt-to-do-q-a-over-papers-ii-using-perplexity-ai-15684629f02b