AI and searching

While ChatGPT has gotten all the press, there are some “AI” (I don't actually like this term[^1] ) Large Language Model and Semantic Analysis tools out there that I think can help doing searches and finding literature.

In a theoretical search scenario, I think I'd start with Perplexity.ai (https://perplexity.ai; no registration required), an “answer engine.” It also gives you a short answer to questions, but, unlike ChatGPT, it's doing actual internet searches (or at least searching a reasonably updated internet index) and cites its sources. You can even ask it for peer-reviewed sources. This is a lot like using a good Wikipedia entry – get an overview, some interesting details, and some references to follow up on. It is, like most internet-based things, going to be biased towards whatever the majority of the sources say, so I could see it spewing some pseudoscience or conspiracy stuff out, but it does look like the programmers gave it some filters on what it uses for sources. As they say, “Perplexity AI is an answer engine that delivers accurate answers to complex questions using large language models. Ask is powered by large language models and search engines. Accuracy is limited by search results and AI capabilities. May generate offensive or dangerous content. Perplexity is not liable for content generated. Do not enter personal information.”

Then, I'd take those sources and plug them into a semantic/citation network search like SemanticScholar.org (https://semanticscholar.org; no registration required), ConnectedPapers.com (https://connectedpapers.com; no registration required), and/or ResearchRabbit.ai (https://researchrabbit.ai; registration required.) These look at the citation networks, author networks, and/or semantic relationship networks of scholarly works and display them in different ways to show you (what might be) related works. Most of these are based on SemanticScholar's database (as the most open and freely available scholarly source out there) so they mostly come up with similar results, but each has additional features that expand on the base. SemanticScholar's “Highly Influential” citations attempt determine works most closely based on the original work. ConnectedPapers looks at 2nd order citations (cited by citing articles or references, etc.) to identify what might be foundational works or review articles, and has nice network maps to explore. ResearchRabbit can look at groups of papers to find common citations and authors, and you can view results in network maps and timelines. If you register, all of these offer alerts, too, based on your searches.

Once I had a core set of works, I'd go back to the tried and true library databases, especially ones with subject headings/controlled vocabulary. Controlled vocabulary establish a single word or phrase for concepts within that particular discipline (MeSH for medicine, ERIC Thesaurus for education, etc.) Every work entered into the database is tagged with these “controlled” terms so that you can be confident that all the articles in Medline/PubMed about heart attacks are tagged with “myocardial infarction.” (There are some experiments with using semantic algorithms to tag database entries, but to the best of my knowledge all or most of the traditional sources still use humans as quality control.) By looking up some of the articles I found through the other sources, I could find the relevant subject headings and use those to search out more results.

Ellicit.org (https://elicit.org; registration required) is another GPT-based tool that bills itself as a “research assistant.” It's a little more complicated, but has some very interesting features. It pulls out quotes from the results that it determines are relevant to your question or topic. You can ask it for particular types of research (qualitative, etc.) or have it highlight aspects of the research, like study size. There are additional “tasks” besides the general search feature – one of which is to find search terms related to a topic! It's still very much in the experimental stage, but also very intriguing.

So...with all of these new tools, am I worried about being replaced by a librarianbot? No, I'm not.

Which means I'm excited for these new tools, as long as they are producing useful results. I'm less excited about tools that produce mis-information, like ChatGPT's made up citations[2] or the AI-powered voice synthesizer that everyone except the promoters predicted would be used for faking celebrity statements.[^3]

So go out and enjoy the AI (again, see footnote 1). No one is stuffing this genie back into the bottle, so we need to learn how to live with it. And it can make some things better.

[1] I don't like the term AI/Artificial Intelligence because we all grew up with science fiction and AI that was actually AI – artificial beings with what we could easily recognize as human-like consciousness and intelligence. (Putting aside for the moment the problem that we often don't recognize or want to recognize the intelligence of other human beings – often the point of those science fiction stories.)

[2]

[3] https://gizmodo.com/ai-joe-rogan-4chan-deepfake-elevenlabs-1850050482

Update (2023-04-23): Aaron Tay has also been experimenting with using LLM's for search, specifically Perplexity, and he keyed in on the ability to restrict the sources used as well. https://medium.com/a-academic-librarians-thoughts-on-open-access/using-large-language-models-like-gpt-to-do-q-a-over-papers-ii-using-perplexity-ai-15684629f02b