3+1 Innovative tools that can help you create a Literature review — Research Matrix of papers — Scholarcy, Elicit.org, Lateral.io and Notion/Obsidian/Roam Research…
Years ago, as a final year undergraduate doing my undergraduate thesis, my supervisor at the time, introduced me to this popular way of summarizing related papers.
You would create a table listing relevant and related papers in each row, with columns representing common characteristics of the paper for comparison. So for example you could have as columns
- Year of publication
- Main findings etc.
- Population/Intervention/Outcome (from PICO model)
- Independent variables used
- Definition of depedent variables etc
- Other aspect of methodology or theory
This is what some call a Literature review research matrix of papers or sometimes just research matrix or literature matrix. This research matrix or table of studies can be very useful for you to summarize the papers of interest and how they are similar or differ from each other. They can help you see at a glance what has been studied (e.g what variables have been used and what were the results) and gives you clues on what gaps in the literature might exist.
Unfortunately most reference managers do not seem designed to help support such work.
For example, Zotero does not have very good support for adding custom fields. A Phd writing in 2021 in the Zotero forum notes
The #1 reason scholars cite for *not* using a reference manager is the inability to add custom fields to support their own idiosyncratic workflows. I was surprised, to be frank. Not tech-aversion or resistance to change, but simply lack of customization.
I understand that underlying data structure changes are beyond challenging. But tagging and the ‘extra’ column are not sufficient workarounds to support the delightfully diverse spectrum of scholarly workflows I’ve seen. This is a HUGE opportunity for Zotero to lead, because as of my last tech-comparison, no other reference manager does this either.
While some reference managers are a bit more flexible in terms of the custom fields that can be added (e.g EndNote can add more custom fields than Zotero) and with a ton of fiddling you could improve things a little, none are particularly good at customization of display, nor do they assist in extraction of such information.
I suspect there is a bunch of tools designed for systematic review, evidence synthesis tools that are designed for such work, particularly extraction of data, but I am unfamilar with them.
You could of course complete all your searching first, import the references into your reference manager of choice as you go along. And once you are all done export the papers into CSV and edit the table from there. This presumes you will never add another paper in the future after this point.
However in reality the benefit of the research matrix is maximised if you build it up as it goes on while enhancing your sense of the area. In fact, many use it to help with research proposals etc and you obviously do it from the start and not when the whole project is done.
I don’t believe there is a perfect solution yet, but before are some new tools that are attempting to solve different aspects of the problem.
- Scholarcy — automatically summarize papers to extract predefined columns of information
- Elicit.org, automatically create Research Matrix of papers with GPT-3 technology
- Lateral.io, add pdfs of papers and quickly create Research Matrix of papers
1. Scholarcy , automatically summarize papers to extract predefined columns of information
This Medium blog has mentioned Scholarcy many times and it has changed rapidly over the years.
But no matter how it changes, Scholarcy’s core functionality remains the same. You would upload PDFs and it would summarize the paper by extracting information from the full text.
Among some of what it extracts are
- Key Concepts (linked to Wikipedia)
- Tables (exported as csv)
- Funding information
- Ethics section
- References (downloadable in bibtex format)
Scholarcy has a ton of cool features from being able to export results into latest knowledge management tools like Obsidian, Roam to create Knowledge graphs to turning your thesis into a powerpoint presentation.
It can even look at the references in the full text you uploaded and analyze what type of citation it is.
This is what Scholarcy calls “comparative analysis” where it uses machine learning to classify such references into categories like
- Builds on previous work — method related
- Differs from previous work — method related
- Confirmation of previous work — results related
- Counterpoints previous work — results related
But of course for the purposes of this post, the main thing relevant here is it can allow you to extract all this information and export it to CSV.
See video below.
2. Elicit.org, automatically create Research Matrix of papers with GPT-3 technology
Elicit is one of the first tools out there that leverages on GPT-3 language models to aid academic discovery. Currently they have finetuned it with metadata from Semantic Scholar (mostly abstracts).
How it works is that first you will do a normal keyword search and it will find papers that it thinks helps with your query.
More specifically, it uses your query terms to match and retrieve1000 title and abstracts from Semantic Scholar, rank the 1,000 papers using a two step process involving GPT-3 Babbage search endpoint followed by a fine tuned T5 model to narrow down to top 8 papers. Starring papers will retrieve paper candiates using forward and backward citations of such papers again using Semantic Scholar API
Because it uses “Semantic Search”, the resulting paper might not need to match every term in your query. Is hard to say if Elicit search ranking algo is clearly better than the gold standard of Google Scholar, but I don’t think it is worse (unlike many “semantic search” I have seen).
What differentiates it from other searches tools is that Elicit tries to find the answers to your query in the abstract and pulls it out via GPT-3
They generate what the paper’s abstract implies about your query using a GPT-3 Davinci model finetuned on roughly 2,000 examples of questions, abstracts, and takeaways.
In the example below, I ask the question what is the IFR of COVID-19 and it shows papers it thinks are relevant, the main findings from each paper that it thinks answers the question and displays it in a table.
This isn’t 100% accurate of course, and you can “Star” the ones that are correct and ask it to try to find similar findings/papers.
2 columns is a far cry from a “Research Matrix” as it needs more columns. Indeed you can add columns for predefined things like
If you have done evidence synthesis (including Systematic reviews), you can see the predefined columns seem to support this. Indeed this is one of the major goals of Elicit currently.
Some of the columns , use metadata retrieved from Semantic Scholar (e.g Journal, citations, influential cites) , type of study (whether RCT etc) is generated via SVM , while some other columns have values extracted using the GPT-3 Davinci Instruct model or fine-tuned Curie model for others. You can find the prompts used here.
But of course in a real research matrix, you will want to add custom columns and you can indeed do it in Elict. Yes you can indeed add CUSTOM columns and ask Elicit to fill them in!
GPT-3 prompt wise — the following is used
Try to answer, but say "... not mentioned in the paper" if you really don't know how to answer.
In the example below , I added a column , Where was the study done?
The results aren’t perfect of course. For example if the countries isn’t specificially indicated in the abstract it won’t find anything.
This is in fact a good thing since it does’t give made up answers, where language models might do sometimes
I’ve tested Elicit quite a bit and it is substantially better than most academic Q&A systems I have seen.
It sometimes could do better, for example, it may be able to tell you the study population is on Americians, but it might not be able to reason that this means the country the study was done is likely to be the US
It will be interesting to see if this changes once full-text is included.
Update April 2022, since I wrote this — Elict has started to use full-text to extract values in the column. , currently 85% of such extractions come from title/abstract only 15% come from full-text. This may change.
This tool is currently in very early stage development, so it is likely by the time you read this things might have changed but the video below should give you a taste, or read my full early review
3. Lateral.io, add pdfs of papers and quickly create Research Matrix of papers
Elicit.org uses GPT-3 to “read” abstracts and try to answer the questions you put to it and displays the results in a table. But what about supporting extraction of information from the full-text. You could of course use Scholarcy but that only extracts predefined things.
Is there a tool that can import full-text (like Scholarcy) and allow you to extract arbitary information (like Elicit)?
How it works is you create a project and import the full-text of relevant papers into it and these papers will be indexed.
For each project, you create columns called “concepts”.
You start off by searching keywords with “Super Search” which is nothing but a straight forward search among full-text of all papers you have imported. You can assign text that is found that matches your keyword into each concept/column.
As you add segments of text to the concepts, Lateral.io’s machine learning comes into play and you will start seeing “smilar snippets” which may have different keywords.
You can then export this into excel, Word etc.
Since 2020, there has been an explosion in interest in note taking tools that support bidirectional linking.
Combined with the renewed interest in Zettelkasten method, tools like Roam Research, Obsdian, Logseq etc and the slightly earlier Notion came into prominence becoming the latest tools that are popular with the “productivity porn” crowd (earlier darlings were tools like Wikis, Evernote).
Among these group, there’s a thriving community of phds and early career researchers who swear by using such tools. You can find them on forums, Youtube, even Tiktok sharing their workflows.
Scholarcy already supports some of these tools, because you can export data in Markdown which is accepted by many such tools like Obsidian.
While this is nice, this gets you only the graph/networked view which isn’t quite the literature review matrix.
Some Obsidian gurus have created complicated workflow with the use of Zotero and half a dozen Zotero and Obsidian plugins to not only import results from Zotero but combine with annotations made in PDF with Zotero extensions such as Zotfile, Better BibTeX, Mdnotes (may not be necessary with Zotero 6.0) as well as Obsidian plugins — Citations
Notion with its ability to insert tables has also naturually become a tool many are using to track their research.
There are tons of such research templates out there
Here for example is a Notion template to do so. More tricky is to incorporate a reference manager like Zotero with Notion autosync workflow. Plugins do exist, but it’s unclear how good they are (One way sync from Zotero to Notion only).
It seems like a complete solution for creating research matrix tables doesn’t exist right now. The perfect workflow solution would in my opinion need the following pieces
- Good ability to automatically extract concepts and data from full-text of paper — Elicit’s use of GPT-3 seems to be ideal here (though they currently work on abstracts only)
- Machine learning is never perfect, so humans should be able to supplement the standard extraction algos and do basic keyword searches across full-text of paper and add their own — Lateral.io
- It can ideally be nicely intergrated with a) reference managers e.g Zotero AND b) a good knowledge management tool e.g. Notion or Obsidian
While I understand for many, part of the “fun” is to do geeky things and “hack” a complicated workflow, such setups are unlikely to be stable, so #3 is very important.
This is of course a huge ask of course.