Utilora

Free Citation Extractor — DOI, arXiv, PMID, ISBN, URL

Pull DOIs, arXiv IDs, PMIDs, ISBNs, and URLs out of any prose, paper, or references list. Runs in your browser.

runs locally — nothing leaves your browser

What is Free Citation Extractor — DOI, arXiv, PMID, ISBN, URL?

Citation Extractor scans arbitrary text for the persistent identifiers researchers actually use — DOIs (10.xxxx/…), arXiv preprint IDs, PubMed IDs, ISBNs, and plain URLs — and returns a deduped, clickable list. It is the kind of thing you reach for after pasting in a messy references section, an email full of links, or the text layer from a PDF, when you just want every citable identifier on one screen.

When to use this

  • Building a reading list from a survey paper's references section
  • Auditing a draft to make sure every cited paper has a resolvable identifier
  • Pulling DOIs out of an email thread or Slack export for batch lookup
  • Cleaning up identifiers copied from a PDF where line breaks broke the formatting

How it works

Each identifier type has a regular expression tuned to its canonical format. DOIs match the 10.xxxx/… prefix. arXiv IDs match the modern YYMM.NNNNN format with optional version suffix. PMIDs require the PMID: prefix to avoid catching random digits. ISBNs match the 10- or 13-digit format with optional separators. URLs match http(s) schemes. Matches are normalised, deduped, and linked to the canonical resolver for each type.

Example use cases

Literature review

Drop the references section of a survey paper to get every DOI and arXiv ID as one clean list.

Reference audit

Paste your own draft to confirm every paper you mention has a citable identifier.

Batch lookup prep

Generate a clean list of DOIs to feed into Crossref, Unpaywall, or your reference manager's import.

Free Citation Extractor — DOI, arXiv, PMID, ISBN, URL

Interactive Tool

How to use

  1. 1

    Paste any text

    Drop a paper, abstract, reference list, email, or PDF copy-paste.

  2. 2

    Pick which identifiers to extract

    Toggle DOI, arXiv, PMID, ISBN, and URL on or off.

  3. 3

    Copy the cleaned list

    Export as plain identifiers or as Markdown links.

Why use this tool?

  • Finds DOIs, arXiv IDs, PMIDs, ISBNs, and URLs in a single pass
  • Dedupes results so each identifier only appears once
  • Generates direct links to doi.org, arXiv, PubMed, and WorldCat
  • Exports as a plain list or a Markdown-formatted bibliography skeleton

Frequently asked questions

Does this resolve the citations to full metadata?
No. It only extracts identifiers — fetching titles or authors would require external API calls, and this tool stays fully client-side.
How does it dedupe?
Identifiers are compared case-insensitively and deduped per type, so the same DOI appearing five times in your text shows up once.
Why didn't it find my arXiv ID?
It expects the modern YYMM.NNNNN format. The older subject-class IDs (e.g., math.GT/0601001) aren't matched by the current pattern.
Is my text sent anywhere?
No. All matching happens in your browser.

Related tools

Popular right now