GET a document URL and return data as JSON

Example: GET a PDF with curl:

{ "filename": "string", "filehash": "string", "content_type": "string", "file_size": 0, "metadata": { "message": "string", "title": "string", "author": "string", "pages": "string", "date": 0, "full_date": "string", "affiliations": [ "string" ], "journal": "string", "abbreviated_journal": "string", "volume": "string", "page": "string", "cited_by": 0, "identifiers": { "doi": "string", "arxiv": "string", "pmid": "string", "pmcid": "string", "url": "string", "isbn": "string", "doc_id": "string" }, "abstract": "string", "keywords": [ "string" ], "references": {}, "source": "string", "emails": [ "string" ], "type": "string", "references_ris": "string", "links": [ "string" ], "author_conclusions": [ "string" ], "funding": [ { "award-group": [ { "funding-source": "string", "award-id": [ "string" ] } ], "funding-statement": "string" } ], "table_captions": [ { "id": "string", "caption": "string" } ], "figure_captions": [ { "id": "string", "caption": "string" } ], "tables_url": "string", "figure_urls": {}, "table_urls": {}, "word_count": "string", "is_oa": true, "oa_status": "string" }, "page_indexes": { "start_page": 0, "top_statements": [ 0 ], "summary": [ 0 ], "claims": [ 0 ], "facts": [ 0 ], "contexts": [ 0 ], "page_boundaries": [ 0 ] }, "sections": {}, "structured_content": [ { "heading": "string", "content": [ "string" ] } ], "participants": [ { "participant": "string", "number": 0, "context": "string" } ], "statistics": [ {} ], "populations": [ { "population": "string", "number": 0, "context": "string" } ], "prevalence": [ { "type": "string", "value": "string", "morbidity": "string", "context": "string" } ], "study_features": {}, "keywords": [ {} ], "keyword_relevance": {}, "species": [ "string" ], "summary": [ "string" ], "structured_summary": {}, "reference_links": [ { "id": "string", "alt_id": "string", "link_id": "string", "entry": "string", "crossref": "string", "scholar_url": "string", "arxiv_url": "string", "pubmed_url": "string", "url": "string", "oa_query": "string", "libkey": "string", "scite": "string", "lookup": "string" } ], "facts": [ "string" ], "claims": [ "string" ], "findings": [ "string" ], "equations": [ "string" ], "processes": [ {} ], "key_statements": [ "string" ], "top_statements": [ "string" ], "headline": "string", "contexts": [ {} ], "aggregated_contexts": {}, "abbreviations": {}, "unstructured_content": {}, "markdown_content": {} }

请求参数

Query 参数

url

string <url>

可选

URL of remote PDF file, Word document or Web page

text

string

可选

The text to be structured and summarised

text_type

enum<string>

可选

Text format so we can process it optimally

枚举值:

txtcsvrisbibtexnbib

默认值:

txt

start_page

integer

可选

Start page

默认值:

end_page

integer

End page

可选

external_metadata

boolean

可选

Fetch remote metadata (crossref, arxiv)?

默认值:

false

parse_references

boolean

可选

Parse references into BibTex?

默认值:

true

reference_style

enum<string>

可选

Select a reference style

枚举值:

acsamaanystyleapachicagoensembleexperimentalharvardieeemhramlanaturevancouver

默认值:

ensemble

reference_format

enum<string>

可选

Select a reference list output format

枚举值:

textbibtex

默认值:

text

generate_summary

boolean

可选

Summarize the article?

默认值:

true

summary_engine

enum<string>

可选

Summarization engine

枚举值:

v1v2v3v4

默认值:

replace_pronouns

boolean

可选

Substitute personal pronouns in summaries?

默认值:

false

strip_dialogue

boolean

可选

Remove dialogue and quotes from summary?

默认值:

false

summary_size

integer

可选

Number of words in summary

默认值:

400

summary_percent

number

可选

Summary size as % of original

默认值:

structured_summary

boolean

可选

Structure the summary with section headings?

默认值:

false

keyword_method

enum<string>

可选

Select a keyword extraction method

枚举值:

sgranksgrank+npsgrank+acrtextranknpregex

默认值:

sgrank+acr

keyword_sample

enum<string>

可选

Extract keywords from full article or a sample

枚举值:

articlesample

默认值:

sample

keyword_limit

integer

可选

Set limit on number of keywords returned

默认值:

abbreviation_method

enum<string>

可选

Select an abbreviation extraction method

枚举值:

schwartzstatisticalensemble

默认值:

schwartz

wiki_links

enum<string>

可选

Create links to Wikipedia?

枚举值:

fastprecisebroadbroader

extract_facts

boolean

可选

Extract ReVerb style relations?

默认值:

true

extract_claims

boolean

可选

Extract author claims?

默认值:

true

key_points

integer

可选

Number of key points to retrieve

默认值:

focus_terms

string

可选

Semicolon separated list of terms which will be used to focus highlights

citation_contexts

boolean

可选

Extract contexts surrounding citations?

默认值:

false

inline_citation_links

boolean

可选

Create inline HTML links to reference entries?

默认值:

false

extract_pico

boolean

可选

Extract population, intervention, control, outcome data?

默认值:

true

extract_tables

boolean

可选

Extract tables as CSV/Excel?

默认值:

false

extract_figures

boolean

可选

Extract figures as .png files?

默认值:

false

require_captions

boolean

可选

Only extract figures and tables if there are accompanying captions?

默认值:

true

extract_sections

boolean

可选

Extract section headers and paragraphs?

默认值:

true

include_bodytext

boolean

可选

If extracting sections, include the main body text content for each section?

默认值:

true

unstructured_content

boolean

可选

Include a raw, unprocessed text dump of the file?

默认值:

false

index_statements

boolean

可选

Index key statements and citation contexts?

默认值:

false

include_markdown

boolean

可选

Include a markdown representation of the content?

默认值:

false

extract_snippets

boolean

可选

Extract just snippets of each section?

默认值:

true

engine

enum<string>

可选

Text extraction engine

枚举值:

v1v2

默认值:

image_engine

enum<string>

可选

Image extraction engine

枚举值:

v1v2v1+v2

默认值:

Header 参数

Authorization

string

可选

Your authorization token

默认值:

Bearer

返回响应

🟢200Success

application/json

Body

filename

string

可选

Filename or URL slug of source document

filehash

string

可选

Content hash for downstream deduplication

content_type

string

可选

Source document MIME type

file_size

integer

可选

Source document file size (bytes)

metadata

object (Document metadata)

可选

message

string

可选

Document processing error or status message

title

string

可选

Document title

author

string

可选

Document author

pages

string

可选

Number of pages in the document

date

integer

可选

Document publication year

full_date

string

可选

Document publication full date string

affiliations

array[string]

可选

List of author affiliations

journal

string

可选

Journal title (from CrossRef)

abbreviated_journal

string

可选

Abbreviated journal title (from CrossRef)

volume

string

可选

Journal volume (from CrossRef)

page

string

可选

Journal page (from CrossRef)

cited_by

integer

可选

Citation count (from CrossRef)

identifiers

object (Canonical document identifiers)

可选

abstract

string

可选

Document abstract

keywords

array[string]

可选

List of author-supplied keywords

references

object

可选

Bibliography or footnotes

source

string

可选

Source of extracted metadata, e.g. CrossRef, OpenAlex, bioRxiv, document

emails

array[string]

可选

List of author emails

type

string

可选

Document type: journal-article, book-chapter, book, preprint, case-study, review-article, meta-analysis, report, web-article

references_ris

string

可选

Bibliography parsed as RIS data

links

array[string]

可选

List of URLs mentioned in document

author_conclusions

array[string]

可选

Author-provided conclusions or takeaways

funding

array[object (Funding statements) {2}]

可选

table_captions

array[object (Captions) {2}]

可选

figure_captions

array[object (Captions) {2}]

可选

tables_url

string

可选

Download link to parsed tables as an Excel workbook

figure_urls

object

可选

List of download links to figures as PNG or WEBP image files

table_urls

object

可选

List of download links to tables as PNG or WEBP image files

word_count

string

可选

Document word count as a min-max range (max=includes appendices)

is_oa

boolean

可选

If true, document is open access as determined by Unpaywall

oa_status

string

可选

Document open-access status (from Unpaywall)

page_indexes

object (Page indexes)

可选

start_page

integer

可选

First numbered page of document

top_statements

array[integer]

可选

Page location of each key statement

summary

array[integer]

可选

Page location of each summary item

claims

array[integer]

可选

Page location of each claim

facts

array[integer]

可选

Page location of each fact

contexts

array[integer]

可选

Page location of each citation context

page_boundaries

array[integer]

可选

Document offset of each page boundary

sections

object

可选

Section snippets and structural checks

structured_content

array[object (Structured content) {2}]

可选

heading

string

可选

Section heading

content

array[string]

可选

Section content

participants

array[object (Study participants) {3}]

可选

participant

string

可选

Participant description/type

number

integer

可选

Participant count

context

string

可选

Context surrounding participant mention

statistics

array [object]

可选

Structure data containing statistical analyses and tests described

populations

array[object (Study populations) {3}]

可选

population

string

可选

Population description/type

number

integer

可选

Population count

context

string

可选

Context surrounding population mention

prevalence

array[object (Reported prevalence and incidence) {4}]

可选

type

string

可选

Prevalence or incidence

value

string

可选

Prevalence or incidence value

morbidity

string

可选

Prevalence population

context

string

可选

Context surrounding prevalence mention

study_features

object

可选

List of study characteristics, randomization, phase etc

keywords

array [object]

可选

List of extracted keywords

keyword_relevance

object

可选

Map of extracted keywords and their relevance socres

species

array[string]

可选

List of species names identified in documents

summary

array[string]

可选

AI generated summary of document

structured_summary

object

可选

AI-generated section-by-section summary

reference_links

array[object (Links to cited sources) {13}]

可选

string

可选

Reference identifier - may be a number or author/year

alt_id

string

可选

Alternative reference identifier - if id is author/year then this will be a number, and vice versa

link_id

string

可选

Global linking id - author_abbreviated_title_year

entry

string

可选

Plain text reference string

crossref

string

可选

CrossRef link resolver

scholar_url

string

可选

Google Scholar link resolver

arxiv_url

string

可选

Arxiv link resolver

pubmed_url

string

可选

PubMed link resolver

url

string

可选

Direct URL to article if given

oa_query

string

可选

Unpaywall link resolver

libkey

string

可选

LibKey link resolver

scite

string

可选

Scite report link resolver

lookup

string

可选

General link resolver lookup fields

facts

array[string]

可选

List of extracted SVO triples

claims

array[string]

可选

List of claims made in the document

findings

array[string]

可选

List of quantitative findings identified

equations

array[string]

可选

List of LaTeX equations

processes

array [object]

可选

List of process steps identified

key_statements

array[string]

可选

Important statements identified

top_statements

array[string]

可选

Top n key statements identified

headline

string

可选

Overall key takeaway identified

contexts

array [object]

可选

List of citation contexts

aggregated_contexts

object

可选

List of citation contexts

abbreviations

object

可选

Abbreviation-term mappings identified

unstructured_content

object

可选

Raw text from document

markdown_content

object

可选

Markdown representation of document