Scholarcy
  1. metadata
Scholarcy
  • metadata
    • Pre-configured endpoint to GET a document URL and return data as JSON
      GET
    • Pre-configured endpoint to POST a document and return summary data as JSON
      POST
    • GET a document URL and return data as JSON
      GET
    • Post a document and return summary data as JSON
      POST
    • Pre-configured endpoint to GET a document URL and return header metadata data as JSON
      GET
    • Pre-configured endpoint to POST a document and return header metadata data as JSON
      POST
  1. metadata

GET a document URL and return data as JSON

开发环境
http://dev-cn.your-api-server.com
开发环境
http://dev-cn.your-api-server.com
GET
/metadata/extract
Example: GET a PDF with curl:
请求示例请求示例
Shell
JavaScript
Java
Swift
curl --location --request GET 'http://dev-cn.your-api-server.com/metadata/extract'
响应示例响应示例
{
    "filename": "string",
    "filehash": "string",
    "content_type": "string",
    "file_size": 0,
    "metadata": {
        "message": "string",
        "title": "string",
        "author": "string",
        "pages": "string",
        "date": 0,
        "full_date": "string",
        "affiliations": [
            "string"
        ],
        "journal": "string",
        "abbreviated_journal": "string",
        "volume": "string",
        "page": "string",
        "cited_by": 0,
        "identifiers": {
            "doi": "string",
            "arxiv": "string",
            "pmid": "string",
            "pmcid": "string",
            "url": "string",
            "isbn": "string",
            "doc_id": "string"
        },
        "abstract": "string",
        "keywords": [
            "string"
        ],
        "references": {},
        "source": "string",
        "emails": [
            "string"
        ],
        "type": "string",
        "references_ris": "string",
        "links": [
            "string"
        ],
        "author_conclusions": [
            "string"
        ],
        "funding": [
            {
                "award-group": [
                    {
                        "funding-source": "string",
                        "award-id": [
                            "string"
                        ]
                    }
                ],
                "funding-statement": "string"
            }
        ],
        "table_captions": [
            {
                "id": "string",
                "caption": "string"
            }
        ],
        "figure_captions": [
            {
                "id": "string",
                "caption": "string"
            }
        ],
        "tables_url": "string",
        "figure_urls": {},
        "table_urls": {},
        "word_count": "string",
        "is_oa": true,
        "oa_status": "string"
    },
    "page_indexes": {
        "start_page": 0,
        "top_statements": [
            0
        ],
        "summary": [
            0
        ],
        "claims": [
            0
        ],
        "facts": [
            0
        ],
        "contexts": [
            0
        ],
        "page_boundaries": [
            0
        ]
    },
    "sections": {},
    "structured_content": [
        {
            "heading": "string",
            "content": [
                "string"
            ]
        }
    ],
    "participants": [
        {
            "participant": "string",
            "number": 0,
            "context": "string"
        }
    ],
    "statistics": [
        {}
    ],
    "populations": [
        {
            "population": "string",
            "number": 0,
            "context": "string"
        }
    ],
    "prevalence": [
        {
            "type": "string",
            "value": "string",
            "morbidity": "string",
            "context": "string"
        }
    ],
    "study_features": {},
    "keywords": [
        {}
    ],
    "keyword_relevance": {},
    "species": [
        "string"
    ],
    "summary": [
        "string"
    ],
    "structured_summary": {},
    "reference_links": [
        {
            "id": "string",
            "alt_id": "string",
            "link_id": "string",
            "entry": "string",
            "crossref": "string",
            "scholar_url": "string",
            "arxiv_url": "string",
            "pubmed_url": "string",
            "url": "string",
            "oa_query": "string",
            "libkey": "string",
            "scite": "string",
            "lookup": "string"
        }
    ],
    "facts": [
        "string"
    ],
    "claims": [
        "string"
    ],
    "findings": [
        "string"
    ],
    "equations": [
        "string"
    ],
    "processes": [
        {}
    ],
    "key_statements": [
        "string"
    ],
    "top_statements": [
        "string"
    ],
    "headline": "string",
    "contexts": [
        {}
    ],
    "aggregated_contexts": {},
    "abbreviations": {},
    "unstructured_content": {},
    "markdown_content": {}
}

请求参数

Query 参数
url
string <url>
可选
URL of remote PDF file, Word document or Web page
text
string 
可选
The text to be structured and summarised
text_type
enum<string> 
可选
Text format so we can process it optimally
枚举值:
txtcsvrisbibtexnbib
默认值:
txt
start_page
integer 
可选
Start page
默认值:
1
end_page
integer 
End page
可选
external_metadata
boolean 
可选
Fetch remote metadata (crossref, arxiv)?
默认值:
false
parse_references
boolean 
可选
Parse references into BibTex?
默认值:
true
reference_style
enum<string> 
可选
Select a reference style
枚举值:
acsamaanystyleapachicagoensembleexperimentalharvardieeemhramlanaturevancouver
默认值:
ensemble
reference_format
enum<string> 
可选
Select a reference list output format
枚举值:
textbibtex
默认值:
text
generate_summary
boolean 
可选
Summarize the article?
默认值:
true
summary_engine
enum<string> 
可选
Summarization engine
枚举值:
v1v2v3v4
默认值:
v1
replace_pronouns
boolean 
可选
Substitute personal pronouns in summaries?
默认值:
false
strip_dialogue
boolean 
可选
Remove dialogue and quotes from summary?
默认值:
false
summary_size
integer 
可选
Number of words in summary
默认值:
400
summary_percent
number 
可选
Summary size as % of original
默认值:
0
structured_summary
boolean 
可选
Structure the summary with section headings?
默认值:
false
keyword_method
enum<string> 
可选
Select a keyword extraction method
枚举值:
sgranksgrank+npsgrank+acrtextranknpregex
默认值:
sgrank+acr
keyword_sample
enum<string> 
可选
Extract keywords from full article or a sample
枚举值:
articlesample
默认值:
sample
keyword_limit
integer 
可选
Set limit on number of keywords returned
默认值:
25
abbreviation_method
enum<string> 
可选
Select an abbreviation extraction method
枚举值:
schwartzstatisticalensemble
默认值:
schwartz
wiki_links
enum<string> 
可选
Create links to Wikipedia?
枚举值:
fastprecisebroadbroader
extract_facts
boolean 
可选
Extract ReVerb style relations?
默认值:
true
extract_claims
boolean 
可选
Extract author claims?
默认值:
true
key_points
integer 
可选
Number of key points to retrieve
默认值:
5
focus_terms
string 
可选
Semicolon separated list of terms which will be used to focus highlights
citation_contexts
boolean 
可选
Extract contexts surrounding citations?
默认值:
false
inline_citation_links
boolean 
可选
Create inline HTML links to reference entries?
默认值:
false
extract_pico
boolean 
可选
Extract population, intervention, control, outcome data?
默认值:
true
extract_tables
boolean 
可选
Extract tables as CSV/Excel?
默认值:
false
extract_figures
boolean 
可选
Extract figures as .png files?
默认值:
false
require_captions
boolean 
可选
Only extract figures and tables if there are accompanying captions?
默认值:
true
extract_sections
boolean 
可选
Extract section headers and paragraphs?
默认值:
true
include_bodytext
boolean 
可选
If extracting sections, include the main body text content for each section?
默认值:
true
unstructured_content
boolean 
可选
Include a raw, unprocessed text dump of the file?
默认值:
false
index_statements
boolean 
可选
Index key statements and citation contexts?
默认值:
false
include_markdown
boolean 
可选
Include a markdown representation of the content?
默认值:
false
extract_snippets
boolean 
可选
Extract just snippets of each section?
默认值:
true
engine
enum<string> 
可选
Text extraction engine
枚举值:
v1v2
默认值:
v1
image_engine
enum<string> 
可选
Image extraction engine
枚举值:
v1v2v1+v2
默认值:
v1
Header 参数
Authorization
string 
可选
Your authorization token
默认值:
Bearer

返回响应

🟢200Success
application/json
Body
filename
string 
可选
Filename or URL slug of source document
filehash
string 
可选
Content hash for downstream deduplication
content_type
string 
可选
Source document MIME type
file_size
integer 
可选
Source document file size (bytes)
metadata
object (Document metadata) 
可选
message
string 
可选
Document processing error or status message
title
string 
可选
Document title
author
string 
可选
Document author
pages
string 
可选
Number of pages in the document
date
integer 
可选
Document publication year
full_date
string 
可选
Document publication full date string
affiliations
array[string]
可选
List of author affiliations
journal
string 
可选
Journal title (from CrossRef)
abbreviated_journal
string 
可选
Abbreviated journal title (from CrossRef)
volume
string 
可选
Journal volume (from CrossRef)
page
string 
可选
Journal page (from CrossRef)
cited_by
integer 
可选
Citation count (from CrossRef)
identifiers
object (Canonical document identifiers) 
可选
abstract
string 
可选
Document abstract
keywords
array[string]
可选
List of author-supplied keywords
references
object 
可选
Bibliography or footnotes
source
string 
可选
Source of extracted metadata, e.g. CrossRef, OpenAlex, bioRxiv, document
emails
array[string]
可选
List of author emails
type
string 
可选
Document type: journal-article, book-chapter, book, preprint, case-study, review-article, meta-analysis, report, web-article
references_ris
string 
可选
Bibliography parsed as RIS data
links
array[string]
可选
List of URLs mentioned in document
author_conclusions
array[string]
可选
Author-provided conclusions or takeaways
funding
array[object (Funding statements) {2}] 
可选
table_captions
array[object (Captions) {2}] 
可选
figure_captions
array[object (Captions) {2}] 
可选
tables_url
string 
可选
Download link to parsed tables as an Excel workbook
figure_urls
object 
可选
List of download links to figures as PNG or WEBP image files
table_urls
object 
可选
List of download links to tables as PNG or WEBP image files
word_count
string 
可选
Document word count as a min-max range (max=includes appendices)
is_oa
boolean 
可选
If true, document is open access as determined by Unpaywall
oa_status
string 
可选
Document open-access status (from Unpaywall)
page_indexes
object (Page indexes) 
可选
start_page
integer 
可选
First numbered page of document
top_statements
array[integer]
可选
Page location of each key statement
summary
array[integer]
可选
Page location of each summary item
claims
array[integer]
可选
Page location of each claim
facts
array[integer]
可选
Page location of each fact
contexts
array[integer]
可选
Page location of each citation context
page_boundaries
array[integer]
可选
Document offset of each page boundary
sections
object 
可选
Section snippets and structural checks
structured_content
array[object (Structured content) {2}] 
可选
heading
string 
可选
Section heading
content
array[string]
可选
Section content
participants
array[object (Study participants) {3}] 
可选
participant
string 
可选
Participant description/type
number
integer 
可选
Participant count
context
string 
可选
Context surrounding participant mention
statistics
array [object] 
可选
Structure data containing statistical analyses and tests described
populations
array[object (Study populations) {3}] 
可选
population
string 
可选
Population description/type
number
integer 
可选
Population count
context
string 
可选
Context surrounding population mention
prevalence
array[object (Reported prevalence and incidence) {4}] 
可选
type
string 
可选
Prevalence or incidence
value
string 
可选
Prevalence or incidence value
morbidity
string 
可选
Prevalence population
context
string 
可选
Context surrounding prevalence mention
study_features
object 
可选
List of study characteristics, randomization, phase etc
keywords
array [object] 
可选
List of extracted keywords
keyword_relevance
object 
可选
Map of extracted keywords and their relevance socres
species
array[string]
可选
List of species names identified in documents
summary
array[string]
可选
AI generated summary of document
structured_summary
object 
可选
AI-generated section-by-section summary
reference_links
array[object (Links to cited sources) {13}] 
可选
id
string 
可选
Reference identifier - may be a number or author/year
alt_id
string 
可选
Alternative reference identifier - if id is author/year then this will be a number, and vice versa
link_id
string 
可选
Global linking id - author_abbreviated_title_year
entry
string 
可选
Plain text reference string
crossref
string 
可选
CrossRef link resolver
scholar_url
string 
可选
Google Scholar link resolver
arxiv_url
string 
可选
Arxiv link resolver
pubmed_url
string 
可选
PubMed link resolver
url
string 
可选
Direct URL to article if given
oa_query
string 
可选
Unpaywall link resolver
libkey
string 
可选
LibKey link resolver
scite
string 
可选
Scite report link resolver
lookup
string 
可选
General link resolver lookup fields
facts
array[string]
可选
List of extracted SVO triples
claims
array[string]
可选
List of claims made in the document
findings
array[string]
可选
List of quantitative findings identified
equations
array[string]
可选
List of LaTeX equations
processes
array [object] 
可选
List of process steps identified
key_statements
array[string]
可选
Important statements identified
top_statements
array[string]
可选
Top n key statements identified
headline
string 
可选
Overall key takeaway identified
contexts
array [object] 
可选
List of citation contexts
aggregated_contexts
object 
可选
List of citation contexts
abbreviations
object 
可选
Abbreviation-term mappings identified
unstructured_content
object 
可选
Raw text from document
markdown_content
object 
可选
Markdown representation of document
修改于 2024-05-09 05:45:46
上一页
Pre-configured endpoint to POST a document and return summary data as JSON
下一页
Post a document and return summary data as JSON
Built with