Withdraw
Loading…
A novel computer vision based method for PDF academic literature structure understanding
Yu, Fengchang; Lu, Wei
Loading…
Permalink
https://hdl.handle.net/2142/103374
Description
- Title
- A novel computer vision based method for PDF academic literature structure understanding
- Author(s)
- Yu, Fengchang
- Lu, Wei
- Issue Date
- 2019-03-15
- Keyword(s)
- Academic literature
- Computer vision
- Structure understanding
- Abstract
- The PDF format plays a crucial role in the field of electronic academic literature publishing, but due to its complicated technical rules, PDF cannot be directly read by machines, which has caused a lot of inconvenience to the research work on academic literature. This poster proposes a computer vision-based PDF document structure understanding method. This method maps visual objects and text objects in PDF academic papers and obtains geometric and text attributes of content objects, supplemented by a heuristic algorithm. The algorithm performs type classification on the content object to obtain the physical structure and logical structure of the PDF document. This method overcomes the shortcomings of other PDF analysis methods that require a large number of artificial feature construction or large-scale corpus training, difficult to identify formula tables, and success-fully constructs a structure understanding and full-text extraction of ACM's collections.
- Publisher
- iSchools
- Series/Report Name or Number
- iConference 2019 Proceedings
- Type of Resource
- text
- Language
- eng
- Permalink
- http://hdl.handle.net/2142/103374
- DOI
- https://doi.org/10.21900/iconf.2019.103374
- Copyright and License Information
- Copyright 2019 Fengchang Yu and Wei Lu
Owning Collections
Manage Files
Loading…
Edit Collection Membership
Loading…
Edit Metadata
Loading…
Edit Properties
Loading…
Embargoes
Loading…