Files in this item

FilesDescriptionFormat

application/msword

application/mswordYu_Lu_Poster.doc (501kB)
(no description provided)Microsoft Word

Description

Title:A novel computer vision based method for PDF academic literature structure understanding
Author(s):Yu, Fengchang; Lu, Wei
Subject(s):PDF
Academic literature
Computer vision
Structure understanding
Abstract:The PDF format plays a crucial role in the field of electronic academic literature publishing, but due to its complicated technical rules, PDF cannot be directly read by machines, which has caused a lot of inconvenience to the research work on academic literature. This poster proposes a computer vision-based PDF document structure understanding method. This method maps visual objects and text objects in PDF academic papers and obtains geometric and text attributes of content objects, supplemented by a heuristic algorithm. The algorithm performs type classification on the content object to obtain the physical structure and logical structure of the PDF document. This method overcomes the shortcomings of other PDF analysis methods that require a large number of artificial feature construction or large-scale corpus training, difficult to identify formula tables, and success-fully constructs a structure understanding and full-text extraction of ACM's collections.
Issue Date:2019-03-15
Publisher:iSchools
Series/Report:iConference 2019 Proceedings
Genre:Conference Poster
Type:Text
Language:English
URI:http://hdl.handle.net/2142/103374
DOI:https://doi.org/10.21900/iconf.2019.103374
Rights Information:Copyright 2019 Fengchang Yu and Wei Lu
Date Available in IDEALS:2019-03-22


This item appears in the following Collection(s)

Item Statistics