
CCKS2019评测任务五-公众公司公告信息抽取,第3名
In today's digital landscape, PDF has solidified its status as a standard for electronic document distribution and digital information dissemination. This ubiquitous format is widely used across academia and various institutions for releasing announcements. However, extracting structured data from unstructured PDF documents remains a significant challenge in the field of knowledge graphs. The development of a solution utilizing Adobe's Acrobat DC SDK to transform PDF files into structured data marks a considerable advancement in this area.
By leveraging Acrobat's capability for format conversion, our approach facilitates the extraction of comprehensive and accurate information from semi-structured intermediate files. This method performed exceptionally well, achieving third place in the CCKS 2019 public company announcement evaluation. The ability to convert PDF files to XML and effectively extract tables and text segments gives our solution a competitive edge over existing open-source PDF parsing methods.

Flask is a lightweight and popular web framework for Python, known for its simplicity and flexibility. It is widely used to build web applications, providing a minimalistic approach to web development with features like routing, templates, and support for extensions.