Intelligent information extraction from scholarly document databases
DOI:
https://doi.org/10.37380/jisib.v10i2.584Keywords:
Market Market Intelligence, Business Intelligence, Competitive Intelligence, Information Systems, Geo-EconomicsAbstract
Extracting knowledge from big document databases has long been a challenge.
Most researchers do a literature review and manage their document databases with tools that
just provide a bibliography and when retrieving information (a list of concepts and ideas), there
is a severe lack of functionality. Researchers do need to extract specific information from their
scholarly document databases depending on their predefined breakdown structure. Those
databases usually contain a few hundred documents, information requirements are distinct in
each research project, and technique algorithms are not always the answer. As most retrieving
and information extraction algorithms require manual training, supervision, and tuning, it
could be shorter and more efficient to do it by hand and dedicate time and effort to perform an
effective semantic search list definition that is the key to obtain the desired results. A robust
relative importance index definition is the final step to obtain a ranked importance concept list
that will be helpful both to measure trends and to find a quick path to the most appropriate
paper in each case.
References
Adrian, W. T., Leone, N., and Manna, M. (2015).
"Ontology-driven information extraction."
arXiv preprint arXiv:1512.06034.
Afantenos, S., Karkaletsis, V., and
Stamatopoulos, P. (2005). "Summarization
from medical documents: a survey." Artificial
intelligence in medicine, 33(2), 157-177.
Ahmad, M. W., and Ansari, M. "A survey: soft
computing in intelligent information retrieval
systems." Proc., 2012 12th International
Conference on Computational Science and Its
Applications, IEEE, 26-34.
Al-Hroob, A., Imam, A. T., and Al-Heisa, R.
(2018). "The use of artificial neural networks
for extracting actions and actors from
requirements document." Information and
Software Technology, 101(2018), 1-15.
Alashwal, A. M., and Al-Sabahi, M. H. (2018).
"Risk factors in construction projects during
unrest period in Yemen." Journal of
Construction in Developing Countries, 23(2),
–62.
Allan, J., Aslam, J., Belkin, N., Buckley, C.,
Callan, J., Croft, B., Dumais, S., Fuhr, N.,
Harman, D., and Harper, D. J. "Challenges in
information retrieval and language modeling:
report of a workshop held at the center for
intelligent information retrieval." Proc., ACM
SIGIR Forum, ACM New York, NY, USA, 31-
Ansari, A., Maknojia, M., and Shaikh, A. (2016).
"Intelligent information extraction based on
artificial neural network." International
Journal in Foundations of Computer Science
& Technology, 6(1).
Barde, B. V., and Bainwad, A. M. (2018). "An
overview of topic modeling methods and tools."
Proc., 2017 International Conference on
Intelligent Computing and Control Systems
(ICICCS), IEEE, 745-750.
Bettany-Saltikov, J. (2012). How to do a
systematic literature review in nursing: a stepby-
step guide, McGraw-Hill Education (UK),
Maidenhead, UK.
Boden, C., Löser, A., Nagel, C., and Pieper, S.
(2012). "Fact-aware document retrieval for
information extraction." Datenbank-
Spektrum, 12(2), 89-100.
Buzan, T. (2004). Cómo crear mapas mentales,
Ediciones Urano, Barcelona, Spain.
Chen, H., and Lynch, K. J. (1992). "Automatic
construction of networks of concepts
characterizing document databases." Ieee T
Syst Man Cyb, 22(5), 885-902.
Dezsenyi, C., Dobrowiecki, T. P., and Meszaros,
T. (2007). "Adaptive information extraction
from unstructured documents." International
Journal of Intelligent Information and
Database Systems, 1(2), 156-180.
Esposito, F., Ferilli, S., Basile, T. M. A., and Di
Mauro, N. (2005). "Semantic-based access to
digital document databases." Proc.,
International Symposium on Methodologies
for Intelligent Systems, Springer, Berlin,
Heidelberg, Germany, 373-381.
Fan, H., Xue, F., and Li, H. (2015). "Project-based
as-needed information retrieval from
unstructured AEC documents." Journal of
Management in Engineering, 31(1), A4014012.
Gaizauskas, R., and Wilks, Y. (1998).
"Information extraction: Beyond document
retrieval." Journal of documentation, 54(1),
-105.
Grishman, R. (2019). "Twenty-five years of
information extraction." Natural Language
Engineering, 25(6), 677-692.
Gupta, P., and Gupta, V. (2012). "A survey of text
question answering techniques." International
Journal of Computer Applications, 53(4), 1–8.
Hassan, F. u., and Le, T. (2020). "Automated
Requirements Identification from
Construction Contract Documents Using
Natural Language Processing." Journal of
Legal Affairs and Dispute Resolution in
Engineering and Construction, 12(2),
Hassan, T., and Baumgartner, R. "Intelligent text
extraction from pdf documents." Proc.,
International Conference on Computational
Intelligence for Modelling, Control and
Automation and International Conference on
Intelligent Agents, Web Technologies and
Internet Commerce (CIMCA-IAWTIC'06),
IEEE, 2–6.
Hassan, T., and Baumgartner, R. (2005b).
Intelligent wrapping from PDF documents,
CEUR Workshop Proceedings, Točná, Czech
Republic.
Hobbs, J. R. (2002). "Information extraction from
biomedical text." Journal of biomedical
informatics, 35(4), 260-264.
Hu, X., Lin, T. Y., Song, I., Lin, X., Yoo, I.,
Lechner, M., and Song, M. "Ontology-based
scalable and portable information extraction
system to extract biological knowledge from
huge collection of biomedical web documents."
Proc., IEEE/WIC/ACM International
Conference on Web Intelligence (WI'04), IEEE,
-83.
Inui, K., Abe, S., Hara, K., Morita, H., Sao, C.,
Eguchi, M., Sumida, A., Murakami, K., and
Matsuyoshi, S. "Experience mining: Building
a large-scale database of personal experiences
and opinions from web documents." Proc.,
IEEE/WIC/ACM International
Conference on Web Intelligence and Intelligent
Agent Technology, IEEE, 314-321.
Jarkas, A. M., and Haupt, T. C. (2015). "Major
construction risk factors considered by general
contractors in Qatar." Journal of Engineering,
Design and Technology, 13(1), 165–194.
Karol, S., and Mangat, V. (2013). "Evaluation of
text document clustering approach based on
particle swarm optimization." Open Computer
Science, 3(2), 69-90.
Karthik, M., Marikkannan, M., and Kannan, A.
"An intelligent system for semantic
information retrieval information from textual
web documents." Proc., International
Workshop on Computational Forensics,
Springer, Berlin, Heidelberg, Germany, 135-
Kasperiuniene, J., and Zydziunaite, V. (2019). "A
systematic literature review on professional
identity construction in social media." SAGE
Open, 9(1), 2158244019828847.
Kim, T., and Chi, S. (2019). "Accident case
retrieval and analyses: using natural
language processing in the construction
industry." Journal of Construction
Engineering and Management, 145(3),
Koval, R., and Návrat, P. (2012). "Intelligent
support for information retrieval of web
documents." Computing and Informatics,
(5), 509–528.
Lambrix, P., and Shahmehri, N. (2000).
"Querying documents using content, structure
and properties." Journal of Intelligent
Information Systems, 15(3), 287-307.
Lee, R. "Automatic information extraction from
documents: A tool for intelligence and law
enforcement analysts." Proc., Proceedings of
AAAI Fall Symposium on Artificial
Intelligence and Link Analysis, AAAI Press
Menlo Park, CA.
Li, J., Wang, H. J., and Bai, X. (2015). "An
intelligent approach to data extraction and
task identification for process mining."
Information Systems Frontiers, 17(6), 1195-
López-Robles, J.-R., Guallar, J., Otegi-Olaso, J.-
R., and Gamboa-Rosales, N.-K. (2019).
"Bibliometric and thematic analysis (2006-
." El profesional de la información, 28(4),
e280417.
Lutsky, P. (2000). "Information extraction from
documents for automating software testing."
Artificial Intelligence in Engineering, 14(1),
-69.
Malik, S. K., Prakash, N., and Rizvi, S. (2010).
"Semantic annotation framework for
intelligent information retrieval using KIM
architecture." International Journal of Web &
Semantic Technology (IJWest), 1(4), 12-26.
Marinai, S. "Metadata extraction from PDF
papers for digital library ingest." Proc., 2009
th International conference on document
analysis and recognition, IEEE, 251-255.
Matos, P. F., Lombardi, L. O., Pardo, T. A.,
Ciferri, C. D., Vieira, M. T., and Ciferri, R. R.
(2010). "An environment for data analysis in
biomedical domain: information extraction for
decision support systems." Proc., International
Conference on Industrial, Engineering and
Other Applications of Applied Intelligent
Systems, Springer, Berlin, Heidelberg,
Germany, 306-316.
Matsuo, Y., and Ishizuka, M. (2004). "Keyword
extraction from a single document using word
co-occurrence statistical information."
International Journal on Artificial
Intelligence Tools, 13(01), 157-169.
Milward, D., and Thomas, J. "From information
retrieval to information extraction." Proc.,
ACL-2000 Workshop on Recent Advances in
Natural Language Processing and
Information Retrieval, 85-97.
Mitra, M., and Chaudhuri, B. (2000).
"Information retrieval from documents: A
survey." Information retrieval, 2(2-3), 141-163.
Nagalla, V., Dendukuri, S. C., and Asadi, S. S.
(2018). "Analysis of risk assessment in
construction of highway projects using
relative importance index method."
International Journal of Mechanical
Engineering and Technology, 9(3), 1–6.
Nasar, Z., Jaffry, S. W., and Malik, M. K. (2018).
"Information extraction from scientific
articles: a survey." Scientometrics, 117(3),
-1990.
Nualart-Vilaplana, J., Pérez-Montoro, M., and
Whitelaw, M. (2014). "Cómo dibujamos textos:
Revisión de propuestas de visualización y
exploración textual." El profesional de la
información, 23(3), 221-235.
Oliveira, D. A. B., and Viana, M. P. (2018). "Fast
CNN-based document layout analysis." Proc.,
Proceedings of the IEEE International
Conference on Computer Vision Workshops,
IEEE Computer Society, 1173-1180.
Oro, E., and Ruffolo, M. "Xonto: An ontologybased
system for semantic information
extraction from pdf documents." Proc., 2008
th IEEE International Conference on Tools
with Artificial Intelligence, IEEE, 118-125.
Rahman, N. A., Soom, A. B. M., and Ismail, N. K.
"Enhancing Latent Semantic Analysis by
Embedding Tagging Algorithm in Retrieving
Malay Text Documents." Proc., Asian
Conference on Intelligent Information and
Database Systems, Springer, 309-319.
Renault, B. Y., and Agumba, J. N. (2016). "Risk
management in the construction industry: a
new literature review." MATEC Web of
Conferences, 66(2016), 0008.
Rizvi, S. T. R., Mercier, D., Agne, S., Erkel, S.,
Dengel, A., and Ahmed, S. (2018). "Ontologybased
Information Extraction from Technical
Documents." Proc., ICAART (2), Science and
Technology Publications, Lda, 493-500.
Rodríguez, A., Colomo, R., Gómez, J. M., Alor-
Hernandez, G., Posada-Gomez, R., Juarez-
Martinez, U., Gayo, J. E. L., and Vidyasankar,
K. "A proposal for a semantic intelligent
document repository architecture." Proc., 2009
Electronics, Robotics and Automotive
Mechanics Conference (CERMA), IEEE, 69-75.
Rostami, A., Sommerville, J., Wong, I. L., and
Lee, C. (2015). "Risk management
implementation in small and medium
enterprises in the UK construction industry."
Engineering, Construction and Architectural
Management, 22(1), 91–107.
Saik, O., Demenkov, P., Ivanisenko, T.,
Kolchanov, N., and Ivanisenko, V. (2017).
"Development of methods for automatic
extraction of knowledge from texts of scientific
publications for the creation of a knowledge
base Solanum TUBEROSUM." Agricultural
Biology, 52(1), 1.
Sarwar, S. M., and Allan, J. "A Retrieval
Approach for Information Extraction." Proc.,
Proceedings of the 2019 ACM SIGIR
International Conference on Theory of
Information Retrieval, Association for
Computing Machinery, 249-252.
Schalley, A. C. (2019). "Ontologies and ontological
methods in linguistics." Language and
Linguistics Compass, 13(11), e12356.
Seedah, D. P., and Leite, F. (2015). "Information
Extraction for Freight-Related Natural
Language Queries." Proc., Computing in Civil
Engineering 2015, American Society of Civil
Engineers, 427-435.
Seng, J.-L., and Lai, J. (2010). "An Intelligent
information segmentation approach to extract
financial data for business valuation." Expert
Systems with Applications, 37(9), 6515-6530.
Shrihari, R. C., and Desai, A. (2015). "A review on
knowledge discovery using text classification
techniques in text mining." International
Journal of Computer Applications, 111(6).
Sirsat, S. R., Chavan, V., and Deshpande, S. P.
(2014). "Mining knowledge from text
repositories using information extraction: A
review." Sadhana-Acad P Eng S, 39(1), 53-62.
Snyder, H. (2019). "Literature review as a
research methodology: An overview and
guidelines." Journal of Business Research,
(2019), 333–339.
Song, D., Lau, R. Y., Bruza, P. D., Wong, K.-F.,
and Chen, D.-Y. (2007). "An intelligent
information agent for document title
classification and filtering in documentintensive
domains." Decision Support
Systems, 44(1), 251-265.
Srihari, R. K., Zhang, Z., and Rao, A. (2000).
"Intelligent indexing and semantic retrieval of
multimodal documents." Information
Retrieval, 2(2-3), 245-275.
Tseng, F. S., and Chou, A. Y. (2006). "The concept
of document warehousing for multidimensional
modeling of textual-based
business intelligence." Decision Support
Systems, 42(2), 727-744.
Upadhyay, R., and Fujii, A. "Semantic knowledge
extraction from research documents." Proc.,
Federated Conference on Computer
Science and Information Systems (FedCSIS),
IEEE, 439–445.
Vegas-Fernández, F. (2019). "Factor de
visibilidad. Nuevo indicador para la
evaluación cuantitativa de riesgos." PhD PhD,
Universidad Politécnica de Madrid,
Universidad Politécnica de Madrid.
Vegas-Fernández, F., and Rodríguez López, F.
(2019). "Risk management improvement
drivers for effective risk-based decisionmaking."
Journal of Business, Economics and
Finance (JBEF), 8(4), 223–234.
Wang, Q., Qu, S. N., Du, T., and Zhang, M. J. "The
Research and Application in Intelligent
Document Retrieval Based on Text
Quantification and Subject Mapping." Proc.,
Advanced Materials Research, Trans Tech
Publ, 2561-2568.
Wolf, C., and Jolion, J.-M. (2004). "Extraction and
recognition of artificial text in multimedia
documents." Formal Pattern Analysis &
Applications, 6(4), 309-326.
Xia, N., Zou, P. X., Griffin, M. A., Wang, X., and
Zhong, R. (2018). "Towards integrating
construction risk management and
stakeholder management: A systematic
literature review and future research
agendas." International Journal of Project
Management, 36(5), 701–715.
Xie, X., Fu, Y., Jin, H., Zhao, Y., and Cao, W.
(2019). "A novel text mining approach for
scholar information extraction from web
content in Chinese." Future Generation
Computer Systems.
Downloads
Published
Issue
Section
License
Authors who publish with this journal agree to the following terms:- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).