Mathematical Document Retrieval System based on Signature Hashing

Sourish Dhar, Sudipta Roy

Abstract


Scientific documents and magazines involve large number of mathematical expressions and formulas along with text. The continuous growth of such documents necessitates the requirement of developing specialized tools and techniques, which could handle and analyse mathematical expressions and formulas. Mathematical expressions and formulae are highly structured and quite different from traditional text. Due to which conventional text retrieval system performs poorly in retrieving scientific documents based on mathematical expression formulated as a query. Mathematical information retrieval is concerned with finding information in documents that include mathematics. To address the challenges posed by mathematical formulae as compared to text, this paper aims to construct a math aware search engine, which can retrieve relevant scientific documents based on a mathematical query. A novel signature based hashing scheme to index raw mathematical web documents is proposed in this paper, which can also take mathematical notational equivalences into account. The proposed system demonstrates better precision and stability of the ranked results when compared with other related state-of-the-art math aware search engines.


Full Text:

PDF


DOI: https://doi.org/10.11591/APTIKOM.J.CSIT.135

Refbacks

  • There are currently no refbacks.


Copyright (c) 2019 APTIKOM Journal on Computer Science and Information Technologies

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.

ISSN: 2722-323X, e-ISSN: 2722-3221

CSIT Stats

 

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.