Welcome to Journal of Beijing Institute of Technology
WANG Ru, SONG Han-tao, LU Yu-chang. Research of Extracting Data from HTML Web Pages Automatically[J]. JOURNAL OF BEIJING INSTITUTE OF TECHNOLOGY, 2003, 12(S1): 104-108.
Citation: WANG Ru, SONG Han-tao, LU Yu-chang. Research of Extracting Data from HTML Web Pages Automatically[J]. JOURNAL OF BEIJING INSTITUTE OF TECHNOLOGY, 2003, 12(S1): 104-108.

Research of Extracting Data from HTML Web Pages Automatically

  • In order to use data information in the Internet, it is necessary to extract data from web pages. An HTT tree model representing HTML pages is presented. Based on the HTT model, a wrapper generation algorithm AGW is proposed. The AGW algorithm utilizes comparing and correcting technique to generate the wrapper with the native characteristic of the HTT tree structure. The AGW algorithm can not only generate the wrapper automatically, but also rebuild the data schema easily and reduce the complexity of the computing.
  • loading

Catalog

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return
    Baidu
    map