基于规则的域名WHOIS信息抽取技术研究
作者:
作者单位:

作者简介:

通讯作者:

薛鹏飞,男,1989年生,博士,副教授,研究方向为网络空间测绘、异构信息网络E-mail:xuepengfei@nudt.edu.cn

中图分类号:

TP393

基金项目:

国家重点研发计划资助项目(2021YFB3100500)


Rule-based WHOIS information extraction technology
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    域名WHOIS数据包含域名所有权的相关信息,包括域名创建信息、注册者信息等,是网络空间实体与社会空间实体相关联的重要数据来源。然而,WHOIS协议仅在传输机制方面具有统一的标准,查询获得的域名注册信息在格式和内容上还存在较大差异,数据呈现多种模式。针对WHOIS数据存在多种模式,难以进行信息准确抽取的问题,设计实现了WHOIS信息的批量获取方法和基于规则的WHOIS信息解析器(rule-based parser),提升了域名WHOIS信息获取效率和分析准确率。通过合理设计解析字段,构建详尽的规则库,解决了不同模式下WHOIS信息解析效率不高的问题。与传统解析器对比,基于规则的WHOIS信息解析器能解析更多的顶级域名WHOIS信息,并能以更高的成功率和更低的时间开销完成对WHOIS信息的解析,能够为网络空间测绘、网络公害治理等方向提供技术和数据支撑。

    Abstract:

    As a significant data source to associate the cyberspace entities with social space entities, the WHOIS data contains domain name creation, registrant, and other related to domain name ownership information. However, the WHOIS protocol only has a unified standard in terms of transmission mechanism, and there are still large differences in the format and content of the domain name registration information obtained by query, and the data presents various schema. To processing the problem of WHOIS data that can not follow any consistent schema and is difficult to analyze at scale, we develop the WHOIS data acquisition methods and rule-based WHOIS information extraction parser. By rational designing analytical fields and constructing a detailed rule base, improved the efficiency of WHOIS data acquisition and extraction accuracy. Compared with traditional parsers, rule-based-parser can extract more top-level WHOIS data and complete the analysis of WHOIS information with a higher success rate and lower time overhead. It can provide technical and data support for cyberspace surveying and mapping, cyber pollution control and other directions.

    参考文献
    相似文献
    引证文献
引用本文
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2022-05-09
  • 最后修改日期:2022-06-04
  • 录用日期:
  • 在线发布日期: 2023-05-04
  • 出版日期: