首页   注册   登录
V2EX = way to explore
V2EX 是一个关于分享和探索的地方
现在注册
已注册用户请  登录
推荐学习书目
Learn Python the Hard Way
Python 学习手册
Python Cookbook
Python 基础教程
Python Sites
PyPI - Python Package Index
http://www.simple-is-better.com/
http://diveintopython.org/toc/index.html
Pocoo
值得关注的项目
PyPy
Celery
Jinja2
Read the Docs
gevent
pyenv
virtualenv
Stackless Python
Beautiful Soup
结巴中文分词
Green Unicorn
Sentry
Shovel
Pyflakes
pytest
Python 编程
pep8 Checker
Styles
PEP 8
Google Python Style Guide
Code Style from The Hitchhiker's Guide
V2EX  ›  Python

45 行 Python 代码写一个语言检测器

  •  
  •   amazing994 · 251 天前 · 746 次点击
    这是一个创建于 251 天前的主题,其中的信息可能已经有所发展或是发生改变。
    class NGram(object):
    def __init__(self, text, n=3):
    self.length = None
    self.n = n
    self.table = {}
    self.parse_text(text)
    self.calculate_length()

    def parse_text(self, text):
    chars = ' ' * self.n # initial sequence of spaces with length n

    for letter in (" ".join(text.split()) + " "):
    chars = chars[1:] + letter # append letter to sequence of length n
    self.table[chars] = self.table.get(chars, 0) + 1 # increment count

    def calculate_length(self):
    """ Treat the N-Gram table as a vector and return its scalar magnitude
    to be used for performing a vector-based search.
    """
    self.length = sum([x * x for x in self.table.values()]) ** 0.5
    return self.length

    def __sub__(self, other):
    """ Find the difference between two NGram objects by finding the cosine
    of the angle between the two vector representations of the table of
    N-Grams. Return a float value between 0 and 1 where 0 indicates that
    the two NGrams are exactly the same.
    """
    if not isinstance(other, NGram):
    raise TypeError("Can't compare NGram with non-NGram object.")

    if self.n != other.n:
    raise TypeError("Can't compare NGram objects of different size.")

    total = 0
    for k in self.table:
    total += self.table[k] * other.table.get(k, 0)

    return 1.0 - (float(total) )/ (float(self.length) * float(other.length))

    def find_match(self, languages):
    """ Out of a list of NGrams that represent individual languages, return
    the best match.
    """
    return min(languages, lambda n: self - n)


    更多代码请扣 1132032275
    目前尚无回复
    关于   ·   FAQ   ·   API   ·   我们的愿景   ·   广告投放   ·   感谢   ·   实用小工具   ·   885 人在线   最高记录 3762   ·  
    创意工作者们的社区
    World is powered by solitude
    VERSION: 3.9.8.1 · 17ms · UTC 18:13 · PVG 02:13 · LAX 11:13 · JFK 14:13
    ♥ Do have faith in what you're doing.
    沪ICP备16043287号-1