Solr/Lucene:将“字数”转换为数字的过滤器
【腾讯云】亏本大甩卖,服务器4核16G 1年370元(带宽12M,系统盘120GB SSD盘,月流量2000GB)!!!!!!
云产品 配置 价格
服务器 1核2G,带宽5M,系统盘50GB SSD盘,月流量500GB 38元/年
MySQL 1核1G 19元/年
服务器 16核32G,带宽18M,系统盘250GB SSD盘,月流量5000GB 1197元/年
点我进入腾讯云,查看更多详情

I'm using Solr as a search frontend to a large corpus of music artist / track information.

Is there a filter or other way to convert "word-numbers" like "five" to their equivalent number ("5") at index time in Lucene / Solr?

As an example, searching for "Ben Folds Five" should return "Ben Folds 5" as a result.

There is the PatternReplaceFilterFactory but doing that all in a regex seems like overkill.

#0

Here's the code that works (I used it in the past):

import java.util.*;

class ConvertWordToNumber {

    public static String WithSeparator(long number) {
        if (number < 0) {
            return "-" + WithSeparator(-number);
        }
        if (number / 1000L > 0) {
            return WithSeparator(number / 1000L) + ","
                    + String.format("%1$03d", number % 1000L);
        } else {
            return String.format("%1$d", number);
        }
    }

    private static String[] numerals = { "zero", "one", "two",
            "three", "four", "five", "six", "seven", "eight", "nine", "ten",
            "eleven", "twelve", "thirteen", "fourteen", "fifteen", "sixteen",
            "seventeen", "eighteen", "ninteen", "twenty", "thirty", "forty",
            "fifty", "sixty", "seventy", "eighty", "ninety", "hundred" };

    private static long[] values = { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12,
            13, 14, 15, 16, 17, 18, 19, 20, 30, 40, 50, 60, 70, 80, 90, 100 };

    private static ArrayList<String> list = new ArrayList<String>(
            Arrays.asList(numerals));

    public static long parseNumerals(String text) throws Exception {
        long value = 0;
        String[] words = text.replaceAll(" and ", " ").split("\\s");
        for (String word : words) {
            if (!list.contains(word)) {
                throw new Exception("Unknown token : " + word);
            }

            long subval = getValueOf(word);
            if (subval == 100) {
                if (value == 0)
                    value = 100;
                else
                    value *= 100;
            } else
                value += subval;
        }

        return value;
    }

    private static long getValueOf(String word) {
        return values[list.indexOf(word)];
    }

    private static String[] words = { "trillion", "billion", "million", "thousand" };
    private static long[] digits = { 1000000000000L, 1000000000L, 1000000L, 1000L };

    public static long parse(String text) throws Exception {
        text = text.toLowerCase().replaceAll("[\\-,]", " ").replaceAll(" and "," ");
        long totalValue = 0;
        boolean processed = false;
        for (int n = 0; n < words.length; n++) {
            int index = text.indexOf(words[n]);
            if (index >= 0) {
                String text1 = text.substring(0, index).trim();
                String text2 = text.substring(index + words[n].length()).trim();

                if (text1.equals(""))
                    text1 = "one";

                if (text2.equals(""))
                    text2 = "zero";

                totalValue = parseNumerals(text1) * digits[n] + parse(text2);
                processed = true;
                break;
            }
        }

        if (processed)
            return totalValue;
        else
            return parseNumerals(text);
    }


    public static void main(String[] args) throws Exception {
        Scanner in = new Scanner(System.in);
        System.out.print("Number in words : ");
        String numberWordsText = in.nextLine();
        System.out.println("Value : " + 
                ConvertWordToNumber.WithSeparator(
                ConvertWordToNumber.parse(numberWordsText)));
    }
}

Taken from here.

You can use it to build your own Solr filter.
Here's a decent post about that:

http://robotlibrarian.billdueber.com/building-a-solr-text-filter-for-normalizing-data/

Please contribute it to the Solr community when it's done. You can write your own wiki page.

To start, just follow link similar to this one:
http://wiki.apache.org/solr/SolrWordToNumberConverter

推荐文章

在C应用程序的ArrayList对象上,线程安全的多个并行读线程和偶尔的写线程的最佳方法是什么?

在C应用程序的ArrayList对象上,线程安全的多个并行读线程和偶尔的写线程的最佳方法是什么?

推荐文章

如何阅读Android市场评论?

如何阅读Android市场评论?

推荐文章

Clojure数字运算性能

Clojure数字运算性能

推荐文章

Ruby on rails:fields\u表示如果定义了子模型属性,则不执行任何操作=

Ruby on rails:fields\u表示如果定义了子模型属性,则不执行任何操作=

推荐文章

C++中的混淆

C++中的混淆

推荐文章

如何在Android中以受控方式向多个活动发送数据?

如何在Android中以受控方式向多个活动发送数据?

推荐文章

shell脚本的算术问题

shell脚本的算术问题

推荐文章

XQuery是否有FLWOR表达式的退出语句

XQuery是否有FLWOR表达式的退出语句

推荐文章

实体框架4-如何从多对多关系中删除

实体框架4-如何从多对多关系中删除

推荐文章

如何在flash中编写加载操作代码?

如何在flash中编写加载操作代码?

推荐文章

高度100%镀铬

高度100%镀铬

推荐文章

可以在XAML中绑定IGrouping枚举吗?如果是,那么绑定语法是什么样子的?

可以在XAML中绑定IGrouping枚举吗?如果是,那么绑定语法是什么样子的?

推荐文章

Windows窗体文本框应只接受url名称

Windows窗体文本框应只接受url名称

推荐文章

100%Javascript Web框架

100%Javascript Web框架

推荐文章

使用PHP数据库类作为单例有什么缺点?

使用PHP数据库类作为单例有什么缺点?

推荐文章

VS2010探查器似乎无法解析ngen图像中的符号

VS2010探查器似乎无法解析ngen图像中的符号