使用 SQL Server 创建语言辅助函数-服务器专区

使用 SQL Server 创建语言辅助函数

作者：chinaitlab 佚名编辑： IT168 2005-11-07 00:00

　　【IT168 服务器学院】在现在这样一个全球化环境中，因为在不同的语言中有很多不同的语法规则，所以以前很多简单的任务现在都变得很困难。你可以将一门特定的语言分成一组语法规则和针对这些规则的异常（以及一个基本词语），从而将这些任务一般化。在一些编程语言（比如 Perl 和 Java）中，有一些公共域（domain）模块可以用来对文本完成语言转换。
　　
　　下面给出一个稍微简单一点儿的例子，假设我们要将一个数字转换成其拼写版本（例如需要填写支票和法律合同）。这个诀窍在 Oracle 出现的早期已经有了，一般都以如下方式使用：
　　
　　selectto_char(to_date(12345,''J''),''Jsp'') from dual;
　　
　　Twelve Thousand Three Hundred Forty-Five
　　
　　TO_DATE 函数使用 Julian 日期格式将数字转换成一个日期。然后，TO_CHAR 接受一个日期参数并再次将其格式化为一个表示 Julian 日期的拼写数字版本的字符串。但是这个决窍有一些限制。
　　
　　首先，在 Oracle 中 Julian 日期的最大有效值是9999年，所以日期的最大值只能取到5373484，而最小值是1或4712BC。而且，因为没有第“零”年，所以如果不额外使用一个 DECODE 或 CASE 语句就不可能生成文本“零”。第三个大的限制是它会忽略掉你的 NLS 设置。不管你使用的是哪种语言，数字总是以美国英语拼写出。一些简单的操作也存在这样的问题，比如拼写出天。例如，尝试生成西班牙语短语“Cinco de Mayo”：
　　
　　alter session set nls_language = ''SPANISH'';
　　select to_char(to_date(''0505'',''MMDD''),''Ddspth Month'') from dual;
　　
　　Fifth Mayo
　　
　　在为大多数语言生成数字时涉及的语法实际上相当简单。主体工作包括收集所有不同的语法规则并建立起足够的规则来生成正确的语法模式。（现在我将回避涉及到匹配数字和性别的问题。）
　　
　　首先，我将创建两个表：第一个表保存基本的单词和异常，第二个表保存用于生成文本的一些简单的模板模式。如果在第一个表中有数字，那么我的语言函数就返回那个文本。对于其它每个数字，我将试图在一系列模式中匹配它，并应用一个模板来生成正确的文本。
　　
　　create table numwords
　　(
　　　　lang　　varchar2(2),
　　　　num　　 integer,
　　　　word　　varchar2(30),
　　　　constraint numwords_pk primary key (lang,num)
　　);
　　
　　create table numrules
　　(
　　　　lang　　varchar2(2),
　　　　seq　　 integer,
　　　　p1　　　integer,
　　　　p2　　　integer,
　　　　temp0　 varchar2(30),
　　　　temp　　varchar2(30),
　　　　constraint numrules_pk primary key (lang,seq)
　　);
　　
　　下面是生成一个数字拼写版本所需的代码。这里我将按照基数来（比如1、2和3）；而事实上，这些函数可以通过为每种语言列出更多异常和模式来生成序数（第1、第2、第三）和复数版本。
　　
　　REM -- create a table of base words and exceptions
　　create or replace package genword
　　as
　　　　function get_word(n number) return varchar2;
　　　　function cardinal(n number) return varchar2;
　　end genword;
　　/
　　show errors;
　　
　　create or replace package body genword
　　as
　　　　function get_word(n number) return varchar2
　　　　is
　　　　　　l_wordnumwords.word%type;
　　　　begin
　　　　　　select word into l_word from numwords
　　　　　　 where lang = sys_context(''userenv'',''lang'') and num = n;
　　　　　　return l_word;
　　　　exception
　　　　　　when no_data_found then
　　　　　　　　return null;
　　　　end;
　　　　--
　　　　function cardinal(n number) return varchar2
　　　　is
　　　　　　p number;　　　 -- power
　　　　　　t varchar2(30); -- template
　　　　　　v number;　　　 -- lower portion
　　　　　　l_word　　　numwords.word%type;
　　　　begin
　　　　　　if n < 0 then
　　　　　　　　l_word := get_word(-1);
　　　　　　　　if l_word is null then
　　　　　　　　　　return null;
　　　　　　　　end if;
　　　　　　　　return l_word||'' ''||cardinal(-n);
　　　　　　end if;
　　　　　　l_word　:= get_word(n);
　　　　　　if l_word is not null then
　　　　　　　　return l_word;
　　　　　　end if;
　　　　　　for row in
　　　　　　(
　　　　　　　　select * from numrules
　　　　　　　　 where lang = sys_context(''userenv'',''lang'')
　　　　　　　　 order by seq
　　　　　　)
　　　　　　loop
　　　　　　　　if length(n) <= row.p1 + row.p2 then
　　　　　　　　　　p := power(10,row.p2);
　　　　　　　　　　v := mod(n,p);
　　　　　　　　　　if row.seq = 0 then
　　　　　　　　　　　　if n < 20 then
　　　　　　　　　　　　　　return replace(row.temp0,''~2'',cardinal(v));
　　　　　　　　　　　　end if;
　　　　　　　　　　else
　　　　　　　　　　　　if v = 0 then
　　　　　　　　　　　　　　return replace(row.temp0,''~1'',cardinal(n/p));
　　　　　　　　　　　　else
　　　　　　　　　　　　　　return replace(replace(nvl(row.temp,''~1 ~2''),
　　　　　　　　　　　　　　　　''~1'',cardinal(n-v)),
　　　　　　　　　　　　　　　　''~2'',cardinal(v));
　　　　　　　　　　　　end if;
　　　　　　　　　　end if;
　　　　　　　　end if;
　　　　　　end loop;
　　　　　　return ''NUMBER TOO LARGE'';
　　　　end cardinal;
　　end genword;
　　/
　　show errors;
　　
　　最后，这里是我为英语和德语收集的一些数据。我还将数据从美国英语拷贝到英国英语中并使用术语“thousand million”和“million million”代替“billion”和“trillion”（美国用法），在美国之外这两个短语通常是混淆的来源。这些数据对生成-999,999,999,999到999,999,999,999之间所有整数（包括零）的拼写版本已经足够了。
　　
　　REM -- American English
　　insert into numwords values (''US'',-1,''negative'');
　　insert into numwords values (''US'',0,''zero'');
　　insert into numwords values (''US'',1,''one'');
　　insert into numwords values (''US'',2,''two'');
　　insert into numwords values (''US'',3,''three'');
　　insert into numwords values (''US'',4,''four'');
　　insert into numwords values (''US'',5,''five'');
　　insert into numwords values (''US'',6,''six'');
　　insert into numwords values (''US'',7,''seven'');
　　insert into numwords values (''US'',8,''eight'');
　　insert into numwords values (''US'',9,''nine'');
　　insert into numwords values (''US'',10,''ten'');
　　insert into numwords values (''US'',11,''eleven'');
　　insert into numwords values (''US'',12,''twelve'');
　　insert into numwords values (''US'',13,''thirteen'');
　　insert into numwords values (''US'',15,''fifteen'');
　　insert into numwords values (''US'',18,''eighteen'');
　　insert into numwords values (''US'',20,''twenty'');
　　insert into numwords values (''US'',30,''thirty'');
　　insert into numwords values (''US'',40,''forty'');
　　insert into numwords values (''US'',50,''fifty'');
　　insert into numwords values (''US'',80,''eighty'');
　　insert into numwords select ''GB'',num,word from numwords where lang = ''US'';
　　
　　insert into numrules values (''US'',0,1,1,''~2teen'',null);
　　insert into numrules values (''US'',1,1,1,''~1ty'',''~1-~2'');
　　insert into numrules values (''US'',2,1,2,''~1 hundred'',null);
　　insert into numrules values (''US'',3,3,3,''~1 thousand'',null);
　　insert into numrules values (''US'',4,3,6,''~1 million'',null);
　　insert into numrules select ''GB'',seq,p1,p2,temp0,temp
　　　　from numrules where lang = ''US'';
　　insert into numrules values (''US'',5,3,9,''~1 billion'',null);
　　insert into numrules values (''GB'',5,3,9,''~1 thousand million'',null);
　　insert into numrules values (''US'',6,3,12,''~1 trillion'',null);
　　insert into numrules values (''GB'',6,3,12,''~1 million million'',null);
　　
　　REM - German
　　insert into numwords values (''D'',-1,''negativ'');
　　insert into numwords values (''D'',0,''null'');
　　insert into numwords values (''D'',1,''eins'');
　　insert into numwords values (''D'',2,''zwei'');
　　insert into numwords values (''D'',3,''drei'');
　　insert into numwords values (''D'',4,''vier'');
　　insert into numwords values (''D'',5,unistr(''f\00FCnf''));
　　insert into numwords values (''D'',6,''sechs'');
　　insert into numwords values (''D'',7,''sieben'');
　　insert into numwords values (''D'',8,''acht'');
　　insert into numwords values (''D'',9,''neun'');
　　insert into numwords values (''D'',10,''zehn'');
　　insert into numwords values (''D'',11,''elf'');
　　insert into numwords values (''D'',12,unistr(''zw\00F6lf''));
　　insert into numwords values (''D'',13,''dreizehn'');
　　insert into numwords values (''D'',16,''sechzehn'');
　　insert into numwords values (''D'',17,''siebzehn'');
　　insert into numwords values (''D'',20,''zwanzig'');
　　insert into numwords values (''D'',21,''einundzwanzig'');
　　insert into numwords values (''D'',30,unistr(''drei\00DFig''));
　　insert into numwords values (''D'',31,unistr(''einunddrei\00DFig''));
　　insert into numwords values (''D'',41,''einundvierzig'');
　　insert into numwords values (''D'',51,unistr(''einundf\00FCnfzig''));
　　insert into numwords values (''D'',60,''sechzig'');
　　insert into numwords values (''D'',70,''siebzig'');
　　insert into numwords values (''D'',100,''hundert'');
　　insert into numwords values (''D'',1000,''tausend'');
　　insert into numwords values (''D'',1e6,''eine Million'');
　　insert into numwords values (''D'',1e9,''eine Milliarde'');
　　insert into numwords values (''D'',1e12,''eine Billion'');
　　
　　insert into numrules values (''D'',0,1,1,''~2zehn'',null);
　　insert into numrules values (''D'',1,1,1,''~1zig'',''~2und~1'');
　　insert into numrules values (''D'',2,1,2,''~1hundert'',''~1~2'');
　　insert into numrules values (''D'',3,3,3,''~1tausend'',''~1 und ~2'');
　　insert into numrules values (''D'',4,3,6,''~1 Millionen'',null);
　　insert into numrules values (''D'',5,3,9,''~1 Milliarden'',null);
　　insert into numrules values (''D'',6,3,12,''~1 Billionen'',null);
　　
　　下面是一些简单的 SQL 语句，这些语句使用了前面提供到函数和数据。你可以试一下将语言设成‘GERMAN’，或‘ENGLISH’来测试其它两组数据：
　　
　　SQL> alter session set nls_language = ''AMERICAN'';
　　SQL> select genword.cardinal(123456789) from dual;
　　
　　one hundred twenty-three million four hundred fifty-six thousand seven hundred
　　eighty-nine

关注我们