Guava CharMatcher —— 字符匹配器
CharMatcher 提供了各种方法来处理各种 JAVA char 类型值。
关于源码中的彩蛋
CharMatcher 类中,开头部分有一张宠物小精灵“小火龙”的字符画,就像本文的封面图一样,一开始不解为何要放一只“小火龙”在这里,后来看到其英文名 Charmander 才明白过来。谐音梗……。
一、类声明
以下是com.google.common.base.CharMatcher类的声明:
@GwtCompatible(emulated=true) public abstract class CharMatcher extends Object implements Predicate<Character>
二、类方法
官方文档:https://google.github.io/guava/releases/27.0.1-jre/api/docs/com/google/common/base/CharMatcher.html
修饰符和类型 | 方法说明 |
---|---|
CharMatcher | and(CharMatcher other) 修饰匹配器,返回当前匹配器与other匹配器做与操作的匹配器. |
static CharMatcher | any() 匹配任意字符的匹配器. |
static CharMatcher | anyOf(CharSequence sequence) 通过sequence初始化匹配器,该匹配器可以匹配sequence中所有字符. |
static CharMatcher | ascii() 获取可以匹配所有ascii码的匹配器. |
static CharMatcher | breakingWhitespace() 获取可以匹配所有可换行的空白字符的匹配器(不包括非换行空白字符,例如”\u00a0″). |
String | collapseFrom(CharSequence sequence, char replacement) 折叠操作,将charMatcher连续被匹配到的字符用一个replacement替换. |
int | countIn(CharSequence sequence) 获取charMatcher在sequence中匹配到字符的个数. |
static CharMatcher | forPredicate(Predicate<? super Character> predicate) 通过Predicate初始化CharMatcher,该匹配器可以匹配Predicate函数式接口apply方法实现返回True的字符. |
int | indexIn(CharSequence sequence) 获取charMatcher在当sequence中匹配到的第一个字符的index. |
int | indexIn(CharSequence sequence, int start) 获取charMatcher在当sequence中从index start开始匹配到的第一个字符的index. |
static CharMatcher | inRange(char startInclusive, char endInclusive) 通过边界值初始化CharMatcher,该匹配器可以匹配处于startInclusive和endInclusive之间的所有字符. |
static CharMatcher | is(char match) 通过单个字符初始化CharMatcher,该匹配器只能匹配match这个单字符. |
static CharMatcher | isNot(char match) 通过单个字符初始化CharMatcher,该匹配器可以匹配除了match之外的所有字符. |
static CharMatcher | javaIsoControl() 获取可以匹配所有Java转义字符的匹配器. |
int | lastIndexIn(CharSequence sequence) 获取获取charMatcher在当sequence中匹配到的最后一个字符的index. |
abstract boolean | matches(char c) 确定给定字符的true或false值。 |
boolean | matchesAllOf(CharSequence sequence) 判断sequence所有字符是否都被charMatcher匹配. |
boolean | matchesAnyOf(CharSequence sequence) 判断sequence中是否存在字符被charMatcher匹配. |
boolean | matchesNoneOf(CharSequence sequence) 判断sequence所有字符是否都没被charMatcher匹配. |
CharMatcher | negate() 修饰匹配器,返回和当前匹配器相反的匹配器. |
static CharMatcher | none() 不匹配任何字符的匹配器,与any()相反. |
static CharMatcher | noneOf(CharSequence sequence) 通过sequence初始化匹配器,该匹配器可以匹配除sequence之外的所有字符. |
CharMatcher | or(CharMatcher other) 修饰匹配器,返回当前匹配器与other匹配器做或操作的匹配器. |
CharMatcher | precomputed() 修饰匹配器,返回的CharMatcher在检索时比原始的CharMatcher效率高,但是预处理也需要花时间,所以只有当某个 CharMatcher需要被使用上千次的时候才有必要进行预处理. |
String | removeFrom(CharSequence sequence) 删除sequence中所有被charMatcher匹配到的字符. |
String | replaceFrom(CharSequence sequence, char replacement) 将sequence中所有被charMatcher匹配到的字符用replacement替换. |
String | replaceFrom(CharSequence sequence, CharSequence replacement) 将sequence中所有被charMatcher匹配到的字符用replacement替换. |
String | retainFrom(CharSequence sequence) 保留sequence中所有被charMatcher匹配到的字符. |
String | toString() 返回此CharMatcher的字符串表示形式,例如CharMatcher.or(WHITESPACE, JAVA_DIGIT). |
String | trimAndCollapseFrom(CharSequence sequence, char replacement) 先对sequence做trim操作(删除sequence头和尾的空格),再对trim的结果做collapse操作(将charMatcher连续被匹配到的字符用一个replacement替换). |
String | trimFrom(CharSequence sequence) 删除sequence首尾charMatcher匹配到的字符. |
String | trimLeadingFrom(CharSequence sequence) 删除sequence首部charMatcher匹配到的字符. |
String | trimTrailingFrom(CharSequence sequence) 删除sequence尾部charMatcher匹配到的字符. |
static CharMatcher | whitespace() 获取可以匹配所有空格的匹配器. |
已过时方法
修饰符和类型 | 方法说明 |
---|---|
boolean | apply(Character character) 已过时. Provided only to satisfy the Predicate interface; use matches(char) instead. |
static CharMatcher | digit() 已过时. Many digits are supplementary characters; see the class documentation. |
static CharMatcher | invisible() 已过时. Most invisible characters are supplementary characters; see the class documentation. |
static CharMatcher | javaDigit() 已过时. Many digits are supplementary characters; see the class documentation. |
static CharMatcher | javaLetter() 已过时. Most letters are supplementary characters; see the class documentation. |
static CharMatcher | javaLetterOrDigit() 已过时. Most letters and digits are supplementary characters; see the class documentation. |
static CharMatcher | javaLowerCase() 已过时. Some lowercase characters are supplementary characters; see the class documentation. |
static CharMatcher | javaUpperCase() 已过时. Some uppercase characters are supplementary characters; see the class documentation. |
static CharMatcher | singleWidth() 已过时. Many such characters are supplementary characters; see the class documentation. |
CharMatcher 字符匹配
CharMatcher 抽象类内部做了大量的实现,可以满足大多数字符匹配需求。
方法名 | 功能说明 | 是否过期 | 代替方案 | |
---|---|---|---|---|
CharMatcher.any() | 用于匹配任意字符 | - | - | |
CharMatcher.ascii() | 用于匹配ASCII字符 | - | - | |
CharMatcher.breakingWhitespace() | 用于匹配所有的可换行的空白符 | - | - | |
CharMatcher.digit() | 匹配ASCII数字 | 是 | CharMatcher.forPredicate(Character::isDigit) | |
CharMatcher.invisible() | 匹配所有不可见字符 | 是 | ||
CharMatcher.javaDigit() | 匹配unicode数字 | 是 | CharMatcher.forPredicate(Character::isDigit) | |
CharMatcher.javaIsoControl() | 匹配iso控制字符 | - | - | |
CharMatcher.javaLetter() | 匹配字母(含中文) | 是 | CharMatcher.forPredicate(Character::isLetter) | |
CharMatcher.javaLetterOrDigit() | 匹配字母(含中文)或数字 | 是 | CharMatcher.forPredicate(Character::isLetterOrDigit) | |
CharMatcher.javaLowerCase() | 匹配所有小写字符 | 是 | CharMatcher.forPredicate(Character::isLowerCase) | |
CharMatcher.javaUpperCase() | 匹配所有大写字符 | 是 | CharMatcher.forPredicate(Character::isUpperCase) | |
CharMatcher.none() | 不匹配所有字符 | - | - | |
CharMatcher.singleWidth() | 匹配单字宽字符 | 是 | - | |
CharMatcher.whitespace() | 用于匹配所有空白字符 | - | - |
如下几个源码中对应的字符
1、Digit
根据确定字符是否为BMP数字:http://unicode.org/cldr/utility/list-unicodeset.jsp?a=\p{digit}
private static final String ZEROES = "0\u0660\u06f0\u07c0\u0966\u09e6\u0a66\u0ae6\u0b66\u0be6\u0c66\u0ce6\u0d66\u0de6" + "\u0e50\u0ed0\u0f20\u1040\u1090\u17e0\u1810\u1946\u19d0\u1a80\u1a90\u1b50\u1bb0" + "\u1c40\u1c50\ua620\ua8d0\ua900\ua9d0\ua9f0\uaa50\uabf0\uff10";
2、breakingWhitespace
static final CharMatcher INSTANCE = new BreakingWhitespace(); @Override public boolean matches(char c) { switch (c) { case '\t': case '\n': case '\013': case '\f': case '\r': case ' ': case '\u0085': case '\u1680': case '\u2028': case '\u2029': case '\u205f': case '\u3000': return true; case '\u2007': return false; default: return c >= '\u2000' && c <= '\u200a'; }
3、whitespace
根据最新的Unicode标准确定字符是否为空格,查询地址:http://unicode.org/cldr/utility/list-unicodeset.jsp?a=\p{whitespace}
这与其他Java API使用的定义不同,详见对空白的定义比较:https://goo.gl/Y6SLW
static final String TABLE = "\u2002\u3000\r\u0085\u200A\u2005\u2000\u3000" + "\u2029\u000B\u3000\u2008\u2003\u205F\u3000\u1680" + "\u0009\u0020\u2006\u2001\u202F\u00A0\u000C\u2009" + "\u3000\u2004\u3000\u3000\u2028\n\u2007\u3000";
4、Invisible
https://unicode.org/cldr/utility/list-unicodeset.jsp
private static final String RANGE_STARTS = "\u0000\u007f\u00ad\u0600\u061c\u06dd\u070f\u08e2\u1680\u180e\u2000\u2028\u205f\u2066" + "\u3000\ud800\ufeff\ufff9"; private static final String RANGE_ENDS = // inclusive ends "\u0020\u00a0\u00ad\u0605\u061c\u06dd\u070f\u08e2\u1680\u180e\u200f\u202f\u2064\u206f" + "\u3000\uf8ff\ufeff\ufffb";
5、SingleWidth
static final SingleWidth INSTANCE = new SingleWidth(); private SingleWidth() { super( "CharMatcher.singleWidth()", "\u0000\u05be\u05d0\u05f3\u0600\u0750\u0e00\u1e00\u2100\ufb50\ufe70\uff61".toCharArray(), "\u04f9\u05be\u05ea\u05f4\u06ff\u077f\u0e7f\u20af\u213a\ufdff\ufeff\uffdc".toCharArray()); }
三、CharMatcher 介绍
在以前的Guava版本中,StringUtil
类疯狂地膨胀,其拥有很多处理字符串的方法:allAscii
、collapse
、collapseControlChars
、collapseWhitespace
、indexOfChars
、lastIndexNotOf
、numSharedChars
、removeChars
、removeCrLf
、replaceChars
、retainAllChars
、strip
、stripAndCollapse
、stripNonDigits
。
所有这些方法指向两个概念上的问题:
怎么才算匹配字符?
如何处理这些匹配字符?
为了收拾这个泥潭,CharMatcher
诞生了。
直观上,我们可以认为一个 CharMatcher
实例代表着某一类字符,如数字或空白字符。事实上来说,CharMatcher
实例就是对字符的布尔判断,CharMatcher
确实也实现了 Predicate<Character>
,但类似”所有空白字符”或”所有小写字母”的需求太普遍了,Guava 因此创建了这一 API。
使用 CharMatcher
的好处更在于它提供了一系列方法,让你对字符作特定类型的操作,例如:修剪[trim]、折叠[collapse]、移除[remove]、保留[retain]等等。
四、测试类
package com.example.guava.string_utilities; import com.google.common.base.CharMatcher; import junit.framework.TestCase; public class CharMatcherTest extends TestCase { /** * collapseFrom 配到的字符做替换 */ public void testCollapseFrom() { String input = " Ting Feng "; String result = CharMatcher.breakingWhitespace().collapseFrom(input, '*'); System.out.println(result); // *Ting*Feng* result = CharMatcher.is(' ').collapseFrom(input, '-'); System.out.println(result); // -Ting-Feng- } /** * trimAndCollapseFrom 去掉两边空格,然后执行 collapseFrom 操作 */ public void testTrimAndCollapseFrom() { String input = " Ting Feng "; String result = CharMatcher.breakingWhitespace().trimAndCollapseFrom(input, '-'); System.out.println(result); // Ting-Feng result = CharMatcher.is(' ').trimAndCollapseFrom(input, '-'); System.out.println(result); // Ting-Feng } /** * trimFrom 去空格 * trimLeadingFrom 左边去空格 * trimTrailingFrom右边去空格 */ public void testTrim() { System.out.println(CharMatcher.breakingWhitespace().trimFrom(" Ting Feng ")); // Ting Feng System.out.println(CharMatcher.breakingWhitespace().trimLeadingFrom(" Ting Feng ")); // Ting Feng System.out.println(CharMatcher.breakingWhitespace().trimTrailingFrom(" Ting Feng ")); // Ting Feng } /** * retainFrom 保留匹配到的字符 */ public void testRetainFrom() { System.out.println(CharMatcher.breakingWhitespace().retainFrom(" Hi 123 Ting 456 Feng ")); // " " 空格 } /** * removeFrom 删除所有匹配的字符 */ public void testRemoveFrom() { System.out.println(CharMatcher.breakingWhitespace().removeFrom(" Hi 123 Ting 456 Feng ")); // Hi123Ting456Feng } /** * countIn 查找字符在字符串中的个数 */ public void testCountIn() { System.out.println(CharMatcher.is('a').countIn("TingFeng Sharing the Google Guava Used")); // 3 String input = "H*el.lo,}12"; CharMatcher matcher = CharMatcher.forPredicate(Character::isLetterOrDigit); System.out.println(matcher.retainFrom(input)); // Hello12 System.out.println(matcher.countIn(input)); // 7 matcher = CharMatcher.inRange('a', 'l'); System.out.println(matcher.countIn(input)); // 3 } /** * indexIn 匹配到的第一个字符的index * lastIndexIn 匹配到的最后一个字符的index */ public void testIndexIn_lastIndexIn() { String input = "**el.lo,}12"; CharMatcher matcher = CharMatcher.forPredicate(Character::isLetterOrDigit); System.out.println(matcher.indexIn(input)); // 2 System.out.println(matcher.indexIn(input, 4)); // 5 System.out.println(matcher.lastIndexIn(input)); // 10 } /** * is 匹配参数之内的所有字符 * isNot 匹配参数之外的所有字符 */ public void testIs_isNot(){ String input = "a, c, z, 1, 2"; System.out.println(CharMatcher.is(',').retainFrom(input)); // ,,,, System.out.println(CharMatcher.is(',').removeFrom(input)); // a c z 1 2 System.out.println(CharMatcher.isNot(',').retainFrom(input)); // a c z 1 2 System.out.println(CharMatcher.isNot(',').removeFrom(input)); // ,,,, } /** * 匹配java转义字符 */ public void testJavaIsoControl(){ String input = "ab\tcd\nef\bg"; CharMatcher matcher = CharMatcher.javaIsoControl(); System.out.println(matcher.removeFrom(input)); // abcdefg } /** * 两个 Matcher 同时匹配 */ public void testDoubleMatcher() { CharMatcher matcher0 = CharMatcher.forPredicate(Character::isLetterOrDigit); CharMatcher matcher1 = CharMatcher.forPredicate(Character::isLowerCase); String result = matcher0.and(matcher1).retainFrom("H*el.lo,}12"); System.out.println(result); // ell0 } /** * matchesAllOf 判断sequence所有字符是否都被charMatcher匹配 * matchesAnyOf 判断sequence中是否存在字符被charMatcher匹配 * matchesNoneOf 判断sequence所有字符是否都没被charMatcher匹配 */ public void test_matchesAllOf_matchesAnyOf_matchesNoneOf(){ String input = "**e,l.lo,}12"; CharMatcher matcher = CharMatcher.is(','); System.out.println(matcher.matchesAllOf(input)); // false matcher = CharMatcher.is(','); System.out.println(matcher.matchesAnyOf(input)); // true matcher = CharMatcher.is('?'); System.out.println(matcher.matchesNoneOf(input)); // true } /** * 匹配任意字符 */ public void testAny() { String input = "H*el.lo,}12"; CharMatcher matcher = CharMatcher.any(); String result = matcher.retainFrom(input); System.out.println(result); // H*el.lo,}12 matcher = CharMatcher.anyOf("Hel"); System.out.println(matcher.retainFrom(input)); // Hell System.out.println(matcher.removeFrom(input)); // *.o,}12 } /** * 匹配 Ascii */ public void testAscii() { String input = "あH*el.lo,}12"; CharMatcher matcher = CharMatcher.ascii(); System.out.println(matcher.retainFrom(input)); // H*el.lo,}12 System.out.println(matcher.removeFrom(input)); // あ } /** * negate 返回与当前CharMatcher相反的CharMatcher */ public void testNegate(){ String input = "あH*el.lo,}12"; CharMatcher matcher = CharMatcher.ascii().negate(); System.out.println(matcher.retainFrom(input)); // あ System.out.println(matcher.removeFrom(input)); // H*el.lo,}12 } /** * none 不匹配任何字符,与any()相反 * noneOf 不匹配CharSequence内的任意一个字符,与anyOf()相反 */ public void testNone_noneOf(){ String input = "H*el.lo,}12"; CharMatcher matcher = CharMatcher.none(); System.out.println(matcher.retainFrom(input)); // "" System.out.println(matcher.retainFrom(input).length()); // 0 matcher = CharMatcher.noneOf("Hel"); System.out.println(matcher.retainFrom(input)); // *.o,}12 System.out.println(matcher.removeFrom(input)); // Hell } /** * forPredicate 初始化匹配器 */ public void testForPredicate() { // CharMatcher charMatcher = CharMatcher.forPredicate(new Predicate<Character>() { // @Override // public boolean apply(@Nullable Character input) { // return Character.isLetterOrDigit(input); // } // }); // lambda 写法 CharMatcher charMatcher = CharMatcher.forPredicate(input -> Character.isLetterOrDigit(input)); String input = "H*el.lo,}12"; System.out.println(charMatcher.retainFrom(input)); // Hello12 } }
五、总结
CharMatcher 中提供了大量的方法,有些方法也不太容易理解,在开发中也不常用到,而且没有提供正则表达式判断匹配的方法,在日常工作中,还是要与其他工具类如Apache Commons一起使用效果更佳。
六、相关文章
未经允许请勿转载:程序喵 » Google Guava 快速入门 —— 【字符串处理】CharMatcher 字符匹配器