2

I have a problem that I can't seem to find an answer here for, so I'm asking it.

The thing is that I have a string and I have delimiters. I want to create an array of strings from the things which are between those delimiters (might be words, numbers, etc). However, if I have two delimiters next to one another, the split method will return an empty string for one of the instances.

I tested this against even more delimiters that are in succession. I found out that if I have n delimiters, I will have n-1 empty strings in the result array. In other words, if I have both "," and " " as delimiters, and the sentence "This is a very nice day, isn't it", then the array with results would be like:

{... , "day", "", "isn't" ...}

I want to get those extra empty strings out and I can't figure out how to do that. A sample regex for the delimiters that I have is:

"[\\s,.-\\'\\[\\]\\(\\)]"

Also can you explain why there are extra empty strings in the result array?

P.S. I read some of the similar posts which included information about the second parameter of the regex. I tried both negative, zero, and positive numbers, and I didn't get the result that I'm looking for. (one of the questions had an answer saying that -1 as a parameter might solve the problem, but it didn't.

2
  • So what's your code then? Commented May 4, 2015 at 20:34
  • this isn't code bound, I think that the problem is with the parameters, or the way that String uses Pattern and Matcher to create the array of results Commented May 4, 2015 at 20:36

4 Answers 4

1

You can use this regex for splitting:

[\\s,.'\\[\\]()-]+
  • Keep unescaped hyphen at first or last position in character class otherwise it is treated as range like A-Z or 0-9
  • You must use quantifier + for matching 1 more delimiters
Sign up to request clarification or add additional context in comments.

4 Comments

The reason why hyphen is first/last?
Added more details about that also
Yes, but since it's a delimiter for a range, wouldn't it be better if you espcaped the hyphen or something?
Yes you can use: [\\s,.\\-'\\[\\]()]+ also but [\\s,.'\\[\\]()-]+ is much cleaner
1

I think your problem is just the regex itself. You should use a greedy quantifier:

"[\\s,.-\\'\\[\\]\\(\\)]+"

See http://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html#sum

X+ ... X, one or more times

Comments

1

Your regular expression describes just one single character. If you want it to match multiple separators at once, use a quantifier:

String s = "This is a very nice day, isn't it";
String[] tokens = s.split("[\\s,.\\-\\[\\]()']+");

(Note the '+' at the end of the expression)

1 Comment

I had no idea that I could use the + as in regular regexp's, thanks
0

If you want to get rid of empty strings, you can use the Guava project Splitter class.

on method:

Returns a splitter that uses the given fixed string as a separator.

Example (ignoring empty strings):

System.out.println(
                Splitter.on(',')
                   .trimResults()
                   .omitEmptyStrings()
                   .split("foo,bar,,   qux")
                );

Output:

[foo, bar, qux]

onPattern method:

Returns a splitter that considers any subsequence matching a given pattern (regular expression) to be a separator.

Example (ignoring empty strings):

System.out.println(
                Splitter
                .onPattern("([,.|])")
                .trimResults()
                .omitEmptyStrings()
                .split("foo|bar,,  qux.hi")
                );

Output:

[foo, bar, qux, hi]

For more details, consult Splitter documentation.

1 Comment

@Mackiavelli Have you tried to use Splitter class? Here is the documentation.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.