How to Parse a String in Java
This blog will explain how to parse a string in Java. Let’s start!
Parse a String in Java
In Java, there exist three main approaches to parse a string:
- Parse string by using Java Split() method
- Parse string by using Java Scanner class
- Parse string by using StringUtils class
We will now discuss each of the above-mentioned approaches in detail.
Method 1: Parse String by Using Java split() Method
In Java, there is a split() method of the String class that splits the given string and returns the array of substrings. It keeps the given string unchanged. Also, the split() method is case-sensitive.
The split() method has two variations.
- split(regular-expression/delimiter, limit)
Have a look at the given examples to know more about the usage of the split() method.
Example 1: Parsing String in Java Using split(regular-expression/delimiter) variant
In this example, we will use the split(regular-expression/delimiter) variant of the split() method. A regular expression is passed as an argument for this method. If the given expression matches the string, it will divide the string; otherwise, it will print out the whole string.
Here, we have a string named stg:
While using the split() method, you can also use different delimiters as a condition, such as any alphabet from string or special character, and many more. The below-given string will be split based on the white spaces:
Lastly, for printing the parsed string, use for loop:
As you can see, the split() method has successfully parsed the given string based on the occurrence of the white spaces:
Example 2: Parsing String in Java Using split(regular-expression/delimiter, limit) variant
This variant works almost the same as the above. Here, we will add the limit with a delimiter, which determines the number of splitting strings according to the string length.
For instance, we have a string named stg:
We will use for loop to split the whole stg string with white space as delimiter and limit as 3:
The specified string will split as the space occurs, and it will return three strings according to the added limit:
Method 2: Parse String by Using Java Scanner Class
To parse a string, Java Scanner class commonly uses a regular expression or regex. It divides the given string into tokens by using a useDelimiter() method.
First, we have a string stng that needs to be parsed:
Create an object of the Scanner class and pass the string stng as a parameter:
The delimiter pattern is set using the useDelimiter() method of the Scanner class. Here, we will pass the colon “:” as a delimiter pattern in the useDelimiter() method:
This method splits the string when it finds a colon. To obtain all of the tokens in the string, use the hasNext() method in a while loop and print the result:
Method 3: Parse String by Using StringUtils Class
In order to parse a string using StringUtils class, first of all, we will create a maven project rather than a simple Java project and then add dependencies to it.
Here, we have specified maven dependency for adding StringUtils library in our XML file:
Then, create a Java file, and use the StringUtils class to parse the string stng:
We will use the substringsBetween() method of StringUtils class, specify the stng string and pass “:” and “!” as delimiters which means that the resultant value will contain the substring that is present in between these two delimiters:
To print the parsed strings, utilize for loop:
The output will display the substring between “:” and “!” delimiters:
We have provided the information related to parsing a string in Java.
To parse a string in Java, you can use the Java String split() method, Java Scanner class, or StringUtils class. For parsing a string based on the specified condition, these methods use delimiters to split the string. However, the split() method is majorly utilized as it supports adding delimiter/ regex and the relative limit. This blog explained the methods to parse a string in Java with examples.
About the author
I completed my master’s degree in computer science. I am an academic researcher and love to learn and write about new technologies. I am passionate about writing and sharing my experience with the world.
Parse a String in Java
This tutorial explains how to parse a string in Java using various methods. Parsing is the process of taking a string and processing it to extract information.
Use the split Method to Parse a String in Java
The split() method of the String class works by splitting the source string keeping the original string unmodified, and returns an array of substrings of the original string. This method has two variants.
The split(String regex) method takes a regular expression of type string as an argument and splits the string around the regular expression’s matches. If the regular expression fails to match any part of the original string, it returns an array with one element: the source string.
The split(String regex, int limit) method works the same but takes limit , which means how many strings to be returned. If the limit is negative, the returned array can contain as many substrings as possible when the limit is 0. The array would contain all substrings, excluding the trailing empty strings.
Use Scanner to Parse a String in Java
Scanner is generally used to parse primitive types and strings using a regular expression. It breaks the input into tokens using a delimiter pattern which be default matched white-space.
We create a scanner with a specified string object. The useDelimiter() method of the Scanner class is used to set the delimiter pattern. We can either pass a Pattern object or string as a pattern. To get all the tokens of the string, we loop through the tokens using the hasNext() method and print the output.
Use StringUtils to Parse a String in Java
Apache Commons StringUtils class provides tools that facilitate easy working with Strings. The maven dependency to add this library is given below.
We use the substringBetween(String str, String open, String close) method of the StringUtils class to parse a given string. This method extracts a substring nested between two strings.
Rupam Saini is an android developer, who also works sometimes as a web developer., He likes to read books and write about various things.
Парсинг строк в Java
Если в метод передать строку, которая не является целочисленным значением, будет получена ошибка java.lang.NumberFormatException , которая будет сообщать, что полученная строка не является целочисленным значением.
NumberFormatException произойдет и в том случае, если переданная строка будет содержать пробел.
parseInt() — может работать с отрицательными числами. Для этого строка должна начинаться с символа “-”.
parseInt() — не может распарсить строку, если числовое значение выходит за пределы типы int (-2147483648 .. 2147483647).
What are the different methods to parse strings in Java? [closed]
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago .
For parsing player commands, I’ve most often used the split method to split a string by delimiters and then to then just figure out the rest by a series of if s or switch es. What are some different ways of parsing strings in Java?
15 Answers 15
I really like regular expressions. As long as the command strings are fairly simple, you can write a few regexes that could take a few pages of code to manually parse.
I would suggest you check out http://www.regular-expressions.info for a good intro to regexes, as well as specific examples for Java.
I assume you’re trying to make the command interface as forgiving as possible. If this is the case, I suggest you use an algorithm similar to this:
- Read in the string
- Split the string into tokens
- Use a dictionary to convert synonyms to a common form
- For example, convert «hit», «punch», «strike», and «kick» all to «hit»
- Perform actions on an unordered, inclusive base
- Unordered — «punch the monkey in the face» is the same thing as «the face in the monkey punch»
- Inclusive — If the command is supposed to be «punch the monkey in the face» and they supply «punch monkey», you should check how many commands this matches. If only one command, do this action. It might even be a good idea to have command priorities, and even if there were even matches, it would perform the top action.
Parsing manually is a lot of fun. at the beginning:)
In practice if commands aren’t very sophisticated you can treat them the same way as those used in command line interpreters. There’s a list of libraries that you can use: http://java-source.net/open-source/command-line. I think you can start with apache commons CLI or args4j (uses annotations). They are well documented and really simple in use. They handle parsing automatically and the only thing you need to do is to read particular fields in an object.
If you have more sophisticated commands, then maybe creating a formal grammar would be a better idea. There is a very good library with graphical editor, debugger and interpreter for grammars. It’s called ANTLR (and the editor ANTLRWorks) and it’s free:) There are also some example grammars and tutorials.
I would look at Java migrations of Zork, and lean towards a simple Natural Language Processor (driven either by tokenizing or regex) such as the following (from this link):
Anything which gives a programmer a reason to look at Zork again is good in my book, just watch out for Grues.
Sun itself recommends staying away from StringTokenizer and using the String.spilt method instead.
You’ll also want to look at the Pattern class.
Another vote for ANTLR/ANTLRWorks. If you create two versions of the file, one with the Java code for actually executing the commands, and one without (with just the grammar), then you have an executable specification of the language, which is great for testing, a boon for documentation, and a big timesaver if you ever decide to port it.
If this is to parse command lines I would suggest using Commons Cli.
The Apache Commons CLI library provides an API for processing command line interfaces.
Try JavaCC a parser generator for Java.
It has a lot of features for interpreting languages, and it’s well supported on Eclipse.
@CodingTheWheel Heres your code, a bit clean up and through eclipse ( ctrl + shift + f ) and the inserted back here 🙂
Including the four spaces in front each line.
A simple string tokenizer on spaces should work, but there are really many ways you could do this.
Here is an example using a tokenizer:
Then tokens can be further used for the arguments. This all assumes no spaces are used in the arguments. so you might want to roll your own simple parsing mechanism (like getting the first whitespace and using text before as the action, or using a regular expression if you don’t mind the speed hit), just abstract it out so it can be used anywhere.
When the separator String for the command is allways the same String or char (like the «;») y recomend you use the StrinkTokenizer class:
but when the separator varies or is complex y recomend you to use the regular expresions, wich can be used by the String class itself, method split, since 1.4. It uses the Pattern class from the java.util.regex package
If the language is dead simple like just
then splitting by hand works well.
If it’s more complex, you should really look into a tool like ANTLR or JavaCC.
I’ve got a tutorial on ANTLR (v2) at http://javadude.com/articles/antlrtut which will give you an idea of how it works.
JCommander seems quite good, although I have yet to test it.
If your text contains some delimiters then you can your split method.
If text contains irregular strings means different format in it then you must use regular expressions .
split method can split a string into an array of the specified substring expression regex . Its arguments in two forms, namely: split ( String regex ) and split ( String regex, int limit ), which split ( String regex ) is actually by calling split (String regex, int limit) to achieve, limit is 0. Then, when the limit> 0 and limit <0 represents what?
When the jdk explained: when limit> 0 sub-array lengths up to limit, that is, if possible, can be limit-1 sub-division, remaining as a substring (except by limit-1 times the character has string split end);
limit <0 indicates no limit on the length of the array;
limit = 0 end of the string empty string will be truncated. StringTokenizer class is for compatibility reasons and is preserved legacy class, so we should try to use the split method of the String class. refer to link