Friday, December 11, 2009

Reading UTF-8 streams

If you have a byte array with Unicode characters encoded with UTF-8 you can create a String from it with the following constructors:

String(byte[] bytes, int off, int len, String enc) 
String(byte[] bytes, String enc)

This is useful when reading from local files. But what can you use if you are reading an UTF-8 stream?

You can use class with the following constructor:

InputStreamReader(InputStream is, String enc)

After you have the InputStreamReader instance you can create a char array buffer and use the following method:

public int read(char[] cbuf, int off, int len)

For example:

InputStreamReader in = new InputStreamReader(
inputConnection.openInputStream(), "UTF-8");
char [] buff = new char[1024];
int len =, 0, buff.length);

while (len > 0) {
// use buff characters, like
// String s = new String(buff, 0, len)

len =, 0, buff.length);

But I have used only and did not have a problem with it.

This also applies for Java Standard and Enterprise editions.

