public class TextFile extends Object
Modifier and Type | Method and Description |
---|---|
void |
delete() |
boolean |
exists() |
String |
fastTail(int numChars)
Uses the platform default encoding.
|
String |
fastTail(int numChars,
Charset cs)
Efficiently reads the last N characters (or shorter, if the whole file is shorter than that.)
|
String |
head(int numChars)
Reads the first N characters or until we hit EOF.
|
Stream<String> |
lines()
Read all lines from the file as a
Stream . |
String |
read()
Reads the entire contents and returns it.
|
String |
readTrim() |
String |
toString() |
void |
write(String text)
Overwrites the file by the given string.
|
@NonNull public final File file
public TextFile(@NonNull File file)
public boolean exists()
public void delete() throws IOException
IOException
public String read() throws IOException
IOException
@NonNull public Stream<String> lines() throws IOException
Stream
. Bytes from the file are decoded into
characters using the UTF-8
charset
. If timely
disposal of file system resources is required, the try-with-resources construct should be
used to ensure that BaseStream.close()
is invoked after the stream operations are
completed.Stream
IOException
- if an I/O error occurs opening the filepublic void write(String text) throws IOException
IOException
@NonNull public String head(int numChars) throws IOException
IOException
@NonNull public String fastTail(int numChars, Charset cs) throws IOException
This method first tries to just read the tail section of the file to get the necessary chars. To handle multi-byte variable length encoding (such as UTF-8), we read a larger than necessary chunk.
Some multi-byte encoding, such as Shift-JIS (http://en.wikipedia.org/wiki/Shift_JIS) doesn't allow the first byte and the second byte of a single char to be unambiguously identified, so it is possible that we end up decoding incorrectly if we start reading in the middle of a multi-byte character. All the CJK multi-byte encodings that I know of are self-correcting; as they are ASCII-compatible, any ASCII characters or control characters will bring the decoding back in sync, so the worst case we just have some garbage in the beginning that needs to be discarded. To accommodate this, we read additional 1024 bytes.
Other encodings, such as UTF-8, are better in that the character boundary is unambiguous, so there can be at most one garbage char. For dealing with UTF-16 and UTF-32, we read at 4 bytes boundary (all the constants and multipliers are multiples of 4.)
Note that it is possible to construct a contrived input that fools this algorithm, and in this method we are willing to live with a small possibility of that to avoid reading the whole text. In practice, such an input is very unlikely.
So all in all, this algorithm should work decently, and it works quite efficiently on a large text.
IOException
@NonNull public String fastTail(int numChars) throws IOException
IOException
public String readTrim() throws IOException
IOException
Copyright © 2004–2022. All rights reserved.