Class JxnRealArrayTextFileDataSource
- java.lang.Object
-
- JxnRealArrayTextFileDataSource
-
public class JxnRealArrayTextFileDataSource extends Object
Parses column-wise organized text files (including CSV exports from Excel) to read data intoJxnRealArrayAlgebra
instances.
The text file columns can be defined in two different ways:
- fixed width columns (with given number of characters) or
- character (delimiter) separated columns, identified by column number or column label
column numbers start with 1, column labels (if available) are retrieved from headlines of the text file.
The column values can be decimal or date time values (timestamps)
Usage in JXN:! construct a data source instance: src = @
seeJxnRealArrayTextFileDataSource
( filename ) ! or @JxnRealArrayTextFileDataSource
( filename, skipHeadLines, delim1, delim2 ) ! or @JxnRealArrayTextFileDataSource( filename ).setDelimiters
( delim1, delim2 ).setDateTimeFormat
( fmt ) ! call get methods on src to retrieve JxnRealArrayAlgebra instances: xyz = src.get
( "xyz" ) ! read decimal values from column labeled "xyz" t = src.get
(-2) ! < 0 => read date time values from 2nd column ta = src.get
( 21, 10, true ) ! read date time values from defined fixed width column yt = src.get
( 11, -21, fmt ) ! read values with given format (decimal or date time depending on fmt)get
methods for details
Procedure to read unknown files (adapt to unknown data formats)- First try to use default settings (standard parameters). Indicate columns containing date time values (timestamps) by using the appropriate
get
method (parametersiCol < 0
orisDateTime == true
orfmt instanceof java.text.
DateFormat
):src = @JxnRealArrayTextFileDataSource "filename" t = src[-1] ! ^ shortcut for src.
get
( 1, true ): read 1st column as date time value y = src[2] ! ^ shortcut for src.get(2): read 2nd column as decimal valueformat
src.readLines
(n) ! ^ inspect the first n lines to verify proper reading or to find suitable parameters - Use
setSkipHeadLines
(n)
to explicitly ignoren
headlines, if the same contain numbers which otherwise are parsed as numerical data values - Adapt delimiters
setDelimiters
( delim1, delim2 )
or (if appropriate) use fixed width columns e.g.get
( iStart, iWidthOrEnd )
- Explicitly specify the date time format using
setDateTimeFormat
(pattern)
or e.g.get
( iCol, @
SimpleDateFormat
(pattern) )
.
- See Also:
- JXN Tutorial
- First try to use default settings (standard parameters). Indicate columns containing date time values (timestamps) by using the appropriate
-
-
Constructor Summary
Constructors Constructor Description JxnRealArrayTextFileDataSource(String filename)
Constructs a text file data source object for the givenfilename
.JxnRealArrayTextFileDataSource(String filename, int skipHeadLines, String delim1, String delim2)
Constructs a text file data source object for the givenfilename
.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description int
findLabel(String colLabel)
Returns the column number forcolLabel
.JxnRealArrayAlgebra
get(int iCol)
RetrievesJxnRealArrayAlgebra
instance from the given character delimited column.JxnRealArrayAlgebra
get(int iCol, boolean isDateTime)
RetrievesJxnRealArrayAlgebra
instance from the given character delimited column.JxnRealArrayAlgebra
get(int iStart, int iWidthOrEnd)
RetrievesJxnRealArrayAlgebra
instance from fixed width columns.JxnRealArrayAlgebra
get(int iStart, int iWidthOrEnd, boolean isDateTime)
RetrievesJxnRealArrayAlgebra
instance from fixed width columns, seeget(int, int)
.JxnRealArrayAlgebra
get(int iStart, int iWidthOrEnd, Format fmt)
RetrievesJxnRealArrayAlgebra
instance from fixed width columns, seeget(int, int)
.JxnRealArrayAlgebra
get(int iCol, Format fmt)
RetrievesJxnRealArrayAlgebra
instance from the given character delimited column.JxnRealArrayAlgebra
get(String colLabel)
RetrievesJxnRealArrayAlgebra
instance from column identified bycolLabel
.JxnRealArrayAlgebra
get(String colLabel, boolean isDateTime)
RetrievesJxnRealArrayAlgebra
instance from a column identified bycolLabel
.JxnRealArrayAlgebra
get(String colLabel, Format fmt)
RetrievesJxnRealArrayAlgebra
instance from column identified bycolLabel
.String
getHeader()
String
getHeader(int i)
String[]
getHeaders()
String
getLabel(int iLine, int iCol)
Retrieves the label of theiCol
-th character delimited column.String
getLabel(int iLine, int iStart, int iWidthOrEnd)
Retrieves the label from fixed width columns.String[]
readLines(int n)
Reads the firstn
lines and returns them as a String array.JxnRealArrayTextFileDataSource
setDateTimeFormat(String defaultDateTimeFormatPattern)
JxnRealArrayTextFileDataSource
setDateTimeFormat(DateFormat defaultDateTimeFormat)
Sets the default format for parsing date time values.JxnRealArrayTextFileDataSource
setDecimalFormat(String defaultDecimalFormatPattern)
JxnRealArrayTextFileDataSource
setDecimalFormat(DecimalFormat defaultDecimalFormat)
Sets default format for parsing decimal values.JxnRealArrayTextFileDataSource
setDelimiters(String delim1, String delim2)
Changes the delimiters to the given values (""
ignores the delimiter,null
does not change the delimiter).JxnRealArrayTextFileDataSource
setLabelOffset(int offset)
Sets corrective offset for labels not in sync with data columns.JxnRealArrayTextFileDataSource
setSkipHeadLines(int skipHeadLines)
Sets the number of headlines to be skipped.String
toString()
Returns a string representation of the object: filename, number of skipped headlines, delimiters.
-
-
-
Constructor Detail
-
JxnRealArrayTextFileDataSource
public JxnRealArrayTextFileDataSource(String filename)
Constructs a text file data source object for the givenfilename
.- Parameters:
filename
- see alsoKmgFormelInterpreter.getPath(java.lang.String)
-
JxnRealArrayTextFileDataSource
public JxnRealArrayTextFileDataSource(String filename, int skipHeadLines, String delim1, String delim2)
Constructs a text file data source object for the givenfilename
.- Parameters:
skipHeadLines
- headlines are skipped and not used to retrieve neither values nor labelsdelim1
- consecutive delimiters of this type count separately, default value:";\t"
('\t
' represents the tab character)delim2
- delimiters of this type adjacent to another delimiter are ignored ⇒ consecutive delimiters count as one single delimiter, default value:" "
- See Also:
setSkipHeadLines(int)
,setDelimiters(java.lang.String, java.lang.String)
-
-
Method Detail
-
toString
public String toString()
Returns a string representation of the object: filename, number of skipped headlines, delimiters.
-
setSkipHeadLines
public JxnRealArrayTextFileDataSource setSkipHeadLines(int skipHeadLines)
Sets the number of headlines to be skipped.
Note: skipped headlines are not used to determine column labels.- Returns:
- this
-
setDelimiters
public JxnRealArrayTextFileDataSource setDelimiters(String delim1, String delim2)
Changes the delimiters to the given values (""
ignores the delimiter,null
does not change the delimiter).- Parameters:
delim1
- consecutive delimiters of this type count separatelydelim2
- delimiters of this type adjacent to another delimiter are ignored ⇒ consecutive delimiters count as one single delimiter- Returns:
- this
- See Also:
JxnRealArrayTextFileDataSource(String, int, String, String)
-
setDateTimeFormat
public JxnRealArrayTextFileDataSource setDateTimeFormat(DateFormat defaultDateTimeFormat)
Sets the default format for parsing date time values. Applies the date and time patterns defined injava.text.
SimpleDateFormat
e.g.setDateTimeFormat( @SimpleDateFormat( "MMM dd, yyyy HH:mm" ) )
. If the date time format is not explicitly set or failes to match the data,KmgDateTimeConverter
automatically tries to find a matching pattern.
Notes:
- The data file may contain explicit time zone information. The date time strings2016-12-31T23:59:59Z
and20170101005959+0100
e.g. represent the same point in time as UTC or GMT+01 respectively and result in the same internal representation in aJxnRealArrayAlgebra
instance. Data formats like this allow precise handling of daylight saving time changes.
- If the data file does not contain explicit time zone information (which implies daylight saving time (DST) definitions), the date time information in the data file by default is considered to be the → standard time of the local time zone. If that time zone observes daylight saving time, the standard time skips one hour in spring and has a duplicate hour in autumn. The resulting ambiguity can be resolved by appending the letter'a'
to a date time string of the first of the duplicated hours:2016-10-30 01:59:59 2016-10-30 02:00:00a : : 2016-10-30 02:59:59a ! last second of daylight saving (summer) time 2016-10-30 02:00:00 ! clock set back one hour at start of winter time : : 2016-10-30 02:59:59 2016-10-30 03:00:00
However real life data files often come without any distinction of the duplicate hours except their placement in the file.
- If the date time values in the data file are continuous UTC time values or values of another given time zone with or without daylight saving time, use:sdf = @
- As a timestamp may contain delimiters (e.g. blanks) in case of a date time value theSimpleDateFormat
( "yyyy-MM-dd HH:mm:ss" ) ! adjust the pattern to your given data file format sdf.setTimeZone
(TimeZone
.getTimeZone( "UTC" )
) ! "UTC" or e.g. "GMT+0100" disregarding DST change setDateTimeFormat( sdf )get
methods (except the fixed width columns methods) parse the complete line from the start of the timestamp to the end of the line without regarding further delimiters. The time format pattern (default or explicitly defined) determines the number of characters actually used.- Returns:
- this
- See Also:
KmgDateTimeConverter
,TimeZone
-
setDateTimeFormat
public JxnRealArrayTextFileDataSource setDateTimeFormat(String defaultDateTimeFormatPattern)
-
setDecimalFormat
public JxnRealArrayTextFileDataSource setDecimalFormat(DecimalFormat defaultDecimalFormat)
Sets default format for parsing decimal values. If the file contains decimal values using a decimal separator other than '.
' try
setDecimalFormat( @DecimalFormat() )
to apply the currentjava.util.
Locale
setting or usesetDecimalFormat( @DecimalFormat( Pattern ) )
applying the decimal format patterns defined injava.text.
DecimalFormat
.- Returns:
- this
-
setDecimalFormat
public JxnRealArrayTextFileDataSource setDecimalFormat(String defaultDecimalFormatPattern)
-
setLabelOffset
public JxnRealArrayTextFileDataSource setLabelOffset(int offset)
Sets corrective offset for labels not in sync with data columns.- Parameters:
offset
-= j - i
if the i-th label refers to the j-th data column.- Returns:
- this
-
readLines
public String[] readLines(int n)
Reads the firstn
lines and returns them as a String array. In JXN useformat
src.readLines(n)
to inspect the first lines to verify proper reading or find suitable parameters.- Parameters:
n
- number of lines to read and return as array
> 0 formats nonprintable characters usingKmgStaticUtilities.format(String)
< 0 returns the lines as read
-
get
public JxnRealArrayAlgebra get(int iCol)
RetrievesJxnRealArrayAlgebra
instance from the given character delimited column.- Parameters:
iCol
- if < 0 the column is interpreted as date time value.get(-3)
is a shortcut ofget
( 3, true )
.
-
get
public JxnRealArrayAlgebra get(int iCol, boolean isDateTime)
RetrievesJxnRealArrayAlgebra
instance from the given character delimited column.- Parameters:
isDateTime
- if true, the data item is interpreted as a date time value (timestamp)
-
get
public JxnRealArrayAlgebra get(int iCol, Format fmt)
RetrievesJxnRealArrayAlgebra
instance from the given character delimited column.
The text file data is parsed using the givenfmt
.- Parameters:
fmt
-DecimalFormat
orDateFormat
e.g.SimpleDateFormat
("MMM dd, yyyy HH:mm"
)
-
get
public JxnRealArrayAlgebra get(int iStart, int iWidthOrEnd)
RetrievesJxnRealArrayAlgebra
instance from fixed width columns.- Parameters:
iStart
- position of the first character of the data item to be retrieved (Note: The line starts with 1)iWidthOrEnd
- determines the number of characters (column width) of the data item
> 0
readsiWidthOrEnd
characters starting fromiStart
< 0
reads characters fromiStart
to excluding-iWidthOrEnd
= 0
reads characters fromiStart
to the end of the line
Example: A file containing<Item1><Item2><Item3>
1234567890123456789012 -1234.5-1234.5-1234.5 :item1 = src.get( 1, 7 ) item2 = src.get( 8, 7 ) item3 = src.get( 15, 7 )
oritem1 = src.get( 1, -8 ) item2 = src.get( 8, -15 ) item3 = src.get( 15, -22 )
or in both cases instead of the last lineitem3 = src.get( 15, 0 )
-
get
public JxnRealArrayAlgebra get(int iStart, int iWidthOrEnd, boolean isDateTime)
RetrievesJxnRealArrayAlgebra
instance from fixed width columns, seeget(int, int)
.- Parameters:
isDateTime
- if true, the data item is interpreted as a date time value (timestamp)
-
get
public JxnRealArrayAlgebra get(int iStart, int iWidthOrEnd, Format fmt)
RetrievesJxnRealArrayAlgebra
instance from fixed width columns, seeget(int, int)
.
The text file data is parsed using the givenfmt
.- Parameters:
fmt
-DecimalFormat
orDateFormat
e.g.SimpleDateFormat
("MMM dd, yyyy HH:mm"
)
-
get
public JxnRealArrayAlgebra get(String colLabel)
RetrievesJxnRealArrayAlgebra
instance from column identified bycolLabel
.
Note: For proper work of column labels the headline containing the labels must have the same structure (character separated or fixed width columns) as the data lines. UsesetLabelOffset(int)
to adjust possible discrepancies between headline and data columns.
-
get
public JxnRealArrayAlgebra get(String colLabel, boolean isDateTime)
RetrievesJxnRealArrayAlgebra
instance from a column identified bycolLabel
.- Parameters:
isDateTime
- if true, the data item is interpreted as a date time value (timestamp)
-
get
public JxnRealArrayAlgebra get(String colLabel, Format fmt)
RetrievesJxnRealArrayAlgebra
instance from column identified bycolLabel
.
The text file data is parsed using the givenfmt
.- Parameters:
fmt
-DecimalFormat
orDateFormat
e.g.SimpleDateFormat
("MMM dd, yyyy HH:mm"
)
-
getLabel
public String getLabel(int iLine, int iCol)
Retrieves the label of theiCol
-th character delimited column.- Parameters:
iLine
- line to check for labels
-
getLabel
public String getLabel(int iLine, int iStart, int iWidthOrEnd)
Retrieves the label from fixed width columns.- Parameters:
iLine
- line to check for labelsiStart
- seeget(int, int)
iWidthOrEnd
- seeget(int, int)
-
findLabel
public int findLabel(String colLabel)
Returns the column number forcolLabel
.
-
getHeader
public String getHeader()
-
getHeader
public String getHeader(int i)
-
getHeaders
public String[] getHeaders()
-
-