Class JxnRealArrayTextFileDataSource


  • public class JxnRealArrayTextFileDataSource
    extends Object
    Parses column-wise organized text files (including CSV exports from Excel) to read data into JxnRealArrayAlgebra instances.

    The text file columns can be defined in two different ways:
    - fixed width columns (with given number of characters) or
    - character (delimiter) separated columns, identified by column number or column label
      column numbers start with 1, column labels (if available) are retrieved from headlines of the text file.

    The column values can be decimal or date time values (timestamps)

    Usage in JXN:
    
        ! construct a data source instance:
        src = @JxnRealArrayTextFileDataSource( filename )
        ! or  @JxnRealArrayTextFileDataSource( filename, skipHeadLines, delim1, delim2 )
        ! or  @JxnRealArrayTextFileDataSource( filename ).setDelimiters( delim1, delim2 ).setDateTimeFormat( fmt )
    
        ! call get methods on src to retrieve JxnRealArrayAlgebra instances:
        xyz = src.get( "xyz" )  ! read decimal values from column labeled "xyz"
        t = src.get(-2)         ! < 0 => read date time values from 2nd column
        ta = src.get( 21, 10, true )  ! read date time values from defined fixed width column
        yt = src.get( 11, -21, fmt )  ! read values with given format (decimal or date time depending on fmt)
     
    see get methods for details

    Procedure to read unknown files (adapt to unknown data formats)
    1. First try to use default settings (standard parameters). Indicate columns containing date time values (timestamps) by using the appropriate get method (parameters iCol < 0 or isDateTime == true or fmt instanceof java.text.DateFormat):
      
       src = @JxnRealArrayTextFileDataSource "filename"
       t = src[-1]
       !   ^ shortcut for src.get( 1, true ): read 1st column as date time value
       y = src[2]
       !   ^ shortcut for src.get(2): read 2nd column as decimal value
       format src.readLines(n)
       ! ^ inspect the first n lines to verify proper reading or to find suitable parameters
       
    2. Use setSkipHeadLines(n) to explicitly ignore n headlines, if the same contain numbers which otherwise are parsed as numerical data values
    3. Adapt delimiters setDelimiters( delim1, delim2 ) or (if appropriate) use fixed width columns e.g. get( iStart, iWidthOrEnd )
    4. Explicitly specify the date time format using setDateTimeFormat(pattern) or e.g. get( iCol, @SimpleDateFormat(pattern) ).
    See Also:
    JXN Tutorial
    • Constructor Detail

      • JxnRealArrayTextFileDataSource

        public JxnRealArrayTextFileDataSource​(String filename,
                                              int skipHeadLines,
                                              String delim1,
                                              String delim2)
        Constructs a text file data source object for the given filename.
        Parameters:
        skipHeadLines - headlines are skipped and not used to retrieve neither values nor labels
        delim1 - consecutive delimiters of this type count separately, default value: ";\t" ('\t' represents the tab character)
        delim2 - delimiters of this type adjacent to another delimiter are ignored ⇒ consecutive delimiters count as one single delimiter, default value: " "  
        See Also:
        setSkipHeadLines(int), setDelimiters(java.lang.String, java.lang.String)
    • Method Detail

      • toString

        public String toString()
        Returns a string representation of the object: filename, number of skipped headlines, delimiters.
        Overrides:
        toString in class Object
      • setSkipHeadLines

        public JxnRealArrayTextFileDataSource setSkipHeadLines​(int skipHeadLines)
        Sets the number of headlines to be skipped.
        Note: skipped headlines are not used to determine column labels.
        Returns:
        this
      • setDateTimeFormat

        public JxnRealArrayTextFileDataSource setDateTimeFormat​(DateFormat defaultDateTimeFormat)
        Sets the default format for parsing date time values. Applies the date and time patterns defined in java.text.SimpleDateFormat e.g. setDateTimeFormat( @SimpleDateFormat( "MMM dd, yyyy HH:mm" ) ). If the date time format is not explicitly set or failes to match the data, KmgDateTimeConverter automatically tries to find a matching pattern.

        Notes:
        - The data file may contain explicit time zone information. The date time strings 2016-12-31T23:59:59Z and 20170101005959+0100 e.g. represent the same point in time as UTC or GMT+01 respectively and result in the same internal representation in a JxnRealArrayAlgebra instance. Data formats like this allow precise handling of daylight saving time changes.

        - If the data file does not contain explicit time zone information (which implies daylight saving time (DST) definitions), the date time information in the data file by default is considered to be the → standard time of the local time zone. If that time zone observes daylight saving time, the standard time skips one hour in spring and has a duplicate hour in autumn. The resulting ambiguity can be resolved by appending the letter 'a' to a date time string of the first of the duplicated hours:
         2016-10-30 01:59:59
         2016-10-30 02:00:00a
              :        :
         2016-10-30 02:59:59a   ! last second of daylight saving (summer) time
         2016-10-30 02:00:00    ! clock set back one hour at start of winter time
              :        :
         2016-10-30 02:59:59
         2016-10-30 03:00:00
        However real life data files often come without any distinction of the duplicate hours except their placement in the file.

        - If the date time values in the data file are continuous UTC time values or values of another given time zone with or without daylight saving time, use:
         sdf = @SimpleDateFormat( "yyyy-MM-dd HH:mm:ss" )  ! adjust the pattern to your given data file format
         sdf.setTimeZone( TimeZone.getTimeZone( "UTC" ) )  ! "UTC" or e.g. "GMT+0100" disregarding DST change
         setDateTimeFormat( sdf )
        - As a timestamp may contain delimiters (e.g. blanks) in case of a date time value the get methods (except the fixed width columns methods) parse the complete line from the start of the timestamp to the end of the line without regarding further delimiters. The time format pattern (default or explicitly defined) determines the number of characters actually used.
        Returns:
        this
        See Also:
        KmgDateTimeConverter, TimeZone
      • setDecimalFormat

        public JxnRealArrayTextFileDataSource setDecimalFormat​(DecimalFormat defaultDecimalFormat)
        Sets default format for parsing decimal values. If the file contains decimal values using a decimal separator other than '.' try
        setDecimalFormat( @DecimalFormat() ) to apply the current java.util.Locale setting or use setDecimalFormat( @DecimalFormat( Pattern ) ) applying the decimal format patterns defined in java.text.DecimalFormat.
        Returns:
        this
      • setLabelOffset

        public JxnRealArrayTextFileDataSource setLabelOffset​(int offset)
        Sets corrective offset for labels not in sync with data columns.
        Parameters:
        offset - = j - i if the i-th label refers to the j-th data column.
        Returns:
        this
      • readLines

        public String[] readLines​(int n)
        Reads the first n lines and returns them as a String array. In JXN use  format  src.readLines(n)  to inspect the first lines to verify proper reading or find suitable parameters.
        Parameters:
        n - number of lines to read and return as array
            > 0 formats nonprintable characters using KmgStaticUtilities.format(String)
            < 0 returns the lines as read
      • get

        public JxnRealArrayAlgebra get​(int iCol)
        Retrieves JxnRealArrayAlgebra instance from the given character delimited column.
        Parameters:
        iCol - if < 0 the column is interpreted as date time value. get(-3) is a shortcut of get( 3, true ).
      • get

        public JxnRealArrayAlgebra get​(int iCol,
                                       boolean isDateTime)
        Retrieves JxnRealArrayAlgebra instance from the given character delimited column.
        Parameters:
        isDateTime - if true, the data item is interpreted as a date time value (timestamp)
      • get

        public JxnRealArrayAlgebra get​(int iStart,
                                       int iWidthOrEnd)
        Retrieves JxnRealArrayAlgebra instance from fixed width columns.
        Parameters:
        iStart - position of the first character of the data item to be retrieved (Note: The line starts with 1)
        iWidthOrEnd - determines the number of characters (column width) of the data item
            > 0 reads iWidthOrEnd characters starting from iStart
            < 0 reads characters from iStart to excluding -iWidthOrEnd
            = 0 reads characters from iStart to the end of the line

        Example: A file containing
            <Item1><Item2><Item3>
            1234567890123456789012
            -1234.5-1234.5-1234.5 
               : 
        can be read
            item1 = src.get( 1, 7 )
            item2 = src.get( 8, 7 )
            item3 = src.get( 15, 7 ) 
        or
            item1 = src.get( 1, -8 )
            item2 = src.get( 8, -15 )
            item3 = src.get( 15, -22 ) 
        or in both cases instead of the last line
            item3 = src.get( 15, 0 ) 
      • get

        public JxnRealArrayAlgebra get​(int iStart,
                                       int iWidthOrEnd,
                                       boolean isDateTime)
        Retrieves JxnRealArrayAlgebra instance from fixed width columns, see get(int, int).
        Parameters:
        isDateTime - if true, the data item is interpreted as a date time value (timestamp)
      • get

        public JxnRealArrayAlgebra get​(String colLabel)
        Retrieves JxnRealArrayAlgebra instance from column identified by colLabel.
        Note: For proper work of column labels the headline containing the labels must have the same structure (character separated or fixed width columns) as the data lines. Use setLabelOffset(int) to adjust possible discrepancies between headline and data columns.
      • get

        public JxnRealArrayAlgebra get​(String colLabel,
                                       boolean isDateTime)
        Retrieves JxnRealArrayAlgebra instance from a column identified by colLabel.
        Parameters:
        isDateTime - if true, the data item is interpreted as a date time value (timestamp)
      • getLabel

        public String getLabel​(int iLine,
                               int iCol)
        Retrieves the label of the iCol-th character delimited column.
        Parameters:
        iLine - line to check for labels
      • getLabel

        public String getLabel​(int iLine,
                               int iStart,
                               int iWidthOrEnd)
        Retrieves the label from fixed width columns.
        Parameters:
        iLine - line to check for labels
        iStart - see get(int, int)
        iWidthOrEnd - see get(int, int)
      • findLabel

        public int findLabel​(String colLabel)
        Returns the column number for colLabel.
      • getHeader

        public String getHeader()
      • getHeader

        public String getHeader​(int i)
      • getHeaders

        public String[] getHeaders()