LFIT - interactive linear least-squares fitting

LFIT
interactive linear least-squares fitter

Version 1.8 (1996 July 17)

LFIT is a general interactive tool for performing simple linear least-squares fits to X and Y data, with or without uncertainties. Below is a brief description of how to run LFIT. It is assumed that the user is at least conversant in the general theory and practice of least-squares fitting (e.g., as described by Bevington or in Numerical Recipes).

Index:

Input Data Format
Running LFIT
Entering Data
Fitting Options
Line Fits
Polynomial Fits
Fit Results
Refining your Fit
Clean Up
LFIT Heritage

Input Data Format

LFIT wants data in the form of ASCII tables organized into columns with no blank lines. A typical file of (x,y,sigma-y) data pairs might look like

   2.00  3.45  0.01  14.5
   3.00  4.53  0.02  16.7
   4.00  5.39  0.01 -18.6
   ...

where the columns are separated by one or more spaces, and the data values are valid floating-point numbers. TABS or commas (,) are not recognized by LFIT as "separator" characters.

Data files must contain at least 2 columns (for a minimal unweighted LSF) up to a maximum of 20 columns. Data files may contain no more than 2048 lines of data, and (x,y,sigma-y) data sets must be contiguous (i.e., you cannot combine columns of numbers).

Data files may contain headers at the very top of the file. As part of the data entry dialog described below, you can specify how many lines at the top of a file to skip before reading in the data. However, there may be no header lines anywhere else in the file.

[Index]

Running LFIT

If LFIT is installed on your machine, it should be in the directory

   /usr/local/pkg/bin/lfit

If so, then to run LFIT, go to the directory with your data files and type:

   lfit

You will see the welcome message with the latest release number and date:

    LFIT - Interactive Linear Least Squares
    (Version 1.8 - 1996 July 17)

and receive the first prompt:

    LickMongo Terminal Type <1:14> [11] ?

The default LickMongo terminal type is an X11 workstation window. You can accept the default by just hitting the Enter key (any item in []'s is the default response). If you are on another device (like a PC running em4010), you will need to type in here another device code. See the LickMongo Manual for the valid device codes.

If you are running X-windows, hitting Enter will pop up an X11 graphics window. You may resize this window using the mouse at this time.

[Index]

Entering Data

The next prompt asks for the name of the ASCII file with the data to be examined:

    Input File Name ?

Input data files are organized like SM or LickMongo files: ASCII format with the data organized into columns of numbers separated by spaces or TABs. If there are multiple TABs, LFIT's parser can sometimes get confused, so it is recommended that you strip out TAB characters before running (or to not use TABs to start with).

Files must have a MINIMUM of 2 data columns, one each for X and Y. Often a third column will have the uncertainty (SigmaY) of the Y data. You cannot have blank entries in a column, so if your data arrays have blanks, either edit them out before running LFIT, or make a third SigmaY column in which the valid data have SigmaY=1.0, and the "blank" lines have SigmaY=0.0 You may have up to 10 columns in a file, with the data (X,Y,SigmaY) in any order.

There is no default filename, so you have to type in the full filename at the prompt:

    Input File Name ? fene.dat

and hit the Enter key. If the file you want is not in the current working directory, you also have to include the full directory path.

Note that if you want out now, typing "abort" followed by Enter at this prompt will quit LFIT and return you to the Unix prompt.

The next question it asks if there are any non-data or "header" lines at the top of the file that need to be skipped before getting to the data lines:

    Does the File contain header lines to skip <y|n> [n] ?

LFIT can only handle files with one set of header lines at the very top. Files which contain multiple data sets separated vertically by headers or line breaks must be broken into separate files before running LFIT.

If the file can be read, the header lines (if any) will be skipped and the first line of the file printed on the screen. For example:

    First Line of the File:
       1      9.40   1.0   1      5881.8999   -0.0041

You will then be asked for the columns containing the X, Y data:

    X Data Column Number <1:6> [1] ? 2
    Y Data Column Number <1:6> [2] ? 5

and then you will be asked for the column with the Y error (SigmaY):

    Y Error Column (0=unweighted) <0:6> [6] ?

If you are doing an unweighted fit, entering 0 will set all data weights to 1.0, otherwise it will use the contents of the data column indicated as SigmaY, and weight the data points by the inverse-square of the errors (i.e., points with smaller sigmas have greater statistical weight).

[Index]

Fitting Options

Once the data are read into internal working arrays and you will be presented with the fitting options menu:

     ** FITTING OPTIONS: 
      1  --  Line Fit
      2  --  Polynomial Fit
      3  --  Legendre Polynomials
      0  --  QUIT
     -------------------------------

There are presently two possiblities:

Fitting a line through the data (Option 1)
Fitting a polynomial (up to order 9) through the data (Option 2)

At present, the Legendre Polynomial fitter is broken, but I keep the option present as a constant reminder that I need to get in and fix it someday...

[Index]

Line Fits

To fit a line through the data, enter 1 at the prompt above. You will be presented with 4 line fitting options:

     ** Line Fitting Options:
      1  --  Linear (Y vs X)
      2  --  Linear/Log (Log Y vs X)
      3  --  Power Law (Log Y vs Log X)
      4  --  Exponential (Ln Y vs X)
      0  --  QUIT
     ----------------------
    Linear Fit Option <0:4> [1] ?

The four options are as follows:

Linear Fit (Option 1)

This will fit the data by a straight line of the form:

   Y(X) = A + B*X

where B is the slope and A is the Y-intercept.

Linear/Log (Semi-Log) Fit (Option 2)

This will instruct LFIT to compute the Base-10 logarithm of the Y data values (appropriately treating any Sigma-Y's if a weighted fit), and then do a semi-log fit of the form:

   Log Y(X) = A + B*X

This is the linearized form of the semi-log relation:

   Y(X) = a * 10^B*X

where

   a = 10^A

is the coefficient of the exponential.

Power Law (Log-Log) Fit (Option 3)

This will instruct LFIT to compute the Base-10 logarithms of both the X and Y data values, and then perform a power-law fit of the form:

   Log Y(X) = A + B*Log X

where B is now the power-law slope, and A is just the offset. This is the linearized form of the power-law relation:

   Y(X) = a*X^B

where

   a = 10^A

is the coefficient of the power-law.

Exponential Fit (Option 4)

This will instruct LFIT to fit an exponential to the data by computing the natural (base-e) logarithm of the Y data values (again appropriately treating the Sigma-Y values if a weighted fit), and fitting a line of the form: Ln Y(X) = A + B*X This is the linearized form of the exponential

   Y(X) = a * exp(B*X)

where

   a = exp(A)

is the exponential coefficient. This fitting form would be appropriate if analyzing cooling data for a CCD Dewar or radioactive decay data. In this case, the slope, B, is the inverse of the e-folding parameter.

[Index]

Polynomial Fits

To fit a polynomial to the data of the form:

   Y(X) = A(0) + A(1)*X + A(2)*X² + A(3)*X³ + ...

select Option 2 at the FITTING OPTION prompt. You will then be asked for the order of the fit:

    Polynomial Order <1:9> [1] ?

Order=1 is the same as doing a line fit. Order=2 is a quadratic, Order=3 is a cubic, and so forth. The maximum fit order is 9, however you can change the fit order interactively to refine your fit later after looking at the residuals.

[Index]

Fit Results

After selecting the fitting function, LFIT will compute the best fit using linear least squares. Note that even if fitting a polynomial function, the fit is still "linear" in the sense that the function is linear in the fit coefficients (in the more formal language of numerical analysis, the fit is computed by solving a system of "linear equations of condition" in which the unknowns are the fit coefficients, not the data themselves which are known). When the best fit has been computed, the results are first printed on the terminal screen.

For example, here I have fit a 3rd order polynomial to a set of wavelength calibration lamp data, where Y is the laboratory wavelength, and X is the central pixel of the lines. In this case I have done an unweighted fit, with the goal of finding the dispersion formula for this spectrograph, giving the wavelength associated with each pixel, X.

    ** Polynomial Fit:

    Y(x) =    5863.30     +/-   0.417506    
          +(   1.98894     +/-   6.031907E-03)*X
          +(  5.663378E-05 +/-   2.405902E-05)*X^2
          -(  4.335774E-08 +/-   2.647130E-08)*X^3

        Reduced Chi-Squared:   0.229757    
     Unbiased Mean Variance:   0.229757    
    Unbiased Mean Deviation:   0.479330

The fit coefficients are given with their formal uncertaintes, written in an equation form. In this example, the starting wavelength found is 5863.3+/-0.42 Angstroms, the linear dispersion is 1.989+/-0.006 Ang/pixel, plus 2nd and 3rd order dispersion terms.

The three last parameters give some idea of the quality of the fit:

Reduced Chi-Squared: This is the standard reduced Chi-Squared, divided by the number of degrees of freedom (N-M), where N is the number of data points and M is the number of fit coefficients (here 4). In this example, since we are doing an unweighted fit, Chi-Squared has no real significance, and is just formally identical to the unbiased mean variance of the data from the fit. Chi-Squared is most meaningful for a weighted fit, since the uncertainty in Y appears in the denominator.
Unbiased Mean Variance: This is the mean squared deviation of the Y data from the best-fit curve. It is "unbiased" in the sense that the deviations are weighted by the uncertainties on the data. It is formally equivalent to the computation of the variance of a population of data points with uncertainties, except that instead of computing the variance with respect to the mean of the data, it is done with respect to the best-fit curve.
Unbiased Mean Deviation: This is the square-root of the variance, or the "sigma", of the Y data points relative to the best-fit curve, and has the same units as the Y data points. Thus in this case, where we are fitting the wavelength of calibration lines (Y) as a function of pixel number (X), with wavelengths given in units of Angstroms, the unbiased mean deviation says that the "sigma" of the fit is 0.479 Angstroms; in this case not a very good fit given that the spectrum has a linear dispersion coefficient of (1.989 +/- 0.006)A/pixel (see the linear coefficient above).

[Index]

Refining your Fit

After printing the results, it also displays the best-fit graphically, plotting the plot the data, any error bars (if doing a weighted fit), and the best fit line. After the fit is displayed, the graphics cursor is activated and the program enters the Interactive Fitting Mode. When the mouse (or graphics pointer if on a graphics terminal of some kind) is on the plot, you have the following cursor key commands. Note that on a workstation, if your mouse pointer is in the text window, these commands will not work.

   LFIT Cursor Commands:
     Key        Function
   ---------------------------------------------
      P    Refresh the current plot
      R    Toggle between Data and Residuals
      X    Change plotting limits
      H    Hardcopy (PostScript) of current plot
      O    Change polynomial order and re-fit
      S    Sigma-reject data points and re-fit
      E    Edit data points with cursor & re-fit
      F    Compute the best fit
      L    Change line fitting function
      G    Get other data from current data file
      N    Open a new data file
      A    Change axes of linear fitting
      >    Dump results to an ascii file
      Q    Quit interactive fitting
      !    OOPS! Restore Original Data
      ?    Print this help list
    <ESC>  ABORT, exiting the program
   ---------------------------------------------

Particularly useful commands for examining and refining your fit are:

R - Toggle between Data and Fit Residual Plots: As a general rule, you should always examine the fit residuals (the plot of DATA-FIT), to see if in fact you are fitting the data well compared to the uncertainties, or if you need to introduce higher order terms or eliminate outliers. Sometimes the most interesting things come out of residual plots...
O - Change polynomial order of the fit: If you need to increase the polynomial fit order, or want to see the results of reducing the order (a good thing to try if the formal uncertainty on the highest-order coefficient is larger than or of order the value of that coefficient), use the O command.
E - Edit Data Points: This command lets you interactive remove outliers from a plot, or get erroneously entered data out of your way and refit. It has a separate menu of cursor commands for data editing. You can undo any edits with the ! key.
L - Change Line Fitting Function: Lets you choose among the four Line Fitting functions (Linear, Semi-Log, Log-Log, and Exponential) should you decide you want to try something else.
S - Automatically Sigma-Reject Points and re-fit: This lets you reject points from the fit using a sigma threshold. Often useful for weeding out potentially bad data points from a fit using a bit more robust criterion than "looks bad" (e.g., the "E" command above). Use with caution... You can undo rejections with the ! key.

If you are satisfied with the fit, typing "Q" will exit interactive mode and return you to the LFIT menus. Note that there are a number of commands to allow you to toggle between viewing the best fit and the residuals, editing (removing or restoring) outlying data from the fit, auto-reject points based on their deviation from the fit it units of the computed unbiased mean deviation (see above), changing the plotting scale, changing the fitting function (e.g., changing the polynomial order), making a hardcopy, dumping to an ASCII results file, and so forth. Experiment around with these to find the best way to use them with your data (and individual tastes in fitting).

[Index]

Clean Up

Once you quit interactive mode (Q), you are given the following options menu:

     ** Clean-Up Options:
       1  --  Change Fit Params
       2  --  New Data/Same File
       3  --  Open a New File
       0  --  QUIT
     ----------------------------
    Option <0:3> [0] ?

The options are as follows:

Option 0 quits the program and returns you to the Unix system prompt.

Option 1 allows you to change the fitting function parameters, similar to what can be done in interactive mode. In this case the data will be kept the same and you will be shown the "FITTING OPTIONS" menu again.

Option 2 lets you read in different data from the currently open data file. In this case you will be shown the first line of the file again and asked for the X, Y, (and SigmaY) data columns.

Option 3 will close the current data file and allow you to open another data file for analysis. Selecting this option will return you to the "Input File Name" prompt you got at the beginning.

[Index]

LFIT Heritage

Caltech undergraduate alumni who took the infamous Ph3 any time after about 1984 will notice a number of affinities between "ffit" and LFIT, but in fact LFIT predates "ffit" by about 4 years. The first version of LFIT was LFIT.FOR written on a Dec10 at Caltech in October 1979, when I was taking Ph3. This was before any IBM PC computers showed up in the undergrad labs (after 1983). The original fitting algorithms were based on the routines in Bevington (1st edition), as described in Don Skelton's Primer on Data Analysis. LFIT.FOR was ported to a VAX11/780 computer in the Caltech Astronomy department in 1982, where it first broke free of the Tek4010 and used an early version of Tim Pearson's PGPLOT package on a combination of VT100/Retrographics and Grinnell image displays. After moving to UCSC/Lick Observatory in 1983 for graduate school, LFIT was further modified to use Mongo (later LickMongo) for the graphics, especially on VT100/Retro and GraphOn Tek401x emulators. It was ported to SunOS and X-Windows at Ohio State in 1991. Many of the routines from Bevington were replaced by modifications of code from the first edition of Numerical Recipes by then. Finally, it was ported to RedHat Linux in 2000 (which required little modification at that point), where it is now maintained as part of the OSUSpec package of routines.

Astute users will notice that LFIT does not have a terrifically sophisticated user interface. This reflects the origins of the program in the pre-windows days where we were working on Tektronix 4014 graphics terminals (the real green-tube versions, not the later emulators). Maybe someday it will become more GUI-like, but that won't be anytime soon...

Return to: [Rick Pogge's Software Page]

Updated: 2001 September 22 [rwp]