The programs in the RAMBIN suite are described one by one. The programs are grouped into two categories.
average
Usage:
average [-d]
[
input-file]
[
output-file]
Calculates the average of a set of numbers.
average
calculates the average of a set of numbers; the
-d
option gives detailed information including the standard
deviation and the accuracy.
See also
count
.
bootstrap
, bootstrap_lines
Usage:
bootstrap
[
input-file]
[
output-file]
[
output-data-size]
[
seed]
Bootstraps a set of values or lines.
bootstrap
uses a statistical technique known as
"bootstrapping" (duh!), a Stanford-based invention, to randomly
generate arbirary sets of data from existing sets. bootstrap
operates on a file contain a set of floating-point values and
generates an equal number of values (default) by randomly selecting
from the initial set of values. bootstrap_lines
operates on
any ASCII file and randomly selects an equal number of lines (default)
to output from the initial set of lines.
The number of values/lines output can be changed by specifying a
value for the optional argument output-data-size. This can
be used to generate smaller or larger subsets from the initial set.
The default seed used for the random number generator is the value
returned by time()
. The seed can also be specified with the
optional seed argument for reproducable results.
The reason to bootstrap is to assess the statistical validity to the data. When one has a powerful hammer, everything looks like a nail. Bootstrapping is a powerful technique, and should be used wisely.
ccoef
, lsqr
Usage:
ccoef|lsqr
[
input-file]
[
output-file]
Calculates the correlation coefficient of pairs of numbers.
ccoef
calculates the correlation coefficient of pairs of
numbers specified in two column format. When invoked as lsqr
detailed information, including the least squares fitting slope and
its standard deviation, are output.
clog
, log10
, loge
Usage:
clog
[
value |
input-file]
[
output-file]
Calculates the logarithm.
clog
calculates the log of a value or a set of
values. When invoked as loge
, log base e is calculated. When
invoked as log10
, log base 10 is calculated.
compare_numbers, gt, gt1, gte, gte1, lt, lt1, lte, lte1, eq, eq1
Usage:
compare_numbers
[
value |
input-file]
[
compare-value]
[
output-file]
[
comparison-operator]
Compare pairs of numbers and return a truth value.
compare_numbers
compares pairs of numbers and returns a
truth value. The usage of this program is not standard, but it
is done in the interest of being able to freely pipe values to the
program. Normally, compare_numbers
should be invoked as one
of the comparison operators:
gt - true if greater than
gt1 - true if greater than one
gte - true if greater than equal to
gte1 - true if greater than equal to one
lt - true if lesser than
lt1 - true if lesser than one
lte - true if lesser than equal to
lte1 - true if lesser than equal to one
eq - true if equal
eq - true if equal to one
As the number of arguments for the operator depend on the specific
operation (for example, gt1
requires only one argument
whereas gt
requires two arguments), the program needs only one
argument for it to produce a result (truth value). However, in the
cases where the operator requires two arguments, and only one is
specified, the second argument is assumed to be one. Also, when
comparison-operator is explicitly specified, it takes
precendence over the argv[0]
variable which is used to
determine the comparison operator (when one is not specified).
Examples:
gt 1 2 foo gte
will invoke gt
to compare the numbers 1 and 2, and output the
result to the file foo
, but since gte
is given as
the comparison operator, it will override the argv[0]
specification and will see if 1 is greater than or equal to 2.
lt1
will take input numbers from stdin and see whether they are lesser than one and output the number of times the truth value of that comparison was 1 (true).
eq1 foo 2 bar
will take the input number from the file foo
compare them to
one, and output the number of times the truth value of that comparison
was 1 (true) to the file bar
. In this particular case, the
third argument, 2, is ignored but does need to be specified for the
output to occur in the file bar
.
compound
Usage:
compound
[-a]
value
percent
[
iterations]
[
output-file]
Compounds a value given a certain percent.
compound
demonstrates the "magic" of compounding by
calculating the result after compounding value to
percent. The -a
option will add
value to the compounded value for each iteration. The
default number of iterations is given by
DEFAULT_NUMBER_OF_ITERATIONS
in the source, which can be
modified on the command line.
Features: if an output file is specified, then for the program to produce the correct result, iterations must also be explicitly specified.
count
Usage:
count
[
input-file]
[
output-file]
Calculates the sum of a set of numbers.
count
calculates the sum of a set of numbers (in the
first column, if many columns are available).
Features: The program tries to be clever by ignoring values that it thinks are not numbers. This may work for the most part.
See also
average
.
downcase_filename, upcase_filename
Usage:
downcase_filename
filename
Changes the case of a filename.
downcase_filename
changes the case of a filename to lower
case (and its counterpart, upcase_filename
does the
opposite).
find_cliques
Usage:
find_cliques
[
input-file]
[
output-file]
Finds all the cliques in a graph.
find_cliques
reads the size of the graph, the graph
itself specified by a matrix of 1 and 0 (each line corresponds to a
vertex number) and outputs all the maximal completely connected
sub-graphs in the graph. An example graph of three vertices would look
like:
3
111
110
001
The program uses the Bron and Kerbosch algorithm to do clique finding.
The reference for this program is: Bron C, Kerbosch, R. Algorithm 457: Finding all cliques of an undirected graph. Communications of the ACM, 16: 575-577, 1973.
Features: the graph must be undirected. That is, the matrix must be symmetric and the diagonals must be 1.
find_duplicate_words
Usage:
find_duplicate_words
[
input-file]
[
output-file]
Find occurances of duplicate words in a document.
find_duplicate_words
is a simple program to find the
duplicate words in a document. It basically reads in every word in a
document (separated by WHITESPACE
, as defined in the source
file) and stores the last word and checks to the see if the current
word is the same as the last one.
histogram
Usage:
histogram
input-file
start-value
increment-value
stop-value
[
output-file]
Makes a histogram from a set of numbers.
histogram
uses the input data to create a histogram from
start-value to stop-value. The size of each bin is
determined by increment-value (and the number of bins will
be determined by the difference between start-value and
stop-value divided by increment-value.
ic
Usage:
ic
[
input-file]
[
output-file]
Gives the information content for a set of probabilties.
ic
outputs the information content for a set of
probabilities using the formula P * log(P). The units are decimal
digits ("dits") since the log(P) is calculated using log base 10.
max
, min
Usage:
max|min
input-file
[
output-file]
Find the maximum or minimum value of a set of numbers.
max
finds the maximum value of a set of
numbers. min
finds the minimum value.
Features: Can't handle numbers lesser/greater than
MIN_VALUE
/MAX_VALUE
in
ramp/src/tools/maxmin.c
.
mypaste
Usage:
mypaste
input-file1
input-file2
[
paste-string]
[
output-file]
Concatenates the lines in two files sequentially.
mypaste
concatenates the lines in two files sequentially. If
paste-string is specified, then it is used as a conjunction
between the lines.
Features: The number of lines output will always be the same as the number in input-file1. If input-file1 has a greater number of lines compared to input-file2, lines in input-file1 which don't have a corresponding line in input-file2 will be output as is. If input-file1 has a lesser number of lines compared to input-file2, then the number of lines output will be the same as the number in input-file1.
This routine is better than the paste
commonly found in Unix
systems in that it allows you to specify an arbitrary paste string.
mysplit
Usage:
mysplit
input-file
number-of-lines
[
output-file-prefix]
Splits a file into chunks with a specified line length.
mysplit
splits an input file into N files, where
N is the number of lines in the input file divided by the
value specified for number-of-lines. If
output-file-prefix is not specified, then the value for
input-file is used in its place.
This routine is better than the split
commonly found in
Unix system in that it outputs files using a numeric index suffix
(0..
N).
normalise
Usage:
normalise
[-lt]
[
input-file]
[
output-file]
Normalises a set of numbers.
normalises
divides a set of numbers by the large value
-l
or the total (-t
) of the numbers.
Features: Can't handle sets larger than MAX_VALUES
in
ramp/src/tools/normalise.c
.
random
Usage:
random
[
seed]
[
output-file]
Generates a random number.
random
generates a random number using the
random()
function. The default seed is the value returned by
time()
. The seed can also be specified with the optional
seed argument.
rotate_text
Usage:
rotate_text
[
input-file]
[
output-file]
Rotates a block of ASCII text.
rotate_text
rotates a block of ASCII text by 90
degrees. While the input does not have to be in the form an MxN matrix
and can contain free flow of text, the output is printed as an NxM
matrix (i.e., with spaces).
sizeof
Usage:
sizeof
[
output-file]
Outputs sizes for various types in C.
sizeof
outputs the sizes (in bytes) for various types in
C (as reported by the sizeof()
function. This is useful for
cross-platform development.
text2html
Usage:
text2html
[
input-file]
[
output-file]
Convert a text file into an HTML file.
text2html
takes a simple text file (formatted in
different paragraphs) and converts it into an HTML file (essentially
adding <p> and </p> tags whenever two consecutive new
lines are encountered). The program also prompts for a title and
header string, and appends the file .signature
in the current
directory (if it exists) to the output.
In this preliminary version of the program, there is also an
attempt to use characters normally used in ASCII text for various
style changes. For example, when the program encouters a word of the
form /foo/
it will give you a set of italics options to use
to convert the word into an appropriate style.
The nice thing about this program is that it provides a way to
learn how lex
and yacc
work in terms of writing and
parsing formal language grammers (i.e., it touches upon topics in
compiler theory, deterministic pushdown automata, etc.).
Features: program is still a bit buggy (it does what I want it to
do, so there's little incentive to fix it) and needs lex
and
yacc
installed.