tva Common Conventions
This document defines the naming and behavior conventions for parameters shared across tva subcommands to ensure a consistent user experience.
Header Handling
Headers are the column name rows in data files. Different commands have different header processing requirements, but parameter naming should remain consistent.
Quick Selection:
- Need column names for field references? Use
--header(standard TSV) or--header-hash1(TSV with comments). - Just skip header lines? Use
--header-lines N(first N lines) or--header-hash(comment lines only).
Header Detection Modes (mutually exclusive):
-
Modes that provide column names (
header_args_with_columns()):-
--header/-H: FirstLine mode- Takes the first line as column names.
- Simplest mode for standard TSV files.
linesis empty,column_names_lineis the first line.
-
--header-hash1: HashLines1 mode- Takes consecutive
#lines plus the next line as header. - Graceful degradation: If no
#lines exist, uses the first line as column names ( behaves like--header). linescontains only#lines (empty if no#lines); column names line is stored separately.
- Takes consecutive
Commands using these modes:
append,bin,blank,fill,filter,join,longer,nl,reverse,select,stats,uniq,wider. -
-
Modes that don’t provide column names (
header_args()):-
--header-lines N: LinesN mode- Takes up to N lines as header (fewer if file is shorter).
- Does not extract column names.
linescontains up to n lines,column_names_lineis None.
-
--header-hash: HashLines mode- Takes all consecutive
#lines as header (metadata only). - No column names line is extracted.
linescontains#lines,column_names_lineis None.
- Takes all consecutive
Commands using these modes:
check,slice,sort. -
Library Implementation:
- Use
TsvReader::read_header_mode(mode)to read headers. - Returns
HeaderInfo { lines, column_names_line }where:lines: all header lines read from inputcolumn_names_line: the line containing column names (None if mode doesn’t provide column names)
- Mode behavior:
FirstLine:linesis empty,column_names_lineis the first lineLinesN(n):linescontains up to n lines read,column_names_lineis NoneHashLines:linescontains all consecutive#lines,column_names_lineis NoneHashLines1:linescontains only#lines (empty if no#lines),column_names_lineis the column names line
Special Commands:
split: Uses--header-in-out(input has header, output writes header, default) or--header-in-only(input has header, output does not write header).--headeris an alias for--header-in-out.keep-header: Uses--lines N/-nto specify number of header lines (default: 1)sample: Uses simple--header/-Hflag (treats first line as header)transpose: Does not support header modes (processes all lines as data)
Multi-file Header Behavior:
- When using multiple input files with header mode enabled, the header from the first file is read and written to output.
- Headers from subsequent files are skipped.
Input/Output Conventions
Parameter Naming
| Type | Parameter Name | Description |
|---|---|---|
| Single file input | infile | Positional argument |
| Multiple file input | infiles | Positional argument, supports multiple |
| Output file | --outfile / -o | Optional, defaults to stdout |
Special Values
stdinor-: Read from standard inputstdout: Output to standard output (used with--outfile)
Field Selection Syntax
Commands that support field selection (e.g., select, filter, sort) use a unified field syntax.
-
1-based Indexing
- Fields are numbered starting from 1 (following Unix
cut/awkconvention). - Example:
1,3,5selects the 1st, 3rd, and 5th columns.
- Fields are numbered starting from 1 (following Unix
-
Field Names
- Requires the
--headerflag (or command-specific header option). - Names are case-sensitive.
- Example:
date,user_idselects columns named “date” and “user_id”.
- Requires the
-
Ranges
- Numeric Ranges:
start-end. Example:2-4selects columns 2, 3, and 4. - Name Ranges:
start_col-end_col. Selects all columns fromstart_coltoend_colinclusive, based on their order in the header. - Reverse Ranges:
5-3is automatically treated as3-5.
- Numeric Ranges:
-
Wildcards
*matches any sequence of characters in a field name.- Example:
user_*selectsuser_id,user_name, etc. - Example:
*_timeselectsstart_time,end_time.
-
Escaping
- Special characters in field names (like space, comma, colon, dash, star)
must be escaped with
\. - Example:
Order\ IDselects the column “Order ID”. - Example:
run\:idselects “run:id”.
- Special characters in field names (like space, comma, colon, dash, star)
must be escaped with
-
Exclusion
- Negative selection is typically handled via a separate flag (e.g.,
--excludeinselect), but uses the same field syntax.
- Negative selection is typically handled via a separate flag (e.g.,
Numeric Parameter Conventions
| Parameter | Description | Example |
|---|---|---|
--lines N / -n | Specify line count | --lines 100 |
--fields N / -f | Specify fields | --fields 1,2,3 |
--delimiter | Field delimiter | --delimiter ',' |
Random and Sampling
| Parameter | Description |
|---|---|
--seed N | Specify random seed for reproducibility |
--static-seed | Use fixed default seed |
Boolean Flags
Boolean flags use --flag to enable, without a value:
--headernot--header true--append/-anot--append true
Expr Syntax
The expr command supports a rich expression language for data transformation.
- Column references:
@1,@2(1-based) or@name(when headers provided) - Whole row reference:
@0(original row data) - Variables:
@var_name(bound byas, persists across rows) - Global variables:
@__index,@__file,@__row(built-in) - Arithmetic:
+,-,*,/,%,** - Comparison:
==,!=,<,<=,>,>= - String comparison:
eq,ne,lt,le,gt,ge - Logical:
and,or,not - String concatenation:
++ - Functions:
trim(),upper(),lower(),len(),abs(),round(),min(),max(),if(),default(),substr(),replace(),split(),join(),range(),map(),filter(),reduce() - Pipe operator:
|for chaining functions (e.g.,@name | trim() | upper()) - Underscore placeholder:
_for piped values in multi-argument functions (e.g.,@name | substr(_, 0, 3)) - Lambda expressions:
x => x + 1or(x, y) => x + y - List literals:
[1, 2, 3]or[@a, @b, @c] - Variable binding:
asfor intermediate results (e.g.,@price * @qty as @total; @total * 0.9) - Method call syntax:
@name.upper(),@num.abs()
Full expr syntax documentation is available at here.
Error Handling
All commands follow the same error output format:
tva <command>: <error message>
Serious errors return non-zero exit codes.