longer
Reshapes a table from wide to long format by gathering multiple columns into key-value pairs. This command is useful for “tidying” data where some column names are actually values of a variable.
Behavior:
- Converts wide-format data to long format by melting specified columns.
- ID columns (those not specified in
--cols) are preserved and repeated for each melted row. - The first line is always treated as a header.
- When multiple files are provided, the first file’s header determines the schema.
- Subsequent files must have the same column structure; their headers are skipped.
- Output is produced in row-major order (all melted rows for each input row are output together).
Input:
- Reads from one or more TSV files or standard input.
- Files ending in
.gzare transparently decompressed. - The first line is ALWAYS treated as a header.
- When multiple files are provided, the first file’s header determines the schema (columns to reshape). Subsequent files must have the same column structure; their headers are skipped.
Output:
- By default, output is written to standard output.
- Use
--outfile/-oto write to a file instead. - Output columns: ID columns + name column(s) + value column.
Column selection:
--cols/-c: Specifies which columns to reshape (melt).- Columns can be specified by 1-based indices, ranges (e.g.,
3-5), or names (with wildcards likeQ*). - All columns not specified in
--colsbecome ID columns and are preserved.
Names transformation:
--names-to: The name(s) of the new column(s) that will contain the original column headers. Multiple names can be specified when using--names-sepor--names-pattern.--values-to: The name of the new column that will contain the data values (default: “value”).--names-prefix: A string to remove from the start of each variable name.--names-sep: A separator to split column names into multiple columns.--names-pattern: A regex with capture groups to extract parts of column names into separate columns.
Field syntax:
- Field lists support 1-based indices, ranges (
1-3,5-7), header names, name ranges (run-user_time), and wildcards (*_time). - Run
tva --help-fieldsfor a full description shared across tva commands.
Missing values:
--values-drop-na: If set, rows where the value is empty will be omitted from the output.- Note: Whitespace-only values are not considered empty and will not be dropped.
Examples:
-
Reshape columns 3, 4, and 5 into default “name” and “value” columns
tva longer data.tsv --cols 3-5 -
Reshape columns starting with “wk”, specifying new column names
tva longer data.tsv --cols "wk*" --names-to week --values-to rank -
Reshape all columns except the first two
tva longer data.tsv --cols 3- -
Process multiple files and save to output
tva longer data1.tsv data2.tsv --cols 2-5 --outfile result.tsv -
Split column names into multiple columns using separator
tva longer data.tsv --cols 2-5 --names-sep "_" --names-to type num -
Extract parts of column names using regex pattern
tva longer data.tsv --cols 2-3 --names-pattern "new_?(.*)_(.*)" --names-to diag gender -
Remove prefix from column names before using as values
tva longer data.tsv --cols 2-4 --names-prefix "Q" --names-to question -
Drop rows with empty values
tva longer data.tsv --cols 2-5 --values-drop-na