Loading Data¶
HiFive data is handled using the FiveCData
and HiCData
classes.
Loading 5C data¶
HiFive can load 5C data from one of two source file types.
BAM Files¶
When loading 5C data from BAM files, they should always come in pairs, one for each end of the paired-end reads. HiFive can load any number of pairs of BAM files, such as when multiple sequencing lanes have been run for a single replicate. These files do not need to be indexed or sorted. All sequence names that these files were mapped against should exactly match the primer names in the BED file used to construct the Fragment object.
Count Files¶
Counts files are tabular text files containing pairs of primer names and a count of the number of observed occurrences of that pairing.
5c_for_primer1 5c_rev_primer2 10
5c_for_primer1 5c_rev_primer4 3
5c_for_primer3 5c_rev_primer4 18
Loading HiC Data¶
HiFive can load HiC data from three different types of source files.
BAM Files¶
When loading HiC data from BAM files, they should always come in pairs, one for each end of the paired-end reads. HiFive can load any number of pairs of BAM files, such as when multiple sequencing lanes have been run for a single replicate. These files do not need to be indexed or sorted. For faster loading, especially with very large numbers of reads, it is helpful to parse out single-mapped reads to reduce the number of reads that HiFive needs to traverse in reading the BAM files.
RAW Files¶
RAW files are tabular text files containing pairs of read coordinates from mapped reads containing the chromosome, coordinate, and strand for each read end. HiFive can load any number of RAW files into a single HiC Data object.
chr1 30002023 + chr3 4020235 -
chr5 9326220 - chr1 3576222 +
chr8 1295363 + chr6 11040321 +
MAT Files¶
MAT files are in a tabular text format previously defined for HiCPipe. This format consists of a pair of fend indices and a count of observed occurrences of that pairing. These indices must match those associated with the Fend object used when loading the data. Thus it is wise when using this format to also create the Fend object from a HiCPipe-style fend file to ensure accurate fend-count association.
fend1 fend2 count
1 4 10
1 10 5
1 13 1
Note
In order to maintain compatibility with HiCPipe, both tabular fend files and MAT files are 1-indexed, rather than the standard 0-indexed used everywhere else with HiFive.