# R: The ibb package for count data

Recent technology platforms in proteomics and genomics produce count data for quantitative analysis. In proteomics, the number of MS/MS events observed for a protein in the mass spectrometer has been shown to correlate strongly with the protein’s abundance in a complex mixture. In genomics, next-generation sequencing technologies use read count as a reliable measure of the abundance of the target transcript. This R package contains two functions for statistical analysis of count data.

- The beta-binomial test (bb.test) can be used for significance analysis of independent samples (two or more groups).
- The inverted beta-binomial test (ibb.test) can be used for paired sample testing (e.g. pre-treatment and post-treatment data).

User manual | Example data and R scripts

INSTALLATION

============

For latest R >= 3.5.0

Windows (both 32 bit & 64 bit) | MacOS (64 bit) | Linux 64 bit

For R <= 3.4.4

Windows (both 32 bit & 64 bit) | MacOS (both 32 bit & 64 bit) | Linux 32 bit | Linux 64 bit

For R >= 3.0.0

Windows (both 32 bit & 64 bit) | MacOS (64 bit) | Linux 32 bit | Linux 64 bit

On Windows, go to R menu “Packages” –> “Install package(s) from local zip files”, and then select the downloaded zip file.

On Linux and MacOS, use the shell command: R CMD INSTALL <the downloaded file>

All rights reserved by the author. This software package is provided for research purposes in a non-commercial environment. Please do not redistribute.

Contact: Thang V Pham <t.pham@vumc.nl>

## The beta-binomial test

### Description

Performs the beta-binomial test for count data.

### Usage

bb.test(x, tx, group, alternative = c("two.sided", "less", "greater"), n.threads = 1)

### Arguments

x |
A vector or matrix of counts. When x a matrix, the test is performed row by row. |

tx |
A vector or matrix of the total sample counts. When tx is a matrix, the number of rows must be equal to the number of rows of x. |

group |
A vector of group indicators. |

alternative |
A character string specifying the alternative hypothesis: “two.sided” (default), “greater” or “less”. |

n.threads |
The number of threads to be used. |

### Details

When *n.threads* is 0, the maximal number of CPU cores is used. When *n.threads* is -1, one CPU core less than the maximum is used, and so on.

### Value

A list with a single component is returned

p.value |
The p-value of the test. |

### Author

Thang V. Pham <t.pham@vumc.nl>

### Reference

Pham TV, Piersma SR, Warmoes M, Jimenez CR (2010) On the beta binomial model for analysis of spectral count data in label-free tandem mass spectrometry-based proteomics. Bioinformatics, 26(3):363-369.

### Examples

# example proteomics spectral count data x <- c(1, 5, 1, 10, 9, 11, 2, 8) tx <- c(19609, 19053, 19235, 19374, 18868, 19018, 18844, 19271) group <- c(rep("cancer", 3), rep("normal", 5)) bb.test(x, tx, group) ###################### # comparing 3 groups: columns c(1, 2, 3), c(4, 5, 6), and c(7, 8) of a data file d <- read.delim("example-3groups.txt", header = TRUE) # compare 3 groups, using all available CPU cores out <- bb.test(d[, 1:8], colSums(d[, 1:8]), c(rep("a", 3), rep("b", 3), rep("c", 2)), n.threads = 0) # write result to file write.table(cbind(d, out$p.value), file = "example-3groups-out.txt", sep = "\t", row.names = FALSE)

## The inverted beta-binomial test

### Description

Performs the inverted beta-binomial test for paired count data.

### Usage

ibb.test(x, tx, group, alternative = c("two.sided", "less", "greater"), n.threads = 1)

### Arguments

x |
A vector or matrix of counts. When x is a matrix, the test is performed row by row. |

tx |
A vector or matrix of the total sample counts. When tx is a matrix, the number of rows must be equal to the number of rows of x. |

group |
A vector of group indicators. There should be two groups of equal size. The samples are matched by the order of appearance in each group. |

alternative |
A character string specifying the alternative hypothesis: “two.sided” (default), “greater” or “less”. |

n.threads |
The number of threads to be used. |

### Details

This test is designed for paired count data, for example data acquired before and after treatment.

### Value

A list of values is returned

p.value |
The p-value of the test. |

fc |
An estimation of the common fold change. |

### Author

Thang V. Pham <t.pham@vumc.nl >

### Reference

Pham TV, Jimenez CR (2012) An accurate paired sample test for count data. Bioinformatics, 28(18):i596-i602.

### Examples

# example RNA-seq read count data x <- c(33, 32, 86, 51, 52, 149) tx <- c(7742608, 15581382, 20933491, 7126839, 13842297, 14760103) group <- c(rep("cancer", 3), rep("normal", 3)) ibb.test(x, tx, group) ###################### # columns c(1, 2, 3) are respectively paired with columns c(4, 5, 6) d <- read.delim("example-paired.txt", header = TRUE) # perform a paired test for all rows, using all but one CPU cores out <- ibb.test(d[, 1:6], colSums(d[, 1:6]), c(rep("pre", 3), rep("post", 3)), n.threads = -1) # write result to file write.table(cbind(d, out$fc, out$p.value), file = "example-paired-out.txt", sep = "\t", row.names = FALSE)

2012 – Thang Pham