Title: | Le-Huynh Truc-Ly's R Code and Templates |
---|---|
Description: | Miscellaneous R functions (for graphics, data import, data transformation, and general utilities) and templates (for exploratory analysis, Bayesian modeling, and crafting scientific manuscripts). |
Authors: | Truc-Ly Le-Huynh [aut, cre, cph]
|
Maintainer: | Truc-Ly Le-Huynh <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.1.1.9000 |
Built: | 2025-03-10 05:52:45 UTC |
Source: | https://github.com/le-huynh/lehuynh |
Save a plot using ggplot2::ggsave()
.
Plot size follows instructions of Elsevier journals.
ggsave_elsevier( filename, plot, width = c("one_column", "one_half_column", "full_page"), height, ... )
ggsave_elsevier( filename, plot, width = c("one_column", "one_half_column", "full_page"), height, ... )
filename |
A character string. File name to create on disk. |
plot |
Plot to save, ggplot or other grid object. |
width |
Plot width. See Details for more information. |
height |
Plot height in "mm". |
... |
Passed to |
Instruction of Elsevier about sizing of artwork.
Image width:
single column: 90 mm (255 pt)
1.5 column: 140 mm (397 pt)
double column (full width): 190 mm (539 pt)
Image height: maximum 240 mm.
An image file containing the saved plot.
library(ggplot2) fig <- ggplot(mtcars, aes(y = mpg, x = disp)) + geom_point(aes(colour = factor(cyl))) ## For demo, a temp. file path is created with the file extension .png png_file <- tempfile(fileext = ".png") ggsave_elsevier(png_file, plot = fig, width = "full_page", height = 120)
library(ggplot2) fig <- ggplot(mtcars, aes(y = mpg, x = disp)) + geom_point(aes(colour = factor(cyl))) ## For demo, a temp. file path is created with the file extension .png png_file <- tempfile(fileext = ".png") ggsave_elsevier(png_file, plot = fig, width = "full_page", height = 120)
This function imports multiple data files of the same format from a specified directory, optionally filtered by a pattern in the filenames.
import_data(path, file_format, pattern = NULL, ...)
import_data(path, file_format, pattern = NULL, ...)
path |
A character string specifying the directory path where the data files are located. |
file_format |
A character string specifying the type of files to import, using a glob pattern (e.g., "*.csv" for CSV files). |
pattern |
An optional character string specifying a regex pattern to filter filenames. Default is NULL. |
... |
Additional arguments passed to the |
A named list where each element is the imported data from a file, with names corresponding to the filenames without the path and file extension.
## For demo, temp. file paths is created with the file extension .csv csv_file1 <- tempfile(pattern = "test", fileext = ".csv") csv_file2 <- tempfile(pattern = "file", fileext = ".csv") csv_file3 <- tempfile(pattern = "test", fileext = ".csv") ## create CSV files to import write.csv(head(cars), csv_file1) write.csv(head(mtcars), csv_file2) write.csv(head(iris), csv_file3) ## Import all CSV files in the directory data_list <- import_data(path = tempdir(), file_format = "*.csv") ## Import all CSV files with names containing "test" data_list <- import_data(path = tempdir(), file_format = "*.csv", pattern = "test")
## For demo, temp. file paths is created with the file extension .csv csv_file1 <- tempfile(pattern = "test", fileext = ".csv") csv_file2 <- tempfile(pattern = "file", fileext = ".csv") csv_file3 <- tempfile(pattern = "test", fileext = ".csv") ## create CSV files to import write.csv(head(cars), csv_file1) write.csv(head(mtcars), csv_file2) write.csv(head(iris), csv_file3) ## Import all CSV files in the directory data_list <- import_data(path = tempdir(), file_format = "*.csv") ## Import all CSV files with names containing "test" data_list <- import_data(path = tempdir(), file_format = "*.csv", pattern = "test")
This function imports an Excel file with multiple sheets and returns
a named list
of imported sheets.
import_excel(file_path)
import_excel(file_path)
file_path |
A character string specifying the path to the Excel file. |
A named list where each element is the imported sheet from the Excel file, with names corresponding to the sheet names.
## For demo, a temp. file path is created with the file extension .xlsx excel_file <- tempfile(fileext = ".xlsx") ## create Excel file with multiple sheets to import writexl::write_xlsx(list(cars = head(cars), mtcars = head(mtcars)), excel_file) import_excel(file_path = excel_file)
## For demo, a temp. file path is created with the file extension .xlsx excel_file <- tempfile(fileext = ".xlsx") ## create Excel file with multiple sheets to import writexl::write_xlsx(list(cars = head(cars), mtcars = head(mtcars)), excel_file) import_excel(file_path = excel_file)
Le-Huynh's ggplot2 theme: white background, black axis, black text
lehuynh_theme(base_size = 11, base_family = "", ...)
lehuynh_theme(base_size = 11, base_family = "", ...)
base_size |
Base font size |
base_family |
Base font family |
... |
Passed to |
An object as returned by ggplot2::theme()
ggplot2::theme()
, ggplot2::theme_bw()
library(ggplot2) fig <- ggplot(mtcars, aes(y = mpg, x = disp)) + geom_point(aes(colour = factor(cyl))) fig fig + lehuynh_theme()
library(ggplot2) fig <- ggplot(mtcars, aes(y = mpg, x = disp)) + geom_point(aes(colour = factor(cyl))) fig fig + lehuynh_theme()
Normalize / Standardize / Scale the data to the fixed range from 0 to 1. The minimum value of data gets transformed into 0. The maximum value gets transformed into 1. Other values get transformed into decimals between 0 and 1.
MinMaxScaling(x, y = x)
MinMaxScaling(x, y = x)
x |
A numeric vector to be scaled. |
y |
An optional numeric vector used to determine the scaling range.
If not provided, the scaling range is determined by the values in |
Min-max scaling is a normalization technique that transforms the values in a vector to a standardized range. The scaling is performed using the formula:
A numeric vector of the same length as x
,
with values scaled to the range from 0 to 1.
dat1 = seq(from = 5, to = 30, length.out = 6) MinMaxScaling(dat1) dat2 = c(7, 13, 22) MinMaxScaling(x = dat2, y = dat1)
dat1 = seq(from = 5, to = 30, length.out = 6) MinMaxScaling(dat1) dat2 = c(7, 13, 22) MinMaxScaling(x = dat2, y = dat1)
This function sets up a new project within an active R project for reproducible purposes.
new_project()
new_project()
The project includes:
README.md: the top level description of content in the project
Makefile
different folders to hold all data, code, results of data analysis, and documents related to the project
templates: manuscript.Rmd, code.R, etc.
A project containing folders and files for reproducible purposes.
The function should be executed within an active project.
Recommended workflow:
Create a GitHub repository for the new project. At Initialize this repository with a README, choose NO.
Create a new RStudio Project via git clone
.
Use function new_project()
to generate folders and file templates.
Reproducibile Research Tutorial Series by Pat Schloss.
if(interactive()){ new_project() }
if(interactive()){ new_project() }
Filter a dataset based on a specified column and group value, generate n-grams from a specified text column, then remove standard and user-defined stopwords from the n-grams.
ngrams_filter( data, group_column, group_name, text_column, ngrams, user_defined_stopwords = NULL )
ngrams_filter( data, group_column, group_name, text_column, ngrams, user_defined_stopwords = NULL )
data |
A data frame containing the dataset to be processed. |
group_column |
A character string specifying the name of the column used to filter the data. |
group_name |
A character string specifying the value within the group column to filter the data by. |
text_column |
A character string specifying the name of the column containing text data to be tokenized into n-grams. |
ngrams |
An integer specifying the number of words in the n-grams to be generated. |
user_defined_stopwords |
A character vector of additional stopwords to be removed from the n-grams. Default is NULL. |
A data frame with the filtered data and generated n-grams, excluding the specified stopwords.
library(janeaustenr) austen_books() %>% ngrams_filter(group_column = "book", group_name = "Pride & Prejudice", text_column = "text", ngrams = 2) austen_books() %>% ngrams_filter(group_column = "book", group_name = "Pride & Prejudice", text_column = "text", ngrams = 2, user_defined_stopwords = c("chapter", 1:50))
library(janeaustenr) austen_books() %>% ngrams_filter(group_column = "book", group_name = "Pride & Prejudice", text_column = "text", ngrams = 2) austen_books() %>% ngrams_filter(group_column = "book", group_name = "Pride & Prejudice", text_column = "text", ngrams = 2, user_defined_stopwords = c("chapter", 1:50))
This function processes the input data to create an igraph object and then generates an interactive network plot based on the specified plot type. The plot can show the entire network, the largest component with a single color, or the largest component with different colors based on community detection. Node size and edge width are scaled based on node degree and edge weight, respectively.
plot_networkD3( data, col1 = "word1", col2 = "word2", plot_type = c("whole_network", "biggest_component_one_color", "biggest_component_community_color"), threshold, node_size = 20, edges_width = 10, opacity = 1, font_size = 15, ... )
plot_networkD3( data, col1 = "word1", col2 = "word2", plot_type = c("whole_network", "biggest_component_one_color", "biggest_component_community_color"), threshold, node_size = 20, edges_width = 10, opacity = 1, font_size = 15, ... )
data |
A data frame containing the edge list for the network. |
col1 , col2
|
The name of two columns containing the symbolic edge list. Default is "word1" and "word2", respectively. |
plot_type |
A character string specifying the type of plot to generate. Options are "whole_network", "biggest_component_one_color", and "biggest_component_community_color". Default is "whole_network". |
threshold |
An integer specifying the minimum frequency of edges to be included in the network. |
node_size |
A numeric value specifying the base size of the nodes. Default is 20. |
edges_width |
A numeric value specifying the base width of the edges. Default is 10. |
opacity |
A numeric value specifying the opacity of the graph elements. Default is 1. |
font_size |
A numeric value specifying the font size of the node labels. Default is 15. |
... |
Additional arguments passed to |
An interactive network plot created using networkD3
.
library(janeaustenr) data <- austen_books() %>% ngrams_filter(group_column = "book", group_name = "Pride & Prejudice", text_column = "text", ngrams = 2) # The whole network plot plot_networkD3(data = data, threshold = 10) # The biggest component plot with one color plot_networkD3(data = data, plot_type = "biggest_component_one_color", threshold = 10) # The biggest component plot with community based color plot_networkD3(data = data, plot_type = "biggest_component_community_color", threshold = 10)
library(janeaustenr) data <- austen_books() %>% ngrams_filter(group_column = "book", group_name = "Pride & Prejudice", text_column = "text", ngrams = 2) # The whole network plot plot_networkD3(data = data, threshold = 10) # The biggest component plot with one color plot_networkD3(data = data, plot_type = "biggest_component_one_color", threshold = 10) # The biggest component plot with community based color plot_networkD3(data = data, plot_type = "biggest_component_community_color", threshold = 10)
Plot fitted versus observed values, including confidence interval (gray area) around best fit line (linear regression line) and prediction interval (dashed line).
ppc_brms( object, xtitle = "Observed value", ytitle = "Fitted value", dy = c(0.1, 0.1), dx = c(0.1, 0.1), cor = FALSE, equation = FALSE, xcor = NULL, ycor = NULL, xequ = NULL, yequ = NULL, ... )
ppc_brms( object, xtitle = "Observed value", ytitle = "Fitted value", dy = c(0.1, 0.1), dx = c(0.1, 0.1), cor = FALSE, equation = FALSE, xcor = NULL, ycor = NULL, xequ = NULL, yequ = NULL, ... )
object |
An object of class brmsfit |
xtitle |
The text for the x-axis title |
ytitle |
The text for the y-axis title |
dy |
Distance from plot to y-axis |
dx |
Distance from plot to x-axis |
cor |
If TRUE, add correlation coefficients with p-values and R |
equation |
If TRUE, add regression line equation |
xcor , ycor
|
|
xequ , yequ
|
|
... |
Passed to |
A ggplot object
## Not run: library(brms) mod <- brm(count ~ zAge + zBase * Trt + (1|patient) + (1|obs), data = epilepsy, family = poisson()) ppc_brms(mod) ppc_brms(mod, dy = c(0.02, 0.1), dx = c(0.005, 0.1)) ppc_brms(mod, cor = TRUE, equation = TRUE, yequ = 100) ## End(Not run)
## Not run: library(brms) mod <- brm(count ~ zAge + zBase * Trt + (1|patient) + (1|obs), data = epilepsy, family = poisson()) ppc_brms(mod) ppc_brms(mod, dy = c(0.02, 0.1), dx = c(0.005, 0.1)) ppc_brms(mod, cor = TRUE, equation = TRUE, yequ = 100) ## End(Not run)
This function sets up a new folder for #tidytuesday challenge within an active project. It creates a directory for the specified year and week, along with sub-directories for data, code, and plots. Template files are also added.
tidytuesday(year, week)
tidytuesday(year, week)
year |
An integer representing the year of the #tidytuesday challenge |
week |
An integer representing the week of interest, from 1 to 52 |
The folder includes:
README.md: plots for #tidytuesday challenge
different folders to hold all data, code, plots of data analysis
templates
A folder containing folders and files for #tidytuesday challenge.
Ensure that this function is called within an active project.
if(interactive()){ tidytuesday(year = 2021, week = 25) }
if(interactive()){ tidytuesday(year = 2021, week = 25) }
Calculate TSI. TSI range: 0 - 100.
tsi(x, type = c("chla", "TP", "TN", "SD"))
tsi(x, type = c("chla", "TP", "TN", "SD"))
x |
numeric object |
type |
type of variable used to calculate TSI. See Details for more information. |
Trophic state classification (Carlson, 1996)
<30-40, Oligotrophy
40-50, mesotrophy
50-70, eutrophy
70-100, hypereutrophy
Type of variable used to calculate TSI:
SD: Secchi depth, meter
chla: chlorophyll, ug/L or mg/m3
TP: total Phosphorus, ug/L or mg/m3
TN: total Nitrogen, mg/L
Carlson (1977): TSI-SD, TSI-Chla, TSI-TP
USEPA (2000): TSI-TN
a numeric value.
Carlson, R. E. (1977). A trophic state index for lakes. Limnology and Oceanography, 22(2), 361-369.
Carlson, R. E., & Simpson, J. (1996). A Coordinator's Guide to Volunteer Lake Monitoring Methods. North American Lake Management Society, 73-92.
USEPA. (2000). Nutrient Criteria Technical Guidance Manual: Lakes and Reservoirs, 42-44.
chla <- c(0.12, 0.34, 0.94, 6.4) tsi(chla, type = "chla") tsi(chla, type = "TP")
chla <- c(0.12, 0.34, 0.94, 6.4) tsi(chla, type = "chla") tsi(chla, type = "TP")