Shanshan Chen

Data Import and Cleaning

Before importing a data file compatible with ASCII encoding (e.g. .csv, .xls), I recommend opening the file and check the variable names. Make sure the variable names accurately and succinctly describe the variables. If not, renaming the variables in the script. Avoid following characters are not used in variable names: space, apostrophy, &, *, etc. Prefer underscore “xx_xx” to join words in a variable name.

Also read my previous post, R Style, for more detail on naming conventions.

Data import

Variable Renaming, Selection and Deletion

library(tidyverse)
DF = read.csv("data.csv")
## Rename any variables
DF = DF %>% rename(Name_new = Name_old, Name_new= Name_old2)

## Rename variable names with certain pattern (e.g. removing suffix, prefix, symbols etc)
library(stringr)
DF = DF %>% rename_all(~str_replace(.,"^prefix",""))
DF = DF %>% rename_with(~str_remove(., '.suffix'))

## delecting variables with certain patterns in the names
DF = DF %>% select(-contains(c("prefix_","_suffix","_suffix_2","_suffix_3")))

## Selecting (keeping only) variables with certain patterns in the names
DF = DF %>% select(contains(c("prefix_","_suffix","_suffix_2","_suffix_3")))