316x Filetype PDF File size 1.49 MB Source: evoldyn.gitlab.io
Work with strings with stringr : : CHEAT SHEET
The stringr package provides a set of internally consistent tools for working with character strings, i.e. sequences of characters surrounded by quotation marks.
Detect Matches Subset Strings Manage Lengths
TRUE str_detect(string, pattern) Detect the str_sub(string, start = 1L, end = -1L) Extract 4 str_length(string) The width of strings (i.e.
TRUE presence of a pattern match in a string. substrings from a character vector. 6 number of code points, which generally equals
FALSE str_detect(fruit, "a") str_sub(fruit, 1, 3); str_sub(fruit, -2) 2 the number of characters). str_length(fruit)
TRUE 3
1 str_which(string, pattern) Find the indexes of str_subset(string, pattern) Return only the str_pad(string, width, side = c("left", "right",
2 strings that contain a pattern match. strings that contain a pattern match. "both"), pad = " ") Pad strings to constant
4 str_which(fruit, "a") str_subset(fruit, "b") width. str_pad(fruit, 17)
0 str_count(string, pattern) Count the number str_extract(string, pattern) Return the first str_trunc(string, width, side = c("right", "left",
3 of matches in a string. NA pattern match found in each string, as a vector. "center"), ellipsis = "...") Truncate the width of
1 str_count(fruit, "a") Also str_extract_all to return every pattern strings, replacing content with ellipsis.
2 match. str_extract(fruit, "[aeiou]") str_trunc(fruit, 3)
start end str_locate(string, pattern) Locate the
2 4 positions of pattern matches in a string. Also str_match(string, pattern) Return the first str_trim(string, side = c("both", "left", "right"))
4 7
NANA str_locate_all. str_locate(fruit, "a") pattern match found in each string, as a Trim whitespace from the start and/or end of a
3 4 NANA matrix with a column for each ( ) group in string. str_trim(fruit)
pattern. Also str_match_all.
str_match(sentences, "(a|the) ([^ ]+)")
Mutate Strings Join and Split Order Strings
str_sub() <- value. Replace substrings by str_c(..., sep = "", collapse = NULL) Join 4 str_order(x, decreasing = FALSE, na_last =
multiple strings into a single string. 1 1
identifying the substrings with str_sub() and TRUE, locale = "en", numeric = FALSE, ...) Return
assigning into the results. str_c(letters, LETTERS) 3 the vector of indexes that sorts a character
str_sub(fruit, 1, 3) <- "str" 2 vector. x[str_order(x)]
str_c(..., sep = "", collapse = NULL) Collapse a
str_replace(string, pattern, replacement) vector of strings into a single string. str_sort(x, decreasing = FALSE, na_last = TRUE,
str_c(letters, collapse = "") 1
Replace the first matched pattern in each locale = "en", numeric = FALSE, ...) Sort a
string. str_replace(fruit, "a", "-") str_dup(string, times) Repeat strings times character vector.
times. str_dup(fruit, times = 2) str_sort(x)
str_replace_all(string, pattern,
replacement) Replace all matched patterns
in each string. str_replace_all(fruit, "a", "-") str_split_fixed(string, pattern, n) Split a Helpers
vector of strings into a matrix of substrings str_conv(string, encoding) Override the
A STRING 1
str_to_lower(string, locale = "en") Convert (splitting at occurrences of a pattern match). encoding of a string. str_conv(fruit,"ISO-8859-1")
a string strings to lower case. Also str_split to return a list of substrings.
str_to_lower(sentences) str_split_fixed(fruit, " ", n=2) str_view(string, pattern, match = NA) View
a string 1 {xx} {yy} glue::glue(..., .sep = "", .envir = HTML rendering of first regex match in each
str_to_upper(string, locale = "en") Convert parent.frame(), .open = "{", .close = "}") Create string. str_view(fruit, "[aeiou]")
A STRING strings to upper case. a string from strings and {expressions} to
str_to_upper(sentences) evaluate. glue::glue("Pi is {pi}") str_view_all(string, pattern, match = NA) View
a string 1 HTML rendering of all regex matches.
str_to_title(string, locale = "en") Convert glue::glue_data(.x, ..., .sep = "", .envir = str_view_all(fruit, "[aeiou]")
A String strings to title case. str_to_title(sentences) parent.frame(), .open = "{", .close = "}") Use a
data frame, list, or environment to create a str_wrap(string, width = 80, indent = 0, exdent
string from strings and {expressions} to = 0) Wrap strings into nicely formatted
evaluate. glue::glue_data(mtcars, paragraphs. str_wrap(sentences, 20)
"{rownames(mtcars)} has {hp} hp")
1
See bit.ly/ISO639-1 for a complete list of locales.
RStudio® is a trademark of RStudio, Inc. • CC BY SA RStudio • info@rstudio.com • 844-448-1212 • rstudio.com • Learn more at stringr.tidyverse.org • Diagrams from @LVaudor ! • stringr 1.2.0 • Updated: 2017-10
Need to Know Regular Expressions - Regular expressions, or regexps, are a concise language for [:space:]
describing patterns in strings.
" new line
Pattern arguments in stringr are interpreted as MATCH CHARACTERS see <- function(rx) str_view_all("abc ABC 123\t.!?\\(){}\n", rx)
regular expressions after any special characters [:blank:] .
have been parsed. string (type regexp matches example space
this) (to mean this) (which matches this)
In R, you write regular expressions as strings, a (etc.) a (etc.) see("a") abc ABC 123 .!?\(){} tab
sequences of characters surrounded by quotes \\. \. . see("\\.") abc ABC 123 .!?\(){}
("") or single quotes(''). \\! \! ! see("\\!") abc ABC 123 .!?\(){} [:graph:]
Some characters cannot be represented directly \\? \? ? see("\\?") abc ABC 123 .!?\(){}
in an R string . These must be represented as \\\\ \\ \ see("\\\\") abc ABC 123 .!?\(){} [:punct:]
special characters, sequences of characters that \\( \( ( see("\\(") abc ABC 123 .!?\(){}
have a specific meaning., e.g. \\) \) ) see("\\)") abc ABC 123 .!?\(){} . , : ; ? ! \ | / ` = * + - ^
Special Character Represents \\{ \{ { see("\\{") abc ABC 123 .!?\(){} _ ~ " ' [ ] { } ( ) < > @# $
\\ \ \\} \} } see( "\\}") abc ABC 123 .!?\(){}
\" " \\n \n new line (return) see("\\n") abc ABC 123 .!?\(){} [:alnum:]
\n new line \\t \t tab see("\\t") abc ABC 123 .!?\(){}
Run ?"'" to see a complete list \\s \s any whitespace (\S for non-whitespaces) see("\\s") abc ABC 123 .!?\(){} [:digit:]
\\d \d any digit (\D for non-digits) see("\\d") abc ABC 123 .!?\(){} 0 1 2 3 4 5 6 7 8 9
Because of this, whenever a \ appears in a regular \\w \w any word character (\W for non-word chars) see("\\w") abc ABC 123 .!?\(){}
expression, you must write it as \\ in the string \\b \b word boundaries see("\\b") abc ABC 123 .!?\(){}
that represents the regular expression. 1
[:digit:] digits see("[:digit:]") abc ABC 123 .!?\(){} [:alpha:]
Use writeLines() to see how R views your string [:alpha:] 1 letters see("[:alpha:]") abc ABC 123 .!?\(){} [:lower:] [:upper:]
after all special characters have been parsed. [:lower:] 1 lowercase letters see("[:lower:]") abc ABC 123 .!?\(){}
[:upper:] 1 uppercase letters see("[:upper:]") abc ABC 123 .!?\(){} a b c d e f A B C D E F
writeLines("\\.") 1
# \. [:alnum:] letters and numbers see("[:alnum:]") abc ABC 123 .!?\(){} g h i j k l G H I J K L
[:punct:] 1 punctuation see("[:punct:]") abc ABC 123 .!?\(){}
writeLines("\\ is a backslash") [:graph:] 1 letters, numbers, and punctuation see("[:graph:]") abc ABC 123 .!?\(){} mn o p q r MNOPQR
# \ is a backslash [:space:] 1 space characters (i.e. \s) see("[:space:]") abc ABC 123 .!?\(){} s t u v w x S T U V W X
[:blank:] 1 space and tab (but not new line) see("[:blank:]") abc ABC 123 .!?\(){} z Z
INTERPRETATION . every character except a new line see(".") abc ABC 123 .!?\(){}
1
Many base R functions require classes to be wrapped in a second set of [ ], e.g. [[:digit:]]
Patterns in stringr are interpreted as regexs To
change this default, wrap the pattern in one of:
ALTERNATES alt <- function(rx) str_view_all("abcde", rx) QUANTIFIERS quant <- function(rx) str_view_all(".a.aa.aaa", rx)
regex(pattern, ignore_case = FALSE, multiline = regexp matches example regexp matches example
FALSE, comments = FALSE, dotall = FALSE, ...) a? zero or one quant("a?") .a.aa.aaa
Modifies a regex to ignore cases, match end of ab|d or alt("ab|d") abcde
lines as well of end of strings, allow R comments [abe] one of alt("[abe]") abcde a* zero or more quant("a*") .a.aa.aaa
within regex's , and/or to have . match everything [^abe] anything but alt("[^abe]") abcde a+ one or more quant("a+") .a.aa.aaa
including \n. [a-c] range alt("[a-c]") abcde 1 2 ... n a{n} exactly n quant("a{2}") .a.aa.aaa
str_detect("I", regex("i", TRUE))
1 2 ... n a{n, } n or more quant("a{2,}") .a.aa.aaa
fixed() Matches raw bytes but will miss some ANCHORS anchor <- function(rx) str_view_all("aaa", rx) n ... m a{n, m} between n and m quant("a{2,4}") .a.aa.aaa
characters that can be represented in multiple
ways (fast). str_detect("\u0130", fixed("i")) regexp matches example
coll() Matches raw bytes and will use locale ^a start of string anchor("^a") aaa GROUPS ref <- function(rx) str_view_all("abbaab", rx)
specific collation rules to recognize characters a$ end of string anchor("a$") aaa Use parentheses to set precedent (order of evaluation) and create groups
that can be represented in multiple ways (slow). regexp matches example
str_detect("\u0130", coll("i", TRUE, locale = "tr")) (ab|d)e sets precedence alt("(ab|d)e") abcde
boundary() Matches boundaries between LOOK AROUNDS look <- function(rx) str_view_all("bacad", rx)
characters, line_breaks, sentences, or words. regexp matches example Use an escaped number to refer to and duplicate parentheses groups that occur
str_split(sentences, boundary("word")) a(?=c) followed by look("a(?=c)") bacad earlier in a pattern. Refer to each group by its order of appearance
a(?!c) not followed by look("a(?!c)") bacad string regexp matches example
(?<=b)a preceded by look("(?<=b)a") bacad (type this) (to mean this) (which matches this) (the result is the same as ref("abba"))
(?
no reviews yet
Please Login to review.