x <- c("apple", "banana", "pear")
str_view(x, "an")
str_view(x, ".a.") # dot is used to represent any character
# To escape the regular behaviour of "." we use a escape "\\".  
# To create the regular expression, we need \\
dot <- "\\."

# But the expression itself only contains one:
writeLines(dot)
## \.
# And this tells R to look for an explicit .
x <- c("abc", "a.c", "bef")
str_view(x, "a\\.c") # This tells to look for an actual dot instead of the regular behaviour of dot
x <- "a\\b"
writeLines(x)
## a\b
str_view(x, "\\\\")

Exercise 14.3.1.1

  1. Explain why each of these strings don’t match a : “",”\“,”\".

Backslash is used as a escape in regex and it is also a string.

“" : This will escape the next R string character”\" : This will resolve to literal  in regex which will escape the next character regex. "\" : First two will resolve to literal  in regex and the last  will escape a character. In regex, this will escape an escaped character

14.3.2 Anchors

14.3.2.1 Exercises

  1. How would you match the literal string “\(^\)”?

14.3.3 Character classes and alternatives

14.3.3.1 Exercises

  1. Create regular expressions to find all words that:
  • Start with a vowel.
  • That only contain consonants. (Hint: thinking about matching “not”-vowels.)
  • End with ed, but not with eed.
  • End with ing or ise.
##   [1] "a"           "able"        "about"       "absolute"    "accept"     
##   [6] "account"     "achieve"     "across"      "act"         "active"     
##  [11] "actual"      "add"         "address"     "admit"       "advertise"  
##  [16] "affect"      "afford"      "after"       "afternoon"   "again"      
##  [21] "against"     "age"         "agent"       "ago"         "agree"      
##  [26] "air"         "all"         "allow"       "almost"      "along"      
##  [31] "already"     "alright"     "also"        "although"    "always"     
##  [36] "america"     "amount"      "and"         "another"     "answer"     
##  [41] "any"         "apart"       "apparent"    "appear"      "apply"      
##  [46] "appoint"     "approach"    "appropriate" "area"        "argue"      
##  [51] "arm"         "around"      "arrange"     "art"         "as"         
##  [56] "ask"         "associate"   "assume"      "at"          "attend"     
##  [61] "authority"   "available"   "aware"       "away"        "awful"      
##  [66] "each"        "early"       "east"        "easy"        "eat"        
##  [71] "economy"     "educate"     "effect"      "egg"         "eight"      
##  [76] "either"      "elect"       "electric"    "eleven"      "else"       
##  [81] "employ"      "encourage"   "end"         "engine"      "english"    
##  [86] "enjoy"       "enough"      "enter"       "environment" "equal"      
##  [91] "especial"    "europe"      "even"        "evening"     "ever"       
##  [96] "every"       "evidence"    "exact"       "example"     "except"     
## [101] "excuse"      "exercise"    "exist"       "expect"      "expense"    
## [106] "experience"  "explain"     "express"     "extra"       "eye"        
## [111] "idea"        "identify"    "if"          "imagine"     "important"  
## [116] "improve"     "in"          "include"     "income"      "increase"   
## [121] "indeed"      "individual"  "industry"    "inform"      "inside"     
## [126] "instead"     "insure"      "interest"    "into"        "introduce"  
## [131] "invest"      "involve"     "issue"       "it"          "item"       
## [136] "obvious"     "occasion"    "odd"         "of"          "off"        
## [141] "offer"       "office"      "often"       "okay"        "old"        
## [146] "on"          "once"        "one"         "only"        "open"       
## [151] "operate"     "opportunity" "oppose"      "or"          "order"      
## [156] "organize"    "original"    "other"       "otherwise"   "ought"      
## [161] "out"         "over"        "own"         "under"       "understand" 
## [166] "union"       "unit"        "unite"       "university"  "unless"     
## [171] "until"       "up"          "upon"        "use"         "usual"
## [1] "by"  "dry" "fly" "mrs" "try" "why"
## [1] "bed"     "hundred" "red"

14.3.4 Repetition

You can also specify the number of matches precisely:

By default these matches are “greedy”: they will match the longest string possible. You can make them “lazy”, matching the shortest string possible by putting a ? after them. This is an advanced feature of regular expressions, but it’s useful to know that it exists:

14.3.4.1 Exercises

  1. Describe the equivalents of ?, +, * in {m,n} form.

? = {0,1} + = {1,} * = {0,}

14.3.5 Grouping and backreferences

14.3.5.1 Exercises

  1. Describe, in words, what these expressions will match:

(.)\1\1 = The same character appearing three times in a row. E.g. “aaa” “(.)(.)\2\1” = A pair of characters followed by the same pair of characters in reversed order. E.g. “abba”. (..)\1 = Any two characters repeated. E.g. “a1a1”. “(.).\1.\1” = A character followed by any character, the original character, any other character, the original character again. E.g. “abaca”, “b8b.b”. "(.)(.)(.).*\3\2\1" = Three characters followed by zero or more characters of any kind followed by the same three characters but in reverse order. E.g. “abcsgasgddsadgsdgcba” or “abccba” or “abc1cba”.

14.4 Tools

14.4.1 Detect matches

## [1]  TRUE FALSE  TRUE
## [1] 0.2765306
## [1] TRUE
## [1] "box" "sex" "six" "tax"
## [1] "box" "sex" "six" "tax"
## # A tibble: 4 x 2
##   word      i
##   <chr> <int>
## 1 box     108
## 2 sex     747
## 3 six     772
## 4 tax     841
## [1] 1 3 1
## [1] 1.991837
## # A tibble: 980 x 4
##    word         i vowels consonants
##    <chr>    <int>  <int>      <int>
##  1 a            1      1          0
##  2 able         2      2          2
##  3 about        3      3          2
##  4 absolute     4      4          4
##  5 accept       5      2          4
##  6 account      6      3          4
##  7 achieve      7      4          3
##  8 across       8      2          4
##  9 act          9      1          2
## 10 active      10      3          3
## # ... with 970 more rows
## [1] 2

14.4.1.1 Exercises

  1. For each of the following challenges, try solving it by using both a single regular expression, and a combination of multiple str_detect() calls.

Find all words that start or end with x.

## [1] "box" "sex" "six" "tax"

Find all words that start with a vowel and end with a consonant.

##   [1] "a"           "able"        "about"       "absolute"    "accept"     
##   [6] "account"     "achieve"     "across"      "act"         "active"     
##  [11] "actual"      "add"         "address"     "admit"       "advertise"  
##  [16] "affect"      "afford"      "after"       "afternoon"   "again"      
##  [21] "against"     "age"         "agent"       "ago"         "agree"      
##  [26] "air"         "all"         "allow"       "almost"      "along"      
##  [31] "already"     "alright"     "also"        "although"    "always"     
##  [36] "america"     "amount"      "and"         "another"     "answer"     
##  [41] "any"         "apart"       "apparent"    "appear"      "apply"      
##  [46] "appoint"     "approach"    "appropriate" "area"        "argue"      
##  [51] "arm"         "around"      "arrange"     "art"         "as"         
##  [56] "ask"         "associate"   "assume"      "at"          "attend"     
##  [61] "authority"   "available"   "aware"       "away"        "awful"      
##  [66] "baby"        "back"        "bad"         "bag"         "ball"       
##  [71] "bank"        "bar"         "basis"       "bear"        "beat"       
##  [76] "beauty"      "bed"         "begin"       "behind"      "benefit"    
##  [81] "best"        "bet"         "between"     "big"         "bill"       
##  [86] "birth"       "bit"         "black"       "blood"       "blow"       
##  [91] "board"       "boat"        "body"        "book"        "both"       
##  [96] "bother"      "bottom"      "box"         "boy"         "break"      
## [101] "brief"       "brilliant"   "bring"       "britain"     "brother"    
## [106] "budget"      "build"       "bus"         "business"    "busy"       
## [111] "but"         "buy"         "by"          "call"        "can"        
## [116] "car"         "card"        "carry"       "cat"         "catch"      
## [121] "cent"        "certain"     "chair"       "chairman"    "chap"       
## [126] "character"   "cheap"       "check"       "child"       "Christ"     
## [131] "Christmas"   "church"      "city"        "claim"       "class"      
## [136] "clean"       "clear"       "client"      "clock"       "closes"     
## [141] "club"        "cold"        "collect"     "colour"      "comment"    
## [146] "commit"      "common"      "community"   "company"     "concern"    
## [151] "condition"   "confer"      "consider"    "consult"     "contact"    
## [156] "contract"    "control"     "cook"        "copy"        "corner"     
## [161] "correct"     "cost"        "could"       "council"     "count"      
## [166] "country"     "county"      "court"       "cover"       "cross"      
## [171] "cup"         "current"     "cut"         "dad"         "danger"     
## [176] "day"         "dead"        "deal"        "dear"        "decision"   
## [181] "deep"        "department"  "depend"      "design"      "detail"     
## [186] "develop"     "difficult"   "dinner"      "direct"      "discuss"    
## [191] "district"    "doctor"      "document"    "dog"         "door"       
## [196] "doubt"       "down"        "draw"        "dress"       "drink"      
## [201] "drop"        "dry"         "during"      "each"        "early"      
## [206] "east"        "easy"        "eat"         "economy"     "educate"    
## [211] "effect"      "egg"         "eight"       "either"      "elect"      
## [216] "electric"    "eleven"      "else"        "employ"      "encourage"  
## [221] "end"         "engine"      "english"     "enjoy"       "enough"     
## [226] "enter"       "environment" "equal"       "especial"    "europe"     
## [231] "even"        "evening"     "ever"        "every"       "evidence"   
## [236] "exact"       "example"     "except"      "excuse"      "exercise"   
## [241] "exist"       "expect"      "expense"     "experience"  "explain"    
## [246] "express"     "extra"       "eye"         "fact"        "fair"       
## [251] "fall"        "family"      "far"         "farm"        "fast"       
## [256] "father"      "favour"      "feed"        "feel"        "few"        
## [261] "field"       "fight"       "fill"        "film"        "final"      
## [266] "find"        "finish"      "first"       "fish"        "fit"        
## [271] "flat"        "floor"       "fly"         "follow"      "food"       
## [276] "foot"        "for"         "forget"      "form"        "forward"    
## [281] "four"        "friday"      "friend"      "from"        "front"      
## [286] "full"        "fun"         "function"    "fund"        "further"    
## [291] "garden"      "gas"         "general"     "germany"     "get"        
## [296] "girl"        "glass"       "god"         "good"        "govern"     
## [301] "grand"       "grant"       "great"       "green"       "ground"     
## [306] "group"       "grow"        "guess"       "guy"         "hair"       
## [311] "half"        "hall"        "hand"        "hang"        "happen"     
## [316] "happy"       "hard"        "head"        "health"      "hear"       
## [321] "heart"       "heat"        "heavy"       "hell"        "help"       
## [326] "high"        "history"     "hit"         "hold"        "holiday"    
## [331] "honest"      "hospital"    "hot"         "hour"        "how"        
## [336] "however"     "hundred"     "husband"     "idea"        "identify"   
## [341] "if"          "imagine"     "important"   "improve"     "in"         
## [346] "include"     "income"      "increase"    "indeed"      "individual" 
## [351] "industry"    "inform"      "inside"      "instead"     "insure"     
## [356] "interest"    "into"        "introduce"   "invest"      "involve"    
## [361] "issue"       "it"          "item"        "jesus"       "job"        
## [366] "join"        "jump"        "just"        "keep"        "key"        
## [371] "kid"         "kill"        "kind"        "king"        "kitchen"    
## [376] "knock"       "know"        "labour"      "lad"         "lady"       
## [381] "land"        "last"        "laugh"       "law"         "lay"        
## [386] "lead"        "learn"       "left"        "leg"         "less"       
## [391] "let"         "letter"      "level"       "light"       "likely"     
## [396] "limit"       "link"        "list"        "listen"      "load"       
## [401] "local"       "lock"        "london"      "long"        "look"       
## [406] "lord"        "lot"         "low"         "luck"        "lunch"      
## [411] "main"        "major"       "man"         "many"        "mark"       
## [416] "market"      "marry"       "match"       "matter"      "may"        
## [421] "mean"        "meaning"     "meet"        "member"      "mention"    
## [426] "might"       "milk"        "million"     "mind"        "minister"   
## [431] "minus"       "miss"        "mister"      "moment"      "monday"     
## [436] "money"       "month"       "morning"     "most"        "mother"     
## [441] "motion"      "mrs"         "much"        "music"       "must"       
## [446] "nation"      "near"        "necessary"   "need"        "never"      
## [451] "new"         "news"        "next"        "night"       "non"        
## [456] "normal"      "north"       "not"         "now"         "number"     
## [461] "obvious"     "occasion"    "odd"         "of"          "off"        
## [466] "offer"       "office"      "often"       "okay"        "old"        
## [471] "on"          "once"        "one"         "only"        "open"       
## [476] "operate"     "opportunity" "oppose"      "or"          "order"      
## [481] "organize"    "original"    "other"       "otherwise"   "ought"      
## [486] "out"         "over"        "own"         "pack"        "paint"      
## [491] "pair"        "paper"       "paragraph"   "pardon"      "parent"     
## [496] "park"        "part"        "particular"  "party"       "pass"       
## [501] "past"        "pay"         "pension"     "per"         "percent"    
## [506] "perfect"     "perhaps"     "period"      "person"      "photograph" 
## [511] "pick"        "plan"        "play"        "plus"        "point"      
## [516] "policy"      "politic"     "poor"        "position"    "post"       
## [521] "pound"       "power"       "present"     "press"       "pretty"     
## [526] "previous"    "print"       "problem"     "proceed"     "process"    
## [531] "product"     "project"     "proper"      "protect"     "public"     
## [536] "pull"        "push"        "put"         "quality"     "quarter"    
## [541] "question"    "quick"       "quid"        "quiet"       "rail"       
## [546] "rather"      "read"        "ready"       "real"        "really"     
## [551] "reason"      "recent"      "reckon"      "recommend"   "record"     
## [556] "red"         "refer"       "regard"      "region"      "relation"   
## [561] "remember"    "report"      "represent"   "research"    "respect"    
## [566] "rest"        "result"      "return"      "rid"         "right"      
## [571] "ring"        "road"        "roll"        "room"        "round"      
## [576] "run"         "saturday"    "say"         "school"      "scotland"   
## [581] "seat"        "second"      "secretary"   "section"     "seem"       
## [586] "self"        "sell"        "send"        "serious"     "set"        
## [591] "seven"       "sex"         "shall"       "sheet"       "shoot"      
## [596] "shop"        "short"       "should"      "show"        "shut"       
## [601] "sick"        "sign"        "similar"     "sing"        "sir"        
## [606] "sister"      "sit"         "six"         "sleep"       "slight"     
## [611] "slow"        "small"       "social"      "society"     "son"        
## [616] "soon"        "sorry"       "sort"        "sound"       "south"      
## [621] "speak"       "special"     "specific"    "speed"       "spell"      
## [626] "spend"       "staff"       "stairs"      "stand"       "standard"   
## [631] "start"       "station"     "stay"        "step"        "stick"      
## [636] "still"       "stop"        "story"       "straight"    "strategy"   
## [641] "street"      "strong"      "student"     "study"       "stuff"      
## [646] "stupid"      "subject"     "succeed"     "such"        "sudden"     
## [651] "suggest"     "suit"        "summer"      "sun"         "sunday"     
## [656] "supply"      "support"     "switch"      "system"      "talk"       
## [661] "tax"         "teach"       "team"        "television"  "tell"       
## [666] "ten"         "tend"        "term"        "test"        "than"       
## [671] "thank"       "then"        "they"        "thing"       "think"      
## [676] "thirteen"    "thirty"      "this"        "though"      "thousand"   
## [681] "through"     "throw"       "thursday"    "today"       "together"   
## [686] "tomorrow"    "tonight"     "top"         "total"       "touch"      
## [691] "toward"      "town"        "traffic"     "train"       "transport"  
## [696] "travel"      "treat"       "trust"       "try"         "tuesday"    
## [701] "turn"        "twenty"      "under"       "understand"  "union"      
## [706] "unit"        "unite"       "university"  "unless"      "until"      
## [711] "up"          "upon"        "use"         "usual"       "various"    
## [716] "very"        "view"        "visit"       "wait"        "walk"       
## [721] "wall"        "want"        "war"         "warm"        "wash"       
## [726] "watch"       "water"       "way"         "wear"        "wednesday"  
## [731] "week"        "weigh"       "well"        "west"        "what"       
## [736] "when"        "whether"     "which"       "why"         "will"       
## [741] "win"         "wind"        "window"      "wish"        "with"       
## [746] "within"      "without"     "woman"       "wonder"      "wood"       
## [751] "word"        "work"        "world"       "worry"       "worth"      
## [756] "would"       "wrong"       "year"        "yes"         "yesterday"  
## [761] "yet"         "young"

Are there any words that contain at least one of each different vowel?

## character(0)

14.4.2 Extract matches

## [1] "red|orange|yellow|green|blue|purple"
## [1] "blue" "blue" "red"  "red"  "red"  "blue"
## [1] "blue"   "green"  "orange"
## [[1]]
## [1] "blue" "red" 
## 
## [[2]]
## [1] "green" "red"  
## 
## [[3]]
## [1] "orange" "red"
##      [,1]     [,2] 
## [1,] "blue"   "red"
## [2,] "green"  "red"
## [3,] "orange" "red"
##      [,1] [,2] [,3]
## [1,] "a"  ""   ""  
## [2,] "a"  "b"  ""  
## [3,] "a"  "b"  "c"

14.4.2.1 Exercises

  1. In the previous example, you might have noticed that the regular expression matched “flickered”, which is not a colour. Modify the regex to fix the problem.
## [1] "\\b(red|orange|yellow|green|blue|purple)\\b"

14.4.3 Grouped matches

##  [1] "the smooth" "the sheet"  "the depth"  "a chicken"  "the parked"
##  [6] "the sun"    "the huge"   "the ball"   "the woman"  "a helps"
##       [,1]         [,2]  [,3]     
##  [1,] "the smooth" "the" "smooth" 
##  [2,] "the sheet"  "the" "sheet"  
##  [3,] "the depth"  "the" "depth"  
##  [4,] "a chicken"  "a"   "chicken"
##  [5,] "the parked" "the" "parked" 
##  [6,] "the sun"    "the" "sun"    
##  [7,] "the huge"   "the" "huge"   
##  [8,] "the ball"   "the" "ball"   
##  [9,] "the woman"  "the" "woman"  
## [10,] "a helps"    "a"   "helps"
## # A tibble: 720 x 3
##    sentence                                    article noun   
##    <chr>                                       <chr>   <chr>  
##  1 The birch canoe slid on the smooth planks.  the     smooth 
##  2 Glue the sheet to the dark blue background. the     sheet  
##  3 It's easy to tell the depth of a well.      the     depth  
##  4 These days a chicken leg is a rare dish.    a       chicken
##  5 Rice is often served in round bowls.        <NA>    <NA>   
##  6 The juice of lemons makes fine punch.       <NA>    <NA>   
##  7 The box was thrown beside the parked truck. the     parked 
##  8 The hogs were fed chopped corn and garbage. <NA>    <NA>   
##  9 Four hours of steady work faced us.         <NA>    <NA>   
## 10 Large size in stockings is hard to sell.    <NA>    <NA>   
## # ... with 710 more rows

14.4.3.1 Exercises

  1. Find all words that come after a “number” like “one”, “two”, “three” etc. Pull out both the number and the word.
##  [1] "seven books"   "two met"       "two factors"   "three lists"  
##  [5] "seven is"      "two when"      "ten inches"    "one war"      
##  [9] "one button"    "six minutes"   "ten years"     "two shares"   
## [13] "two distinct"  "five cents"    "two pins"      "five robins"  
## [17] "four kinds"    "three story"   "three inches"  "six comes"    
## [21] "three batches" "two leaves"

14.4.4 Replacing matches

## [1] "-pple"  "p-ar"   "b-nana"
## [1] "-ppl-"  "p--r"   "b-n-n-"
## [1] "one house"    "two cars"     "three people"

Instead of replacing with a fixed string you can use backreferences to insert components of the match. In the following code, I flip the order of the second and third words.

## [1] "The canoe birch slid on the smooth planks." 
## [2] "Glue sheet the to the dark blue background."
## [3] "It's to easy tell the depth of a well."     
## [4] "These a days chicken leg is a rare dish."   
## [5] "Rice often is served in round bowls."

14.4.4.1 Exercises

  1. Replace all forward slashes in a string with backslashes.
## [1] "\\\\\\\\]\\]\\]\\]\\"

14.4.5 Splitting

## [[1]]
## [1] "The"     "birch"   "canoe"   "slid"    "on"      "the"     "smooth" 
## [8] "planks."
## 
## [[2]]
## [1] "Glue"        "the"         "sheet"       "to"          "the"        
## [6] "dark"        "blue"        "background."
## 
## [[3]]
## [1] "It's"  "easy"  "to"    "tell"  "the"   "depth" "of"    "a"     "well."
## 
## [[4]]
## [1] "These"   "days"    "a"       "chicken" "leg"     "is"      "a"      
## [8] "rare"    "dish."  
## 
## [[5]]
## [1] "Rice"   "is"     "often"  "served" "in"     "round"  "bowls."
## [1] "a" "b" "c" "d"
##      [,1]    [,2]    [,3]    [,4]      [,5]  [,6]    [,7]     [,8]         
## [1,] "The"   "birch" "canoe" "slid"    "on"  "the"   "smooth" "planks."    
## [2,] "Glue"  "the"   "sheet" "to"      "the" "dark"  "blue"   "background."
## [3,] "It's"  "easy"  "to"    "tell"    "the" "depth" "of"     "a"          
## [4,] "These" "days"  "a"     "chicken" "leg" "is"    "a"      "rare"       
## [5,] "Rice"  "is"    "often" "served"  "in"  "round" "bowls." ""           
##      [,9]   
## [1,] ""     
## [2,] ""     
## [3,] "well."
## [4,] "dish."
## [5,] ""
##      [,1]      [,2]    
## [1,] "Name"    "Hadley"
## [2,] "Country" "NZ"    
## [3,] "Age"     "35"
## [1] "This"      "is"        "a"         "sentence." ""          "This"     
## [7] "is"        "another"   "sentence."
## [1] "This"     "is"       "a"        "sentence" "This"     "is"       "another" 
## [8] "sentence"

14.4.5.1 Exercises

  1. Split up a string like “apples, pears, and bananas” into individual components.
## [1] "apples"  "pears"   "and"     "bananas"