5.8 表达式注释
The sequence (?#
marks the start of a comment which continues up to the next closing parenthesis. Nested parentheses are not permitted. The characters that make up a comment play no part at all in the pattern matching.
If the extended option is set, an unescaped #
character outside a character class introduces a comment that continues up to the next newline character in the pattern.
批量转换驼峰式命名
<- list.files(".", pattern = "^[A-Z].*.Rmd$")
old_name <- gsub("rmd", "Rmd", tolower(old_name))
new_name file.rename(from = old_name, to = new_name)
<- readLines("https://movie.douban.com/top250")
html_lines <- paste0(html_lines, collapse = "")
doc
<- grep('class="title"', html_lines, value = T)
title_lines <- gsub(".*>(.*?)<.*", "\\1", title_lines, perl = T)
titles
gsub(".*>(.*?)<.*", "\\1", '<span class="title">肖生克的救赎</span>', perl = T)
解析术之 XPath
library(xml2)
= read_html(doc)
dom = xml_find_all(dom, './/span[@class="title"]')
title_nodes xml_text(title_nodes)
解析术之 CSS Selector
library(rvest)
read_html(doc) %>%
html_nodes('.title') %>% # class="title"的标签
html_text()