sorry, weird question, can't seem figure out myself. news is: think it's totally reproducible.
i'm trying build simple r function use {rvest} scrape wikipedia hometown of musicians. basically, function wrote works, artists, doesn't work (returns null
). (randy newman 1 such, i'll use him example.)
when run whole thing (below) , findhome("randy newman")
null
when attempt debug, run tablemusic()
function , artist <- "randy newman"
, run guts of artistdata()
function line line, works!
and then, once i've done that, can run findhome("randy newman")
, work right. gives?! have in wrong order or something? can't seem figure out.
any appreciated. here code:
library(rvest) findhome <- function(artist) { ##function table right info tablemusic <- function(data) { if(!any(grepl("years active|labels|instruments", data[,1], ignore.case=t))) { (i in 2:5) { data <- try(url %>% html %>% html_nodes(xpath=paste('//*[@id="mw-content-text"]/table[', i, ']', sep="")) %>% html_table(fill=t), silent=t) if(!class(data)=="try-error" & length(data)>0) { if(class(data)!="data.frame") {data <- data.frame(data, stringsasfactors=f)} if(any(grepl("years active|labels|instruments", data[,1], ignore.case=t))) { break } } } } if(class(data)=="try-error" | length(data)<1) { data <- null } else if (!any(grepl("years active|labels|instruments", data[,1], ignore.case=t))) { data <- null } data } #function pull data , try different pages if first wrong artistdata <- function(artist) { artist <- gsub(" ", "_", artist) artist <- gsub("'", "%27", artist) ##first try getting data url <- paste("https://en.wikipedia.org/wiki/", artist, sep="") data <- try(url %>% html %>% html_nodes(xpath='//*[@id="mw-content-text"]/table[1]') %>% html_table(fill=t), silent=t) ##check if it's right page (deal disambiguation issues) if(!class(data)=="try-error" & length(data)>0) { if(class(data)!="data.frame") {data <- data.frame(data, stringsasfactors=f)} data <- tablemusic(data) } ## if try-error or musictable==null, try _(band) if(class(data)=="try-error" | is.null(data) | length(data)<1) { url <- paste("https://en.wikipedia.org/wiki/", artist, "_(band)", sep="") data <- try(url %>% html %>% html_nodes(xpath='//*[@id="mw-content-text"]/table[1]') %>% html_table(fill=t), silent=t) if(class(data)=="try-error"){ data <- null } else { if(class(data)!="data.frame") {data <- data.frame(data, stringsasfactors=f)} data <- tablemusic(data) } } else { if(class(data)!="data.frame") {data <- data.frame(data, stringsasfactors=f)} } ## if try-error or musictable==null, try _(musician) if(class(data)=="try-error" | is.null(data) | length(data)<1) { url <- paste("https://en.wikipedia.org/wiki/", artist, "_(musician)", sep="") data <- try(url %>% html %>% html_nodes(xpath='//*[@id="mw-content-text"]/table[1]') %>% html_table(fill=t), silent=t) if(class(data)=="try-error"){ data <- null } else { if(class(data)!="data.frame") {data <- data.frame(data, stringsasfactors=f)} data <- tablemusic(data) } } else { if(class(data)!="data.frame") {data <- data.frame(data, stringsasfactors=f)} } data } ## first try finding data data <- artistdata(artist) ## try finding and/& if(is.null(data)){data <- artistdata(unlist(strsplit(artist, " and| &"))[1])} ## if no matches return "" if(class(data)=="try-error" | is.null(data)) { data <- "" return() } else { if(class(data)!="data.frame") {data <- data.frame(data, stringsasfactors=f)} } ## if have matching page, pull relevant data origin <- data[data[,1]=="origin",2] if(length(origin)>0) { home <- origin } else { born <- data[data[,1]=="born",2] if (length(born)>0) { home <- unlist(strsplit(born, "age.[0-9]+)"))[2] } else { home <- "" } } home } findhome("randy newman")
i figured out. had add url
parameter tablemusic()
function. was, recycling url past searches. suggestion.
Comments
Post a Comment