This is a very simple data analysis project. The question: what does Google suggest when I look up actors or actresses names? Google Autocomplete works by taking into account the latest search terms, among other variables. If there's the slightest hint of ageism in Hollywood, specially in the case of women, perhaps we can see it there. Let's try to answer it using R.

Looking up Google Autocomplete suggestions in order to work with them is fairly easy. Let's take a look at this R function:

library(RCurl)
library(jsonlite)
get_suggestions <- function(text, verbose = TRUE) {
    # Returns suggestions for `text`
    text <- tolower(text)
    sug_url <- sprintf("https://suggestqueries.google.com/complete/search?client=firefox&q=%s",
                       URLencode(text))
    res <- fromJSON(getURL(sug_url))
    # Remove `text` from all elements of the second position in the list
    subst <- as.character(sapply(res[[2]], 
                                 function(x) trimws(gsub(text, "", x))
                                 )
                          )
    # Delete empty strings (it will normally be the first item of the array)
    subst <- Filter(function(x) nchar(x) > 0, subst)
    if (verbose) {
        print(text)
        print(subst)
    }
    return(subst)
}

As you can see, all we are doing is querying the API available at suggestqueries.google.com with the right parameters. Let's check the function:

> get_suggestions("obama is ", FALSE)
[1] "osama"            "a weak president" "gog"              "a class act"      "out"
[6] "back"             "a muslim video"   "still president"  "bin laden"        "gone"

Thank you, Internet, that's enough. Let's move on.

Now, we want to run that function using a list of names of actors and actresses as inputs. While we can manually come up with such lists, it is better to try to see if we can simplify our lives by extracting that information from somewhere else. There are two very relevant Wikipedia pages: Academy Award for Best Actor and Academy Award for Best Actress. If you open these pages, you will see that the main content is presented in a table that lists the year of the award, the name of the actor or actress, the movie he or she starred in, and the character that was played. The first two columns are relevant for us, if we manage to extract them. This way, we will get a list of people that are relatively well known, so we can expect Google will have some suggestion in store for us.

At first, I tried to solve this by using the XML library, but the table was not being correctly read. After some minutes of looking for an alternative, I found the wonderful rvest package, which did the trick. This is how I build my list of actors (same for actresses):

library(rvest)
tables <- html_nodes(read_html("https://en.wikipedia.org/wiki/Academy_Award_for_Best_Actor"),
                     "table")
actors <- html_table(tables[[3]])[, 1:2]
actors$Year <- as.numeric(gsub("\\(.*\\)", "", actors$Year)) # may produce NAs, not concerned
actors$Actor <- gsub("(.*)\\!(.*)", "\\2", actors$Actor)
actors <- unique(actors[actors$Year > 1990, "Actor"])
actors <- actors[!is.na(actors)]

The lines that call gsub are necessary because the date contains the Award edition in parentheses, which I don't need, and the actor name contains a hyperlink to the actor page, and I also want to scrap that. In the end, I get a unique list of actors that were nominated for an Academy Award after 1990.

After this I called the get_suggestions function and stored the results. Then I counted the top 5 words that appear as the first suggestion, the top 5 words that appear as the second suggestion, and so no, until the fifth suggestion. Code sample:

# Get the top 5 words in each position (from positions 1 to 5)
actors_top <- lapply(1:5, function(i) {
      dd <- sort(table(sapply(actors_suggestions, function(x) x[i])), decreasing = TRUE)[1:5]
      data.frame(keywords = names(dd),
                 values = as.numeric(dd),
                 position = sprintf("Position #%d", i),
                 idx = 1:5)
})
actors_top <- do.call(rbind, actors_top)
actors_top$type <- "Actors"

Then I combined the results for actors and the results for actresses in a single data frame called all_data and plotted the following:

plt1 <- ggplot(alldata) + geom_bar(aes(x = idx, y = values), 
                                   stat = "identity", fill = "lightgray",
                                   color = "black") + 
    facet_grid(factor(position)~ type) + 
    geom_text(aes(x = idx, y = values, label = keywords), nudge_y = 4) +
    xlab("") + ylab("Number of occurrences")
plot(plt1)

Final result:

Google suggestions for actors and actresses 
names

(The complete R file for this project can be found in this GitHub repository.)