A lot of clustering / data exploration tutorials out there use the famous `iris` dataset to show how PCA, t-SNE, MDS and other techniques work. Something like this:

```pca1 <- prcomp(iris[, -5])
plot(pca1\$x[, 1], pca1\$x[, 2], col = iris\$Species, pch = 19)
``` Which is alright, but starts from a premise that is not always true: we know to which class each data point belongs. However, in many real-life problems, the dimensionality reduction step is done precisely because we don't know how data is being clustered together (if at all) and we want to check what is really going on.

Consider the following dataset and transformations:

```dataset1 <- data.frame(feature1 = c(rnorm(40, 0, 1), rnorm(60, 4, 1)),
feature2 = c(rnorm(20, 10, 2), rnorm(20, 0, 2), rnorm(60, 5, 2)),
feature3 = rnorm(100, 0, 1),
feature4 = c(rnorm(50, 0, 1), rnorm(50, 5, 1)),
feature5 = c(rnorm(40, 2, 1), rnorm(60, 6, 2)))

library(tsne)
tsne1 <- tsne(scale(dataset1), perplexity = 10)
plot(tsne1[, 1], tsne1[, 2])
``` We don't know (though we can take an educated guess) how many classes are in this dataset or which features are useful to separate them. So, though we can see some very clear clusters, the only thing we know about them is their t-SNE x and y coordinates. We can now go to our original dataset and obtain a subset of data (say, those for which the transformation yielded `x < -150`, for instance, and then check what is going on in the `featureN` original space.

But the other day I found a graphical way of doing this. It uses plotly (locally) and ggplot. Let's first build the ggplot object for this:

```library(ggplot2)
plotdata <- data.frame(tsne_x = tsne1[, 1], tsne_y = tsne1[, 2])
plt1 <- ggplot(plotdata) + geom_point(aes(x = tsne_x, y = tsne_y))
plot(plt1)
```

Nothing special here, just the normal, static ggplot usual graph (won't even include it here). What would be cool, though, is having system that allows to explore data in a visual way: when the mouse pointer is placed over a given point, the `featureN` values are shown on screen. We can do that with plotly and a couple of extra lines:

```library(plotly)
# Let's generate a vector with the info:
hover_text <- apply(dataset1, 1, function(x) {
n <- names(x)
t <- paste(n, x, sep = ": ", collapse = "<br>")
return(t)
}
)
plotdata <- data.frame(tsne_x = tsne1[, 1], tsne_y = tsne1[, 2],
hover_text = hover_text)
plt2 <- ggplot(plotdata) +
geom_point(aes(x = tsne_x, y = tsne_y, text = hover_text))
ggplotly(plt2)
```

So, what are we doing here? We are creating a `hover_text` variable for each row of our dataset (with the help of the `apply` function) that simply prints the name of each column in the dataset followed by the actual value for that row, and will separate them using HTML's `<br>` tag, which will make plotly present each variable in a new line. We then build the ggplot object as before, with the addition of the `text` aesthetic (which will produce a warning, but let's not worry about that) and then, instead of `plot`ing it, we use plotly's `ggplotly` function.

And we get something like this: That's it. We can now explore our data (including the original values before the t-SNE transformation) in a visual way, much faster than having to go back and forth to the initial data.frame.