I've been working quite a lot with Cassandra lately and for some analysis I have to do I've been trying to find the way to connect to it using R. My searches have pointed me to articles such as this one that work for older versions of Cassandra. However, when I tried to run my script with this JDBC driver, I got this error message:

Unable to retrieve JDBC result set for SELECT * FROM test_table LIMIT 1 (CQL2 has been removed in Cassandra 2.2. Please use CQL3 instead

The solution is to use a JDBC driver that can talk to newer Cassandra installations. In the end, this one from DataStaX (Simba JDBC Driver for Apache Cassandra) did the trick.

Briefly:

library(RJDBC)

drv <- JDBC("com.simba.cassandra.jdbc42.Driver", 
            "/home/user/libs/datastax/CassandraJDBC42.jar")
conn <- dbConnect(drv, 
                  "jdbc:cassandra://127.0.01;AuthMech=1;UID=user;PWD=pass;DefaultKeySpace=mykeyspace")
data <- dbGetQuery(conn, "SELECT * FROM test_table LIMIT 10")
print(data)
dbDisconnect(conn)

Yes, I know there's also RCassandra, but it hasn't been updated in a while and I've seen a lot of people complaining that it doesn't quite work, so I decided to go the JDBC way.

There is no comment system. If you want to tell me something about this article, you can do so via e-mail or Mastodon.