How can I install Apache Cassandra 4.0 on CentOS 8 | Rocky Linux 8 machine?. Apache Cassandra is a free and open-source NoSQL database management system designed to be distributed and highly available. Cassandra can handle large amounts of data across many commodity servers without any single point of failure.
This guide will walk you through the installation of Cassandra on CentOS 8 | Rocky Linux 8. After installation is done, we’ll proceed to do configurations and tuning of Cassandra to work with machines having minimal resources available.
Features of Cassandra
Cassandra provides the Cassandra Query Language (CQL), an SQL-like language, to create and update database schema and access data. CQL allows users to organize data within a cluster of Cassandra nodes using:
- Keyspace: defines how a dataset is replicated, for example in which datacenters and how many copies. Keyspaces contain tables.
- Table: defines the typed schema for a collection of partitions. Cassandra tables have flexible addition of new columns to tables with zero downtime. Tables contain partitions, which contain partitions, which contain columns.
- Partition: defines the mandatory part of the primary key all rows in Cassandra must have. All performant queries supply the partition key in the query.
- Row: contains a collection of columns identified by a unique primary key made up of the partition key and optionally additional clustering keys.
- Column: A single datum with a type which belong to a row.
Cassandra has support for the following client drivers:
- Java
- Python
- Ruby
- C# / .NET
- Nodejs
- PHP
- C++
- Scala
- Clojure
- Erlang
- Go
- Haskell
- Rust
- Perl
- Elixir
- Dart
Install Apache Cassandra 4.0 on CentOS 8 | Rocky Linux 8
Java is required for running Cassandra on CentOS 8 | Rocky Linux 8. As of this writing, required version of Java is 8. If you want to use cqlsh, you need the latest version of Python 2.7.
Step 1: Install Java 8 and Python and cqlsh
Install Python3 Pip and OpenJDK 8 on your CentOS / Rocky Linux 8:
sudo yum install python3 python3-pip java-1.8.0-openjdk java-1.8.0-openjdk-devel
Install cqsh using pip3 Python package manager:
sudo pip3 install cqlsh tox
Ensure the install is successful:
....
Collecting importlib-metadata; python_version < "3.8" (from click->geomet<0.3,>=0.1->cassandra-driver->cqlsh)
Downloading https://files.pythonhosted.org/packages/71/c2/cb1855f0b2a0ae9ccc9b69f150a7aebd4a8d815bd951e74621c4154c52a8/importlib_metadata-4.8.1-py3-none-any.whl
Collecting zipp>=0.5 (from importlib-metadata; python_version < "3.8"->click->geomet<0.3,>=0.1->cassandra-driver->cqlsh)
Downloading https://files.pythonhosted.org/packages/bd/df/d4a4974a3e3957fd1c1fa3082366d7fff6e428ddb55f074bf64876f8e8ad/zipp-3.6.0-py3-none-any.whl
Collecting typing-extensions>=3.6.4; python_version < "3.8" (from importlib-metadata; python_version < "3.8"->click->geomet<0.3,>=0.1->cassandra-driver->cqlsh)
Downloading https://files.pythonhosted.org/packages/74/60/18783336cc7fcdd95dae91d73477830aa53f5d3181ae4fe20491d7fc3199/typing_extensions-3.10.0.2-py3-none-any.whl
Installing collected packages: thrift, cql, zipp, typing-extensions, importlib-metadata, click, geomet, cassandra-driver, cqlsh
Running setup.py install for thrift ... done
Running setup.py install for cql ... done
Successfully installed cassandra-driver-3.25.0 click-8.0.3 cql-1.4.0 cqlsh-6.0.0 geomet-0.2.1.post1 importlib-metadata-4.8.1 thrift-0.15.0 typing-extensions-3.10.0.2 zipp-3.6.0
Confirm the installation of Java and cqlsh.
$ java -version
openjdk version "1.8.0_312"
OpenJDK Runtime Environment (build 1.8.0_312-b07)
OpenJDK 64-Bit Server VM (build 25.312-b07, mixed mode)
$ cqlsh --version
cqlsh 6.0.0
Step 2: Install Apache Cassandra 4.0 on CentOS 8 | Rocky Linux 8
Now that Java and Python are installed. Let’s now add Cassandra repository to our CentOS / Rocky system.
sudo tee /etc/yum.repos.d/cassandra.repo <<EOF
[cassandra]
name=Apache Cassandra
baseurl=https://www.apache.org/dist/cassandra/redhat/40x/
gpgcheck=1
repo_gpgcheck=1
gpgkey=https://www.apache.org/dist/cassandra/KEYS
EOF
Install Apache Cassandra with the command below.
sudo yum -y install cassandra
Create Cassandra service.
sudo tee /etc/systemd/system/cassandra.service<<EOF
[Unit]
Description=Apache Cassandra
After=network.target
[Service]
PIDFile=/var/run/cassandra/cassandra.pid
User=cassandra
Group=cassandra
ExecStart=/usr/sbin/cassandra -f -p /var/run/cassandra/cassandra.pid
Restart=always
[Install]
WantedBy=multi-user.target
EOF
Start and enable service to start at boot.
sudo systemctl daemon-reload
sudo systemctl start cassandra.service
sudo systemctl enable cassandra
Check service status:
$ systemctl status cassandra.service
● cassandra.service - Apache Cassandra
Loaded: loaded (/etc/systemd/system/cassandra.service; disabled; vendor preset: disabled)
Active: active (running) since Wed 2020-03-04 22:24:31 EAT; 2s ago
Main PID: 8758 (java)
Tasks: 10 (limit: 26213)
Memory: 3.9G
CGroup: /system.slice/cassandra.service
└─8758 java -Xloggc:/var/log/cassandra/gc.log -ea -XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -XX:+HeapDumpOnOutOfMemoryError -Xss256k -XX:StringTableSize=1000003 -XX:+AlwaysPreTouch -XX:-Us>
Mar 04 22:24:31 cent8.localdomain systemd[1]: Started Apache Cassandra.
You can also verify that Cassandra is running with the command below.
$ nodetool status
Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID Rack
UN 127.0.0.1 70 KiB 256 100.0% 0daf41fa-22e5-4471-bc00-9aed6f566235 rack1
To run a query against Cassandra, invoke the CQL shell with below command.
$ cqlsh
Connected to Test Cluster at 127.0.0.1:9042
[cqlsh 6.0.0 | Cassandra 4.0.1 | CQL spec 3.4.5 | Native protocol v5]
Use HELP for help.
cqlsh>
- The default location of configuration files is /etc/cassandra.
- The default location of log and data directories is /var/log/cassandra/ and /var/lib/cassandra.
Step 3: Configuring Cassandra on CentOS 8 | Rocky Linux 8
For running Cassandra on a single node, the default configuration file present at /etc/cassandra/conf/cassandra.yaml. For cluster of nodes setup, you may need to modify this file to ensure your cluster is tuned properly.
At a minimum you should consider setting the following properties:
- cluster_name: the name of your cluster.
- seeds: a comma separated list of the IP addresses of your cluster seeds.
- storage_port: you don’t necessarily need to change this but make sure that there are no firewalls blocking this port.
- listen_address: the IP address of your node, this is what allows other nodes to communicate with this node so it is important that you change it.
- native_transport_port: as for storage_port, make sure this port is not blocked by firewalls as clients will communicate with Cassandra on this port.
Changing the location of directories
The configuration yaml file controls the following data directories.
- data_file_directories: one or more directories where data files are located.
- commitlog_directory: the directory where commitlog files are located.
- saved_caches_directory: the directory where saved caches are located.
- hints_directory: the directory where hints are located.
For performance reasons, if you have multiple disks, consider putting commitlog and data files on different disks.
Setting Environment variables
The JVM level settings such as heap size are set in the cassandra-env.sh. Consider adding any additional JVM command line argument to the JVM_OPTS
environment variable. These arguments are passed to Cassandra service when it starts.
Cassandra Logging
The logger in use is logback. You can change logging properties by editing logback.xml. By default it will log at INFO level into a file called system.log and at debug level into a file calle debug.log. When running in the foreground, it will also log at INFO level to the console.
Refer to official guide for Clients configuration.