Automatizing a Bigdata (CDH) cluster installation

It’s been a while without writing any post but this one is going to be useful for people that face the task of installing a CDH cluster from scratch.

There are a few prerequisites before starting with the installation that need to be configured or otherwise the installation can/will crash. To address some of these I wrote some small scripts in ansible to automatize the adjustment of these parameters or the data copy. The scripts are not finished, and I plan to improve them adding a few parts that I still haven’t automatized like the creation of the yum repositories for installing cloudera manager and a few other small tasks.

For this manual we assume that Cloudera manager has already been installed, and the database to hold Cloudera manager data repository and some of the other tools has been installed as well.

The script is written in ansible and it has three playbooks and an inventory file. The inventory file is in yaml format. I grouped the nodes between master(s) / workers (datanodes) the Cloudera manager server and the gateway. The inventory file is defined as follows and you can save it as inventory.yaml:

all:
  children:
    cm:
      hosts:
        cm_host:
    master:
      hosts:
        master_host:
    workers:
      hosts:
        worker1_host:
        worker2_host:
        worker3_host:
    gateway:
      hosts:
        gateway_host:

Replace the xxx_host by the fully qualified domain name of your server.

Then we have the prerequisites playbook, this one has more substance. Apart from the prerequisites enunciated in Cloudera’s website I added some tweaks and actions like the change of the mysql jdbc driver, as the one in yum is outdated and will make the creation of the dbs to crash in the wizard. You can save this one as cloudera_prerequisites.yaml:

---
- hosts: all
  connection: ssh
  remote_user: youruser
  become: yes
  become_method: sudo
  become_user: root
  tasks:
   - service: name=firewalld state=stopped enabled=False
   - selinux: state=disabled
   - sysctl: name=net.ipv6.conf.all.disable_ipv6 value=1 state=present
   - sysctl: name=net.ipv6.conf.default.disable_ipv6 value=1 state=present
   - sysctl: name=vm.swappiness value=1 state=present
   - shell: sysctl -w vm.swappiness=1
   - copy: src=/etc/hosts dest=/etc/hosts owner=root group=root mode=0644
   - yum: name=java-1.8.0-openjdk-devel state=latest
   - systemd: name=tuned state=started
   - shell: tuned-adm off
   - systemd: name=tuned state=stopped enabled=False
   - name: Disable THP support scripts added to rc.local
     lineinfile:
       path: /etc/rc.local
       line: |
         echo never > /sys/kernel/mm/transparent_hugepage/enabled
         echo never > /sys/kernel/mm/transparent_hugepage/defrag
   - name: Change permissions of /etc/rc.local to make it run on boot
     shell: chmod +x /etc/rc.d/rc.local
     become_method: sudo
   - service: name=ntpd state=started
   - name: Allow 'simigsolutions' user to have passwordless sudo
     lineinfile:
       path: /etc/sudoers
       state: present
       regexp: '^youruser'
       line: 'youruser ALL =(ALL) NOPASSWD: ALL'
#   - name: Install mysql jdbc driver
#     yum:
#       name: mysql-connector-java
#       state: latest
   - name: download newer jdbc for mysql to avoid crash
     get_url: url=https://dev.mysql.com/get/Downloads/Connector-J/mysql-connector-java-5.1.46.tar.gz dest=/tmp/mysql-connector-java-5.1.46.tar.gz
   - name: Check if /tmp/mysql-connector-java-5.1.46.tar.gz exists
     stat:
       path: /tmp/mysql-connector-java-5.1.46.tar.gz
     register: stat_result
   - block:
     - name: Extract downloaded jdbc
       unarchive:
         src: /tmp/mysql-connector-java-5.1.46.tar.gz 
         dest: /tmp/
     - name: Creates directory for the java driver if it does not exist
       file:
         path: /usr/share/java
         state: directory
         mode: 0755
         recurse: yes
     - name: Copies the file
       copy:
         src: /tmp/mysql-connector-java-5.1.46/mysql-connector-java-5.1.46-bin.jar
         dest: /usr/share/java/mysql-connector-java.jar
         owner: root
         group: root
         mode: 0644
     - name: Copies the file to sqoop as well
       copy:
         src: /tmp/mysql-connector-java-5.1.46/mysql-connector-java-5.1.46-bin.jar
         dest: /var/lib/sqoop/mysql-connector-java.jar
         owner: sqoop
         group: sqoop
         mode: 0644		
     when: stat_result.stat.exists == True

The script can be called with the following: ansible-playbook -i inventory.yaml cloudera_prerequisites.yaml –ask-pass –ask-become-pass

I’ve also created two aditional playbooks, one to create folders in the mount points to store hdfs and yarn data:

---
- hosts: workers
  connection: ssh
  remote_user: youruser
  become: yes
  become_method: sudo
  become_user: root
  tasks:
  - name: Creates directory datanodes
    file:
      path: /home/data/dfs/dn
      state: directory
      owner: hdfs
      group: hdfs
      mode: 0700
      recurse: yes
  - name: Creates directory namenodes
    file:
      path: /home/data/yarn/nm
      state: directory
      owner: yarn
      group: yarn
      mode: 0700
      recurse: yes
  - name: Creates directory namenodes
    file:
      path: home/data/impala/impalad
      state: directory
      owner: impala
      group: impala
      mode: 0700
      recurse: yes

- hosts: master
  connection: ssh
  remote_user: youruser
  become: yes
  become_method: sudo
  become_user: root
  tasks:
  - name: Creates directory namenode
    file:
      path: /home/data/dfs/nn
      state: directory
      owner: hdfs
      group: hdfs
      mode: 0700
      recurse: yes
  - name: Creates directory secondary namenode
    file:
      path: /home/data/dfs/snn
      state: directory
      owner: hdfs
      group: hdfs
      mode: 0700
      recurse: yes

And another one to copy some jdbc driver and distribute it in all machines of the cluster (sql server) but can be adapted to any downloadable jdbc driver:

- hosts: all
  connection: ssh
  remote_user: youruser
  become: yes
  become_method: sudo
  become_user: root
  tasks:
   - name: download sqlserver jdbc driver
     get_url: url=https://download.microsoft.com/download/4/D/C/4DCD85FA-0041-4D2E-8DD9-833C1873978C/sqljdbc_7.2.2.0_enu.tar.gz dest=/tmp/sqljdbc_7.2.2.0_enu.tar.gz
   - name: Check if /tmp/sqljdbc_7.2.2.0_enu.tar.gz exists
     stat:
       path: /tmp/sqljdbc_7.2.2.0_enu.tar.gz
     register: stat_result
   - block:
     - name: Extract downloaded jdbc
       unarchive:
         src: /tmp/sqljdbc_7.2.2.0_enu.tar.gz
         dest: /tmp/
     - name: Copies the file into the aqoop folder
       copy:
         src: /tmp/sqljdbc_7.2/enu/mssql-jdbc-7.2.2.jre8.jar
         dest: /var/lib/sqoop/mssql-jdbc-7.2.2.jre8.jar
         owner: sqoop
         group: sqoop
         mode: 0644
     when: stat_result.stat.exists == True

Happy Installation 🙂

Leave a Reply

Your email address will not be published. Required fields are marked *