====== Cluster Big Data 1 ======

==== Service description ====
Cluster for experimentation and research in big data. Its objective is to provide an adequate hardware platform for Big Data jobs with a very flexible configuration, to make it easy to research and experiment on it.

==== Activation ====
You have to register filling the [[https://citius.usc.es/dashboard/enviar-incidencia| requests and problem reporting form]]. This form is only available to CITIUS users.

==== Access ====

The cluster can be accessed only through the master node which acts as a frontend for the cluster:
<code bash>
ssh -X <citius user>@master-bd1.inv.usc.es
</code>

To access the Ambari manager web interface, you need to tunnel the ''nodo1'' port 8080 using SSH. In case you want to use this interface, you must use this command instead:

<code bash>
ssh -L 8080:nodo1:8080 <citius_user>@master-bd1.inv.usc.es
</code>

==== Use ====

To manage the cluster from the Apache Ambari web interface, once connected through SSH using the option ''-L 8080:nodo1:8080'', connect to http://localhost:8080/

User and password are ''admin/admin''.

==== Software ====
The following software can be managed from the Ambari console: 
    * HDFS 2.7.3
    * YARN 2.7.3
    * Tez 0.7.0
    * Hive 1.2.1000
    * Pig 0.16.0
    * ZooKeeper 3.4.6
    * Storm 1.1.0
    * Spark 1.6.3
    * Spark2 2.2.0
    * Zeppelin Notebook 0.7.3
    * Slider 0.92.0

Also in the following table other projects managed by users can be found:
^   User    ^    Software   ^   Nodes   ^   Notes   ^
|   David Luaces Cachaza  |   MongoDB Sharding   |  Todos  |  Hasta 12/19  |
|   Cesar Piñeiro Pomar   |   GlusterFS   |   Todos   |      |
|   Rodrigo Martinez Castaño  |  Docker  |  3 y 4  |    |
==== Hardware ====
16  Dell EMC PowerEdge R730 servers, each one with the following configuration:
  * 2 x Intel Xeon E5-2630 v4 (2,2Ghz 10c)
  * 384 GB de RAM: 12 x 32GB RDIMM 2400MT/s
  * 32 TB HDD: 8 x 4TB 7.2k SATA 6Gbps in JBOD
  * 2 x 10Gb BaseT y 2 x 1Gb BaseT