====== Cluster Big Data 1 ======
==== Service description ====
Cluster for experimentation and research in big data. Its objective is to provide an adequate hardware platform for Big Data jobs with a very flexible configuration, to make it easy to research and experiment on it.
==== Activation ====
You have to register filling the [[https://citius.usc.es/dashboard/enviar-incidencia| requests and problem reporting form]]. This form is only available to CITIUS users.
==== Access ====
The cluster can be accessed only through the master node which acts as a frontend for the cluster:
ssh -X @master-bd1.inv.usc.es
To access the Ambari manager web interface, you need to tunnel the ''nodo1'' port 8080 using SSH. In case you want to use this interface, you must use this command instead:
ssh -L 8080:nodo1:8080 @master-bd1.inv.usc.es
==== Use ====
To manage the cluster from the Apache Ambari web interface, once connected through SSH using the option ''-L 8080:nodo1:8080'', connect to http://localhost:8080/
User and password are ''admin/admin''.
==== Software ====
The following software can be managed from the Ambari console:
* HDFS 2.7.3
* YARN 2.7.3
* Tez 0.7.0
* Hive 1.2.1000
* Pig 0.16.0
* ZooKeeper 3.4.6
* Storm 1.1.0
* Spark 1.6.3
* Spark2 2.2.0
* Zeppelin Notebook 0.7.3
* Slider 0.92.0
Also in the following table other projects managed by users can be found:
^ User ^ Software ^ Nodes ^ Notes ^
| David Luaces Cachaza | MongoDB Sharding | Todos | Hasta 12/19 |
| Cesar Piñeiro Pomar | GlusterFS | Todos | |
| Rodrigo Martinez Castaño | Docker | 3 y 4 | |
==== Hardware ====
16 Dell EMC PowerEdge R730 servers, each one with the following configuration:
* 2 x Intel Xeon E5-2630 v4 (2,2Ghz 10c)
* 384 GB de RAM: 12 x 32GB RDIMM 2400MT/s
* 32 TB HDD: 8 x 4TB 7.2k SATA 6Gbps in JBOD
* 2 x 10Gb BaseT y 2 x 1Gb BaseT