DC/OS的HDFS持久化管理
DC/OS除了高效、低成本集群/数据中心管理外,在大数据分析和有状态服务有明显优势。
而有状态服务主要通过 dcos-commons 提供。
如:http://192.168.0.250/metadata
dcos-commons Simplifying stateful services for Kafka, Cassandra, HDFS, Spark, and TensorFlow with DC/OS. dcos-commons文档
DC/OS HDFS版本迭代更新也是放在dcos-commons仓库里的。
因为HDFS是DC/OS官方支持的持久化存储方案,所以采用解决容器和状态持续化服务。
实施这套部署方案
DC/OS HDFS 提供以下功能:
- Single-command installation for rapid provisioning
- Persistent storage volumes for enhanced data durability
- Runtime configuration and software updates for high availability
- Health checks and metrics for monitoring
- Distributed storage scale out
- HA name service with Quorum Journaling and ZooKeeper failure detection。
HDFS节点配置信息
"journal_node": {
"cpus": 0.5,
"mem": 4096,
"disk": 10240,
"disk_type": "ROOT",
"strategy": "parallel"
},
"name_node": {
"cpus": 0.5,
"mem": 4096,
"disk": 10240,
"disk_type": "ROOT"
},
"zkfc_node": {
"cpus": 0.5,
"mem": 4096
},
"data_node": {
"count": 3,
"cpus": 0.5,
"mem": 4096,
"disk": 10240,
"disk_type": "ROOT",
"strategy": "parallel"
}
HDFS的一些端口信息,三类:名字节点、日志节点、数据节点
{
"hdfs": {
"name_node_rpc_port": 9001,
"name_node_http_port": 9002,
"journal_node_rpc_port": 8485,
"journal_node_http_port": 8480,
"data_node_rpc_port": 9005,
"data_node_http_port": 9006,
"data_node_ipc_port": 9007,
"permissions_enabled": false,
"name_node_heartbeat_recheck_interval": 60000,
"compress_image": true,
"image_compression_codec": "org.apache.hadoop.io.compress.SnappyCodec"
}
}
HDFS部署时可能出现的错误,主要是SSL配置问题:
2018/04/23 14:21:26 No $MESOS_SANDBOX/.ssl directory found. Cannot install certificate. Error: stat /var/lib/mesos/slave/slaves/e1d2e6c5-6a6e-455d-96cc-f2b17213c33f-S2/frameworks/e1d2e6c5-6a6e-455d-96cc-f2b17213c33f-0000/executors/hdfs.6e37052e-46be-11e8-83f7-16e3007733d8/runs/ac880e14-2b5e-4dda-b6ab-f1b784264f8b/.ssl: no such file or directory
2018/04/23 14:21:26 SDK Bootstrap successful.
Exception in thread "main" java.lang.NullPointerException
at java.util.Base64$Decoder.decode(Base64.java:549)
at com.mesosphere.sdk.hdfs.scheduler.Main.getHDFSUserAuthMappings(Main.java:122)
at com.mesosphere.sdk.hdfs.scheduler.Main.createSchedulerBuilder(Main.java:61)
at com.mesosphere.sdk.hdfs.scheduler.Main.main(Main.java:49)
I0423 14:21:30.079073 13 executor.cpp:933] Command exited with status 1 (pid: 15)
I0423 14:21:31.081287 10 checker_process.cpp:244] Stopped HTTP health check for task 'hdfs.6e37052e-46be-11e8-83f7-16e3007733d8'
I0423 14:21:31.082377 14 process.cpp:1068] Failed to accept socket: future discarded
可参考SSL in Mesos