Alex 的 Hadoop 菜鸟教程: 第18课 用Http的方式访问HDFS,hadoophdfs


声明

  • 本文基于Centos 6.x + CDH 5.x

HttpFs 有啥用

HttpFs可以干这两件事情
  • 通过HttpFs你可以在浏览器里面管理HDFS上的文件
  • HttpFs还提供了一套REST 风格的API可以用来管理HDFS
其实很简单的一个东西嘛,但是很实用

安装HttpFs

在集群里面找一台可以访问hdfs的机器安装HttpFs
$ sudo yum install hadoop-httpfs

配置

编辑/etc/hadoop/conf/core-site.xml
<property>  
<name>hadoop.proxyuser.httpfs.hosts</name>  
<value>*</value>  
</property>  
<property>  
<name>hadoop.proxyuser.httpfs.groups</name>  
<value>*</value>  
</property>  
这边是定义可以使用httpfs的用户组和host,写*就是不限制 配置好之后重启hadoop

启动HttpFs

$ sudo service hadoop-httpfs start

使用HttpFs


打开浏览器访问 http://host2:14000/webhdfs/v1?op=LISTSTATUS&user.name=httpfs 可以看到
{
	"FileStatuses": {
		"FileStatus": [{
			"pathSuffix": "hbase",
			"type": "DIRECTORY",
			"length": 0,
			"owner": "hbase",
			"group": "hadoop",
			"permission": "755",
			"accessTime": 0,
			"modificationTime": 1423446940595,
			"blockSize": 0,
			"replication": 0
		},
		{
			"pathSuffix": "tmp",
			"type": "DIRECTORY",
			"length": 0,
			"owner": "hdfs",
			"group": "hadoop",
			"permission": "1777",
			"accessTime": 0,
			"modificationTime": 1423122488037,
			"blockSize": 0,
			"replication": 0
		},
		{
			"pathSuffix": "user",
			"type": "DIRECTORY",
			"length": 0,
			"owner": "hdfs",
			"group": "hadoop",
			"permission": "755",
			"accessTime": 0,
			"modificationTime": 1423529997937,
			"blockSize": 0,
			"replication": 0
		},
		{
			"pathSuffix": "var",
			"type": "DIRECTORY",
			"length": 0,
			"owner": "hdfs",
			"group": "hadoop",
			"permission": "755",
			"accessTime": 0,
			"modificationTime": 1422945036465,
			"blockSize": 0,
			"replication": 0
		}]
	}
}

这个 &user.name=httpfs 表示用默认用户 httpfs 访问,默认用户是没有密码的。
webhdfs/v1 这是HttpFs的根目录
访问 http://host2:14000/webhdfs/v1/user?op=LISTSTATUS&user.name=httpfs 可以看到
{
	"FileStatuses": {
		"FileStatus": [{
			"pathSuffix": "cloudera",
			"type": "DIRECTORY",
			"length": 0,
			"owner": "root",
			"group": "hadoop",
			"permission": "755",
			"accessTime": 0,
			"modificationTime": 1423472508868,
			"blockSize": 0,
			"replication": 0
		},
		{
			"pathSuffix": "hdfs",
			"type": "DIRECTORY",
			"length": 0,
			"owner": "hdfs",
			"group": "hadoop",
			"permission": "700",
			"accessTime": 0,
			"modificationTime": 1422947019504,
			"blockSize": 0,
			"replication": 0
		},
		{
			"pathSuffix": "history",
			"type": "DIRECTORY",
			"length": 0,
			"owner": "mapred",
			"group": "hadoop",
			"permission": "1777",
			"accessTime": 0,
			"modificationTime": 1422945692887,
			"blockSize": 0,
			"replication": 0
		},
		{
			"pathSuffix": "hive",
			"type": "DIRECTORY",
			"length": 0,
			"owner": "hive",
			"group": "hadoop",
			"permission": "755",
			"accessTime": 0,
			"modificationTime": 1423123187569,
			"blockSize": 0,
			"replication": 0
		},
		{
			"pathSuffix": "hive_people",
			"type": "DIRECTORY",
			"length": 0,
			"owner": "root",
			"group": "hadoop",
			"permission": "755",
			"accessTime": 0,
			"modificationTime": 1423216966453,
			"blockSize": 0,
			"replication": 0
		},
		{
			"pathSuffix": "hive_people2",
			"type": "DIRECTORY",
			"length": 0,
			"owner": "root",
			"group": "hadoop",
			"permission": "755",
			"accessTime": 0,
			"modificationTime": 1423222237254,
			"blockSize": 0,
			"replication": 0
		},
		{
			"pathSuffix": "impala",
			"type": "DIRECTORY",
			"length": 0,
			"owner": "root",
			"group": "hadoop",
			"permission": "755",
			"accessTime": 0,
			"modificationTime": 1423475272189,
			"blockSize": 0,
			"replication": 0
		},
		{
			"pathSuffix": "root",
			"type": "DIRECTORY",
			"length": 0,
			"owner": "root",
			"group": "hadoop",
			"permission": "700",
			"accessTime": 0,
			"modificationTime": 1423221719835,
			"blockSize": 0,
			"replication": 0
		},
		{
			"pathSuffix": "spark",
			"type": "DIRECTORY",
			"length": 0,
			"owner": "spark",
			"group": "spark",
			"permission": "755",
			"accessTime": 0,
			"modificationTime": 1423530243396,
			"blockSize": 0,
			"replication": 0
		},
		{
			"pathSuffix": "sqoop",
			"type": "DIRECTORY",
			"length": 0,
			"owner": "hdfs",
			"group": "hadoop",
			"permission": "755",
			"accessTime": 0,
			"modificationTime": 1423127462911,
			"blockSize": 0,
			"replication": 0
		},
		{
			"pathSuffix": "test_hive",
			"type": "DIRECTORY",
			"length": 0,
			"owner": "root",
			"group": "hadoop",
			"permission": "755",
			"accessTime": 0,
			"modificationTime": 1423215687891,
			"blockSize": 0,
			"replication": 0
		}]
	}
}



很奇怪的是HttpFs的文档很少,更具体的命令要去 WebHDFS的文档里面看 WebHDFS REST API   支持的命令

Operations

  • HTTP GET
    • OPEN (see FileSystem.open)
    • GETFILESTATUS (see FileSystem.getFileStatus)
    • LISTSTATUS (see FileSystem.listStatus)
    • GETCONTENTSUMMARY (see FileSystem.getContentSummary)
    • GETFILECHECKSUM (see FileSystem.getFileChecksum)
    • GETHOMEDIRECTORY (see FileSystem.getHomeDirectory)
    • GETDELEGATIONTOKEN (see FileSystem.getDelegationToken)
  • HTTP PUT
    • CREATE (see FileSystem.create)
    • MKDIRS (see FileSystem.mkdirs)
    • RENAME (see FileSystem.rename)
    • SETREPLICATION (see FileSystem.setReplication)
    • SETOWNER (see FileSystem.setOwner)
    • SETPERMISSION (see FileSystem.setPermission)
    • SETTIMES (see FileSystem.setTimes)
    • RENEWDELEGATIONTOKEN (see DistributedFileSystem.renewDelegationToken)
    • CANCELDELEGATIONTOKEN (see DistributedFileSystem.cancelDelegationToken)
  • HTTP POST
    • APPEND (see FileSystem.append)
  • HTTP DELETE
    • DELETE (see FileSystem.delete)

建立文件夹

尝试建立一个叫 abc 的文件夹
[root@host2 hadoop-httpfs]# curl -i -X PUT "http://xmseapp03:14000/webhdfs/v1/user/abc?op=MKDIRS&user.name=httpfs"
HTTP/1.1 200 OK
Server: Apache-Coyote/1.1
Set-Cookie: hadoop.auth="u=httpfs&p=httpfs&t=simple&e=1423573951025&s=Ab44ha1Slg1f4xCrK+x4R/s1eMY="; Path=/; Expires=Tue, 10-Feb-2015 13:12:31 GMT; HttpOnly
Content-Type: application/json
Transfer-Encoding: chunked
Date: Tue, 10 Feb 2015 03:12:36 GMT

{"boolean":true}
然后用服务器上的hdfs dfs -ls 命令看下结果
[root@host2 conf]# hdfs dfs -ls /user
Found 12 items
drwxr-xr-x   - httpfs hadoop          0 2015-02-10 11:12 /user/abc
drwxr-xr-x   - root   hadoop          0 2015-02-09 17:01 /user/cloudera
drwx------   - hdfs   hadoop          0 2015-02-03 15:03 /user/hdfs
drwxrwxrwt   - mapred hadoop          0 2015-02-03 14:41 /user/history
drwxr-xr-x   - hive   hadoop          0 2015-02-05 15:59 /user/hive
drwxr-xr-x   - root   hadoop          0 2015-02-06 18:02 /user/hive_people
drwxr-xr-x   - root   hadoop          0 2015-02-06 19:30 /user/hive_people2
drwxr-xr-x   - root   hadoop          0 2015-02-09 17:47 /user/impala
drwx------   - root   hadoop          0 2015-02-06 19:21 /user/root
drwxr-xr-x   - spark  spark           0 2015-02-10 09:04 /user/spark
drwxr-xr-x   - hdfs   hadoop          0 2015-02-05 17:11 /user/sqoop
drwxr-xr-x   - root   hadoop          0 2015-02-06 17:41 /user/test_hive
可以看到建立了一个属于 httpfs 的文件夹 abc

打开文件

从后台上传一个文本文件 test.txt  到 /user/abc 目录下,内容是
Hello World!

用httpfs访问
[root@host2 hadoop-httpfs]# curl -i -X GET "http://xmseapp03:14000/webhdfs/v1/user/abc/test.txt?op=OPEN&user.name=httpfs"
HTTP/1.1 200 OK
Server: Apache-Coyote/1.1
Set-Cookie: hadoop.auth="u=httpfs&p=httpfs&t=simple&e=1423574166943&s=JTxqIJUsblVBeHVuTs6JCV2UbBs="; Path=/; Expires=Tue, 10-Feb-2015 13:16:06 GMT; HttpOnly
Content-Type: application/octet-stream
Content-Length: 13
Date: Tue, 10 Feb 2015 03:16:07 GMT

Hello World!





相关内容