webhdfspy¶
A Python 2/3 wrapper library to access Hadoop WebHDFS REST API
Python versions¶
webhdfspy supports Python 2.7 and 3.4
Usage¶
>>> import webhdfspy
>>> webHDFS = webhdfspy.WebHDFSClient("localhost", 50070, "username")
>>> print(webHDFS.listdir('/'))
[]
>>> webHDFS.mkdir('/foo')
True
>>> print(webHDFS.listdir('/'))
[{u'group': u'supergroup', u'permission': u'755', u'blockSize': 0, u'accessTime': 0, u'pathSuffix': u'foo', u'modificationTime': 1429805040695, u'replication': 0, u'length': 0, u'childrenNum': 0, u'owner': u'username', u'storagePolicy': 0, u'type': u'DIRECTORY', u'fileId': 16387}]
>>> print webHDFS.create('/foo/foo.txt', "just put some text here", True)
True
>>> print webHDFS.open('/pywebhdfs_test/foo.txt')
just put some text here
>>> webHDFS.remove('/foo')
True
>>> print(webHDFS.listdir('/'))
[]
API Documentation¶
-
class
webhdfspy.webhdfspy.
WebHDFSClient
(host, port, username=None, logger=None)¶ -
__init__
(host, port, username=None, logger=None)¶ Create a new WebHDFS client.
When security is on, we need to specify an username :param host: hostname of the HDFS namenode :param port: port of the namenode :param username: used for authentication
-
append
(path, file_data, buffersize=None)¶ Append file_data to a file
Parameters: - path – path of the file
- file_data – data to append to the file
- buffersize – the size of the buffer used to transfer the data
-
chmod
(path, permission)¶ Set the permissions of a file or directory
Parameters: - path – path of the file/dir
- permission – dir permissions in octal (0-777)
-
copyfromlocal
(local_path, hdfs_path, overwrite=None)¶ Copy a file from the local filesystem to HDFS
Parameters: - local_path – path of the file to move
- hdfs_path – hdfs path to copy the file
-
create
(path, file_data, overwrite=None)¶ Create a new file in HDFS with the content of file_data
https://hadoop.apache.org/docs/r1.0.4/webhdfs.html#CREATE
Parameters: - path – the file path to create the file
- data – the data to write to the
-
environ_home
()¶ Returns: the home directory of the user
-
get_checksum
(path)¶ Returns the checksum of a file
Parameters: path – path of the file Returns: a FileChecksum JSON object
-
listdir
(path='/')¶ List all the contents of a directory
Parameters: path – path of the directory Returns: a list of fileStatusProperties: http://hadoop.apache.org/common/docs/r1.0.0/webhdfs.html#fileStatusProperties False on error
-
mkdir
(path, permission=None)¶ Create a directory hierarchy, like the unix command mkdir -p
Parameters: - path – the path of the directory
- permission – dir permissions in octal (0-777)
-
open
(path, offset=None, length=None, buffersize=None)¶ Open a file to read
Parameters: - path – path of the file
- offset – starting bit position
- length – number of bits to read
- buffersize – the size of the buffer used to transfer the data
Returns: the file data
-
remove
(path, recursive=False)¶ Delete a file o directory
Parameters: - path – path of the file or dir to delete
- recursive – set to true to delete the content in subdirectories
-
rename
(src, dst)¶ Rename a file or directory
Parameters: - src – path of the file or dir to rename
- dst – path of the final file/dir
-
set_replication
(path, replication_factor)¶ Set the replication factor of a file
Parameters: - path – path of the file
- replication_factor – number of replications, should be > 0
-
status
(path)¶ Returns the status of a file/dir
Parameters: path – path of the file/dir Returns: a FileStatus dictionary on success, false otherwise
-