webhdfspy

A Python 2/3 wrapper library to access Hadoop WebHDFS REST API

Installation

To install webhdfspy from PyPI:

$ pip install webhdfspy

Python versions

webhdfspy supports Python 2.7 and 3.4

Usage

>>> import webhdfspy
>>> webHDFS = webhdfspy.WebHDFSClient("localhost", 50070, "username")
>>> print(webHDFS.listdir('/'))
[]
>>> webHDFS.mkdir('/foo')
True
>>> print(webHDFS.listdir('/'))
[{u'group': u'supergroup', u'permission': u'755', u'blockSize': 0, u'accessTime': 0, u'pathSuffix': u'foo', u'modificationTime': 1429805040695, u'replication': 0, u'length': 0, u'childrenNum': 0, u'owner': u'username', u'storagePolicy': 0, u'type': u'DIRECTORY', u'fileId': 16387}]
>>> print webHDFS.create('/foo/foo.txt', "just put some text here", True)
True
>>> print webHDFS.open('/pywebhdfs_test/foo.txt')
just put some text here
>>> webHDFS.remove('/foo')
True
>>> print(webHDFS.listdir('/'))
[]

API Documentation

class webhdfspy.webhdfspy.WebHDFSClient(host, port, username=None, logger=None)
__init__(host, port, username=None, logger=None)

Create a new WebHDFS client.

When security is on, we need to specify an username :param host: hostname of the HDFS namenode :param port: port of the namenode :param username: used for authentication

append(path, file_data, buffersize=None)

Append file_data to a file

Parameters:
  • path – path of the file
  • file_data – data to append to the file
  • buffersize – the size of the buffer used to transfer the data
chmod(path, permission)

Set the permissions of a file or directory

Parameters:
  • path – path of the file/dir
  • permission – dir permissions in octal (0-777)
copyfromlocal(local_path, hdfs_path, overwrite=None)

Copy a file from the local filesystem to HDFS

Parameters:
  • local_path – path of the file to move
  • hdfs_path – hdfs path to copy the file
create(path, file_data, overwrite=None)

Create a new file in HDFS with the content of file_data

https://hadoop.apache.org/docs/r1.0.4/webhdfs.html#CREATE

Parameters:
  • path – the file path to create the file
  • data – the data to write to the
environ_home()
Returns:the home directory of the user
get_checksum(path)

Returns the checksum of a file

Parameters:path – path of the file
Returns:a FileChecksum JSON object
listdir(path='/')

List all the contents of a directory

Parameters:path – path of the directory
Returns:a list of fileStatusProperties:

http://hadoop.apache.org/common/docs/r1.0.0/webhdfs.html#fileStatusProperties False on error

mkdir(path, permission=None)

Create a directory hierarchy, like the unix command mkdir -p

Parameters:
  • path – the path of the directory
  • permission – dir permissions in octal (0-777)
open(path, offset=None, length=None, buffersize=None)

Open a file to read

Parameters:
  • path – path of the file
  • offset – starting bit position
  • length – number of bits to read
  • buffersize – the size of the buffer used to transfer the data
Returns:

the file data

remove(path, recursive=False)

Delete a file o directory

Parameters:
  • path – path of the file or dir to delete
  • recursive – set to true to delete the content in subdirectories
rename(src, dst)

Rename a file or directory

Parameters:
  • src – path of the file or dir to rename
  • dst – path of the final file/dir
set_replication(path, replication_factor)

Set the replication factor of a file

Parameters:
  • path – path of the file
  • replication_factor – number of replications, should be > 0
status(path)

Returns the status of a file/dir

Parameters:path – path of the file/dir
Returns:a FileStatus dictionary on success, false otherwise

Indices and tables