webhdfspy¶
A Python wrapper library to access Hadoop WebHDFS REST API
Installation¶
To install webhdfspy from PyPI:
$ pip install webhdfspy
Python versions¶
webhdfspy requires Python 3.9+
Usage¶
>>> import webhdfspy
>>> client = webhdfspy.WebHDFSClient("localhost", 50070, "username")
>>> print(client.listdir('/'))
[]
>>> client.mkdir('/foo')
True
>>> print(client.listdir('/'))
[{'group': 'supergroup', 'permission': '755', ...}]
>>> client.create('/foo/foo.txt', "just put some text here", overwrite=True)
True
>>> print(client.open('/foo/foo.txt'))
just put some text here
>>> client.remove('/foo')
True
Using a context manager:
>>> with webhdfspy.WebHDFSClient("localhost", 50070, "username") as client:
... client.listdir('/')
[]
HTTPS support:
>>> client = webhdfspy.WebHDFSClient("host", 9871, "user", scheme="https")
API Documentation¶
- class webhdfspy.WebHDFSClient(host: str, port: int, username: str | None = None, logger: Logger | None = None, *, timeout: float = 60.0, scheme: str = 'http')¶
Client for Hadoop WebHDFS REST API.
Supports context manager protocol for automatic resource cleanup:
with WebHDFSClient("host", 50070, username="user") as client: client.listdir("/")
- append(path: str, file_data: Any, buffersize: int | None = None) bool¶
Append data to a file.
- Parameters:
path – path of the file
file_data – data to append
buffersize – size of the buffer used to transfer the data
- cancel_delegation_token(token: str) bool¶
Cancel a delegation token.
- Parameters:
token – the delegation token
- chmod(path: str, permission: str) bool¶
Set the permissions of a file or directory.
- Parameters:
path – path of the file/dir
permission – permissions in octal (e.g.
"755")
- close() None¶
Close the underlying HTTP session.
- copyfromlocal(local_path: str, hdfs_path: str, overwrite: bool | None = None) bool¶
Copy a file from the local filesystem to HDFS.
- Parameters:
local_path – path of the local file
hdfs_path – HDFS destination path
overwrite – whether to overwrite an existing file
- create(path: str, file_data: Any, overwrite: bool | None = None) bool¶
Create a new file in HDFS.
Uses the two-step WebHDFS create protocol (NameNode redirect then DataNode upload).
- Parameters:
path – the file path to create
file_data – the data to write
overwrite – whether to overwrite an existing file
- environ_home() str¶
Return the home directory of the user.
- get_checksum(path: str) dict[str, Any]¶
Return the checksum of a file.
- Parameters:
path – path of the file
- Returns:
FileChecksum dict
- get_content_summary(path: str) dict[str, Any]¶
Return the content summary of a directory.
- Parameters:
path – path of the directory
- Returns:
ContentSummary dict
- get_delegation_token(renewer: str) dict[str, Any]¶
Get a delegation token.
- Parameters:
renewer – the user who can renew the token
- Returns:
Token dict
- listdir(path: str = '/') list[dict[str, Any]]¶
List all the contents of a directory.
- Parameters:
path – path of the directory
- Returns:
a list of FileStatus dicts
- mkdir(path: str, permission: str | None = None) bool¶
Create a directory hierarchy, like
mkdir -p.- Parameters:
path – the path of the directory
permission – dir permissions in octal (e.g.
"755")
- open(path: str, offset: int | None = None, length: int | None = None, buffersize: int | None = None) str¶
Open a file to read.
- Parameters:
path – path of the file
offset – starting byte position
length – number of bytes to read
buffersize – size of the buffer used to transfer the data
- Returns:
the file data as text
- remove(path: str, recursive: bool = False) bool¶
Delete a file or directory.
- Parameters:
path – path of the file or dir to delete
recursive – delete content in subdirectories
- rename(src: str, dst: str) bool¶
Rename a file or directory.
- Parameters:
src – path of the file or dir to rename
dst – destination path
- renew_delegation_token(token: str) int¶
Renew a delegation token.
- Parameters:
token – the delegation token
- Returns:
new expiration time in ms since epoch
- set_owner(path: str, owner: str | None = None, group: str | None = None) bool¶
Set the owner and/or group of a file or directory.
- Parameters:
path – path of the file/dir
owner – new owner name
group – new group name
- set_replication(path: str, replication_factor: int) bool¶
Set the replication factor of a file.
- Parameters:
path – path of the file
replication_factor – number of replications (>0)
- set_times(path: str, modificationtime: int | None = None, accesstime: int | None = None) bool¶
Set modification and/or access time of a file.
- Parameters:
path – path of the file
modificationtime – modification time in ms since epoch
accesstime – access time in ms since epoch
- status(path: str) dict[str, Any]¶
Return the FileStatus of a file or directory.
- Parameters:
path – path of the file/dir
- Returns:
a FileStatus dictionary
Exceptions¶
- class webhdfspy.WebHDFSException(msg: str)¶
Base exception for WebHDFS errors.
- class webhdfspy.WebHDFSRemoteException(message: str, status_code: int, exception: str = '', java_class_name: str = '')¶
Exception raised when WebHDFS returns a RemoteException.
- class webhdfspy.WebHDFSConnectionError(msg: str, cause: Exception | None = None)¶
Exception raised when a connection to WebHDFS fails.
WebHDFS documentation¶
https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/WebHDFS.html