Welcome to intake_accumulo’s documentation!¶
This package enables the Intake data access and cataloging system to access data stored in Apache Accumulo.
Quickstart¶
This guide will show you how to get started using Intake to read an Accumulo table.
Installation¶
For conda users, the Intake Accumulo plugin is installed with the following commands:
conda install -c intake intake-accumulo
Example: Reading Accumulo table without catalog¶
The simplest use case for this plugin is to read an existing Accumulo table.
Assuming the Accumulo instance is located at localhost:42424
and the table
is in the variable, table
, this will read the entire table into a
dataframe.:
>>> import intake
>>> ds = intake.open_accumulo(table)
>>> df = ds.read()
>>> df
row column_family column_qualifier column_visibility time value
0 row_0 cf1 cq1 2018-05-15 22:53:37.990 0
1 row_0 cf2 cq2 2018-05-15 22:53:38.009 0
2 row_1 cf1 cq1 2018-05-15 22:53:38.018 1
3 row_1 cf2 cq2 2018-05-15 22:53:38.026 1
4 row_2 cf1 cq1 2018-05-15 22:53:38.034 2
5 row_2 cf2 cq2 2018-05-15 22:53:38.042 2
6 row_3 cf1 cq1 2018-05-15 22:53:38.049 3
7 row_3 cf2 cq2 2018-05-15 22:53:38.057 3
8 row_4 cf1 cq1 2018-05-15 22:53:38.065 4
9 row_4 cf2 cq2 2018-05-15 22:53:38.072 4
Example: Reading Accumulo table with catalog¶
This example is equivalent to the above example, except we now access the table
through an existing catalog, catalog.yml
.:
>>> import intake
>>> c = intake.open_catalog("catalog.yml")
>>> df = c.basic.read()
>>> df
row column_family column_qualifier column_visibility time value
0 row_0 cf1 cq1 2018-05-15 22:53:37.990 0
1 row_0 cf2 cq2 2018-05-15 22:53:38.009 0
2 row_1 cf1 cq1 2018-05-15 22:53:38.018 1
3 row_1 cf2 cq2 2018-05-15 22:53:38.026 1
4 row_2 cf1 cq1 2018-05-15 22:53:38.034 2
5 row_2 cf2 cq2 2018-05-15 22:53:38.042 2
6 row_3 cf1 cq1 2018-05-15 22:53:38.049 3
7 row_3 cf2 cq2 2018-05-15 22:53:38.057 3
8 row_4 cf1 cq1 2018-05-15 22:53:38.065 4
9 row_4 cf2 cq2 2018-05-15 22:53:38.072 4
API Reference¶
intake_accumulo.source.AccumuloSource (table) |
Read data from Accumulo table. |
-
class
intake_accumulo.source.
AccumuloSource
(table, host='localhost', port=42424, username='root', password='secret', metadata=None)[source]¶ Read data from Accumulo table.
Parameters: - table : str
The database table that will act as source
- host : str
The server hostname for the given table
- port : int
The server port for the given table
- username : str
The username used to connect to the Accumulo cluster
- password : str
The password used to connect to the Accumulo cluster
Attributes: - cache_dirs
- datashape
- description
hvplot
Returns a hvPlot object to provide a high-level plotting API.
plot
Returns a hvPlot object to provide a high-level plotting API.
plots
List custom associated quick-plots
Methods
close
()Close open resources corresponding to this data source. discover
()Open resource and populate the source attributes. read
()Load entire dataset into a container and return it read_chunked
()Return iterator over container fragments of data source read_partition
(i)Return a part of the data corresponding to i-th partition. to_dask
()Return a dask container for this data source to_spark
()Provide an equivalent data object in Apache Spark yaml
([with_plugin])Return YAML representation of this data-source set_cache_dir