Driver for Oracle Database v0.4.4¶

Driver Version v0.4.4 Tested With Oracle Database 23ai

This driver provides access to Oracle Database (commonly referred to as just “Oracle”).

Note

This project is not affiliated with Oracle.

Installation & Quickstart¶

Important

This driver is available from Columnar’s private driver registry. Before installing it, create an account at cloud.columnar.tech and click to activate a 14-day free trial. Then authenticate to the registry:

dbc auth login

The driver can be installed with dbc :

dbc install oracle

To use the driver:

Install the Oracle Instant Client libraries.
Set LD_LIBRARY_PATH, DYLD_LIBRARY_PATH, or otherwise make sure the Oracle Instant Client libraries are discoverable by your application.
Provide the connection string as the uri parameter.

Note

It is highly recommended to explicitly set timezone="UTC" to avoid ambiguity in local timestamp values (if used).

Note

It is highly recommended to use the latest Oracle Instant Client release.

Connecting¶

To use the driver, provide an Oracle connection string as the uri option. The driver supports URI format and DSN-style connection strings, but URIs are recommended.

from adbc_driver_manager import dbapi

dbapi.connect(
  driver="oracle",
  db_kwargs={
      "uri": "oracle://user:password@localhost:1521/FREEPDB1"
  }
)

Note: The example above is for Python using the adbc-driver-manager package but the process will be similar for other driver managers. See adbc-quickstarts .

Connection String Format¶

Oracle URI syntax:

oracle://[user[:password]@]host[:port][/serviceName][?param1=value1&param2=value2]

This follows the Go Oracle Driver Connection String format.

Components:

scheme: oracle:// (required)
user/password: (optional) For standard database authentication.
host: (required) The hostname or IP address of the Oracle database server.
port: (optional) Defaults to 1521.
serviceName: (optional) The Service Name or SID, used by the Oracle Client to identify the target database instance. Also can use full Oracle Easy Connect Plus identifier format, such as /serviceName:serverType/instanceName. See Oracle Database Easy Connect Plus
Query Parameters: Additional configuration options. For a complete list of parameters, see the Go Oracle Driver Connection Parameters

Note

Reserved characters in URI elements must be URI-encoded. For example, @ becomes %40.

Examples:

oracle://scott:tiger@salesserver1/sales.us.example.com?poolSessionTimeout=42s
oracle://dbhost.local/PROD_PDB
oracle://user@10.0.0.5:1521/svc?configDir=/tmp/admin
oracle://user:p%40ssword@host/service

This driver also supports other types of connection strings that are supported by the Go Oracle Driver .

Feature & Type Support¶

Feature	Support
Bulk Ingestion	Create	✅
	Append	✅
	Create/Append	✅
	Replace	✅
	Temporary Table	❌
	Specify target catalog	❌
	Specify target schema	✅
	Non-nullable fields are marked NOT NULL	✅
Catalog (GetObjects)	depth=catalogs	✅
	depth=db_schemas	✅
	depth=tables	✅
	depth=columns (all)	✅
Get Parameter Schema	❌
Get Table Schema	✅
Prepared Statements	✅
Transactions	✅

Feature	Name	Support	Description
Configuration	Connect with URI	✅	Test end-to-end connection with real Oracle database.

Types¶

Oracle Database to Arrow¶

Oracle Database Type	Arrow Type
BINARY_DOUBLE	double
BINARY_FLOAT	float
BLOB	large_binary [1]
BOOLEAN	bool [2]
CHAR	string
CLOB	large_string
DATE	date32[day]
INT	decimal128(38, 0)
JSON	extension<arrow.json> [3]
LONG	large_string
LONG RAW	large_binary
NCLOB	large_string
NUMBER	double
NUMBER(p, 0) (p <= 18)	int64
NUMBER(p, s) (s != 0, p > 18, redo.use_high_precision = false)	double [4]
NUMBER(p, s) (s != 0, p > 18, redo.use_high_precision = true)	decimal128 [4]
NUMBER(p, s), NUMBER(p), NUMBER(*, s)	decimal128
OBJECT	extension<arrow.json> [3]
RAW	binary
ROWID, UROWID	string
SMALLINT	decimal128(38, 0)
TIMESTAMP	timestamp[ns], timestamp[us] ⚠️ [5]
TIMESTAMP WITH LOCAL TIME ZONE	timestamp[us] (with time zone) [6]
TIMESTAMP WITH TIME ZONE	timestamp[ns] (with time zone), timestamp[us] (with time zone) ⚠️ [5]
VARCHAR2	string [7]
VARRAY	extension<arrow.json> [3]
XMLType	extension<arrow.opaque[storage_type=string, type_name=XMLType, vendor_name=Oracle Database]> [3]

Arrow to Oracle Database¶

Arrrow Type	Oracle Database Type
Arrrow Type	Bind	Ingest
binary	RAW	BLOB
binary_view	RAW	BLOB
bool	BOOLEAN
date32[day]	DATE
decimal128	NUMBER(p, s)
double	BINARY_DOUBLE
extension<arrow.json>	JSON	(not tested)
fixed_size_binary	RAW	BLOB
float	BINARY_FLOAT
halffloat	BINARY_FLOAT	(not tested)
int16	NUMBER(38, 0)	NUMBER(5, 0)
int32	NUMBER(38, 0)	NUMBER(10, 0)
int64	(not tested)	NUMBER(19, 0)
large_binary	RAW	BLOB
large_string	NVARCHAR2	NVARCHAR2(2000)
list	(not tested)	BLOB, JSON, NVARCHAR2
null	(not tested)	NUMBER
string	NVARCHAR2	NVARCHAR2(2000)
string_view	RAW	NVARCHAR2(2000)
struct	(not tested)	BLOB, JSON, NVARCHAR2
timestamp[ms]	TIMESTAMP(3)
timestamp[ms] (with time zone)	TIMESTAMP(3) WITH TIME ZONE
timestamp[ns]	TIMESTAMP(9)
timestamp[ns] (with time zone)	TIMESTAMP(9) WITH TIME ZONE
timestamp[s]	TIMESTAMP(0)
timestamp[s] (with time zone)	TIMESTAMP(0) WITH TIME ZONE
timestamp[us]	TIMESTAMP(6)
timestamp[us] (with time zone)	TIMESTAMP(6) WITH TIME ZONE

Timezones¶

The Oracle Instant Client depends on the client and server having the exact same “Time Zone File” to resolve timezone names. If these are not matched, you may get errors like the following for queries that reference timezones:

Could not read row: ORA-01805: possible error in date/time operation

You can avoid this by using zone offsets (like -05:00) instead of timezone names. You can also configure the ORA_TZFILE environment variable so that this file is the same between client and server. Unfortunately, this is a global setting and cannot be set per connection[8]. To learn more, including how to configure this environment variable, read the Datetime Data Types and Time Zone Support section of the “Database Globalization Support Guide” in the Oracle Database documentation, particularly section 4.6, “Choosing a Time Zone File”.

Memory and Performance Tuning¶

When reading data, we recommend the following for best performance:

Set redo.statement.execution_mode to batch. (This will be the default in a future version.)
Set redo.statement.fetch_array_size. If there is little or no latency between your client and the database, then a low value (in the tens to hundreds of rows) is best. If there is high latency, then a higher value (in the thousands of rows) is best. We recommend benchmarking your specific setup and queries to learn the best value for this parameter.

Memory usage, however, is directly proportional to fetch_array_size. If memory usage is a concern, then a lower value is recommended. (The Oracle SDK pre-allocates memory to hold fetch_array_size rows, hence increasing fetch size also increases allocation size.)

Options¶

Options Affecting Queries¶

redo.use_extension_type

Type: boolean. Default: false

Enable Arrow extension types to annotate result set column types. For example, when enabled, JSON columns will be returned as Arrow string columns with the JSON canonical extension type metadata appended. When not enabled (the default), JSON columns will simply be Arrow string columns.

This is disabled by default because many Arrow implementations will completely change the representation of such columns/types when extension type metadata is seen, requiring special handling for those types.

redo.use_high_precision

Type: boolean. Default: true

Controls how to handle NUMBER types in Oracle. When enabled (the default), NUMBER types will be returned as Arrow decimal128 (precision <= 38) or decimal256 (precision > 38). Otherwise, they will be returned as Arrow float64.

Note that NUMBER types may not have a precision specified, in which case they will always be returned as Arrow float64, regardless of this setting. Additionally, NUMBER types where the scale is 0 and the precision is <= 18 will always be returned as Arrow int64, again regardless of this setting.

redo.statement.execution_mode

Values: row or batch. Default: batch.

Controls how to parse the query result internally. row (the current default) parses values row-by-row. batch parses batches of values.

We recommend batch and expect it to always be on-par or faster. It will become the default in a future version of the driver. Please report any bugs you may encounter.

redo.statement.prefetch_bytes and redo.statement.prefetch_rows

Type: positive integer. Default: unspecified (prefetch_bytes); 1 (prefetch_rows).

How many rows or bytes of data to fetch on initial query execution. If you know approximately how much data a query will return, setting this can improve performance by avoiding subsequent round-trips to the database server. For prefetch_bytes, the SDK will fetch as many complete rows as can fit in the specified memory.

If both are set, then whichever limit is reached first by the SDK will take effect.

redo.statement.fetch_array_size

Type: positive integer. Default: unspecified.

How many rows of data to fetch on each server round-trip. If the connection is high-latency, then increasing this is highly recommended to reduce the number of round-trips.

redo.statement.read_batch_approx_bytes

Type: nonnegative integer. Default: 0 (no limit)

An approximate limit on the memory size of each returned Arrow batch. This limit is best-effort and individual batches may exceed (or undershoot) the limit.

redo.statement.read_batch_rows

Type: positive integer. Default: 65536

A hard limit on the number of rows in each returned Arrow batch.

Options Affecting Bulk Ingest¶

redo.statement.insert_batch_rows

Type: positive integer. Default: 65536

How many rows to insert on each call to the database.

redo.ingest.struct_type

Values: blob, json, nvarchar2, object. Default: object

What database type to ingest Arrow struct, list, fixed size list, and list view types as.

blob: create BLOB CHECK IS JSON columns containing JSON
nvarchar2: create NVARCHAR2 CHECK IS JSON columns containing JSON
json: create JSON columns (only on Oracle Database 21c and newer)
object: create OBJECT columns (not currently implemented)

redo.ingest.tablespace

Type: string. Default: not set

Create the table to ingest into in the specified tablespace .