Skip to main content

3.4.3 JDK API Guide (with Python Bindings)

Last Version: 2025/09/25

Introduction

This section explains the core design and usage of the JDK API and its Python bindings. It covers:

  • Data type definitions
  • C++ classes for multimedia (capture, encode/decode, image processing, display output)
  • Python bindings and End-to-End examples

The goal is to help developers quickly build and integrate multimedia applications.

Data Type Definitions (data_type)

These are the basic enums and structs used by the API.

Enum: media_type

Device media type enumeration:

ValueDescription
MEDIA_TYPE_CANT_STATCannot get device status
MEDIA_TYPE_UNKNOWNUnknown
MEDIA_TYPE_VIDEOVideo
MEDIA_TYPE_VBIVBI (Vertical Blanking Interval)
MEDIA_TYPE_RADIORadio
MEDIA_TYPE_SDRSDR(Software Defined Radio)
MEDIA_TYPE_TOUCHTouch input
MEDIA_TYPE_SUBDEVSub-device
MEDIA_TYPE_DVB_FRONTENDDigital TV frontend
MEDIA_TYPE_DVB_DEMUXDigital TV demultiplexer
MEDIA_TYPE_DVB_DVRDigital TV recorder
MEDIA_TYPE_DVB_NETDigital TV network
MEDIA_TYPE_DTV_CADigital TV conditional access
MEDIA_TYPE_MEDIAMedia device

Enum: codec_type

Indicates whether device/context is for encoding or decoding:

ValueDescription
NOT_CODECNot codec
CODEC_DECDecoder
CODEC_ENCEncoder

Struct: v4l2_ctx

V4L2 capture and encoding context structure:

struct v4l2_ctx {
int fd; // Device file handle
unsigned int width; // Video width
unsigned int height; // Video height
unsigned int pixelformat; // Input pixel format
unsigned int out_pixelformat; // Output pixel format
int nplanes; // Input plane count
int out_nplanes; // Output plane count
struct buffer* cap_buffers; // Capture buffer array
struct buffer* out_buffers; // Output buffer array
__u32 bytesperline[VIDEO_MAX_PLANES]; // Bytes per line (input)
__u32 out_bytesperline[VIDEO_MAX_PLANES]; // Bytes per line (output)
FILE* file[2]; // Input/output file pointers
int verbose; // Log level
enum codec_type ctype; // Codec type (encode/decode)
};

Core C++ API

Main classes for multimedia processing:

  • Frames (JdkFrame)
  • Camera input (JdkCamera)
  • Decoder/Encoder (JdkDecoder, JdkEncoder)
  • Display output (JdkVo, JdkDrm)
  • Image processing (JdkV2D)

JdkFrame: Image Frame Wrapper

class JdkFrame {
public:
JdkFrame(int dma_fd_, size_t size_, int w, int h);
~JdkFrame();

// Map DMA buffer to CPU memory and return pointer
unsigned char* toHost() const;
// Return data copy
std::vector<unsigned char> Clone() const;
// Save as NV12 format .yuv file
bool saveToFile(const std::string& filename) const;
// Load data from file (paired with saveToFile)
bool loadFromFile(const std::string& filename, size_t expected_size);

// Get underlying DMA FD
int getDMAFd() const;
// Get buffer size
size_t getSize() const { return size_; }
// Get resolution
int getWidth() const { return width_; }
int getHeight() const { return height_; }

// Copy raw NALU data to internal buffer (e.g., after encoding)
// offset:target buffer offset
int MemCopy(const uint8_t* nalu, int nalu_size, int offset = 0);

private:
size_t size_; // Total buffer size
int width_;
int height_;
JdkDma dma_; // DMA sync helper
std::shared_ptr<JdkDmaBuffer> data; // Underlying DMA buffer
};

using JdkFramePtr = std::shared_ptr<JdkFrame>;

JdkDma and JdkDmaBuffer

class JdkDmaBuffer {
public:
// Constructor with DMA buffer allocation
explicit JdkDmaBuffer(size_t size);
~JdkDmaBuffer();

// Return mapped userspace address
void* data() const;
// Fill entire buffer with value
void fill(uint8_t val);

// Get physical address (call map_phys_addr first)
void map_phys_addr();

// Public fields (read-only)
size_t m_size;
uint64_t m_phys;
};

class JdkDma {
public:
// Asynchronous DMA data copy
int Asyn(const JdkDmaBuffer& dst, const JdkDmaBuffer& src, size_t size);
// DMA copy between FDs
int Asyn(const int& dst_fd, const int& src_fd, size_t size);
};

JdkCamera

class JdkCamera {
public:
/**
* Create and open V4L2 device
* @param device Device path (e.g., "/dev/video0")
* @param width Desired capture width
* @param height Desired capture height
* @param pixfmt V4L2 pixel format (e.g., V4L2_PIX_FMT_NV12)
* @param req_count Requested buffer count (default: 4)
* @return JdkCameraPtr on success, nullptr otherwise
*/
static std::shared_ptr<JdkCamera> create(const std::string& device,
int width,
int height,
__u32 pixfmt,
int req_count = 4);
/** Get one frame (blocking) */
JdkFramePtr getFrame();

~JdkCamera();

private:
explicit JdkCamera(const std::string& device);
class Impl;
std::unique_ptr<Impl> impl_;
};
using JdkCameraPtr = std::shared_ptr<JdkCamera>;

JdkDecoder

class JdkDecoder {
public:
/**
* Initialize hardware decoder
* @param width Output resolution width
* @param height Output resolution height
* @param payload Input stream type (see MppCodingType)
* @param Format Output pixel format (default: NV12)
*/
JdkDecoder(int width, int height,
MppCodingType payload,
MppPixelFormat Format = PIXEL_FORMAT_NV12);
~JdkDecoder();

/** Decode from wrapped frame */
std::shared_ptr<JdkFrame> Decode(std::shared_ptr<JdkFrame> frame);
/** Decode from raw NALU data */
std::shared_ptr<JdkFrame> Decode(const uint8_t* nalu, int nalu_size);

private:
int width_;
int height_;
MppCodingType payload_;
int format_;
int channel_id_;
MppVdecCtx* pVdecCtx = nullptr;
};

JdkEncoder

class JdkEncoder {
public:
/**
* Initialize hardware encoder
* @param width Input resolution width
* @param height Input resolution height
* @param payload Output stream type (see MppCodingType)
* @param Format Input pixel format (default: NV12)
*/
JdkEncoder(int width, int height,
MppCodingType payload,
MppPixelFormat Format = PIXEL_FORMAT_NV12);
~JdkEncoder();

/** Encode raw frame to compressed stream */
std::shared_ptr<JdkFrame> Encode(std::shared_ptr<JdkFrame> frame);

private:
int width_;
int height_;
MppCodingType payload_;
int format_;
int encoder_id_ = 0;
MppVencCtx* pVencCtx = nullptr;
};

JdkDrm

/** Supported pixel formats */
enum class PixelFmt : uint32_t {
NV12 = DRM_FORMAT_NV12
};

class JdkDrm {
public:
/**
* Open DRM device and initialize
* @param width Display width
* @param height Display height
* @param stride Line stride (bytes)
* @param fmt Pixel format
* @param device DRM device path (default: "/dev/dri/card0")
*/
JdkDrm(int width, int height, int stride,
PixelFmt fmt = PixelFmt::NV12,
const char* device = "/dev/dri/card0");
~JdkDrm();

/** Send frame to DRM display */
int sendFrame(std::shared_ptr<JdkFrame> frame);
/** Destroy specified framebuffer */
void destroyFb(uint32_t fb, uint32_t handle);
/** Open DRM device */
int openCard(const char* dev);
/** Automatically select suitable connector/crtc/plane */
int pickConnectorCrtcPlane();
/** Import DMA FD as DRM framebuffer */
int importFb(int dma_fd, uint32_t& fb_id, uint32_t& handle);

private:
struct LastFB {
uint32_t fb_id;
uint32_t handle;
int dma_fd;
} last_;
};

JdkV2D

/** Supported output pixel formats (see full header file for all options) */
enum V2DFormat {
// Example: V2D_NV12, V2D_RGB888, ……
};

/** Rectangle definition */
struct V2DRect {
int x, y, width, height;
};

class JdkV2D {
public:
JdkV2D() = default;
~JdkV2D() = default;

/** Convert image format */
JdkFramePtr convert_format(const JdkFramePtr& input,
V2DFormat out_format);
/** Resize image */
JdkFramePtr resize(const JdkFramePtr& input,
int out_width, int out_height);
/** Resize and convert format in one step */
JdkFramePtr resize_and_convert(const JdkFramePtr& input,
int out_width, int out_height,
V2DFormat out_format);
/** Fill a rectangle area */
bool fill_rect(const JdkFramePtr& image,
const V2DRect& rect,
uint32_t rgba_color);
/** Draw a rectangle border */
bool draw_rect(const JdkFramePtr& image,
const V2DRect& rect,
uint32_t rgba_color,
int thickness = 2);
/** Draw multiple rectangles */
bool draw_rects(const JdkFramePtr& image,
const std::vector<V2DRect>& rects,
uint32_t rgba_color,
int thickness = 2);
/** Blend two images (overlay `top` onto `bottom`) */
JdkFramePtr blend(const JdkFramePtr& bottom,
const JdkFramePtr& top);
};

JdkVo

class JdkVo {
public:
/**
* Initialize Video Output (Vo)
* @param width Output frame width
* @param height Output frame height
* @param Format Pixel format (default: NV12)
*/
JdkVo(int width, int height,
MppPixelFormat Format = PIXEL_FORMAT_NV12);
~JdkVo();

/** Send frame to video output hardware */
int sendFrame(std::shared_ptr<JdkFrame> frame);

private:
int width_;
int height_;
MppPixelFormat format_;
int channel_id_;
MppVoCtx* pVoCtx = nullptr;
};

Python Bindings (pyjdk)

Install and Import

# Download and install
wget https://gitlab.dc.com:8443/bianbu/bianbu-linux/jdk/-/blob/main/pyjdk/pyjdk-0.1.0-cp312-cp312-linux_riscv64.whl

pip install pyjdk-0.1.0-cp312-cp312-linux_riscv64.whl

# Import
import pyjdk as jdk

Available Enums

  • jdk.PixelFormat: NV12, MJPEG, JPEG (corresponds to V4L2 FourCC)
  • jdk.CodingType: H264, H265, JPEG, MJPEG
  • jdk.MppPixelFormat: NV12, NV21
  • jdk.V2DFormat: Common formats like RGB888 (other values depend on the actual build output)
  • jdk.DrmPixelFormat: NV12

jdk.V2DRect

r = jdk.V2DRect(x, y, w, h)
# Also accepts (x,y,w,h) tuple/list/dict in drawing functions

jdk.Dma

dma = jdk.Dma()
dma.asyn(dst_fd: int, src_fd: int, size: int) -> int # Asynchronous DMA copy (wraps JdkDma::Asyn)

jdk.Frame(equivalent to JdkFrame)

f = jdk.Frame(dma_fd: int, size: int, width: int, height: int)

# Read-only
f.dma_fd: int
f.size: int
f.width: int
f.height: int

# I/O and Views
f.save(path: str) -> bool # Save underlying buffer (NV12/raw/bitstream)
f.load_from_file(path: str, expected_size: int) -> bool
f.to_numpy_nv12(copy: bool = False) -> (y, uv) # Zero-copy or deep-copy to numpy (NV12 two-plane format)
f.to_bytes() -> bytes # Directly export underlying buffer
f.mem_copy(src: bytes|bytearray|memoryview, offset: int = 0) -> int

# Resource management
f.release() # Immediately release the underlying buffer (QBUF)
# Supports 'with' syntax — automatically releases when exiting scope
with f:
y, uv = f.to_numpy_nv12()

Note:

  • to_numpy_nv12(copy=False) returns a zero-copy view of y/uv bound to the underlying buffer.
  • The lifetime of this view is tied to the numpy object.
  • To get an independent copy, set copy=True.

Camera Capture (MIPI / USB)

jdk.MipiCam

cam = jdk.MipiCam.create(device: str, width: int, height: int,
fourcc: jdk.PixelFormat = jdk.PixelFormat.NV12,
req_count: int = 4) -> jdk.MipiCam
frame = cam.get_frame() # Blocking frame capture, returns jdk.Frame

# Also supports iteration: for f in cam — to capture frames continuously
# Supports 'with cam:' syntax for automatic resource management

jdk.UsbCam

uc = jdk.UsbCam.create("/dev/video20", 1280, 720, jdk.PixelFormat.MJPEG)
f = uc.get_frame() # Returns MJPEG bitstream frame (can be decoded with Decoder)

See examples: mipi_cam.py, usb_cam.py.

Encoder jdk.Encoder

enc = jdk.Encoder(width: int, height: int,
coding: jdk.CodingType = jdk.CodingType.H264,
pixfmt: jdk.MppPixelFormat = jdk.MppPixelFormat.NV12)

pkt = enc.encode(frame: jdk.Frame) -> jdk.Frame # return bitstream in Frame

Note:

  • Only the encode(...) method is exposed in the current version.
  • If your local script uses encode_frame(...), please change it to encode(...).
  • The encode_frame name in encode_h264.py is outdated and should be updated.

Decoder jdk.Decoder

dec = jdk.Decoder(width: int, height: int,
coding: jdk.CodingType = jdk.CodingType.JPEG,
pixfmt: jdk.MppPixelFormat = jdk.MppPixelFormat.NV12)

# 1) Decode from "bitstream frame"
yuv = dec.decode(bitstream_frame: jdk.Frame) -> jdk.Frame

# 2) Decode directly from bytes-like object (bytes/bytearray/memoryview)
yuv = dec.decode(bitstream: bytes|bytearray|memoryview) -> jdk.Frame

See examples: decode_jpeg.py, encode_decode.py.

Image Processing jdk.V2D

v2d = jdk.V2D()
out1 = v2d.convert_format(input: jdk.Frame, out_format: jdk.V2DFormat) -> jdk.Frame
out2 = v2d.resize(input: jdk.Frame, out_width: int, out_height: int) -> jdk.Frame
out3 = v2d.resize_and_convert(input: jdk.Frame, out_width: int, out_height: int,
out_format: jdk.V2DFormat) -> jdk.Frame

v2d.fill_rect(image: jdk.Frame, rect: jdk.V2DRect, rgba_color: int) -> bool
v2d.draw_rect(image: jdk.Frame, rect: jdk.V2DRect, rgba_color: int, thickness: int = 2) -> bool
v2d.draw_rects(image: jdk.Frame, rects: list[jdk.V2DRect], rgba_color: int, thickness: int = 2) -> bool

mixed = v2d.blend(bottom: jdk.Frame, top: jdk.Frame) -> jdk.Frame

See example: v2d_demo.py. Use rgba_color as 0xAARRGGBB.

Display Output

jdk.Vo

vo = jdk.Vo(width: int, height: int, pixfmt: jdk.MppPixelFormat = jdk.MppPixelFormat.NV12)
vo.send_frame(frame: jdk.Frame) -> int

jdk.Drm

drm = jdk.Drm(width: int, height: int,
stride: int = 0,
pixfmt: jdk.DrmPixelFormat = jdk.DrmPixelFormat.NV12,
card: str = "/dev/dri/card0")

drm.send_frame(frame: jdk.Frame) -> int

See Examples: jdk_vo.py, jdk_drm.py.

End-to-End Examples(same as source code)

MIPI Capture → Encode → Decode(from encode_decode.py

cam = jdk.MipiCam.create("/dev/video50", 1920, 1080, jdk.PixelFormat.NV12)
enc = jdk.Encoder(1920, 1080, jdk.CodingType.H264, jdk.MppPixelFormat.NV12)
dec = jdk.Decoder(1920, 1080, jdk.CodingType.H264, jdk.MppPixelFormat.NV12)

for _ in range(60):
f = cam.get_frame()
pkt = enc.encode(f)
yuv = dec.decode(pkt)

Decode JPEG/MJPEG from bytes(from decode_jpeg.py

bs_frame = jdk.Frame(-1, size, 1920, 1080)
bs_frame.load_from_file("examples/data/1920x1080.jpg", expected_size=size)
dec = jdk.Decoder(1920,1080, jdk.CodingType.MJPEG, jdk.MppPixelFormat.NV12)
yuv = dec.decode(bs_frame)

NV12 → RGB888 + Image frame(from v2d_demo.py

f = jdk.Frame(-1, w*h*3//2, w, h)
f.load_from_file("frame_1920x1080_nv12.yuv", expected_size=w*h*3//2)

v2d = jdk.V2D()
rgb = v2d.convert_format(f, jdk.V2DFormat.RGB888)
v2d.draw_rects(f, [jdk.V2DRect(30,20,100,80)], 0xFFFFFF00, 4)

Error Handling and Performance Notes

  • All blocking or compute-intensive operations (frame capture, encoding/decoding, and V2D processing) release the Global Interpreter Lock (GIL) at the C++ level, improving multi-threaded performance in Python.
  • Frame.to_numpy_nv12(copy=False) returns zero-copy views of the underlying buffer. Manage lifecycle carefully:
    • Call f.release() when done, OR
    • Use the frame within a with f: block for automatic cleanup.
  • The decoder accepts two input types: Frame and bytes-like objects, making it easy to adapt decoding for both network streams and file streams.