path: root/Documentation/powerpc/papr_hcalls.rst
diff options
authorVaibhav Jain <>2019-08-28 13:57:29 +0530
committerMichael Ellerman <>2020-01-29 21:30:50 +1100
commit58b278f568f0509497e2df7310bfd719156a60d1 (patch)
treec8750848d83a468fc0c81441991245412a25fc02 /Documentation/powerpc/papr_hcalls.rst
parent9933819099c4600b41a042f27a074470a43cf6b9 (diff)
powerpc: Provide initial documentation for PAPR hcalls
This doc patch provides an initial description of the hcall op-codes that are used by Linux kernel running as a guest (LPAR) on top of PowerVM or any other sPAPR compliant hyper-visor (e.g qemu). Apart from documenting the hcalls the doc-patch also provides a rudimentary overview of how hcall ABI, how they are issued with the Linux kernel and how information/control flows between the guest and hypervisor. Signed-off-by: Vaibhav Jain <> Reviewed-by: Laurent Dufour <> Acked-by: Nicholas Piggin <> [mpe: Add SPDX tag, add it to index.rst] Signed-off-by: Michael Ellerman <> Link:
Diffstat (limited to 'Documentation/powerpc/papr_hcalls.rst')
1 files changed, 250 insertions, 0 deletions
diff --git a/Documentation/powerpc/papr_hcalls.rst b/Documentation/powerpc/papr_hcalls.rst
new file mode 100644
index 000000000000..3493631a60f8
--- /dev/null
+++ b/Documentation/powerpc/papr_hcalls.rst
@@ -0,0 +1,250 @@
+.. SPDX-License-Identifier: GPL-2.0
+Hypercall Op-codes (hcalls)
+Virtualization on 64-bit Power Book3S Platforms is based on the PAPR
+specification [1]_ which describes the run-time environment for a guest
+operating system and how it should interact with the hypervisor for
+privileged operations. Currently there are two PAPR compliant hypervisors:
+- **IBM PowerVM (PHYP)**: IBM's proprietary hypervisor that supports AIX,
+ IBM-i and Linux as supported guests (termed as Logical Partitions
+ or LPARS). It supports the full PAPR specification.
+- **Qemu/KVM**: Supports PPC64 linux guests running on a PPC64 linux host.
+ Though it only implements a subset of PAPR specification called LoPAPR [2]_.
+On PPC64 arch a guest kernel running on top of a PAPR hypervisor is called
+a *pSeries guest*. A pseries guest runs in a supervisor mode (HV=0) and must
+issue hypercalls to the hypervisor whenever it needs to perform an action
+that is hypervisor priviledged [3]_ or for other services managed by the
+Hence a Hypercall (hcall) is essentially a request by the pseries guest
+asking hypervisor to perform a privileged operation on behalf of the guest. The
+guest issues a with necessary input operands. The hypervisor after performing
+the privilege operation returns a status code and output operands back to the
+The ABI specification for a hcall between a pseries guest and PAPR hypervisor
+is covered in section 14.5.3 of ref [2]_. Switch to the Hypervisor context is
+done via the instruction **HVCS** that expects the Opcode for hcall is set in *r3*
+and any in-arguments for the hcall are provided in registers *r4-r12*. If values
+have to be passed through a memory buffer, the data stored in that buffer should be
+in Big-endian byte order.
+Once control is returns back to the guest after hypervisor has serviced the
+'HVCS' instruction the return value of the hcall is available in *r3* and any
+out values are returned in registers *r4-r12*. Again like in case of in-arguments,
+any out values stored in a memory buffer will be in Big-endian byte order.
+Powerpc arch code provides convenient wrappers named **plpar_hcall_xxx** defined
+in a arch specific header [4]_ to issue hcalls from the linux kernel
+running as pseries guest.
+Register Conventions
+Any hcall should follow same register convention as described in section
+of "64-Bit ELF V2 ABI Specification: Power Architecture"[5]_. Table below
+summarizes these conventions:
+| Register |Volatile | Purpose |
+| Range |(Y/N) | |
+| r0 | Y | Optional-usage |
+| r1 | N | Stack Pointer |
+| r2 | N | TOC |
+| r3 | Y | hcall opcode/return value |
+| r4-r10 | Y | in and out values |
+| r11 | Y | Optional-usage/Environmental pointer |
+| r12 | Y | Optional-usage/Function entry address at |
+| | | global entry point |
+| r13 | N | Thread-Pointer |
+| r14-r31 | N | Local Variables |
+| LR | Y | Link Register |
+| CTR | Y | Loop Counter |
+| XER | Y | Fixed-point exception register. |
+| CR0-1 | Y | Condition register fields. |
+| CR2-4 | N | Condition register fields. |
+| CR5-7 | Y | Condition register fields. |
+| Others | N | |
+DRC & DRC Indexes
+ DR1 Guest
+ +--+ +------------+ +---------+
+ | | <----> | | | User |
+ +--+ DRC1 | | DRC | Space |
+ | PAPR | Index +---------+
+ DR2 | Hypervisor | | |
+ +--+ | | <-----> | Kernel |
+ | | <----> | | Hcall | |
+ +--+ DRC2 +------------+ +---------+
+PAPR hypervisor terms shared hardware resources like PCI devices, NVDIMMs etc
+available for use by LPARs as Dynamic Resource (DR). When a DR is allocated to
+an LPAR, PHYP creates a data-structure called Dynamic Resource Connector (DRC)
+to manage LPAR access. An LPAR refers to a DRC via an opaque 32-bit number
+called DRC-Index. The DRC-index value is provided to the LPAR via device-tree
+where its present as an attribute in the device tree node associated with the
+HCALL Return-values
+After servicing the hcall, hypervisor sets the return-value in *r3* indicating
+success or failure of the hcall. In case of a failure an error code indicates
+the cause for error. These codes are defined and documented in arch specific
+header [4]_.
+In some cases a hcall can potentially take a long time and need to be issued
+multiple times in order to be completely serviced. These hcalls will usually
+accept an opaque value *continue-token* within there argument list and a
+return value of *H_CONTINUE* indicates that hypervisor hasn't still finished
+servicing the hcall yet.
+To make such hcalls the guest need to set *continue-token == 0* for the
+initial call and use the hypervisor returned value of *continue-token*
+for each subsequent hcall until hypervisor returns a non *H_CONTINUE*
+return value.
+HCALL Op-codes
+Below is a partial list of HCALLs that are supported by PHYP. For the
+corresponding opcode values please look into the arch specific header [4]_:
+| Input: *drcIndex, offset, buffer-address, numBytesToRead*
+| Out: *numBytesRead*
+| Return Value: *H_Success, H_Parameter, H_P2, H_P3, H_Hardware*
+Given a DRC Index of an NVDIMM, read N-bytes from the the metadata area
+associated with it, at a specified offset and copy it to provided buffer.
+The metadata area stores configuration information such as label information,
+bad-blocks etc. The metadata area is located out-of-band of NVDIMM storage
+area hence a separate access semantics is provided.
+| Input: *drcIndex, offset, data, numBytesToWrite*
+| Out: *None*
+| Return Value: *H_Success, H_Parameter, H_P2, H_P4, H_Hardware*
+Given a DRC Index of an NVDIMM, write N-bytes to the metadata area
+associated with it, at the specified offset and from the provided buffer.
+| Input: *drcIndex, startingScmBlockIndex, numScmBlocksToBind,*
+| *targetLogicalMemoryAddress, continue-token*
+| Out: *continue-token, targetLogicalMemoryAddress, numScmBlocksToBound*
+| Return Value: *H_Success, H_Parameter, H_P2, H_P3, H_P4, H_Overlap,*
+| *H_Too_Big, H_P5, H_Busy*
+Given a DRC-Index of an NVDIMM, map a continuous SCM blocks range
+*(startingScmBlockIndex, startingScmBlockIndex+numScmBlocksToBind)* to the guest
+at *targetLogicalMemoryAddress* within guest physical address space. In
+case *targetLogicalMemoryAddress == 0xFFFFFFFF_FFFFFFFF* then hypervisor
+assigns a target address to the guest. The HCALL can fail if the Guest has
+an active PTE entry to the SCM block being bound.
+| Input: drcIndex, startingScmLogicalMemoryAddress, numScmBlocksToUnbind
+| Out: numScmBlocksUnbound
+| Return Value: *H_Success, H_Parameter, H_P2, H_P3, H_In_Use, H_Overlap,*
+| *H_Busy, H_LongBusyOrder1mSec, H_LongBusyOrder10mSec*
+Given a DRC-Index of an NVDimm, unmap *numScmBlocksToUnbind* SCM blocks starting
+at *startingScmLogicalMemoryAddress* from guest physical address space. The
+HCALL can fail if the Guest has an active PTE entry to the SCM block being
+| Input: *drcIndex, scmBlockIndex*
+| Out: *Guest-Physical-Address*
+| Return Value: *H_Success, H_Parameter, H_P2, H_NotFound*
+Given a DRC-Index and an SCM Block index return the guest physical address to
+which the SCM block is mapped to.
+| Input: *Guest-Physical-Address*
+| Out: *drcIndex, scmBlockIndex*
+| Return Value: *H_Success, H_Parameter, H_P2, H_NotFound*
+Given a guest physical address return which DRC Index and SCM block is mapped
+to that address.
+| Input: *scmTargetScope, drcIndex*
+| Out: *None*
+| Return Value: *H_Success, H_Parameter, H_P2, H_P3, H_In_Use, H_Busy,*
+| *H_LongBusyOrder1mSec, H_LongBusyOrder10mSec*
+Depending on the Target scope unmap all SCM blocks belonging to all NVDIMMs
+or all SCM blocks belonging to a single NVDIMM identified by its drcIndex
+from the LPAR memory.
+| Input: drcIndex
+| Out: *health-bitmap, health-bit-valid-bitmap*
+| Return Value: *H_Success, H_Parameter, H_Hardware*
+Given a DRC Index return the info on predictive failure and overall health of
+the NVDIMM. The asserted bits in the health-bitmap indicate a single predictive
+failure and health-bit-valid-bitmap indicate which bits in health-bitmap are
+| Input: drcIndex, resultBuffer Addr
+| Out: None
+| Return Value: *H_Success, H_Parameter, H_Unsupported, H_Hardware, H_Authority, H_Privilege*
+Given a DRC Index collect the performance statistics for NVDIMM and copy them
+to the resultBuffer.
+.. [1] "Power Architecture Platform Reference"
+.. [2] "Linux on Power Architecture Platform Reference"
+.. [3] "Definitions and Notation" Book III-Section 14.5.3
+.. [4] arch/powerpc/include/asm/hvcall.h
+.. [5] "64-Bit ELF V2 ABI Specification: Power Architecture"