Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RP1 clock debugging #6363

Draft
wants to merge 10,000 commits into
base: rpi-6.6.y
Choose a base branch
from
Draft

RP1 clock debugging #6363

wants to merge 10,000 commits into from

Conversation

pelwell
Copy link
Contributor

@pelwell pelwell commented Sep 16, 2024

Add some logging output to the RP1 lock registration, to debug #6321.

popcornmix and others added 30 commits July 29, 2024 14:46
Fixes: d869ef6a96a3 ("drm/vc4: Add debugfs node that dumps the vc5 gamma PWL entries")
Signed-off-by: Dave Stevenson <[email protected]>
CSI2 devices are meant to use the 1Xnn formats rather than 2Xnn
such as MEDIA_BUS_FMT_UYVY8_2X8.

For devices with ADV7180_FLAG_MIPI_CSI2 set, use
MEDIA_BUS_FMT_UYVY8_1X16.

Signed-off-by: Dave Stevenson <[email protected]>
For CSI2 receivers that need to know the link frequency,
add it as a control to the driver.
Interlaced modes are 216Mbp/s or 108MHz, whilst going through
the I2P to deinterlace gives 432Mb/s or 216MHz.

Signed-off-by: Dave Stevenson <[email protected]>
The forced conversion of native CS lines into software CS lines is done
whether or not the controller has been given any CS lines to use. This
breaks the use of the spi0-0cs overlay to prevent SPI from claiming any
CS lines, particularly with spidev which doesn't pass in the SPI_NO_CS
flag at creation.

Use the presence of an empty cs-gpios property as an indication that no
CS lines should be used, bypassing the native CS conversion code.

See: raspberrypi#5835

Signed-off-by: Phil Elwell <[email protected]>
2712D0 has a simpler colourspace conversion matrix block
so set that up.

Signed-off-by: Dom Cobley <[email protected]>
2712D0 has increased the fifo sizes of MAI_THR blocks,
resulting in adjusted bit offsets. Handle that.

Signed-off-by: Dom Cobley <[email protected]>
…ve once

vc4_hvs_dlist_free_work was iterating through the list of stale
dlist entries and reading the frame count and active flags from
the hardware for each one.

Read the frame count and active flags once, and then use the
cached value in the loop.

Signed-off-by: Dave Stevenson <[email protected]>
This is largely for debug at present.
For reasons unknown we are not getting the end of frame interrupts
that should trigger a sweep of stale dlist entries.

On allocation failure clear out ALL stale entries, and retry the
allocation. Log the interrupt status so we have debug regarding
whether the HVS believes the interrupt is enabled.

Signed-off-by: Dave Stevenson <[email protected]>
See commit 511ce37 ("mmc: Add MMC host software queue support")

Introduced in 5.8, this feature lets the block layer issue up to 2
pending requests to the MMC layer which in certain cases can improve
throughput, but in the case of this driver can significantly reduce
context switching when performing random reads.

On bcm2837 with a performant class A1 card, context switches under FIO
random 4k reads go from ~8800 per second to ~5800, with a reduction in
hardIRQs per second from ~5800 to ~4000. There is no appreciable
difference in throughput.

For bcm2835, and for workloads other than random read, HSQ is a wash in
terms of throughput and CPU load.

So, use it by default.

Signed-off-by: Jonathan Bell <[email protected]>
There are three disable bits, one for each bus-instance type. Add a
quirk to cover the FS/LS type, and update the slightly mangled quirk
descriptions in the process.

Signed-off-by: Jonathan Bell <[email protected]>
There are three parkmode disable bits, one for each bus instance type.
Add FS/LS and parse the quirk out of DT. Also update the slightly
mangled quirk descriptions.

Signed-off-by: Jonathan Bell <[email protected]>
With the command line parser now providing the information about
the tv mode, use that as the preferred choice for initialising the
default of the tv_mode property.

Signed-off-by: Dave Stevenson <[email protected]>
Calculate the HCNT and LCNT values for all modes using the rise and
fall times of SCL, the aim being a 50/50 mark/space ratio.

Signed-off-by: Phil Elwell <[email protected]>
Add support for non-standard bus speeds by treating them as detuned
versions of the slowest standard speed not less than the requested
speed.

Signed-off-by: Phil Elwell <[email protected]>
There are multiple causes of interrupts, errors being one, and only the
receipt of data warrants continued polling.

See: raspberrypi#2676

Signed-off-by: Phil Elwell <[email protected]>
The dedicated dma40 memcpy code is no longer used, and without a
prototype the kernel build fails. Delete it.

Signed-off-by: Phil Elwell <[email protected]>
Drops downstream patch to v4l2_mem2mem, and uses the new mainline
flag to achieve the same functionality

Signed-off-by: Dave Stevenson <[email protected]>
There is no point in trying to create a dlist entry for planes
that have a 0 crtc size, and it can also cause grief in the vc6
dlist generation as it takes width-1 and height-1, causing wrap
around.
Drop these planes.

Signed-off-by: Dave Stevenson <[email protected]>
2712D0 removed alpha_mode from control word 2 for choosing fixed alpha
and replaced it with the previously reserved value of 3 in alpha_mask.

Handle this to fix corrupt desktop when using X on 2712D0

Signed-off-by: Dom Cobley <[email protected]>
We have a read-modify-write race when updating SCALER_DISPCTRL for
underrun and end-of-frame interrupts.
Ideally it would be fixed via a spinlock or similar, but that will
require a reasonable amount of study to ensure we don't get deadlocks.

The underrun reporting is only for debug, so disable it for now.

Signed-off-by: Dave Stevenson <[email protected]>
The SDIO_CFG register SD_PIN_SEL conflates two settings - whether eMMC
HS or SD UHS timings are applied to the interface, and whether or not
the card-detect line is functional. SD_PIN_SEL can only be changed when
the SD clock isn't running, so add a bcm2712-specific clock setup.

Toggling SD_PIN_SEL at runtime means the integrated card-detect feature
can't be used, so this controller needs a cd-gpios property.

Also fix conditionals for usage of the delay-line PHY - no-1-8-v will
imply no bits set in hsemmc_mask or uhs_mask, so remove it.

Signed-off-by: Jonathan Bell <[email protected]>
The code was reducing the number of components by one when we were not
blending with alpha. But that only makes sense if the components include
alpha.

For YUV, we were reducing the number of components for Y from one to zero
which resulted in no lbm space being allocated.

Fixes: raspberrypi#5912
Signed-off-by: Dom Cobley <[email protected]>
If enabled, DMA_BOUNCE_UNALIGNED_KMALLOC causes the swiotlb buffers
(64MB, by default) to be allocated, even on systems where the DMA
controller can reach all of RAM. This is a huge amount of RAM to
waste on a device with only 512MB to start with, such as the Zero 2 W.

See: raspberrypi#5975

Signed-off-by: Phil Elwell <[email protected]>
This patch adds the device ID for the BCM4343A2 module, found e.g. in
the Infineon (Cypress) CYW43439 chip. The required firmware file is
named 'BCM4343A2.hcd'.

Signed-off-by: Phil Elwell <[email protected]>
Add this as a value for enum_drm_connector_tv_mode, represented
by the string "Mono", to generate video with standard timings
but no colour encoding or bursts. Define it to have no pedestal
(since only NTSC-M calls for a pedestal).

Change default mode creation to acommodate the new tv_mode value
which comprises both 525-line and 625-line formats.

Signed-off-by: Nick Hollinghurst <[email protected]>
The VEC supports not producing colour bursts for monochrome output.
It also has an option for disabling the chroma input to remove
chroma from the signal.

Now that there is a DRM_MODE_TV_MODE_xx defined for monochrome,
plumb this in.

Signed-off-by: Dave Stevenson <[email protected]>
PLL_AUDIO has a ternary divider (a copy of the secondary divider) and
PLL_VIDEO has a primary phased output.

Signed-off-by: Jonathan Bell <[email protected]>
Add ALSA jack detection to the vc4-hdmi audio driver so userspace knows
when to add/remove HDMI audio devices.

Signed-off-by: David Turner <[email protected]>
The newer Raspberry Pi 5" and 7" panels have a slightly different
register map to the original one.
Add a new driver for this regulator.

Signed-off-by: Dave Stevenson <[email protected]>
pelwell and others added 27 commits July 29, 2024 14:46
TMOD_RO is the receive-only mode that doesn't require data in the
transmit FIFO in order to generate clock cycles. Using TMOD_RO when the
device doesn't care about the data sent to it saves CPU time and memory
bandwidth.

Signed-off-by: Phil Elwell <[email protected]>
Disabling the peripheral resets controller state which has a dangerous
side-effect of disabling the DMA handshake interface while it is active.
This can cause DMA channels to hang.

The error recovery pathway will wait for DMA to stop and reset the chip
anyway, so mask further FIFO interrupts and let the transfer finish
gracefully.

Signed-off-by: Jonathan Bell <[email protected]>
Channels 1-2 have a statically configured maximum MSIZE of 8, and
channels 3-8 have MSIZE set to 4. The DMAC "helpfully" silently
truncates bursts to the hardware supported maximum, so any FIFO read
operation with an oversized burst threshold will leave a residue of
threshold minus MSIZE rows.

As channel allocation is dynamic, this means every client needs to use a
maximum of 4 for burst length.

AXI AWLEN/ARLEN constraints aren't strictly related to MSIZE, except
that bursts won't be issued that are longer than MSIZE beats. Therefore,
it's a useful proxy to tell clients of the DMAC the hardware
limitations.

Signed-off-by: Jonathan Bell <[email protected]>
There's no real need to constrain MEM access widths to 32-bit (or
narrower), as the DMAC is intelligent enough to size memory accesses
appropriately. Wider accesses are more efficient.

Similarly, MEM burst lengths don't need to be a function of DEV burst
lengths - the DMAC packs/unpacks data into/from its internal channel
FIFOs appropriately. Longer accesses are more efficient.

However, the DMAC doesn't have complete support for unaligned accesses,
and blocks are always defined in integer multiples of SRC_WIDTH, so odd
source lengths or buffer alignments will prevent wide accesses being
used, as before.

There is an implicit requirement to limit requested DEV read burst
lengths to less than the hardware's maximum configured MSIZE - otherwise
RX data will be left over at the end of a block. There is no config
register that reports this value, so the AXI burst length parameter is
used to produce a facsimile of it. Warn if such a request arrives that
doesn't respect this.

Signed-off-by: Jonathan Bell <[email protected]>
This issue was due to a misconfiguration of the RP1 DMAC due to hardware
limitations, not the SPI driver (which was using the incorrect reported
maximum burst size to set the FIFO threshold).

This reverts commit 6aab06f.

Signed-off-by: Jonathan Bell <[email protected]>
If the associated DMA controller has lower burst length support than the
level the FIFO is set to, then bytes will be left in the RX FIFO at the
end of a DMA block - requiring a round-trip through the timeout interrupt
handler rather than an end-of-block DMA interrupt.

Signed-off-by: Jonathan Bell <[email protected]>
Do an end-run around ASoC in lieu of not being able to easily find the
associated DMA controller capabilities.

Signed-off-by: Jonathan Bell <[email protected]>
…ints

Valid ranges for the I2S peripheral's FIFO configuration include a depth
of 16 - unconditionally setting the burst length to 16 with a fifo
threshold of size/2 will cause under/overflows.

For DMA engines with restricted capabilities the requested burst length
and FIFO thresholds need to be adjusted downward accordingly.

Both the RX and TX FIFOs operate on "less-than" thresholds. Setting the
TX threshold to fifo_size minus burst means the FIFO is kept nearly-full.

Signed-off-by: Jonathan Bell <[email protected]>
The associated DMAC has channels that do not support longer bursts.

Signed-off-by: Jonathan Bell <[email protected]>
Whilst BCM2712 does fix using odd horizontal timings, it doesn't
work with interlaced modes.

Drop the workaround for interlaced modes and revert to the same
behaviour as BCM2711.

raspberrypi#6281

Signed-off-by: Dave Stevenson <[email protected]>
This produces a hard fail on later (6.11) kernels.

See: https://lore.kernel.org/all/[email protected]/

Signed-off-by: Dom Cobley <[email protected]>
Ensure the transmit FIFO has emptied before ending the transfer by
dropping the TX threshold to 0 when the last byte has been pushed into
the FIFO. Include a similar fix for the non-IRQ paths.

See: raspberrypi#6285
Fixes: 6014649 ("spi: dw: Save bandwidth with the TMOD_TO feature")
Signed-off-by: Phil Elwell <[email protected]>
The DW SPI interface has a 16-bit clock divider, where the bottom bit
of the divisor must be 0. Limit how low the clock speed can go to
prevent the clock divider from being truncated, as that could lead to
a much higher clock rate than requested.

Signed-off-by: Phil Elwell <[email protected]>
There is now an ssd1327-spi overlay, but it's of little use without
the corresponding display drivers. Add them as modules to the usual
defconfig files.

Signed-off-by: Phil Elwell <[email protected]>
Using the "cores * 1.5" heuristic, configure the kernel builds for the
4-core GitHub-hosted runners.

Signed-off-by: Phil Elwell <[email protected]>
The DT property for the BQ32000 controlled by trickle-resistor-ohms
parameter should be "trickle-resistor-ohms", not "abracon,tc-resistor".

See: raspberrypi#6291

Signed-off-by: Phil Elwell <[email protected]>
Fix cut-and-paste error spotted during upstreaming process.

Signed-off-by: Phil Elwell <[email protected]>
Many HD44780 LCD displays are connected via very common I2C
GPIO expander.
We have an overlay for connecting the displays directly to GPIOs,
but not one for it connected via a backpack. Add such an overlay.

Signed-off-by: Dave Stevenson <[email protected]>
The default values defining a 16x2 display weren't documented,
so add them.

Signed-off-by: Dave Stevenson <[email protected]>
The corresponding driver implementation has seen sufficient testing,
so enable by default. Retain the dtparam so it can be turned off for test.

Signed-off-by: Jonathan Bell <[email protected]>
In the same way that other subsystems support the setting of device
id numbers from Device Tree aliases, allow gpiochip numbers to be
derived from "gpiochip<n>" aliases.

Signed-off-by: Phil Elwell <[email protected]>
Add a gpiochip0 aliase pointing to the rp1 GPIO node, making it appear
as gpiochip0.

Signed-off-by: Phil Elwell <[email protected]>
Make the BCM2712's onboard GPIOs start at gpiochip10, marking them out
as system resources and preventing accidental use by existing Pi 5
code.

Signed-off-by: Phil Elwell <[email protected]>
Allow block devices to be used as caches for other devices. The primary
use is to allow small, low latency media to act as caches for spinning
rust drives.

See: raspberrypi#6303
     raspberrypi#455

Signed-off-by: Phil Elwell <[email protected]>
Add CONFIG_ZRAM_WRITEBACK=y and CONFIG_ZRAM_MULTI_COMP=y.

See: raspberrypi#2939

Signed-off-by: Phil Elwell <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.