summaryrefslogtreecommitdiff
path: root/atheros-disallow-retrain-nongen1-pcie.patch
blob: c295bf23bdfcd1bbe5683c974f6ba03632a0d20d (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-pci-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-19.7 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH,
	DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,INCLUDES_CR_TRAILER,INCLUDES_PATCH,
	MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT
	autolearn=unavailable autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 5293AC433ED
	for <linux-pci@archiver.kernel.org>; Wed,  5 May 2021 16:46:38 +0000 (UTC)
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by mail.kernel.org (Postfix) with ESMTP id 336FB613BE
	for <linux-pci@archiver.kernel.org>; Wed,  5 May 2021 16:46:38 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S236089AbhEEQrX (ORCPT <rfc822;linux-pci@archiver.kernel.org>);
        Wed, 5 May 2021 12:47:23 -0400
Received: from mail.kernel.org ([198.145.29.99]:39944 "EHLO mail.kernel.org"
        rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
        id S235175AbhEEQn0 (ORCPT <rfc822;linux-pci@vger.kernel.org>);
        Wed, 5 May 2021 12:43:26 -0400
Received: by mail.kernel.org (Postfix) with ESMTPSA id 7196361879;
        Wed,  5 May 2021 16:35:00 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org;
        s=k20201202; t=1620232500;
        bh=fPfmCTyAQD8PT6TZkO8THxra9rEY81AdAkfBo+IZ5tU=;
        h=From:To:Cc:Subject:Date:In-Reply-To:References:From;
        b=GGQEKTJt6sQkhGf+9veu8zkPNAFhEHpim85V7tVqJPqB4sraqNoIP0BWX1keInmv1
         ZKmmiJG5OUg5J9Es8W0iw0yQNPVmz34gbFy4b/BRUc7EKu46CVXeOvY7+dX1ivCaHW
         ikKmjPxAXiMqLH3vRNqoFgDWvRRAhatZ2B9XuscrV7BPfXufja4ykhb29irvNL2akj
         Hc4a6H7NHXnuVMvnnogfrZDleFUOYf/BVnamTiRmKRbnoBOWJPt6XCS+yoB5gVy3H7
         /pzDotZdZ51NWdc/HzZfBT+40TkFWyD6fV5hW9V0Yi5BVVD/2LZDvbLecJkOQJfTqD
         +rF2SH3dN0A9Q==
Received: by pali.im (Postfix)
        id 09BEF79D; Wed,  5 May 2021 18:34:57 +0200 (CEST)
From:   =?UTF-8?q?Pali=20Roh=C3=A1r?= <pali@kernel.org>
To:     Bjorn Helgaas <bhelgaas@google.com>,
        Kalle Valo <kvalo@codeaurora.org>,
        =?UTF-8?q?Toke=20H=C3=B8iland-J=C3=B8rgensen?= <toke@redhat.com>,
        =?UTF-8?q?Marek=20Beh=C3=BAn?= <kabel@kernel.org>,
        =?UTF-8?q?Krzysztof=20Wilczy=C5=84ski?= <kw@linux.com>
Cc:     vtolkm@gmail.com, Rob Herring <robh@kernel.org>,
        Ilias Apalodimas <ilias.apalodimas@linaro.org>,
        Thomas Petazzoni <thomas.petazzoni@bootlin.com>,
        linux-pci@vger.kernel.org, ath10k@lists.infradead.org,
        linux-wireless@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: [PATCH v3] PCI: Disallow retraining link for Atheros chips on non-Gen1 PCIe bridges
Date:   Wed,  5 May 2021 18:33:57 +0200
Message-Id: <20210505163357.16012-1-pali@kernel.org>
X-Mailer: git-send-email 2.20.1
In-Reply-To: <20210326124326.21163-1-pali@kernel.org>
References: <20210326124326.21163-1-pali@kernel.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Precedence: bulk
List-ID: <linux-pci.vger.kernel.org>
X-Mailing-List: linux-pci@vger.kernel.org

Atheros AR9xxx and QCA9xxx chips have behaviour issues not only after a
bus reset, but also after doing retrain link, if PCIe bridge is not in
GEN1 mode (at 2.5 GT/s speed):

- QCA9880 and QCA9890 chips throw a Link Down event and completely
  disappear from the bus and their config space is not accessible
  afterwards.

- QCA9377 chip throws a Link Down event followed by Link Up event, the
  config space is accessible and PCI device ID is correct. But trying to
  access chip's I/O space causes Uncorrected (Non-Fatal) AER error,
  followed by Synchronous external abort 96000210 and Segmentation fault
  of insmod while loading ath10k_pci.ko module.

- AR9390 chip throws a Link Down event followed by Link Up event, config
  space is accessible, but contains nonsense values. PCI device ID is
  0xABCD which indicates HW bug that chip itself was not able to read
  values from internal EEPROM/OTP.

- AR9287 chip throws also Link Down and Link Up events, also has
  accessible config space containing correct values. But ath9k driver
  fails to initialize card from this state as it is unable to access HW
  registers. This also indicates that the chip iself is not able to read
  values from internal EEPROM/OTP.

These issues related to PCI device ID 0xABCD and to reading internal
EEPROM/OTP were previously discussed at ath9k-devel mailing list in
following thread:

  https://www.mail-archive.com/ath9k-devel@lists.ath9k.org/msg07529.html

After experiments we've come up with a solution: it seems that Retrain
link can be called only when using GEN1 PCIe bridge or when PCIe bridge
link speed is forced to 2.5 GT/s. Applying this workaround fixes all
mentioned cards.

This issue was reproduced with more cards:
- Compex WLE900VX (QCA9880 based / device ID 0x003c)
- QCNFA435 (QCA9377 based / device ID 0x0042)
- Compex WLE200NX (AR9287 based / device ID 0x002e)
- "noname" card (QCA9890 based / device ID 0x003c)
- Wistron NKR-DNXAH1 (AR9390 based / device ID 0x0030)
on Armada 385 with pci-mvebu.c driver and also on Armada 3720 with
pci-aardvark.c driver.

To workaround this issue, this change introduces a new PCI quirk called
PCI_DEV_FLAGS_NO_RETRAIN_LINK_WHEN_NOT_GEN1, which is enabled for all
Atheros chips with PCI_DEV_FLAGS_NO_BUS_RESET quirk, and also for Atheros
chip AR9287.

When this quirk is set, kernel disallows triggering PCI_EXP_LNKCTL_RL
bit in config space of PCIe Bridge in the case when PCIe Bridge is
capable of higher speed than 2.5 GT/s and this higher speed is already
allowed. When PCIe Bridge has accessible LNKCTL2 register, we try to
force target link speed to 2.5 GT/s. After this change it is possible
to trigger PCI_EXP_LNKCTL_RL bit without issues.

Currently only PCIe ASPM kernel code triggers this PCI_EXP_LNKCTL_RL bit,
so quirk check is added only into pcie/aspm.c file.

Signed-off-by: Pali Rohár <pali@kernel.org>
Reported-by: Toke Høiland-Jørgensen <toke@redhat.com>
Tested-by: Toke Høiland-Jørgensen <toke@redhat.com>
Tested-by: Marek Behún <kabel@kernel.org>
BugLink: https://lore.kernel.org/linux-pci/87h7l8axqp.fsf@toke.dk/
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=84821
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=192441
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=209833
Cc: stable@vger.kernel.org # c80851f6ce63a ("PCI: Add PCI_EXP_LNKCTL2_TLS* macros")

---
Changes since v1:
* Move whole quirk code into pcie_downgrade_link_to_gen1() function
* Reformat to 80 chars per line where possible
* Add quirk also for cards with AR9287 chip (PCI ID 0x002e)
* Extend commit message description and add information about 0xABCD

Changes since v2:
* Add quirk also for Atheros QCA9377 chip
---
 drivers/pci/pcie/aspm.c | 44 +++++++++++++++++++++++++++++++++++++++++
 drivers/pci/quirks.c    | 39 ++++++++++++++++++++++++++++--------
 include/linux/pci.h     |  2 ++
 3 files changed, 77 insertions(+), 8 deletions(-)

--- linux-6.5/drivers/pci/pcie/aspm.c.orig	2023-08-27 23:49:51.000000000 +0200
+++ linux-6.5/drivers/pci/pcie/aspm.c	2023-08-29 19:04:58.069254559 +0200
@@ -191,6 +191,44 @@
 	link->clkpm_disable = blacklist ? 1 : 0;
 }
 
+static int pcie_downgrade_link_to_gen1(struct pci_dev *parent)
+{
+	u16 reg16;
+	u32 reg32;
+	int ret;
+
+	/* Check if link is capable of higher speed than 2.5 GT/s */
+	pcie_capability_read_dword(parent, PCI_EXP_LNKCAP, &reg32);
+	if ((reg32 & PCI_EXP_LNKCAP_SLS) <= PCI_EXP_LNKCAP_SLS_2_5GB)
+		return 0;
+
+	/* Check if link speed can be downgraded to 2.5 GT/s */
+	pcie_capability_read_dword(parent, PCI_EXP_LNKCAP2, &reg32);
+	if (!(reg32 & PCI_EXP_LNKCAP2_SLS_2_5GB)) {
+		pci_err(parent, "ASPM: Bridge does not support changing Link Speed to 2.5 GT/s\n");
+		return -EOPNOTSUPP;
+	}
+
+	/* Force link speed to 2.5 GT/s */
+	ret = pcie_capability_clear_and_set_word(parent, PCI_EXP_LNKCTL2,
+						 PCI_EXP_LNKCTL2_TLS,
+						 PCI_EXP_LNKCTL2_TLS_2_5GT);
+	if (!ret) {
+		/* Verify that new value was really set */
+		pcie_capability_read_word(parent, PCI_EXP_LNKCTL2, &reg16);
+		if ((reg16 & PCI_EXP_LNKCTL2_TLS) != PCI_EXP_LNKCTL2_TLS_2_5GT)
+			ret = -EINVAL;
+	}
+
+	if (ret) {
+		pci_err(parent, "ASPM: Changing Target Link Speed to 2.5 GT/s failed: %d\n", ret);
+		return ret;
+	}
+
+	pci_info(parent, "ASPM: Target Link Speed changed to 2.5 GT/s due to quirk\n");
+	return 0;
+}
+
 /*
  * pcie_aspm_configure_common_clock: check if the 2 ends of a link
  *   could use common clock. If they are, configure them to use the
@@ -257,7 +295,14 @@
 	pcie_capability_clear_and_set_word(parent, PCI_EXP_LNKCTL,
 					   PCI_EXP_LNKCTL_CCC, ccc);
 
-	if (pcie_retrain_link(link->pdev, true)) {
+	bool skip_retrain = false;
+	if (link->downstream->dev_flags & PCI_DEV_FLAGS_NO_RETRAIN_LINK_WHEN_NOT_GEN1) {
+		if (pcie_downgrade_link_to_gen1(parent)) {
+			pci_err(parent, "ASPM: Retrain Link at higher speed is disallowed by quirk\n");
+			skip_retrain = true;
+		}
+	}
+	if (!skip_retrain && pcie_retrain_link(link->pdev, true)) {
 
 		/* Training failed. Restore common clock configurations */
 		pci_err(parent, "ASPM: Could not configure common clock\n");
diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index 653660e3ba9e..4999ad9d08b8 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -3553,31 +3553,55 @@ static void mellanox_check_broken_intx_masking(struct pci_dev *pdev)
 	dev->dev_flags |= PCI_DEV_FLAGS_NO_BUS_RESET;
 }
 
+static void quirk_no_bus_reset_and_no_retrain_link(struct pci_dev *dev)
+{
+	dev->dev_flags |= PCI_DEV_FLAGS_NO_BUS_RESET |
+			  PCI_DEV_FLAGS_NO_RETRAIN_LINK_WHEN_NOT_GEN1;
+}
+
 /*
  * Some NVIDIA GPU devices do not work with bus reset, SBR needs to be
  * prevented for those affected devices.
  */
 static void quirk_nvidia_no_bus_reset(struct pci_dev *dev)
 {
 	if ((dev->device & 0xffc0) == 0x2340)
 		quirk_no_bus_reset(dev);
 }
 DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_NVIDIA, PCI_ANY_ID,
 			 quirk_nvidia_no_bus_reset);
 
 /*
- * Some Atheros AR9xxx and QCA988x chips do not behave after a bus reset.
+ * Atheros AR9xxx and QCA9xxx chips do not behave after a bus reset and also
+ * after retrain link when PCIe bridge is not in GEN1 mode at 2.5 GT/s speed.
  * The device will throw a Link Down error on AER-capable systems and
  * regardless of AER, config space of the device is never accessible again
  * and typically causes the system to hang or reset when access is attempted.
+ * Or if config space is accessible again then it contains only dummy values
+ * like fixed PCI device ID 0xABCD or values not initialized at all.
+ * Retrain link can be called only when using GEN1 PCIe bridge or when
+ * PCIe bridge has forced link speed to 2.5 GT/s via PCI_EXP_LNKCTL2 register.
+ * To reset these cards it is required to do PCIe Warm Reset via PERST# pin.
  * https://lore.kernel.org/r/20140923210318.498dacbd@dualc.maya.org/
+ * https://lore.kernel.org/r/87h7l8axqp.fsf@toke.dk/
+ * https://www.mail-archive.com/ath9k-devel@lists.ath9k.org/msg07529.html
  */
-DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_ATHEROS, 0x0030, quirk_no_bus_reset);
-DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_ATHEROS, 0x0032, quirk_no_bus_reset);
-DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_ATHEROS, 0x003c, quirk_no_bus_reset);
-DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_ATHEROS, 0x0033, quirk_no_bus_reset);
-DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_ATHEROS, 0x0034, quirk_no_bus_reset);
-DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_ATHEROS, 0x003e, quirk_no_bus_reset);
+DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_ATHEROS, 0x002e,
+			 quirk_no_bus_reset_and_no_retrain_link);
+DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_ATHEROS, 0x0030,
+			 quirk_no_bus_reset_and_no_retrain_link);
+DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_ATHEROS, 0x0032,
+			 quirk_no_bus_reset_and_no_retrain_link);
+DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_ATHEROS, 0x0033,
+			 quirk_no_bus_reset_and_no_retrain_link);
+DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_ATHEROS, 0x0034,
+			 quirk_no_bus_reset_and_no_retrain_link);
+DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_ATHEROS, 0x003c,
+			 quirk_no_bus_reset_and_no_retrain_link);
+DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_ATHEROS, 0x003e,
+			 quirk_no_bus_reset_and_no_retrain_link);
+DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_ATHEROS, 0x0042,
+			 quirk_no_bus_reset_and_no_retrain_link);
 
 /*
  * Root port on some Cavium CN8xxx chips do not successfully complete a bus
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 86c799c97b77..fdbf7254e4ab 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -227,6 +227,8 @@ enum pci_dev_flags {
 	PCI_DEV_FLAGS_NO_RELAXED_ORDERING = (__force pci_dev_flags_t) (1 << 11),
 	/* Device does honor MSI masking despite saying otherwise */
 	PCI_DEV_FLAGS_HAS_MSI_MASKING = (__force pci_dev_flags_t) (1 << 12),
+	/* Don't Retrain Link for device when bridge is not in GEN1 mode */
+	PCI_DEV_FLAGS_NO_RETRAIN_LINK_WHEN_NOT_GEN1 = (__force pci_dev_flags_t) (1 << 12),
 };
 
 enum pci_irq_reroute_variant {
-- 
2.20.1