Jump to: Complete Features | Incomplete Features | Complete Epics | Incomplete Epics | Other Complete | Other Incomplete |
Note: this page shows the Feature-Based Change Log for a release
These features were completed when this image was assembled
Customers typically run more than one cluster and/or applications deployed across different regions. In such a hybrid cloud environment, aggregating metrics is a key requirement to avoid admins and or applications owners to drop in into individual clusters to troubleshoot specific problems. And since Red Hat does not offer a standalone metrics aggregation service, customers have started to use existing, home-grown technologies based on, for example, InfluxDB or Kafka to achieve that.
In summary:
Expose Prometheus remote-write configuration via our OpenShift Monitoring (Cluster and User Workload) ConfigMap to allow customers to push time-series data to a remote location.
Please note that we do not plan to support certain third party “receivers” with this solution. Customers will be responsible to ensure an appropriate receiving component is up and running that implements the “remote-write” API. Here is a list of possible “receiver” plugins.
User configures one of the available ConfigMaps to allow node_cpu_seconds_total to be written into a remote Thanos system.
Remote write allows to replicate time-series data to a remote location. This is important for several scenarios like you want to use "remote-write enabled" systems (e.g. InfluxDB) for long-term storage and historical analysis; as well as for aggregating metrics across multiple clusters.
Currently, remote-write is in an experimental stage in Prometheus[1] but the chances are high that it will be stable some time this year. Furthermore, we are using remote-write pretty extensively already for Telemetry as well as ACM in the near future. With that in mind, we think that we are in a perfect spot to move what we already have[2] from dev preview to at least tech preview.
[1] https://github.com/coreos/prometheus-operator/blob/master/Documentation/api.md#prometheusspec - "If specified, the remote_write spec. This is an experimental feature, it may change in any upcoming release in a breaking way." The experimental flag was removed.
We'll want to give user the option to add remote_write configs to both the cluster monitoring and UWM.
AC:
OCP/Telco Definition of Done
Epic Template descriptions and documentation.
<--- Cut-n-Paste the entire contents of this description into your new Epic --->
this will allow cipher customization work to be completed.
When this image was assembled, these features were not yet completed. Therefore, only the Jira Cards included here are part of this release
Requirement | Notes | isMvp? |
---|
CI - MUST be running successfully with test automation | This is a requirement for ALL features. | YES |
Release Technical Enablement | Provide necessary release enablement details and documents. | YES |
This Section:
This Section: What does the person writing code, testing, documenting need to know? What context can be provided to frame this feature.
Questions to be addressed:
Story: As an OpenShift admin I want the internal registry of the cluster use storage from Azure Stack Hub so that I can run a fully supported OpenShift environment on that infrastructure provider.
As a cluster administrator,
I want OpenShift to include a recent CoreDNS version,
so that I have the latest available performance and security fixes.
We should strive to follow upstream CoreDNS releases by bumping openshift/coredns with every OpenShift 4.y release, so that OpenShift benefits from upstream performance and security fixes, and so that we avoid large version-number jumps when an urgently needed change necessitates bumping CoreDNS to the latest upstream release. This bump should happen as early as possible in the OpenShift release cycle, so as to maximize soak time.
For OpenShift 4.9, this means bumping from CoreDNS 1.8.1 to 1.8.3, or possibly a later release should one ship before we do the bump.
Note that CoreDNS upstream does not maintain release branches—that is, once CoreDNS is released, there will be no further 1.8.z releases—so we may be better off updating to 1.9 as soon as it is released, rather than staying on the 1.8 series which would then be unmaintained.
We may consider bumping CoreDNS again during the OpenShift 4.9 release cycle if upstream ships additional releases during the 4.9 development cycle. However, we will need to weigh the risks and available remaining soak time in the release schedule before doing so, should that contingency arise.
As a OpenShift administrator, I would like a solution that allows me to upgrade from one EUS version to another with very few steps and only minimum disruption to application workloads while still allowing new application services to be deployed.
Functional requirements break down into the following prioritized list:
Non-Functional Requirements
Requirement | Notes | isMvp? |
---|---|---|
Release Technical Enablement | Provide necessary release enablement details and documents. | YES |
Documentation | This is a requirement for ALL end user facing features | YES |
Questions to be addressed:
EUS to EUS Focus Area Discussion: https://docs.google.com/document/d/17I1Wd7-R1wRxmboyv1jUFHFkqQcBTorJccdGi1ZqjQE/edit?usp=sharing
EUS Feature: https://issues.redhat.com/browse/OCPPLAN-5484
Use case
As an Admin, one of my operators says it can't be upgraded. An action is required, as I will be unable to upgrade to a .y minor release until I fix the problem.
Possible Design Solution
Create a message saying you can upgrade to .z patch releases even when one of your cluster operators says it's not upgradeable.
Ideally, the message string on the condition explains what the admin needs to resolve , and until they resolve the issue they can only update within their current z stream.
Questions
Need to do a little R&D to find out when this happens and what happens when you're in this state.
Designs (WIP)
Doc: https://docs.google.com/document/d/1iUZlHbv5nTYtb7Cq4rn_bYPqD4Jtie59xIogxN-2Eyc/edit#heading=h.5eoflxvaj1m4
We drive OpenShift cross-market customer success and new customer adoption with constant improvements and feature additions to the existing capabilities of our OpenShift Core Networking (SDN and Network Edge). This feature captures that natural progression of the product.
There are definitely grey areas, but in general:
Questions to be addressed:
Plugin teams need a mechanism to extend the OCP console that is decoupled enough so they can deliver at the cadence of their projects and not be forced in to the OCP Console release timelines.
The OCP Console Dynamic Plugin Framework will enable all our plugin teams to do the following:
Requirement | Notes | isMvp? |
---|---|---|
UI to enable and disable plugins | YES | |
Dynamic Plugin Framework in place | YES | |
Testing Infra up and running | YES | |
Docs and read me for creating and testing Plugins | YES | |
CI - MUST be running successfully with test automation | This is a requirement for ALL features. | YES |
Release Technical Enablement | Provide necessary release enablement details and documents. | YES |
Documentation Considerations
Questions to be addressed:
The dynamic plugins enhancement describes a `disable-plugins` query parameter for disabling specific console plugins.
This has no effect on static plugins, which are built into the Console application.
We need to support localization of dynamic plugins. The current proposal is to have one i18n namespace per dynamic plugin with a fixed name: `${plugin-name}-plugin`. Since console will know the list of plugins on startup, it can add these namespaces to the i18next config.
The console backend will need to implement an endpoint at the i18next load path. The endpoint will see if the namespace matches the known plugin namespaces. If so, it will proxy to the plugin. Otherwise it will serve the static file from the local filesystem.
We need a UI for enabling and disabling dynamic plugins. The plugins will be discovered either through a custom resource or an annotation on the operator CSV. The enabled plugins will be persisted through the operator config (consoles.operator.openshift.io).
This story tracks enabling and disabling the plugin during operator install through Cluster Settings. This is needed in the future if a plugin is installed outside of an OLM operator.
UX design: https://github.com/openshift/openshift-origin-design/pull/536
Requirement | Notes | isMvp? |
---|
CI - MUST be running successfully with test automation | This is a requirement for ALL features. | YES |
Release Technical Enablement | Provide necessary release enablement details and documents. | YES |
This Section:
This Section: What does the person writing code, testing, documenting need to know? What context can be provided to frame this feature.
Questions to be addressed:
Currently we are showing system projects within the list view of the Projects page. As stated here https://issues.redhat.com/browse/RFE-185, there are many projects that are considered as system projects that are not important to the user. The value should be remember across sessions, but it something we should be able to toggle directly from the list.
In OpenShift, reserved namespaces are `default`, `openshift`, and those that start with `openshift-`, `kubernetes-`, or `kube-`.
Goal
By default the Cluster Utilization card should not include metrics from `master` nodes in its queries for CPU, Memory, Filesystem, Network, and Pod count.
A new filter option should allow users to toggle between a combined view of what is seen on the Cluster Utilization card today, which is mostly useful on small clusters where masters are schedulable for user workloads.
Assets
Background
As discussed in this thread, the`kube_node_role` metric available since 4.3 should allow us to filter the card's PromQL queries to not include master node metrics.
This filtered view would likely make the card's data more useful for users who aren't running their workloads on masters, like OpenShift Dedicated users.
As noted by some folks during design discussions, this filter isn't perfect, and wouldn't filter out the data from "Infra" nodes that users may have set up using labels/taints. Until we determine a good way to provide more advanced filtering, this basic "Include masters" checkbox is still more flexible than what the card offers today.
Requirements
As a admin, I want to be able to access the node logs from the nodes detail page in order to troubleshoot what is going on with the node.
We should support getting node logs for different units for node journal logs and evaluate the other CLI flags.
We currently have a gap with the CLI:
We need to investigate whether the k8s API supports WebSockets for streaming node logs.
OpenShift console supports new features and elevated experience for Operator Lifecycle Manager (OLM) Operators and Cluster Operators.
OCP Console improves the controls and visibility for managing vendor-provided software in customers’ infrastructure and making these solutions available for customers' internal users.
To achieve this,
We want to make sure OLM’s and Cluster Operators' new features are exposed in the console so admin console users can benefit from them.
Requirement | Notes | isMvp? |
---|---|---|
OCP console supports the latest OLM APIs and features | This is a requirement for ALL features. | YES |
OCP console improves visibility to Cluster Operators related resources and features. | This is a requirement for ALL features. | YES |
(Optional) Use Cases
<--- Remove this text when creating a Feature in Jira, only for reference --->
* Main success scenarios - high-level user stories
* Alternate flow/scenarios - high-level user stories
* ...
Questions to answer...
How will the user interact with this feature?
Which users will use this and when will they use it?
Is this feature used as part of the current user interface?
Out of Scope
<--- Remove this text when creating a Feature in Jira, only for reference --->
# List of non-requirements or things not included in this feature
# ...
Background, and strategic fit
<--- Remove this text when creating a Feature in Jira, only for reference --->
What does the person writing code, testing, documenting need to know? What context can be provided to frame this feature.
Assumptions
<--- Remove this text when creating a Feature in Jira, only for reference --->
* Are there assumptions being made regarding prerequisites and dependencies?
* Are there assumptions about hardware, software or people resources?
* ...
Customer Considerations
<--- Remove this text when creating a Feature in Jira, only for reference --->
* Are there specific customer environments that need to be considered (such as working with existing h/w and software)?
...
Documentation Considerations
<--- Remove this text when creating a Feature in Jira, only for reference --->
Questions to be addressed:
* What educational or reference material (docs) is required to support this product feature? For users/admins? Other functions (security officers, etc)?
* Does this feature have doc impact?
* New Content, Updates to existing content, Release Note, or No Doc Impact
* If unsure and no Technical Writer is available, please contact Content Strategy.
* What concepts do customers need to understand to be successful in [action]?
* How do we expect customers will use the feature? For what purpose(s)?
* What reference material might a customer want/need to complete [action]?
* Is there source material that can be used as reference for the Technical Writer in writing the content? If yes, please link if available.
* What is the doc impact (New Content, Updates to existing content, or Release Note)?
As a user of OperatorHub, I'd like to have an improved "status display" for Operators being installed before so I can better understand if those Operators actually being successfully installed or require additional actions from me to complete the installation.
Improve visibility of Operator installation status on OperatorHub page
OperatorHub page currently shows an Operator as Installed as long as a Subscription object exists for that operator in the current namespace.
This can be misleading because the installation could be stalled or require additional interactions from the user (e.g. "manual upgrade approval") in order to complete the installation.
The console could potentially have some indication of an "in-between" or "requires attention" state for Operators that are in these states + links to the actual "Installed Operators" page for more details.
1. BZ: https://bugzilla.redhat.com/show_bug.cgi?id=1899359
2. RFE: https://issues.redhat.com/browse/RFE-1691
OLM is adding a property to the CSV to signal that the operator should clean up the operand on operator uninstall. See https://github.com/operator-framework/enhancements/pull/46
Console will need to add a checkbox to the UI to prompt ask the user if the operand should be cleaned up (with strong warnings about what this means). On delete, console should set the `spec.cleanup` property on the CSV to indicate whether cleanup should happen.
Additionally, console needs to be able to show proper status for CSVs that are terminating in the UI so it's clear the operator is being deleted and cleanup is in progress. If there are errors with cleanup, those should be surfaced back through the UI.
Depends on OLM-1733
Current version is 2.5.1, and we are still on 1.x. Updating the package is required to support the additional validation keywords in CONSOLE-2807.
https://github.com/rjsf-team/react-jsonschema-form/releases
Breaking changes are listed in the v2 notes:
https://github.com/rjsf-team/react-jsonschema-form/releases/tag/v2.0.0
Key Objective
Providing our customers with a single simplified User Experience(Hybrid Cloud Console)that is extensible, can run locally or in the cloud, and is capable of managing the fleet to deep diving into a single cluster.
Why customers want this?
Why we want this?
Phase 1 Goal: Get something to market (OCP 4.8, ACM 2.3)
Phase 1 —> OCP deploys ACM Hub Operator —> ACM Perspective becomes available —> User can switch between ACM multi-cluster view and local OCP Console —> No SSO user has to login in twice
Phase 2 Goal: Productization of the united Console (OCP 4.9, ACM 2.4)
Phase 2 Use Cases:
We need to coordinate with the ACM team so that the masthead looks the same when switching between contexts. This might require us to consume a common masthead component in OCP console.
The ACM team will need to honor our custom branding configuration so that the logo does not change when switching contexts.
Known differences:
Open questions:
OCP/Telco Definition of Done
Feature Template descriptions and documentation.
Feature Overview
Enable customers to access Google services from workloads on OpenShift clusters using Google Workload Identity (aka WIF)
https://cloud.google.com/kubernetes-engine/docs/concepts/workload-identity
Requirement | Notes | isMvp? |
---|---|---|
CI - MUST be running successfully with test automation | This is a requirement for ALL features. | YES |
Release Technical Enablement | Provide necessary release enablement details and documents. | YES |
This Section:
This Section: What does the person writing code, testing, documenting need to know? What context can be provided to frame this feature.
Questions to be addressed:
Dependencies (internal and external)
Investigate if this will work for OpenShift components similar to how we implemented STS.
Can we distribute credentials fashion that is transparent to the callers (as to whether it is normal service account of a short lived token) like we did for AWS?
What changes would be required for operators?
Can ccoctl do the heavy lifting as we did for AWS?
Enable sharing ConfigMap and Secret across namespaces
Requirement | Notes | isMvp? |
---|---|---|
Secrets and ConfigMaps can get shared across namespaces | YES |
NA
NA
Consumption of RHEL entitlements has been a challenge on OCP 4 since it moved to a cluster-based entitlement model compared to the node-based (RHEL subscription manager) entitlement mode. In order to provide a sufficiently similar experience to OCP 3, the entitlement certificates that are made available on the cluster (OCPBU-93) should be shared across namespaces in order to prevent the need for cluster admin to copy these entitlements in each namespace which leads to additional operational challenges for updating and refreshing them.
Questions to be addressed:
* What educational or reference material (docs) is required to support this product feature? For users/admins? Other functions (security officers, etc)?
* Does this feature have doc impact?
* New Content, Updates to existing content, Release Note, or No Doc Impact
* If unsure and no Technical Writer is available, please contact Content Strategy.
* What concepts do customers need to understand to be successful in [action]?
* How do we expect customers will use the feature? For what purpose(s)?
* What reference material might a customer want/need to complete [action]?
* Is there source material that can be used as reference for the Technical Writer in writing the content? If yes, please link if available.
* What is the doc impact (New Content, Updates to existing content, or Release Note)?
OCP/Telco Definition of Done
Epic Template descriptions and documentation.
<--- Cut-n-Paste the entire contents of this description into your new Epic --->
Reduce the OpenShift platform and associated RH provided components to a single physical core on Intel Sapphire Rapids platform for vDU deployments on SingleNode OpenShift.
Requirement | Notes | isMvp? |
---|---|---|
CI - MUST be running successfully with test automation | This is a requirement for ALL features. | YES |
Release Technical Enablement | Provide necessary release enablement details and documents. | YES |
Provide a mechanism to tune the platform to use only one physical core. |
Users need to be able to tune different platforms. | YES |
Allow for full zero touch provisioning of a node with the minimal core budget configuration. | Node provisioned with SNO Far Edge provisioning method - i.e. ZTP via RHACM, using DU Profile. | YES |
Platform meets all MVP KPIs | YES |
Questions to be addressed:
Currently, OpenShift Monitoring is a full E2E solution for monitoring infrastructure and workloads locally inside a single cluster. It comes with everything that an SRE needs from allowing to configure scraping of metrics to configuring where alerts go.
With deployment models like Single Node OpenShift and/or resource restricted environments, we now face challenges that a lot of the functions are already available centrally or are not necessary due to the nature of a specific cluster (e.g. Far Edge). Therefore, you don't need to deploy components that expose these functions.
Also, Grafana is not FIPS compliant, because it uses PBKDF2 from x/crypto to derive a 32 byte key from a secret and salt, which is then used as the encryption key. Quoting https://bugzilla.redhat.com/show_bug.cgi?id=1931408#c10 "it may be a problem to sell Openshift into govt agencies if
grafana is a required component."
We could make the Monitoring stack as is completely optional and provide a more "agent-like" component. Unfortunately, that would probably take much more time and in the end just reproduce what we already have just with fewer components. It would also not reduce the amount of samples scraped which has the most impact on CPU usage.
Use cases like single-node deployments (e.g. far-edge) don't need to deploy a local Alertmanager cluster because alerts are centralized at the core (e.g. hub cluster), running Alertmanager locally takes resources from user workloads and adds management overhead. Cluster admins should be able to not deploy Alertmanager as a day-2 operation.
DoD
This section includes Jira cards that are linked to an Epic, but the Epic itself is not linked to any Feature. These epics were completed when this image was assembled
Monitoring needs to be reliable and is the very useful when trying to debug clusters in an already degraded state. We want to ensure that metrics scraping can always work if the scraper can reach the target, even if the kube-apiserver is unavailable or unreachable. To do this, we will combine a local authorizer (already merged in many binaries and the rbac-proxy) and client-cert based authentication to have a fully local authentication and authorization path for scraper targets.
If networking (or part of networking) is down and a scraper target cannot reach the kube-apiserver to verify a token and a subjectaccessreview, then the metrics scraper can be rejected. The subjectaccessreview (authorization) is already largely addressed, but service account tokens are still used for scraping targets. Tokens require an external network call that we can avoid by using client certificates. Gathering metrics, especially client metrics, from partially functionally clusters helps narrow the search area between kube-apiserver, etcd, kubelet, and SDN considerably.
In addition, this will significantly reduce the load on the kube-apiserver. We have observed in the CI cluster that token and subject access reviews are a significant percentage of all kube-apiserver traffic.
User story:
As cluster-policy-controller I automatically approve cert signing requests issued by monitoring.
DoD:
Implementation hints: leverage approving logic implemented in https://github.com/openshift/library-go/pull/1083.
An epic we can duplicate for each release to ensure we have a place to catch things we ought to be doing regularly but can tend to fall by the wayside.
ListPage is still JSX, we should convert to TSX, add proper types and make sure rest of the code is passing correct props.
resources.js contains functions to work with k8s API (CRUD). It would be good to convert to TS and add proper types. We will want to expose these functions in some form to dynamic plugins too so proper types is a must
Table component is Class component currently, we want to update to function component.
There's also many properties with `any` type, we will want to reduce those and be more strict.
OCP/Telco Definition of Done
Epic Template descriptions and documentation.
<--- Cut-n-Paste the entire contents of this description into your new Epic --->
This epic is mainly focused to track the dev console QE activities for 4.9 Release
1. Identify the scenarios for automation
2. Segregate the test Scenarios into smoke, Regression and other user stories
a. Update the https://docs.jboss.org/display/ODC/Automation+Status+Report
3. Align with layered operator teams for updating scripts
3. Work closely with dev team for epic automation
4. Create the automation scripts using cypress
5. Implement CI for nightly builds
6. Execute scripts on sprint basis
To the track the QE progress at one place
Update the OWNERS file in all plugin folder. As Gaja and praveen left from the org
Deleting the Service mesh folder and adding the OWNERS file to the service-mesh folder
While executing the script "Yarn run gherkin-lint", error is displaying due to ","
Scenario length increased to 20. To avoid couple of quick start scenarios errors
max scenarios increased to 20 in feature files, because for few features this is needed
This section includes Jira cards that are linked to an Epic, but the Epic itself is not linked to any Feature. These epics were not completed when this image was assembled
This epic is mainly focused on the 4.10 Release QE activities
1. Identify the scenarios for automation
2. Segregate the test Scenarios into smoke, Regression and other user stories
a. Update the https://docs.jboss.org/display/ODC/Automation+Status+Report
3. Align with layered operator teams for updating scripts
3. Work closely with dev team for epic automation
4. Create the automation scripts using cypress
5. Implement CI for nightly builds
6. Execute scripts on sprint basis
To the track the QE progress at one place in 4.10 Release Confluence page
Acceptance criteria:
To fix the CI related prow issues, creating this task
This section includes Jira cards that are not linked to either an Epic or a Feature. These tickets were completed when this image was assembled
When the user opens the topology sidebar for a Deployment with a BuildConfig the string Build #x was complete (and others) are not translated. The browser log shows also an error
Missing i18n key "Build <1></1> was complete" in namespace "public" and language "en."
The strings are translated but are saved as:
Build {link} was complete
None
Build #x ... string is not translated. An error is logged in the browser console.
String is shown in the selected language and no error is logged in the browser console.
Always
Happen on a cluster (4.9.0-0.nightly-2021-07-07-021823)
and local development (4.9 master, tested with 0588bc0f0b838ae448a68f35c5424f9bbfc65bc9)
None
.NET builder image is not getting detected
Install Openshift Pipelines Operator
When git url of a .NET project is provided, the builder image is not getting detected
When git url of a .NET project is provided, the builder image should automatically get detected
Always
Create README based on shared Google doc and add it to OpenShift so we have documentation for i18n work.
The topology use a PatternFly toolbar component to render its toolbar. It looks like the latest version uses a grid and flex layout which adds an addtional spacing below the toolbar.
None
There is a spacing below the toolbar (Display options, Filter by resource, Filter by name), see attached screenshot.
No additional spacing below the toolbar. (See screenshot of 4.8)
Always
4.9 master, tested with commit 3c6537eec4f5c165cf214c4100bddeccc104ed44
Maybe this is a patternfly issue, or we should check if the second div in the toolbar should not be rendered. See attached screenshot.
User expects a shadow on the form footer when the add page content is longer then the viewport shows. This was shown in 4.8.
None
No shadow at the top of the form footer when the content view is scrollable.
A shadow at the top of the form footer when the content view is scrollable.
Always
Happen on a cluster (4.9.0-0.nightly-2021-07-07-021823)
and local development (4.9 master, tested with 0588bc0f0b838ae448a68f35c5424f9bbfc65bc9)
None
The QuickStart content shows a shadow above and/or below the content when the user can scroll into that direction. This feature is missing now.
None
No shadow when the user can scroll the content into the content direction.
A shadow when the user can scroll the content into the content direction.
Always
4.9 (tested on master 2860e58114b4f811ac2ebf6ce34dd99263920e17)
Quick start is now extracted into PatternFly
This section includes Jira cards that are not linked to either an Epic or a Feature. These tickets were not completed when this image was assembled
Currently we see this issue:
Aug 28 00:02:20.755103 ip-10-0-131-145 hyperkube[1366]: I0828 00:02:20.755067 1366 prober.go:116] "Probe failed" probeType="Readiness" pod="openshift-etcd/etcd-quorum-guard-588ff9b55d-8lhb7" podUID=5b79def2-9e56-4c93-b8ab-1d04db0f552f containerName="guard" probeResult=failure output=""
then few seconds later
Aug 28 00:02:25.797258 ip-10-0-131-145 hyperkube[1366]: I0828 00:02:25.797231 1366 kubelet.go:2175] "SyncLoop (probe)" probe="readiness" status="ready" pod="openshift-etcd/etcd-quorum-guard-588ff9b55d-8lhb7"
Try to improve the clustermembercontroller sync loop for health status or just improve to not fail there on probe quard during install at least or scale. Instead of maybe operator status use metrics to track this.
Slack for more context https://coreos.slack.com/archives/C027U68LP/p1630506922034600
AC: