By Dr. Cara Touretzky, Dr. Robert Luce, and Dr. David Torres Sanchez
For those of us who like to focus on the art of modeling, it’s easy to take for granted the influence of different hardware options on optimization performance. And with a reliable solver like Gurobi, modelers don’t have to spend time contemplating the algorithms being deployed under the hood.
But hardware options have been evolving, and so have the algorithms used in the optimization space. While the simplex and barrier methods for solving LPs are well established (and continuing to improve each year), the Primal-Dual Hybrid Gradient method (PDHG, also known as PDLP when enhanced and adapted specifically to Linear Programming) has recently received significant attention because it is amenable to a GPU-based implementation.
With new GPU-enabled algorithms for barrier and PDHG, the average optimization user benefits from understanding the impact of hardware and algorithm choice when deploying optimization models—especially if they are interested in finding the combination that results in the fastest possible solution times.
Spoiler alert: there is no silver bullet when it comes to hardware and algorithm selection. The classic advice of “it depends on your model” still applies. At Gurobi, we have done a thorough comparison of these algorithms and hardware options. We want to share our methodology and industry-proven benchmark results with the optimization community to help navigate these choices.
This article will demystify the applicable algorithms and reveal new opportunities for the field of optimization. Specifically, we have observed that giant LPs (and possibly MIPs that rely on giant LP relaxations) show the most immediate promise for experiencing a significant speed boost from GPU-enabled LP algorithms.
Gurobi has a proven track record of taking new innovations in the optimization space and turning them into practical tools for industry applications. Historically, this has been achieved through close collaboration with Gurobi customers, who are pushing the frontier of our solver’s capabilities. Now that Gurobi has established a benchmark methodology for comparing the various hardware and algorithm choices, we are ready and waiting to test your real-world models. Please reach out through How do I get support for using Gurobi? if you are interested in testing your models’ performance with a GPU-accelerated solver.
We have investigated multiple setups for solving giant LPs, with a combination of new hardware and algorithm options.
Hardware options we’ve investigated:
For the CPU options, all benchmarks are run using 64 cores, rather than the total available. The additional benefit of using more than 64 cores on either machine was diminishing, and using the same number of cores for all tests enabled a simple comparison.
Gurobi’s algorithmic options for giant LPs are (1) the barrier algorithm, and (2) PDHG, a first-order method. These two algorithms differ significantly in their computational footprint. We’ll summarize the most important properties:
If you’re wondering why the primal and dual simplex methods are not part of this discussion, it is because these algorithms are already difficult to parallelize on CPU hardware, and even more so on the GPU.
Convergence behavior is different for the main LP algorithms. Barrier methods have the benefit of displaying locally quadratic convergence, so once the method gets close to a solution, it is very quick to shrink the residual to a small value. On the other hand, PDHG displays linear behavior at best, so it can take longer to reach a very low tolerance threshold.
The tolerance that we use to define convergence can vary, but industry practices seem to generally agree that residuals less than 1e-6 are acceptable for a quality solution. What’s the risk of increasing the convergence tolerance when you are trying to get a solution as fast as possible? To answer that question, we have to talk about crossover and how it can add extra time to improve the result of the barrier and PDHG algorithms.
Crossover is an essential but often forgotten step needed when using a non-simplex method for LPs. To summarize, crossover helps translate the LP solution into a “clean basic solution.” Solutions from an interior point algorithm (or first-order method) normally have noisy values close to zero (we can call this dense) and, more importantly, due to these, they may have violations in various metrics (such as primal feasibility, dual feasibility, and complementary slackness). Crossover is very effective at cleaning up these violations and providing a clean and sparse basic solution.
Most of our customers regard a basic solution as essential to interpretability. Therefore, when evaluating these new algorithms, we feel it is necessary to also account for the time needed to complete crossover. The importance of this will be made even clearer in the results below, because for some algorithms, there is a clear tradeoff between accuracy and the time needed to complete crossover.
Of course, one would hope that the “best” or “fastest” machine would solve all optimization problems equally fast. In practice, however, one can observe significant variation across the different hardware options and the two algorithms mentioned above.
We give two insightful examples from the MIPLIB public library of models, and will conclude with a comment on the results from our internal set of benchmarks. The MIPLIB examples were selected because they show the possible variation in solution times due to the combination of model, algorithm, and hardware selection. For a more in-depth analysis of these results and additional examples, see New Options for Solving Giant LPs – Gurobi Optimization.
Key takeaways from our experiments:
Example Model 1:
The first model (“rwth-timetable”) is an LP relaxation of a time tabling problem with a lot of rows and columns that result in high computational effort in the barrier algorithm. The statistics are as follows:
rwth-timetable: 440,134 rows, 923,564 columns and 4,510,786 nonzeros
We visualize the comparison of the different hardware and algorithmic options explained above by means of a bar plot:
Each bar of the plot corresponds to a different hardware and algorithm combination, noting that PDHG has several columns, each with different relative tolerance termination criteria. Each bar is split into two colors: time spent performing iterations in the respective algorithm (red), and the time spent in crossover (green). From the hardware listed above, the first four bars correspond to CPU 1, the following bar corresponds to CPU 2, and the last two bars correspond to GPU.
In this first example, we can see that PDHG with a low convergence tolerance (1e-6) is the clear winner, regardless of the hardware, CPU 1 or GPU. As shown by the last two bars, both algorithms running on the GPU bring some improvement to their CPU counterparts. Specifically, for PDHG on the GPU, we see an improvement of 62.71% over CPU 1. For barrier on the GPU, we see a 50% improvement over CPU 1, and only a modest 7% improvement over CPU 2.
Another interesting takeaway is that PDHG with higher tolerances, as expected, spends less time performing iterations—but unfortunately, this doesn’t help the total solution time, as crossover takes a large proportion of this. This is because crossover largely benefits from higher-quality solutions.
Example Model 2:
The second problem is the LP relaxation of an open-pit mining problem (“rmine25”) that has a huge number of rows:
rmine25: 2,953,849 rows, 326,599 columns, 7,182,744 nonzeros
In the rmine25 results, we can see that barrier is the algorithm of choice across all hardware options, with an even further advantage when using the fastest CPU 2. Both CPU 2 and the GPU version for barrier show a speedup nearly 40% over CPU 1. Interestingly, the time spent in crossover dominates the PDHG runtime, even for a convergence tolerance as small as 1e-6.
With results like this, it is hard to predict which algorithm and hardware choice is the best for a given model, without even mentioning the effect of termination tolerance on crossover. Put differently, it is safe to say there is no one-size-fits-all combination that is set to win! And we haven’t even factored into this discussion a range of other interesting phenomena: For example, although on these two models the barrier runtime of CPU 2 and GPU are largely comparable, we’ve seen that the actual throughput for the factorization that cuDSS is doing can be much larger than the throughput we achieve on the CPU.
Whether you identify as an “optimization person” who wanted to learn more about your hardware options, or a “high-performance computing person” who wanted to learn more about optimization, we hope this article guided you to the same place of appreciation for the latest initiatives in the field of GPU-enabled optimization algorithms.
As was stated earlier, there is no silver bullet when it comes to hardware and algorithm selection for optimization. We still recommend treating every model as a unique creation that must be tested to see which combination is the best fit. The two examples shown earlier reflect the impact that these choices can have, given the currently available state-of-the-art hardware options.
Because the hardware and related libraries (recall, we leveraged NVIDIA’s cuDSS and cuSparse libraries for our GPU-enabled algorithms) are continuously evolving, we expect the results of our benchmark to change rapidly—and it’s possible many more use cases will benefit from barrier and PDHG on GPU in the future.
At this point you may be wondering why we are only sharing results for models from the MIPLIB database when Gurobi has many real-world case studies. Our preliminary and internal benchmarks show only 72 out of 2633 models where PDHG is better than our default settings (only 17 of those show more than a factor of 2 improvement). But by nature of our collection, this set is highly biased towards LPs that we can already solve with either Simplex or IPMs. Why does our test set not include giant LPs that we cannot solve? This is more of a philosophical question, because why would someone write a giant LP if they expect that current solver technology cannot handle it?
This is an active research area, and perception of what is “giant” is always shifting. With the new GPU-enabled algorithms, our initial impression is that they will offer the most benefit to models so large that people have not yet dared to build them. Of course, we’re also exploring models that may not be “giant,” but have a level of complexity and heavy compute needs that will warrant using the new algorithms and GPU hardware.
Interested in qualifying your model for GPU-accelerated optimization? Please reach out through How do I get support for using Gurobi?
Latest news and releases
Choose the evaluation license that fits you best, and start working with our Expert Team for technical guidance and support.
Request free trial hours, so you can see how quickly and easily a model can be solved on the cloud.
Cookie | Duration | Description |
---|---|---|
_biz_flagsA | 1 year | A Cloudflare cookie set to record users’ settings as well as for authentication and analytics. |
_biz_pendingA | 1 year | A Cloudflare cookie set to record users’ settings as well as for authentication and analytics. |
_biz_sid | 30 minutes | This cookie is set by Bizible, to store the user's session id. |
ARRAffinity | session | ARRAffinity cookie is set by Azure app service, and allows the service to choose the right instance established by a user to deliver subsequent requests made by that user. |
ARRAffinitySameSite | session | This cookie is set by Windows Azure cloud, and is used for load balancing to make sure the visitor page requests are routed to the same server in any browsing session. |
BIGipServersj02web-nginx-app_https | session | NGINX cookie |
cookielawinfo-checkbox-advertisement | 1 year | Set by the GDPR Cookie Consent plugin, this cookie is used to record the user consent for the cookies in the "Advertisement" category . |
cookielawinfo-checkbox-analytics | 11 months | This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics". |
cookielawinfo-checkbox-functional | 11 months | The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional". |
cookielawinfo-checkbox-necessary | 11 months | This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary". |
cookielawinfo-checkbox-others | 11 months | This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other. |
cookielawinfo-checkbox-performance | 11 months | This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance". |
CookieLawInfoConsent | 1 year | Records the default button state of the corresponding category & the status of CCPA. It works only in coordination with the primary cookie. |
elementor | never | This cookie is used by the website's WordPress theme. It allows the website owner to implement or change the website's content in real-time. |
JSESSIONID | session | New Relic uses this cookie to store a session identifier so that New Relic can monitor session counts for an application. |
viewed_cookie_policy | 11 months | The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data. |
Cookie | Duration | Description |
---|---|---|
__cf_bm | 30 minutes | This cookie, set by Cloudflare, is used to support Cloudflare Bot Management. |
_biz_nA | 1 year | Bizible sets this cookie to remember users’ settings as well as for authentication and analytics. |
_biz_uid | 1 year | This cookie is set by Bizible, to store user id on the current domain. |
_hjAbsoluteSessionInProgress | 30 minutes | Hotjar sets this cookie to detect a user's first pageview session, which is a True/False flag set by the cookie. |
_mkto_trk | 2 years | This cookie is set by Marketo. This allows a website to track visitor behavior on the sites on which the cookie is installed and to link a visitor to the recipient of an email marketing campaign, to measure campaign effectiveness. Tracking is performed anonymously until a user self-identifies by submitting a form. |
bcookie | 1 year | LinkedIn sets this cookie from LinkedIn share buttons and ad tags to recognize browser ID. |
bscookie | 1 year | LinkedIn sets this cookie to store performed actions on the website. |
doc_langsBB | 1 year | Documentation system cookie |
doc_version | 1 year | Documentation system cookie |
lang | session | LinkedIn sets this cookie to remember a user's language setting. |
lidc | 1 day | LinkedIn sets the lidc cookie to facilitate data center selection. |
UserMatchHistory | 1 month | LinkedIn sets this cookie for LinkedIn Ads ID syncing. |
whova_client_id | 10 years | Event agenda system cookie |
Cookie | Duration | Description |
---|---|---|
_gat_UA-5909815-1 | 1 minute | A variation of the _gat cookie set by Google Analytics and Google Tag Manager to allow website owners to track visitor behaviour and measure site performance. The pattern element in the name contains the unique identity number of the account or website it relates to. |
Cookie | Duration | Description |
---|---|---|
_an_uid | 7 days | 6Sense Cookie |
_BUID | 1 year | This cookie, set by Bizible, is a universal user id to identify the same user across multiple clients’ domains. |
_ga | 2 years | The _ga cookie, installed by Google Analytics, calculates visitor, session and campaign data and also keeps track of site usage for the site's analytics report. The cookie stores information anonymously and assigns a randomly generated number to recognize unique visitors. |
_ga_* | 1 year 1 month 4 days | Google Analytics sets this cookie to store and count page views. |
_gat_UA-* | 1 minute | Google Analytics sets this cookie for user behaviour tracking. |
_gcl_au | 3 months | Provided by Google Tag Manager to experiment advertisement efficiency of websites using their services. |
_gd_session | 4 hours | This cookie is used for collecting information on users visit to the website. It collects data such as total number of visits, average time spent on the website and the pages loaded. |
_gd_visitor | 2 years | This cookie is used for collecting information on the users visit such as number of visits, average time spent on the website and the pages loaded for displaying targeted ads. |
_gid | 1 day | Installed by Google Analytics, _gid cookie stores information on how visitors use a website, while also creating an analytics report of the website's performance. Some of the data that are collected include the number of visitors, their source, and the pages they visit anonymously. |
_hjFirstSeen | 30 minutes | Hotjar sets this cookie to identify a new user’s first session. It stores the true/false value, indicating whether it was the first time Hotjar saw this user. |
_hjIncludedInSessionSample_* | 2 minutes | Hotjar cookie that is set to determine if a user is included in the data sampling defined by a site's daily session limit. |
_hjRecordingEnabled | never | Hotjar sets this cookie when a Recording starts and is read when the recording module is initialized, to see if the user is already in a recording in a particular session. |
_hjRecordingLastActivity | never | Hotjar sets this cookie when a user recording starts and when data is sent through the WebSocket. |
_hjSession_* | 30 minutes | Hotjar cookie that is set when a user first lands on a page with the Hotjar script. It is used to persist the Hotjar User ID, unique to that site on the browser. This ensures that behavior in subsequent visits to the same site will be attributed to the same user ID. |
_hjSessionUser_* | 1 year | Hotjar cookie that is set when a user first lands on a page with the Hotjar script. It is used to persist the Hotjar User ID, unique to that site on the browser. This ensures that behavior in subsequent visits to the same site will be attributed to the same user ID. |
_hjTLDTest | session | To determine the most generic cookie path that has to be used instead of the page hostname, Hotjar sets the _hjTLDTest cookie to store different URL substring alternatives until it fails. |
6suuid | 2 years | 6Sense Cookie |
AnalyticsSyncHistory | 1 month | LinkedIn cookie |
BE_CLA3 | 1 year 1 month 4 days | BrightEdge sets this cookie to enable data aggregation, analysis and report creation to assess marketing effectiveness and provide solutions for SEO, SEM and website performance. |
CONSENT | 2 years | YouTube sets this cookie via embedded youtube-videos and registers anonymous statistical data. |
dj | 10 years | DemandJump cookie |
djaimid.a28e | 2 years | DemandJump cookiean |
djaimses.a28e | 30 minutes | DemandJump cookie |
li_gc | 5 months 27 days | LinkedIn Cookie |
ln_or | 1 day | LinkedIn Cookie |
vuid | 2 years | Vimeo installs this cookie to collect tracking information by setting a unique ID to embed videos to the website. |
Cookie | Duration | Description |
---|---|---|
__adroll | 1 year 1 month | This cookie is set by AdRoll to identify users across visits and devices. It is used by real-time bidding for advertisers to display relevant advertisements. |
__adroll_fpc | 1 year | AdRoll sets this cookie to target users with advertisements based on their browsing behaviour. |
__adroll_shared | 1 year 1 month | Adroll sets this cookie to collect information on users across different websites for relevant advertising. |
__ar_v4 | 1 year | This cookie is set under the domain DoubleClick, to place ads that point to the website in Google search results and to track conversion rates for these ads. |
_fbp | 3 months | This cookie is set by Facebook to display advertisements when either on Facebook or on a digital platform powered by Facebook advertising, after visiting the website. |
_te_ | session | Adroll cookie |
fr | 3 months | Facebook sets this cookie to show relevant advertisements to users by tracking user behaviour across the web, on sites that have Facebook pixel or Facebook social plugin. |
IDE | 1 year 24 days | Google DoubleClick IDE cookies are used to store information about how the user uses the website to present them with relevant ads and according to the user profile. |
li_sugr | 3 months | LinkedIn sets this cookie to collect user behaviour data to optimise the website and make advertisements on the website more relevant. |
test_cookie | 15 minutes | The test_cookie is set by doubleclick.net and is used to determine if the user's browser supports cookies. |
VISITOR_INFO1_LIVE | 5 months 27 days | A cookie set by YouTube to measure bandwidth that determines whether the user gets the new or old player interface. |
YSC | session | YSC cookie is set by Youtube and is used to track the views of embedded videos on Youtube pages. |
yt-remote-connected-devices | never | YouTube sets this cookie to store the video preferences of the user using embedded YouTube video. |
yt-remote-device-id | never | YouTube sets this cookie to store the video preferences of the user using embedded YouTube video. |
yt.innertube::nextId | never | This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen. |
yt.innertube::requests | never | This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen. |