Imagine a time before Facebook and Meta, way back in the middle of the dot.com bubble. The real inventor of the World Wide Web, Tim Berners-Lee, coined the term Semantic Web in 1999, a web of data that could be processed by machines, specifically that the data was machine-readable (as referenced in his in his book, Weaving the Web).
I have a dream for the Web [in which computers] become capable of analyzing all the data on the Web – the content, links, and transactions between people and computers. A “Semantic Web”, which makes this possible, has yet to emerge, but when it does, the day-to-day mechanisms of trade, bureaucracy, and our daily lives will be handled by machines talking to machines. The “intelligent agents” people have touted for ages will finally materialize.”
His vision was eventually better outlined in a 2001 Scientific American article, where Berners-Lee described an evolution of the (then) current Web to a Semantic Web.
The Semantic Web has been known as Web 3.0 and is an extension of the World Wide Web with its own standards by the W3C. Because the Metaverse has taken the moniker of Web 3.0, the Semantic Web has been relegated to Web 4.0, and even still, critics have questioned a Semantic Web feasibility. Proponents still argue that several applications have been proven, and the original concept has validity. We shall discuss the current and future potential of Web 4.0 and the Semantic Web.
What is the Semantic Web?
So beyond being “machine-readable,” what does the Semantic Web mean? The definition of the Semantic Web can differ significantly from person to person, but for our purposes, the Semantic Web is a virtual environment in which information and data are organized and connected so that it is processed automatically, the machines are reading content, and they have the ability to interpret the data. To this is added broadened Artificial Intelligence functionality which will allow users to interact with AI in a “human” way. When you need anything, how you would interact with another human is similar to how you would interact with the Web. The Web will be able to interpret your meaning from what is said/typed and even actions taken with clicks. Saying, “I want to see some TikToks” will not show you a selection of clocks, and the selection provided will automatically incorporate your likes and preferences.
The Semantic Web’s Ongoing Evolution
The evolution of the Semantic Web has focused on finding efficient methods of data sharing, data discovery, data integration, and data reuse.
One of the main components of the Semantic Web is the creation of ontologies. (Ontology– a set of concepts and categories in a subject area or domain that shows their properties and the relations between them.) From this foundation can grow other building blocks that computers use to draw connections. The linking of shared data, started around 2006, was to build data in the form of “RDF” graphs to make data sharing standardized.
Graph courtesy of communications of the ACM
The Linked Open Data Cloud’s RDF graphs are only a small portion of the 650,000 data documents being used in research of ontologies.
After combining ontologies, it was discovered that they had limitations, and the linking of data was considered the next step, but even the interest in linked data waned as it was realized that integrating and utilizing linked data required more data than was initially expected. Knowledge graphs were the following evolution and became a synthesis of linked data with the organization of data.
Image courtesy of Towards Data Science
From here is where things for the Semantic Web get complicated and where the technology is today. The Semantic Web field is not driven by certain methods inherent to the field, which distinguishes it from other data-related areas such as machine learning, which can be more focused and see incremental improvements. A shared vision drives it, and therefore as such, it borrows from other disciplines as needed but is less organized.
What is Web 4.0?
Combined with the Semantic Web is a dual progression of augmented reality combined with distributed tech and Big Data (the components of Web 3.0), and their linking will be the basis of Web 4.0. Users will likely have their own Avatars, a form of digital alter egos, interacting with AI and humans, not knowing, or caring if they are human. The digital assistant will not only take requests for actions but will be proactive.
In a Web 4.0 world, say you are on the way to LaGuardia airport, and your driverless Uber is stuck in traffic. Your digital assistant will inform you that with the current traffic patterns, you are going to miss your booked flight; however, the assistant has already pre-booked a different flight out of JFK airport and can automatically send you there, changing the route of the Uber, while also informing your family that you will be home only 15 minutes later than expected, it just needs your OK to do so.
Some may think this is great, while others see this as a dystopian future where there is too much access and control over your information. Some people may question if they are in a Matrix, and someone is pulling the strings of every aspect of their lives behind the scenes.
Challenges
The Semantic Web is proving very difficult to implement with the current technology and research. Having a computer understand the nuance of language is not easy. If you have language converted to text, you lose any nuance of pronunciation that would indicate humor or sarcasm; going the next step and incorporating this, then understanding it is a tall order, and this is just one challenge, even as humans, we don’t always pick up on these queues.
Some challenges that the Semantic Web must contend with are deceit, inconsistency, uncertainty, vagueness, and vastness. Any systems will need to effectively deal with all of these issues simultaneously in order to provide the promises of the Semantic Web.
Deceit- When the information’s producer intentionally misleads the information consumer. Cryptography is being utilized to reduce this threat, and processes to define information integrity along with the identity of the information source will be needed.
Inconsistency– when information from separate sources is combined, there will be logical contradictions, and deductive reasoning used by computers falls apart when “anything follows from a contradiction.” Two techniques that can be utilized to deal with inconsistency are defeasible reasoning and paraconsistent reasoning.
Uncertainty- computers don’t like precise concepts with uncertain values. A patient might present symptoms that could be one of a number of different diagnoses, each with a different probability. Uncertainty can be addressed with probabilistic reasoning.
Vagueness- imprecise questions such as, “how many grains of sand make up a pile?” or even concepts like tall and young are complex for a computer to deal with efficiently; everyone has their own definition. Matching query terms with different knowledge bases that provide overlapping but subtly different concepts can help. Fuzzy logic is also being used for the vagueness issue.
Vastness- With billions of pages on the Web already, it is difficult to determine what is needed to be used. The SNOMED CT dictionary has only 370,000 terms, and the existing system has been unable to effectively eliminate semantically duplicate terms. Future automated reasoning systems will have to deal with inputs on the level of billions and trillions.
While this does not cover all of the issues of creating a Semantic Web, it provides many of the challenges that all need to be overcome and to be simultaneously addressed to make the vision possible. The
The World Wide Web Consortium’s Incubator Group for Uncertainty (URW3-XG) lumps these problems altogether, in their report, under a single heading “uncertainty.” The techniques of possible solutions will require an extension to the Web Ontology Language (OWL) to, as an example, annotate conditional probabilities. This is an area of ongoing research which is yet to “solve” any of these problems.
Feasibility of the Semantic Web
Companies that have been investing in the Semantic Web for decades and are still having trouble bringing it to fruition. Recently IBM sold off much of its Watson Health program. Sadly many of the same problems with the Semantic Web that were identified 20 years ago remain.
- Scalability
- Multilinguality
- Reducing information overload with visualization
- Semantic Web language stability
The Semantic Web field is seeing mainstream industrial adoption. However, more efficient data management solutions are needed and continue to be a driver for the field.
Reaching the point of human comprehension and interaction is still a ways away.
Summary
The potential of the Semantic Web and Web 4.0 are incredible. Semantics is a slow-moving field, and as new discoveries are made, even more pain points seem to be discovered, yet we are making progress. Companies have spent fortunes on Semantic Web development and will continue to do so in the hope that the results will be as desired. It will eventually happen; there is a light at the end of the tunnel; we just don’t yet know how long the Semantic Web tunnel is.
Disclaimer: The author of this text, Jean Chalopin, is a global business leader with a background encompassing banking, biotech, and entertainment. Mr. Chalopin is Chairman of Deltec International Group, deltecbankstag.wpengine.com.
The co-author of this text, Robin Trehan, has a bachelor’s degree in economics, a master’s in international business and finance, and an MBA in electronic business. Mr. Trehan is a Senior VP at Deltec International Group, deltecbankstag.wpengine.com.
The views, thoughts, and opinions expressed in this text are solely the views of the authors, and do not necessarily reflect those of Deltec International Group, its subsidiaries, and/or its employees.