For complex inputs, use actual peg parsers : https://docs.rs/peg/latest/peg/
For simplet inputs, express your intent with readable methods using a lib : https://github.com/sgreben/regex-builder/ & https://github.com/francisrstokes/super-expressive
There is an excellent HN comment that provides more reading material around regex generation:- https://news.ycombinator.com/item?id=32037544
It looks like no one did that here. Even using the sample data provided, if you highlight a few of the addresses, it can't find the rest of them, mainly because it generates a regex with ST/AVE/LN in it, missing all the ones with RD. And if you add an RD sample, it just adds that to the list.
There's lots of great innovation coming with LLMs, but people are forgetting their "AI basics" when it comes to verifying them.
We tell AI what we want. AI produces a hyper-specific, but barely comprehensible result. We look over the result to make sure it’s all good.
"PostgresSql": "Host=localhost;User ID=postgres;Password=xxxx;Database=test;Application Name=Test1234,Port=35432;Pooling=false;"
I selected User ID=, Host=, Application Name=, Password=
The results were pretty useless, using the literal inputs as pattern matches:
(User ID|Host|Application Name|Password)=([^;]) (User ID|Host|Application Name|Password)=([\w]) (User ID=)|(Host=)|(Application Name=)|(Password=) ([Uu]ser [Ii][Dd]=)|([Hh]ost=)|([Aa]pplication [Nn]ame=)|([Pp]assword=)
The following prompt given to Chat GPT 4 however:
> Write a regular expression to extract the property names from this PostgreSQL connection string: "PostgresSql": "Host=localhost;User ID=postgres;Password=xxxx;Database=test;Application Name=Test1234,Port=35432;Pooling=false;"
Yields the response with an explanation:
"This regular expression will match any sequence of alphabetic characters (upper or lower case) that are followed by an equal sign (=). The negative lookbehind (?<=[^\\w]) ensures that the property name is not preceded by another word character."
A quick test on regex101.com shows this works perfectly.
Sorry, don't like to be overly critical. Someone has attempted to solve a common problem for developers, but LLMs are going to blow applications like this away. And I think that Chat GPT at version 4 has become a truly useful tool.
I was interested to test this since I'd been writing a regular expression earlier in the day for a similar usecase, which I've written up here: https://journals.appsoftware.com/public/76/227/4221/blog/sof...
At least for me, what would make this a killer app would be the ability of reading a document or pdf or big text dump and 1: identify "possible fields" (first name, date of birth), "probable fields" (middle name or other fields that are part of data set but doesn't appear in every line) and "probable junk data" (page numbers, page headers, useless pdf padding"
2: allow selection or tuning of these fields to generate regex to catch or remove only the data related to the parsed fields.
I THINK there's something done with pandas (pandoc?)that can help tearing a document apart and getting fields or basic doc structure, but AI would need to take it from here and present it in a clear, concise and optionally explained way so a busy office worker could just copy the regex filter in a spreadsheet formula or program function
Struck me as funny when we have another thread going about people pasting company data into ChatGPT and here we have a regex AI with an example that looks like it's encouraging you to trust it with helping you regex through your PII, just paste it in the box and highlight what you need lol (not saying that's the intent, just that's what less savvy users may do)
Company site does not inspire much confidence: https://libertylabs.ai/
Light on details, heavy on philosophers, trend setters, idea banks, and radicals that make me worried I'm dealing with opportunists taking swings at monetizing a bunch of .ai domains. Especially the weird cinematic banner.
Also, I'm not sure what underlying tech is used, and the only explanations on the tool seems to be a Youtube video, so I didn't look further. I'd like to know more about how it's made, if that's possible and something the author would be ok to share.
I've seen at least 2 projects in the last 6 months using LLMs to generate bash code which seems like a similar solve. LLMs are super cool, but there's a massive advantage to actually understanding what you're code does, and LLM generated regex, bash, assembly etc loses that.
We will be deploying regex.ai v1.1 on the first week of April , with descriptions and 5x improved performance. Stay tuned!
One I familiar with is to match datetime interval, when you need to narrow down log rows for a particular time range.
So I built a tool just for it :) https://github.com/ekiauhce/interval-to-regexp
AI is a term that's been heavily co-opted by the expert systems crowd, especially when marketing solutions.
Does this tool actually learn from user input and improve output accordingly? That would be my definition.
Whichever way this was built it's a useful tool, thanks!
even though I doubt most production code uses the actual, correct, rfc-compliant regex to match emails (it's a monster), this does nothing to improve the situation...
Just like normal software development you have to check the solution right away, but it really works great
P.S. assuming the poster is scanning the comments, typo in the site title: "aritifical"
You have to know reg-ex really well to use this tool safely.
Or, just write regular expressions?
> ... Regex.ai's intuitive interface makes it easy to input sample text and generate complex regular expressions quickly and efficiently.
Inputting the sample text:And highlighting the first "baz" produced patterns which all had "[A-Z][a-z]*@libertylabs\\.ai" included, assumedly due to the default inclusions.
Removing those and highlighting the second "baz" resulted "<Agent B>" as the results in one case.
There is no explanation of any patterns generated. If a person is to use one of the generated patterns and Regex.ai is supposed to "save you time and streamline your workflow", no matter "[w]hether you're a novice or an expert", then some form of verification and/or explanation must exist.
Otherwise, a person must know how to formulate regular expressions in order to determine which, if any, of the presented options are applicable. And if a person knows how to formulate regular expressions, then why would they use Regex.ai?