ai css 线条粗细_如何训练AI将您的设计模型转换为HTML和CSS-程序员宅基地

技术标签: python  机器学习  人工智能  神经网络  大数据  

ai css 线条粗细

by Emil Wallner

埃米尔·沃尔纳(Emil Wallner)

如何训练AI将您的设计模型转换为HTML和CSS (How you can train an AI to convert your design mockups into HTML and CSS)

Within three years, deep learning will change front-end development. It will increase prototyping speed and lower the barrier for building software.

三年之内,深度学习将改变前端开发。 它将提高原型制作速度,并降低构建软件的障碍。

The field took off last year when Tony Beltramelli introduced the pix2code paper and Airbnb launched sketch2code.

去年,当Tony Beltramelli提出pix2code纸,而Airbnb推出了sketch2code时,这一领域开始兴起

Currently, the largest barrier to automating front-end development is computing power. However, we can use current deep learning algorithms, along with synthesized training data, to start exploring artificial front-end automation right now.

当前,自动化前端开发的最大障碍是计算能力。 但是,我们可以使用当前的深度学习算法以及综合的训练数据来立即开始探索人工前端自动化。

In this post, we’ll teach a neural network how to code a basic a HTML and CSS website based on a picture of a design mockup. Here’s a quick overview of the process:

在这篇文章中,我们将教一个神经网络如何基于设计模型的图片编写一个基本HTML和CSS网站。 以下是该过程的快速概述:

1)将设计图像提供给经过训练的神经网络 (1) Give a design image to the trained neural network)
2)神经网络将图像转换为HTML标记 (2) The neural network converts the image into HTML markup)
3)渲染输出 (3) Rendered output)

We’ll build the neural network in three iterations.

我们将通过三个迭代来构建神经网络。

First, we’ll make a bare minimum version to get a hang of the moving parts. The second version, HTML, will focus on automating all the steps and explaining the neural network layers. In the final version, Bootstrap, we’ll create a model that can generalize and explore the LSTM layer.

首先,我们将制作一个最低限度的最低版本,以掌握更多的活动部件。 第二个版本HTML将着重于自动化所有步骤并解释神经网络层。 在最终版本Bootstrap中,我们将创建一个可以概括和探索LSTM层的模型。

All the code is prepared on GitHub and FloydHub in Jupyter notebooks. All the FloydHub notebooks are inside the floydhub directory and the local equivalents are under local.

所有代码在Jupyter笔记本中的GitHubFloydHub上准备。 所有FloydHub笔记本都在floydhub目录中,而本地等效项在local下。

The models are based on Beltramelli‘s pix2code paper and Jason Brownlee’s image caption tutorials. The code is written in Python and Keras, a framework on top of TensorFlow.

这些模型基于Beltramelli的pix2code纸和Jason Brownlee的图像说明教程 。 该代码是用Python和Keras(TensorFlow之上的框架)编写的。

If you’re new to deep learning, I’d recommend getting a feel for Python, backpropagation, and convolutional neural networks. My three earlier posts on FloydHub’s blog will get you started:

如果您是深度学习的新手,建议您对Python,反向传播和卷积神经网络有所了解。 我在FloydHub博客上发表的前三篇文章将帮助您入门:

核心逻辑 (Core Logic)

Let’s recap our goal. We want to build a neural network that will generate HTML/CSS markup that corresponds to a screenshot.

让我们回顾一下我们的目标。 我们要构建一个神经网络,该网络将生成与屏幕截图相对应HTML / CSS标记。

When you train the neural network, you give it several screenshots with matching HTML.

训练神经网络时,您会给它几个具有匹配HTML的屏幕截图。

It learns by predicting all the matching HTML markup tags one by one. When it predicts the next markup tag, it receives the screenshot as well as all the correct markup tags until that point.

它通过逐一预测所有匹配HTML标记来学习。 当它预测下一个标记标签时,它将接收屏幕截图以及所有正确的标记标签,直到该点为止。

Here is a simple training data example in a Google Sheet.

这是Google表格中的一个简单的训练数据示例

Creating a model that predicts word by word is the most common approach today. There are other approaches, but that’s the method we’ll use throughout this tutorial.

创建一个可以逐词预测的模型是当今最常见的方法。 还有其他方法 ,但这就是我们在本教程中将使用的方法。

Notice that for each prediction it gets the same screenshot. So if it has to predict 20 words, it will get the same design mockup twenty times. For now, don’t worry about how the neural network works. Focus on grasping the input and output of the neural network.

注意,对于每个预测,它都会获得相同的屏幕截图。 因此,如果必须预测20个单词,它将获得20次相同的设计样机。 现在,不必担心神经网络如何工作。 专注于掌握神经网络的输入和输出。

Let’s focus on the previous markup. Say we train the network to predict the sentence “I can code.” When it receives “I,” then it predicts “can.” Next time it will receive “I can” and predict “code.” It receives all the previous words and only has to predict the next word.

让我们集中于上一个标记。 假设我们训练网络以预测句子“我可以编码”。 当它收到“ I”时,它将预测“可以”。 下次它将收到“我可以”并预测“代码”。 它接收所有先前的单词,而只需要预测下一个单词。

The neural network creates features from the data. The network builds features to link the input data with the output data. It has to create representations to understand what is in each screenshot, the HTML syntax, that it has predicted. This builds the knowledge to predict the next tag.

神经网络根据数据创建特征。 网络构建了将输入数据与输出数据链接的功能。 它必须创建表示形式以了解每个屏幕快照中所预测的内容(HTML语法)。 这将积累知识以预测下一个标签。

When you want to use the trained model for real-world usage, it’s similar to when you train the model. The text is generated one by one with the same screenshot each time. Instead of feeding it with the correct HTML tags, it receives the markup it has generated so far. Then, it predicts the next markup tag. The prediction is initiated with a “start tag” and stops when it predicts an “end tag” or reaches a max limit. Here’s another example in a Google Sheet.

当您想将训练后的模型用于实际使用时,与训练模型时相似。 每次使用相同的屏幕截图一一生成文本。 它没有提供正确HTML标记,而是接收到目前为止生成的标记。 然后,它预测下一个标记标签。 预测以“开始标签”开始,并在预测“结束标签”或达到最大限制时停止。 这是Google表格中的另一个示例。

“ Hello World”版本 (“Hello World” Version)

Let’s build a “hello world” version. We’ll feed a neural network a screenshot with a website displaying “Hello World!” and teach it to generate the markup.

让我们构建一个“ hello world”版本。 我们将使用显示“ Hello World!”的网站为神经网络提供屏幕截图。 并教它生成标记。

First, the neural network maps the design mockup into a list of pixel values. From 0–255 in three channels — red, blue, and green.

首先,神经网络将设计模型映射到像素值列表中。 从0到255在三个通道中-红色,蓝色和绿色。

To represent the markup in a way that the neural network understands, I use one hot encoding. Thus, the sentence “I can code” could be mapped like the below.

为了以神经网络理解的方式表示标记,我使用一种热编码 。 因此,句子“我可以编码”可以像下面这样映射。

In the above graphic, we include the start and end tag. These tags are cues for when the network starts its predictions and when to stop.

在上图中,我们包含了开始和结束标记。 这些标记是网络何时开始其预测以及何时停止的线索。

For the input data, we will use sentences, starting with the first word and then adding each word one by one. The output data is always one word.

对于输入数据,我们将使用句子,从第一个单词开始,然后将每个单词一个接一个地添加。 输出数据始终为一个字。

Sentences follow the same logic as words. They also need the same input length. Instead of being capped by the vocabulary, they are bound by maximum sentence length. If it’s shorter than the maximum length, you fill it up with empty words, a word with just zeros.

句子的逻辑与单词相同。 它们还需要相同的输入长度。 它们不受最大词汇量的限制,而不受词汇量的限制。 如果它小于最大长度,则将其填充为空单词,即只有零的单词。

As you see, words are printed from right to left. This forces each word to change position for each training round. This allows the model to learn the sequence instead of memorizing the position of each word.

如您所见,单词是从右到左打印的。 这迫使每个单词改变每个训练回合的位置。 这使模型可以学习顺序,而不必记住每个单词的位置。

In the below graphic there are four predictions. Each row is one prediction. To the left are the images represented in their three color channels: red, green and blue and the previous words. Outside of the brackets are the predictions one by one, ending with a red square to mark the end.

在下图中,有四个预测。 每一行都是一个预测。 左侧是以三个颜色通道表示的图像:红色,绿色和蓝色以及前一个单词。 括号之外是一个接一个的预测,以红色正方形标记结束。

#Length of longest sentencemax_caption_len = 3#Size of vocabulary vocab_size = 3
# Load one screenshot for each word and turn them into digits images = []for i in range(2):    images.append(img_to_array(load_img('screenshot.jpg', target_size=(224, 224))))images = np.array(images, dtype=float)# Preprocess input for the VGG16 modelimages = preprocess_input(images)
#Turn start tokens into one-hot encodinghtml_input = np.array(            [[[0., 0., 0.], #start             [0., 0., 0.],             [1., 0., 0.]],             [[0., 0., 0.], #start <HTML>Hello World!</HTML>             [1., 0., 0.],             [0., 1., 0.]]])
#Turn next word into one-hot encodingnext_words = np.array(            [[0., 1., 0.], # <HTML>Hello World!</HTML>             [0., 0., 1.]]) # end
# Load the VGG16 model trained on imagenet and output the classification featureVGG = VGG16(weights='imagenet', include_top=True)# Extract the features from the imagefeatures = VGG.predict(images)
#Load the feature to the network, apply a dense layer, and repeat the vectorvgg_feature = Input(shape=(1000,))vgg_feature_dense = Dense(5)(vgg_feature)vgg_feature_repeat = RepeatVector(max_caption_len)(vgg_feature_dense)# Extract information from the input seqence language_input = Input(shape=(vocab_size, vocab_size))language_model = LSTM(5, return_sequences=True)(language_input)
# Concatenate the information from the image and the inputdecoder = concatenate([vgg_feature_repeat, language_model])# Extract information from the concatenated outputdecoder = LSTM(5, return_sequences=False)(decoder)# Predict which word comes nextdecoder_output = Dense(vocab_size, activation='softmax')(decoder)# Compile and run the neural networkmodel = Model(inputs=[vgg_feature, language_input], outputs=decoder_output)model.compile(loss='categorical_crossentropy', optimizer='rmsprop')
# Train the neural networkmodel.fit([features, html_input], next_words, batch_size=2, shuffle=False, epochs=1000)

In the hello world version, we use three tokens: start, <HTML><center><H1>Hello World!<;/H1&gt;</center></HTML> and end. A token can be anything. It can be a character, word, or sentence. Character versions require a smaller vocabulary but constrain the neural network. Word level tokens tend to perform best.

在hello world版本中,我们使用三个标记: start<HTML><center><H1>Hello World!< ; / H1& gt; </ center> </ HTML>并结束。 令牌可以是任何东西。 它可以是字符,单词或句子。 字符版本需要较小的词汇量,但会限制神经网络。 单词级令牌往往表现最佳。

Here we make the prediction:

在这里我们进行预测:

# Create an empty sentence and insert the start tokensentence = np.zeros((1, 3, 3)) # [[0,0,0], [0,0,0], [0,0,0]]start_token = [1., 0., 0.] # startsentence[0][2] = start_token # place start in empty sentence    # Making the first prediction with the start tokensecond_word = model.predict([np.array([features[1]]), sentence])    # Put the second word in the sentence and make the final predictionsentence[0][1] = start_tokensentence[0][2] = np.round(second_word)third_word = model.predict([np.array([features[1]]), sentence])    # Place the start token and our two predictions in the sentence sentence[0][0] = start_tokensentence[0][1] = np.round(second_word)sentence[0][2] = np.round(third_word)    # Transform our one-hot predictions into the final tokensvocabulary = ["start", "<HTML><center><H1>Hello World!</H1></center></HTML>", "end"]for i in sentence[0]:print(vocabulary[np.argmax(i)], end=' ')

输出量 (Output)

  • 10 epochs: start start start

    10个纪元: start start start

  • 100 epochs: start <HTML><center><H1>Hello World!</H1></center></HTML> <HTML><;center><H1>Hello World!</H1></center></HTML>

    100个纪元: start <HTML><center><H1>Hello World!</H1></center></HTML> <HTML>< ; center> <H1> Hello World!</ H1> </ center> < / HTML>

  • 300 epochs: start <HTML><center><H1>Hello World!</H1></center></HTML> end

    300个纪元: start <HTML><center><H1>Hello World!</H1 > </ center> </ HTML>结束

我犯的错误: (Mistakes I made:)

  • Build the first working version before gathering the data. Early on in this project, I managed to get a copy of an old archive of the Geocities hosting website. It had 38 million websites. Blinded by the potential, I ignored the huge workload that would be required to reduce the 100K-sized vocabulary.

    在收集数据之前构建第一个工作版本。 在该项目的早期,我设法获得了Geocities托管网站的旧存档的副本。 它拥有3800万个网站。 被潜力所蒙蔽,我忽略了减少10万字汇所需的巨大工作量。

  • Dealing with a terabyte worth of data requires good hardware or a lot of patience. After having my mac run into several problems I ended up using a powerful remote server. Expect to rent a rig with 8 modern CPU cores and a 1GPS internet connection to have a decent workflow.

    处理TB级的数据需要良好的硬件或足够的耐心。 在Mac遇到几个问题之后,我最终使用了功能强大的远程服务器。 期望租用8个现代CPU内核和1GPS互联网连接的钻机,以实现良好的工作流程。

  • Nothing made sense until I understood the input and output data. The input, X, is one screenshot and the previous markup tags. The output, Y, is the next markup tag. When I got this, it became easier to understand everything between them. It also became easier to experiment with different architectures.

    在我理解输入和输出数据之前,没有任何意义。 输入X是一个屏幕截图和之前的标记标签。 输出Y是下一个标记标签。 当我明白了这一点后,就可以更轻松地了解它们之间的所有内容。 尝试不同的体系结构也变得更加容易。

  • Be aware of the rabbit holes. Because this project intersects with a lot of fields in deep learning, I got stuck in plenty of rabbit holes along the way. I spent a week programming RNNs from scratch, got too fascinated by embedding vector spaces, and was seduced by exotic implementations.

    注意兔子的洞。 因为这个项目与深度学习的许多领域相交,所以我一路陷入了很多兔子洞。 我花了一个星期从头开始对RNN进行编程,对嵌入向量空间太着迷了,并被奇特的实现所吸引。

  • Picture-to-code networks are image caption models in disguise. Even when I learned this, I still ignored many of the image caption papers, simply because they were less cool. Once I got some perspective, I accelerated my learning of the problem space.

    图片编码网络是变相的图像字幕模型。 即使知道了这一点,我仍然忽略了许多图像标题文件,只是因为它们不那么酷。 一旦有了观点,我就会加快对问题空间的学习。

在FloydHub上运行代码 (Running the code on FloydHub)

FloydHub is a training platform for deep learning. I came across them when I first started learning deep learning and I’ve used them since for training and managing my deep learning experiments. You can run your first model within 30 seconds by clicking this button:

FloydHub是深度学习的培训平台。 我刚开始学习深度学习时就遇到了它们,从那时起我就一直使用它们来训练和管理我的深度学习实验。 您可以通过以下按钮在30秒内运行您的第一个模型:

It opens a Workspace on FloydHub where you will find the same environment and dataset used for the Bootstrap version. You can also find the trained models for testing.

它将在FloydHub上打开一个工作区 ,您将在其中找到与Bootstrap版本相同的环境和数据集。 您还可以找到训练有素的模型进行测试。

Or you can do a manual installation by following these steps: 2-min installation or my 5-minute walkthrough.

或者,您可以按照以下步骤进行手动安装: 2分钟的安装5分钟的演练。

克隆存储库 (Clone the repository)
git clone https://github.com/emilwallner/Screenshot-to-code-in-Keras.git
登录并启动FloydHub命令行工具 (Login and initiate FloydHub command-line-tool)
cd Screenshot-to-code-in-Kerasfloyd loginfloyd init s2c
在FloydHub云GPU机器上运行Jupyter笔记本: (Run a Jupyter notebook on a FloydHub cloud GPU machine:)
floyd run --gpu --env tensorflow-1.4 --data emilwallner/datasets/imagetocode/2:data --mode jupyter

All the notebooks are prepared inside the FloydHub directory. The local equivalents are under local. Once it’s running, you can find the first notebook here: floydhub/Helloworld/helloworld.ipynb .

所有笔记本均在FloydHub目录中准备。 本地等效项在本地下。 一旦运行,您可以在这里找到第一个笔记本:floydhub / Hello world / hello world.ipynb。

If you want more detailed instructions and an explanation for the flags, check my earlier post.

如果您需要更详细的说明和标志说明,请查看我之前的文章

HTML版本 (HTML Version)

In this version, we’ll automate many of the steps from the Hello World model. This section will focus on creating a scalable implementation and the moving pieces in the neural network.

在此版本中,我们将自动执行Hello World模型中的许多步骤。 本节将重点介绍在神经网络中创建可扩展的实现和活动部分。

This version will not be able to predict HTML from random websites, but it’s still a great setup to explore the dynamics of the problem.

这个版本将无法预测来自随机网站HTML,但它仍然是探索问题动态的绝佳选择。

总览 (Overview)

If we expand the components of the previous graphic it looks like this.

如果我们扩展上一个图形的组件,它看起来像这样。

There are two major sections. First, the encoder. This is where we create image features and previous markup features. Features are the building blocks that the network creates to connect the design mockups with the markup. At the end of the encoder, we glue the image features to each word in the previous markup.

有两个主要部分。 首先是编码器。 我们在这里创建图像功能和以前的标记功能。 功能是网络创建的构建块,用于将设计模型与标记连接起来。 在编码器的末尾,我们将图像特征粘贴到上一个标记中的每个单词上。

The decoder then takes the combined design and markup feature and creates a next tag feature. This feature is run through a fully connected neural network to predict the next tag.

然后,解码器采用组合的设计和标记功能,并创建下一个标签功能。 此功能通过完全连接的神经网络运行,以预测下一个标签。

设计样机功能 (Design mockup features)

Since we need to insert one screenshot for each word, this becomes a bottleneck when training the network (example). Instead of using the images, we extract the information we need to generate the markup.

由于我们需要为每个单词插入一个屏幕截图,因此这在训练网络时会成为瓶颈( 示例 )。 代替使用图像,我们提取生成标记所需的信息。

The information is encoded into image features. This is done by using an already pre-trained convolutional neural network (CNN). The model is pre-trained on Imagenet.

该信息被编码为图像特征。 这是通过使用已经预训练的卷积神经网络(CNN)来完成的。 该模型已在Imagenet上进行了预训练。

We extract the features from the layer before the final classification.

在最终分类之前,我们从图层中提取要素。

We end up with 1536 eight by eight pixel images known as features. Although they are hard to understand for us, a neural network can extract the objects and position of the elements from these features.

我们最终得到1536个八乘八像素的图像,称为特征。 尽管它们对我们来说很难理解,但是神经网络可以从这些特征中提取对象和元素的位置。

标记功能 (Markup features)

In the hello world version, we used a one-hot encoding to represent the markup. In this version, we’ll use a word embedding for the input and keep the one-hot encoding for the output.

在hello world版本中,我们使用单热编码来表示标记。 在此版本中,我们将对输入使用单词嵌入,并为输出保留一键编码。

The way we structure each sentence stays the same, but how we map each token is changed. One-hot encoding treats each word as an isolated unit. Instead, we convert each word in the input data to lists of digits. These represent the relationship between the markup tags.

我们构造每个句子的方式保持不变,但是我们映射每个标记的方式却发生了变化。 一键式编码将每个单词视为一个隔离的单位。 相反,我们将输入数据中的每个单词转换为数字列表。 这些代表标记之间的关系。

The dimension of this word embedding is eight but often varies between 50–500 depending on the size of the vocabulary.

该单词嵌入的维数为8,但通常在50–500之间变化,具体取决于词汇量。

The eight digits for each word are weights similar to a vanilla neural network. They are tuned to map how the words relate to each other (Mikolov et al., 2013).

每个单词的八位数字的权重类似于香草神经网络。 调整它们以映射单词之间的相互关系( Mikolov等,2013 )。

This is how we start developing markup features. Features are what the neural network develops to link the input data with the output data. For now, don’t worry about what they are, we’ll dig deeper into this in the next section.

这就是我们开始开发标记功能的方式。 神经网络开发的功能就是将输入数据与输出数据链接起来。 现在,不必担心它们是什么,我们将在下一部分中对此进行更深入的研究。

编码器 (The Encoder)

We’ll take the word embeddings and run them through an LSTM and return a sequence of markup features. These are run through a Time distributed dense layer — think of it as a dense layer with multiple inputs and outputs.

我们将使用嵌入一词,并通过LSTM运行它们,并返回一系列标记功能。 它们贯穿时间分布密集层-将其视为具有多个输入和输出的密集层。

In parallel, the image features are first flattened. Regardless of how the digits were structured, they are transformed into one large list of numbers. Then we apply a dense layer on this layer to form a high-level feature. These image features are then concatenated to the markup features.

并行地,图像特征首先被展平。 无论数字的结构如何,它们都将转换为一个大数字列表。 然后,在该层上应用密集层以形成高级功能。 然后将这些图像特征连接到标记特征。

This can be hard to wrap your mind around — so let’s break it down.

这可能很难使您的想法转瞬即逝,所以让我们分解一下。

标记功能 (Markup features)

Here we run the word embeddings through the LSTM layer. In this graphic, all the sentences are padded to reach the maximum size of three tokens.

在这里,我们将单词嵌入贯穿LSTM层。 在此图中,所有句子都被填充以达到三个标记的最大大小。

To mix signals and find higher-level patterns, we apply a TimeDistributed dense layer to the markup features. TimeDistributed dense is the same as a dense layer, but with multiple inputs and outputs.

为了混合信号并找到更高级别的模式,我们将TimeDistributed密集层应用于标记功能。 TimeDistributed密集与密集层相同,但具有多个输入和输出。

图片功能 (Image features)

In parallel, we prepare the images. We take all the mini image features and transform them into one long list. The information is not changed, just reorganized.

同时,我们准备图像。 我们将所有微型图像功能都转换为一个长长的列表。 信息没有改变,只是重新组织了。

Again, to mix signals and extract higher level notions, we apply a dense layer. Since we are only dealing with one input value, we can use a normal dense layer. To connect the image features to the markup features, we copy the image features.

同样,为了混合信号并提取更高级别的概念,我们应用了一个密集层。 由于我们只处理一个输入值,因此可以使用普通的密集层。 为了将图像特征连接到标记特征,我们复制了图像特征。

In this case, we have three markup features. Thus, we end up with an equal amount of image features and markup features.

在这种情况下,我们具有三个标记功能。 因此,我们最终得到了相等数量的图像特征和标记特征。

串联图像和标记功能 (Concatenating the image and markup features)

All the sentences are padded to create three markup features. Since we have prepared the image features, we can now add one image feature for each markup feature.

所有句子都被填充以创建三个标记功能。 由于我们已经准备好图像功能,因此现在可以为每个标记功能添加一个图像功能。

After sticking one image feature to each markup feature, we end up with three image-markup features. This is the input we feed into the decoder.

在将一个图像功能粘贴到每个标记功能之后,我们最终得到了三个图像标记功能。 这是我们输入到解码器的输入。

解码器 (The Decoder)

Here we use the combined image-markup features to predict the next tag.

在这里,我们使用组合的图像标记功能来预测下一个标记。

In the below example, we use three image-markup feature pairs and output one next tag feature.

在下面的示例中,我们使用三个图像标记特征对,并输出一个下一个标记特征。

Note that the LSTM layer has the sequence set to false. Instead of returning the length of the input sequence, it only predicts one feature. In our case, it’s a feature for the next tag. It contains the information for the final prediction.

请注意,LSTM层的序列设置为false。 它不返回输入序列的长度,而只是预测一个特征。 在我们的例子中,这是下一个标签的功能。 它包含最终预测的信息。

最终预测 (The final prediction)

The dense layer works like a traditional feedforward neural network. It connects the 512 digits in the next tag feature with the 4 final predictions. Say we have 4 words in our vocabulary: start, hello, world, and end.

密集层的工作方式类似于传统的前馈神经网络。 它将下一个标记功能中的512位数字与4个最终预测相关联。 假设我们的词汇表中有4个单词:开始,你好,世界和结束。

The vocabulary prediction could be [0.1, 0.1, 0.1, 0.7]. The softmax activation in the dense layer distributes a probability from 0–1, with the sum of all predictions equal to 1. In this case, it predicts that the 4th word is the next tag. Then you translate the one-hot encoding [0, 0, 0, 1] into the mapped value, say “end”.

词汇预测可能是[0.1、0.1、0.1、0.7]。 密集层中的softmax激活分配的概率为0-1,所有预测的总和等于1。在这种情况下,它预测第四个单词是下一个标记。 然后,将一键编码[0、0、0、1]转换为映射的值,例如“ end”。

# Load the images and preprocess them for inception-resnetimages = []all_filenames = listdir('images/')all_filenames.sort()for filename in all_filenames:    images.append(img_to_array(load_img('images/'+filename, target_size=(299, 299))))images = np.array(images, dtype=float)images = preprocess_input(images)
# Run the images through inception-resnet and extract the features without the classification layerIR2 = InceptionResNetV2(weights='imagenet', include_top=False)features = IR2.predict(images)
# We will cap each input sequence to 100 tokensmax_caption_len = 100# Initialize the function that will create our vocabulary tokenizer = Tokenizer(filters='', split=" ", lower=False)
# Read a document and return a stringdef load_doc(filename):    file = open(filename, 'r')    text = file.read()    file.close()    return text
# Load all the HTML filesX = []all_filenames = listdir('html/')all_filenames.sort()for filename in all_filenames:    X.append(load_doc('html/'+filename))
# Create the vocabulary from the html filestokenizer.fit_on_texts(X)
# Add +1 to leave space for empty wordsvocab_size = len(tokenizer.word_index) + 1# Translate each word in text file to the matching vocabulary indexsequences = tokenizer.texts_to_sequences(X)# The longest HTML filemax_length = max(len(s) for s in sequences)
# Intialize our final input to the modelX, y, image_data = list(), list(), list()for img_no, seq in enumerate(sequences):    for i in range(1, len(seq)):        # Add the entire sequence to the input and only keep the next word for the output        in_seq, out_seq = seq[:i], seq[i]        # If the sentence is shorter than max_length, fill it up with empty words        in_seq = pad_sequences([in_seq], maxlen=max_length)[0]        # Map the output to one-hot encoding        out_seq = to_categorical([out_seq], num_classes=vocab_size)[0]        # Add and image corresponding to the HTML file        image_data.append(features[img_no])        # Cut the input sentence to 100 tokens, and add it to the input data        X.append(in_seq[-100:])        y.append(out_seq)
X, y, image_data = np.array(X), np.array(y), np.array(image_data)
# Create the encoderimage_features = Input(shape=(8, 8, 1536,))image_flat = Flatten()(image_features)image_flat = Dense(128, activation='relu')(image_flat)ir2_out = RepeatVector(max_caption_len)(image_flat)
language_input = Input(shape=(max_caption_len,))language_model = Embedding(vocab_size, 200, input_length=max_caption_len)(language_input)language_model = LSTM(256, return_sequences=True)(language_model)language_model = LSTM(256, return_sequences=True)(language_model)language_model = TimeDistributed(Dense(128, activation='relu'))(language_model)
# Create the decoderdecoder = concatenate([ir2_out, language_model])decoder = LSTM(512, return_sequences=False)(decoder)decoder_output = Dense(vocab_size, activation='softmax')(decoder)
# Compile the modelmodel = Model(inputs=[image_features, language_input], outputs=decoder_output)model.compile(loss='categorical_crossentropy', optimizer='rmsprop')
# Train the neural networkmodel.fit([image_data, X], y, batch_size=64, shuffle=False, epochs=2)
# map an integer to a worddef word_for_id(integer, tokenizer):    for word, index in tokenizer.word_index.items():        if index == integer:            return word    return None
# generate a description for an imagedef generate_desc(model, tokenizer, photo, max_length):    # seed the generation process    in_text = 'START'    # iterate over the whole length of the sequence    for i in range(900):        # integer encode input sequence        sequence = tokenizer.texts_to_sequences([in_text])[0][-100:]        # pad input        sequence = pad_sequences([sequence], maxlen=max_length)        # predict next word        yhat = model.predict([photo,sequence], verbose=0)        # convert probability to integer        yhat = np.argmax(yhat)        # map integer to word        word = word_for_id(yhat, tokenizer)        # stop if we cannot map the word        if word is None:            break        # append as input for generating the next word        in_text += ' ' + word        # Print the prediction        print(' ' + word, end='')        # stop if we predict the end of the sequence        if word == 'END':            break    return
# Load and image, preprocess it for IR2, extract features and generate the HTMLtest_image = img_to_array(load_img('images/87.jpg', target_size=(299, 299)))test_image = np.array(test_image, dtype=float)test_image = preprocess_input(test_image)test_features = IR2.predict(np.array([test_image]))generate_desc(model, tokenizer, np.array(test_features), 100)

输出量 (Output)

If you can’t see anything when you click these links, you can right click and click on “View Page Source.” Here is the original website for reference.

如果您在单击这些链接时看不到任何内容,则可以右键单击并单击“查看页面源”。 这是原始网站供参考。

我犯的错误: (Mistakes I made:)

  • LSTMs are a lot heavier for my cognition compared to CNNs. When I unrolled all the LSTMs, they became easier to understand. Fast.ai’s video on RNNs was super useful. Also, focus on the input and output features before you try understanding how they work.

    与CNN相比,LSTM对于我的认知而言要重得多 。 当我展开所有LSTM时,它们变得更容易理解。 Fast.ai的有关RNN的视频非常有用。 另外,在尝试了解输入和输出功能的工作原理之前,请先关注它们。

  • Building a vocabulary from the ground up is a lot easier than narrowing down a huge vocabulary. This includes everything from fonts, div sizes, and hex colors to variable names and normal words.

    从头开始构建词汇表比缩小庞大的词汇表要容易得多。 这包括字体,div大小和十六进制颜色到变量名和常规单词的所有内容。

  • Most of the libraries are created to parse text documents and not code. In documents, everything is separated by a space, but in code, you need custom parsing.

    创建大多数库是为了分析文本文档而不是代码。 在文档中,所有内容都由空格分隔,但是在代码中,您需要自定义解析。

  • You can extract features with a model that’s trained on Imagenet. This might seem counterintuitive since Imagenet has few web images. However, the loss is 30% higher compared to to a pix2code model, which is trained from scratch. It’d be interesting to use a pre-train inception-resnet type of model based on web screenshots.

    您可以使用在Imagenet上训练的模型提取特征。 由于Imagenet的Web图像很少,因此这似乎违反直觉。 但是,与从头开始训练的pix2code模型相比,损失要高30%。 基于网络屏幕截图使用预训练初始-resnet类型的模型会很有趣。

引导版本 (Bootstrap version)

In our final version, we’ll use a dataset of generated bootstrap websites from the pix2code paper. By using Twitter’s bootstrap, we can combine HTML and CSS and decrease the size of the vocabulary.

在最终版本中,我们将使用pix2code文件中生成的引导网站的数据集 通过使用Twitter的bootstrap ,我们可以结合HTML和CSS并减少词汇量。

We’ll enable it to generate the markup for a screenshot it has not seen before. We’ll also dig into how it builds knowledge about the screenshot and markup.

我们将使它能够为以前从未见过的屏幕截图生成标记。 我们还将深入研究它如何建立有关屏幕截图和标记的知识。

Instead of training it on the bootstrap markup, we’ll use 17 simplified tokens that we then translate into HTML and CSS. The dataset includes 1500 test screenshots and 250 validation images. For each screenshot there are on average 65 tokens, resulting in 96925 training examples.

我们将使用17个简化的令牌,然后将其转换为HTML和CSS,而不是对引导程序标记进行培训。 数据集包括1500个测试屏幕截图和250个验证图像。 每个屏幕截图平均有65个令牌,产生了96925个训练示例。

By tweaking the model in the pix2code paper, the model can predict the web components with 97% accuracy (BLEU 4-ngram greedy search, more on this later).

通过在pix2code文件中调整模型,该模型可以以97%的准确性预测Web组件(BLEU 4-ngram贪婪搜索,稍后将对此进行更多介绍)。

端到端方法 (An end-to-end approach)

Extracting features from pre-trained models works well in image captioning models. But after a few experiments, I realized that pix2code’s end-to-end approach works better for this problem. The pre-trained models have not been trained on web data and are customized for classification.

从预训练模型中提取特征在图像字幕模型中效果很好。 但是经过一些实验,我意识到pix2code的端到端方法可以更好地解决这个问题。 预先训练的模型尚未针对网络数据进行训练,并且已针对分类进行了定制。

In this model, we replace the pre-trained image features with a light convolutional neural network. Instead of using max-pooling to increase information density, we increase the strides. This maintains the position and the color of the front-end elements.

在该模型中,我们用光卷积神经网络替换了预训练的图像特征。 我们没有使用最大池来增加信息密度,而是增加了步幅。 这样可以保持前端元素的位置和颜色。

There are two core models that enable this: convolutional neural networks (CNN) and recurrent neural networks (RNN). The most common recurrent neural network is long-short term memory (LSTM), so that’s what I’ll refer to.

有两种核心模型可以实现这一点:卷积神经网络(CNN)和递归神经网络(RNN)。 最常见的递归神经网络是长期短期记忆(LSTM),这就是我要提到的。

There are plenty of great CNN tutorials, and I covered them in my previous article. Here, I’ll focus on the LSTMs.

有很多很棒的CNN教程,我在上一篇文章中进行了介绍 。 在这里,我将重点介绍LSTM。

了解LSTM中的时间步长 (Understanding timesteps in LSTMs)

One of the harder things to grasp about LSTMs is timesteps. A vanilla neural network can be thought of as two timesteps. If you give it “Hello,” it predicts “World.” But it would struggle to predict more timesteps. In the below example, the input has four timesteps, one for each word.

关于LSTM的难点之一是时间步长。 可以将香草神经网络视为两个时间步长。 如果您将其命名为“ Hello”,则表示预测为“世界”。 但是要预测更多的时间步将很困难。 在下面的示例中,输入具有四个时间步,每个单词一个。

LSTMs are made for input with timesteps. It’s a neural network customized for information in order. If you unroll our model it looks like this. For each downward step, you keep the same weights. You apply one set of weights to the previous output and another set to the new input.

LSTM是随时间步长输入的。 这是为信息有序定制的神经网络。 如果您展开我们的模型,它将看起来像这样。 对于每个向下的步骤,您将保持相同的权重。 您将一组权重应用于先前的输出,并将另一组权重应用于新的输入。

The weighted input and output are concatenated and added together with an activation. This is the output for that timestep. Since we reuse the weights, they draw information from several inputs and build knowledge of the sequence.

加权的输入和输出被串联并与激活一起添加。 这是该时间步的输出。 由于我们重用了权重,因此它们从多个输入中提取信息并建立了序列知识。

Here is a simplified version of the process for each timestep in an LSTM.

这是LSTM中每个时间步的简化过程。

To get a feel for this logic, I’d recommend building an RNN from scratch with Andrew Trask’s brilliant tutorial.

为了体会这种逻辑,我建议使用Andrew Trask 出色的教程从头开始构建RNN。

了解LSTM层中的单位 (Understanding the units in LSTM layers)

The number of units in each LSTM layer determines it’s ability to memorize. This also corresponds to the size of each output feature. Again, a feature is a long list of numbers used to transfer information between layers.

每个LSTM层中的单位数量决定了它的记忆能力。 这也对应于每个输出要素的大小。 同样,功能是一长串数字,用于在层之间传输信息。

Each unit in the LSTM layer learns to keep track of different aspects of the syntax. Below is a visualization of a unit that keeps tracks of the information in the row div. This is the simplified markup we are using to train the bootstrap model.

LSTM层中的每个单元都学会跟踪语法的不同方面。 下面是一个可视化的单元,该单元在div行中跟踪信息。 这是我们用来训练引导程序模型的简化标记。

Each LSTM unit maintains a cell state. Think of the cell state as the memory. The weights and activations are used to modify the state in different ways. This enables the LSTM layers to fine-tune which information to keep and discard for each input.

每个LSTM单元保持一个单元状态。 将单元状态视为内存。 权重和激活用于以不同方式修改状态。 这使LSTM层可以微调为每个输入保留和丢弃的信息。

In addition to passing through an output feature for each input, it also forwards the cell states, one value for each unit in the LSTM. To get a feel for how the components within the LSTM interact, I recommend Colah’s tutorial, Jayasiri’s Numpy implementation, and Karphay’s lecture and write-up.

除了传递每个输入的输出功能外,它还转发单元状态,即LSTM中每个单元的一个值。 为了了解LSTM中的组件如何交互,我建议使用Colah的教程 ,Jayasiri的Numpy实现以及Karphay的讲座和文章

dir_name = 'resources/eval_light/'
# Read a file and return a stringdef load_doc(filename):    file = open(filename, 'r')    text = file.read()    file.close()    return text
def load_data(data_dir):    text = []    images = []    # Load all the files and order them    all_filenames = listdir(data_dir)    all_filenames.sort()    for filename in (all_filenames):        if filename[-3:] == "npz":            # Load the images already prepared in arrays            image = np.load(data_dir+filename)            images.append(image['features'])        else:            # Load the boostrap tokens and rap them in a start and end tag            syntax = '<START> ' + load_doc(data_dir+filename) + ' <END>'            # Seperate all the words with a single space            syntax = ' '.join(syntax.split())            # Add a space after each comma            syntax = syntax.replace(',', ' ,')            text.append(syntax)    images = np.array(images, dtype=float)    return images, text
train_features, texts = load_data(dir_name)
# Initialize the function to create the vocabulary tokenizer = Tokenizer(filters='', split=" ", lower=False)# Create the vocabulary tokenizer.fit_on_texts([load_doc('bootstrap.vocab')])
# Add one spot for the empty word in the vocabulary vocab_size = len(tokenizer.word_index) + 1# Map the input sentences into the vocabulary indexestrain_sequences = tokenizer.texts_to_sequences(texts)# The longest set of boostrap tokensmax_sequence = max(len(s) for s in train_sequences)# Specify how many tokens to have in each input sentencemax_length = 48
def preprocess_data(sequences, features):    X, y, image_data = list(), list(), list()    for img_no, seq in enumerate(sequences):        for i in range(1, len(seq)):            # Add the sentence until the current count(i) and add the current count to the output            in_seq, out_seq = seq[:i], seq[i]            # Pad all the input token sentences to max_sequence            in_seq = pad_sequences([in_seq], maxlen=max_sequence)[0]            # Turn the output into one-hot encoding            out_seq = to_categorical([out_seq], num_classes=vocab_size)[0]            # Add the corresponding image to the boostrap token file            image_data.append(features[img_no])            # Cap the input sentence to 48 tokens and add it            X.append(in_seq[-48:])            y.append(out_seq)    return np.array(X), np.array(y), np.array(image_data)
X, y, image_data = preprocess_data(train_sequences, train_features)
#Create the encoderimage_model = Sequential()image_model.add(Conv2D(16, (3, 3), padding='valid', activation='relu', input_shape=(256, 256, 3,)))image_model.add(Conv2D(16, (3,3), activation='relu', padding='same', strides=2))image_model.add(Conv2D(32, (3,3), activation='relu', padding='same'))image_model.add(Conv2D(32, (3,3), activation='relu', padding='same', strides=2))image_model.add(Conv2D(64, (3,3), activation='relu', padding='same'))image_model.add(Conv2D(64, (3,3), activation='relu', padding='same', strides=2))image_model.add(Conv2D(128, (3,3), activation='relu', padding='same'))
image_model.add(Flatten())image_model.add(Dense(1024, activation='relu'))image_model.add(Dropout(0.3))image_model.add(Dense(1024, activation='relu'))image_model.add(Dropout(0.3))
image_model.add(RepeatVector(max_length))
visual_input = Input(shape=(256, 256, 3,))encoded_image = image_model(visual_input)
language_input = Input(shape=(max_length,))language_model = Embedding(vocab_size, 50, input_length=max_length, mask_zero=True)(language_input)language_model = LSTM(128, return_sequences=True)(language_model)language_model = LSTM(128, return_sequences=True)(language_model)
#Create the decoderdecoder = concatenate([encoded_image, language_model])decoder = LSTM(512, return_sequences=True)(decoder)decoder = LSTM(512, return_sequences=False)(decoder)decoder = Dense(vocab_size, activation='softmax')(decoder)
# Compile the modelmodel = Model(inputs=[visual_input, language_input], outputs=decoder)optimizer = RMSprop(lr=0.0001, clipvalue=1.0)model.compile(loss='categorical_crossentropy', optimizer=optimizer)
#Save the model for every 2nd epochfilepath="org-weights-epoch-{epoch:04d}--val_loss-{val_loss:.4f}--loss-{loss:.4f}.hdf5"checkpoint = ModelCheckpoint(filepath, monitor='val_loss', verbose=1, save_weights_only=True, period=2)callbacks_list = [checkpoint]
# Train the modelmodel.fit([image_data, X], y, batch_size=64, shuffle=False, validation_split=0.1, callbacks=callbacks_list, verbose=1, epochs=50)

测试精度 (Test accuracy)

It’s tricky to find a fair way to measure the accuracy. Say you compare word by word. If your prediction is one word out of sync, you might have 0% accuracy. If you remove one word which syncs the prediction, you might end up with 99/100.

寻找一种公平的方法来测量精度是很棘手的。 假设您逐字比较。 如果您的预测是一个单词不同步,则您的准确度可能为0%。 如果删除一个与预测同步的单词,则最终可能会得到99/100。

I used the BLEU score, best practice in machine translating and image captioning models. It breaks the sentence into four n-grams, from 1–4 word sequences. In the below prediction “cat” is supposed to be “code.”

我使用了BLEU分数,这是机器翻译和图像字幕模型的最佳实践。 它将句子从1-4个单词序列分解为四个n-gram。 在下面的预测中,“猫”应该是“代码”。

To get the final score, you multiply each score with 25%, (4/5) * 0.25 + (2/4) * 0.25 + (1/3) * 0.25 + (0/2) * 0.25 = 0.2 + 0.125 + 0.083 + 0 = 0.408 . The sum is then multiplied with a sentence length penalty. Since the length is correct in our example, it becomes our final score.

要获得最终分数,请将每个分数乘以25%,即(4/5)* 0.25 +(2/4)* 0.25 +(1/3)* 0.25 +(0/2)* 0.25 = 0.2 + 0.125 + 0.083 + 0 = 0.408。 然后将总和乘以句子长度罚分。 由于长度在我们的示例中是正确的,因此它将成为我们的最终分数。

You could increase the number of n-grams to make it harder. A four n-gram model is the model that best corresponds to human translations. I’d recommend running a few examples with the below code and reading the wiki page.

您可以增加n-gram的数量以使其更难。 四个n元语法模型是最适合人类翻译的模型。 我建议使用以下代码运行一些示例,并阅读Wiki页面。

#Create a function to read a file and return its contentdef load_doc(filename):    file = open(filename, 'r')    text = file.read()    file.close()    return text
def load_data(data_dir):    text = []    images = []    files_in_folder = os.listdir(data_dir)    files_in_folder.sort()    for filename in tqdm(files_in_folder):        #Add an image        if filename[-3:] == "npz":            image = np.load(data_dir+filename)            images.append(image['features'])        else:        # Add text and wrap it in a start and end tag            syntax = '<START> ' + load_doc(data_dir+filename) + ' <END>'            #Seperate each word with a space            syntax = ' '.join(syntax.split())            #Add a space between each comma            syntax = syntax.replace(',', ' ,')            text.append(syntax)    images = np.array(images, dtype=float)    return images, text
#Intialize the function to create the vocabularytokenizer = Tokenizer(filters='', split=" ", lower=False)#Create the vocabulary in a specific ordertokenizer.fit_on_texts([load_doc('bootstrap.vocab')])
dir_name = '../../../../eval/'train_features, texts = load_data(dir_name)
#load model and weights json_file = open('../../../../model.json', 'r')loaded_model_json = json_file.read()json_file.close()loaded_model = model_from_json(loaded_model_json)# load weights into new modelloaded_model.load_weights("../../../../weights.hdf5")print("Loaded model from disk")
# map an integer to a worddef word_for_id(integer, tokenizer):    for word, index in tokenizer.word_index.items():        if index == integer:            return word    return Noneprint(word_for_id(17, tokenizer))
# generate a description for an imagedef generate_desc(model, tokenizer, photo, max_length):    photo = np.array([photo])    # seed the generation process    in_text = '<START> '    # iterate over the whole length of the sequence    print('\nPrediction---->\n\n<START> ', end='')    for i in range(150):        # integer encode input sequence        sequence = tokenizer.texts_to_sequences([in_text])[0]        # pad input        sequence = pad_sequences([sequence], maxlen=max_length)        # predict next word        yhat = loaded_model.predict([photo, sequence], verbose=0)        # convert probability to integer        yhat = argmax(yhat)        # map integer to word        word = word_for_id(yhat, tokenizer)        # stop if we cannot map the word        if word is None:            break        # append as input for generating the next word        in_text += word + ' '        # stop if we predict the end of the sequence        print(word + ' ', end='')        if word == '<END>':            break    return in_text
max_length = 48
# evaluate the skill of the modeldef evaluate_model(model, descriptions, photos, tokenizer, max_length):    actual, predicted = list(), list()    # step over the whole set    for i in range(len(texts)):        yhat = generate_desc(model, tokenizer, photos[i], max_length)        # store actual and predicted        print('\n\nReal---->\n\n' + texts[i])        actual.append([texts[i].split()])        predicted.append(yhat.split())    # calculate BLEU score    bleu = corpus_bleu(actual, predicted)    return bleu, actual, predicted
bleu, actual, predicted = evaluate_model(loaded_model, texts, train_features, tokenizer, max_length)
#Compile the tokens into HTML and cssdsl_path = "compiler/assets/web-dsl-mapping.json"compiler = Compiler(dsl_path)compiled_website = compiler.compile(predicted[0], 'index.html')
print(compiled_website )print(bleu)

输出量 (Output)

Links to sample output

链接到样本输出

我犯的错误: (Mistakes I made:)

  • Understand the weakness of the models instead of testing random models. First, I applied random things such as batch normalization and bidirectional networks and tried implementing attention. After looking at the test data and seeing that it could not predict color and position with high accuracy, I realized there was a weakness in the CNN. This lead me to replace maxpooling with increased strides. The validation loss went from 0.12 to 0.02 and increased the BLEU score from 85% to 97%.

    了解模型的弱点,而不是测试随机模型。 首先,我应用了诸如批处理规范化和双向网络之类的随机事物,并尝试实现注意力。 在查看测试数据并发现它无法高精度预测颜色和位置之后,我意识到CNN存在缺陷。 这使我以更大的步伐取代了maxpooling。 验证损失从0.12增加到0.02,并将BLEU评分从85%增加到97%。

  • Only use pre-trained models if they are relevant. Given the small dataset I thought that a pre-trained image model would improve the performance. From my experiments, and end-to-end model is slower to train and requires more memory, but is 30% more accurate.

    仅在相关时使用预先训练的模型。 考虑到较小的数据集,我认为预先训练的图像模型可以提高性能。 根据我的实验,端到端模型的训练速度较慢,并且需要更多的内存,但准确性要高出30%。

  • Plan for slight variance when you run your model on a remote server. On my mac, it reads the files in alphabetic order. However, on the server, it was randomly located. This created a mismatch between the screenshots and the code. It still converged, but the validation data was 50% worse than when I fixed it.

    在远程服务器上运行模型时,请计划有细微的差异。 在我的Mac上,它按字母顺序读取文件。 但是,在服务器上,它是随机放置的。 这在屏幕截图和代码之间造成了不匹配。 它仍然收敛,但是验证数据比我修复时差50%。

  • Make sure you understand library functions. Include space for the empty token in your vocabulary. When I didn’t add it, it did not include one of the tokens. I only noticed it after looking at the final output several times and noticing that it never predicted a “single” token. After a quick check, I realized it wasn’t even in the vocabulary. Also, use the same order in the vocabulary for training and testing.

    确保您了解库函数。 在词汇表中包含用于空令牌的空间。 当我不添加它时,它不包含令牌之一。 在多次查看最终输出并注意到它从未预测到“单个”令牌后,我才注意到它。 快速检查后,我意识到它甚至不在词汇表中。 另外,在词汇表中使用相同的顺序进行培训和测试。

  • Use lighter models when experimenting. Using GRUs instead of LSTMs reduced each epoch cycle by 30%, and did not have a large effect on the performance.

    实验时请使用较轻的模型。 使用GRU代替LSTM将每个纪元周期减少了30%,并且对性能没有太大影响。

下一步 (Next steps)

Front-end development is an ideal space to apply deep learning. It’s easy to generate data, and the current deep learning algorithms can map most of the logic.

前端开发是应用深度学习的理想空间。 生成数据很容易,当前的深度学习算法可以映射大多数逻辑。

One of the most exciting areas is applying attention to LSTMs. This will not just improve the accuracy, but enable us to visualize where the CNN puts its focus as it generates the markup.

最令人兴奋的领域之一是关注LSTM 。 这不仅可以提高准确性,还可以使我们可视化CNN生成标记时将焦点放在何处。

Attention is also key for communicating between markup, stylesheets, scripts and eventually the backend. Attention layers can keep track of variables, enabling the network to communicate between programming languages.

注意也是标记,样式表,脚本以及最终后端之间进行通信的关键。 注意层可以跟踪变量,使网络能够在编程语言之间进行通信。

But in the near feature, the biggest impact will come from building a scalable way to synthesize data. Then you can add fonts, colors, words, and animations step-by-step.

但是在近期功能中,最大的影响将来自构建可扩展的方法来合成数据。 然后,您可以逐步添加字体,颜色,单词和动画。

So far, most progress is happening in taking sketches and turning them into template apps. In less then two years, we’ll be able to draw an app on paper and have the corresponding front-end in less than a second. There are already two working prototypes built by Airbnb’s design team and Uizard.

到目前为止,大多数进展都发生在绘制草图并将其转换为模板应用程序中。 在不到两年的时间内,我们将能够在纸上绘制一个应用程序,并在不到一秒钟的时间内拥有相应的前端。 Airbnb的设计团队Uizard已经建立了两个工作原型。

Here are some experiments to get started.

这是一些实验入门。

实验 (Experiments)

Getting started

入门

  • Run all the models

    运行所有模型
  • Try different hyper parameters

    尝试不同的超级参数
  • Test a different CNN architecture

    测试不同的CNN架构
  • Add Bidirectional LSTM models

    添加双向LSTM模型
  • Implement the model with a different dataset. (You can easily mount this dataset in your FloydHub jobs with this flag --data emilwallner/datasets/100k-html:data)

    不同的数据集实现模型。 (您可以使用此标志--data emilwallner/datasets/100k-html:data轻松地将此数据集挂载到FloydHub作业中)

Further experiments

进一步的实验

  • Creating a solid random app/web generator with the corresponding syntax.

    使用相应的语法创建可靠的随机应用程序/网络生成器。
  • Data for a sketch to app model. Auto-convert the app/web screenshots into sketches and use a GAN to create variety.

    草图到应用程序模型的数据。 将应用程序/网络屏幕快照自动转换为草图,并使用GAN创造多样性。
  • Apply an attention layer to visualize the focus on the image for each prediction, similar to this model.

    类似于此模型 ,为每个预测应用关注层以使图像上的焦点可视化。

  • Create a framework for a modular approach. Say, having encoder models for fonts, one for color, another for layout and combine them with one decoder. A good start could be solid image features.

    创建模块化方法的框架。 假设有一个用于字体的编码器模型,一个用于颜色的编码器模型,另一个用于布局的编码器模型,并将它们与一个解码器组合在一起。 良好的开始可能是坚实的图像功能。
  • Feed the network simple HTML components and teach it to generate animations using CSS. It would be fascinating to have an attention approach and visualize the focus on both input sources.

    向网络提供简单HTML组件,并教其使用CSS生成动画。 拥有注意力的方法并使对两个输入源的关注可视化将是非常有趣的。

Huge thanks to Tony Beltramelli and Jon Gold for their research and ideas, and for answering questions. Thanks to Jason Brownlee for his stellar Keras tutorials (I included a few snippets from his tutorial in the core Keras implementation), and Beltramelli for providing the data. Also thanks to Qingping Hou, Charlie Harrington, Sai Soundararaj, Jannes Klaas, Claudio Cabral, Alain Demenet and Dylan Djian for reading drafts of this.

非常感谢 Tony Beltramelli和Jon Gold的研究和想法,并回答了问题。 感谢Jason Brownlee提供了出色的Keras教程(我在核心Keras实现中包括了他的教程中的一些摘录),以及Beltramelli提供了数据。 还要感谢侯庆平,查理·哈灵顿,赛·桑达拉拉杰,珍妮丝·克拉斯,克劳迪奥·卡布拉尔,阿兰·德梅内和迪伦·德吉恩阅读本文的草稿。

关于埃米尔·沃纳(Emil Wallner) (About Emil Wallner)

This the fourth part of a multi-part blog series from Emil as he learns deep learning. Emil has spent a decade exploring human learning. He’s worked for Oxford’s business school, invested in education startups, and built an education technology business. Last year, he enrolled at Ecole 42 to apply his knowledge of human learning to machine learning.

这是Emil学习深度学习的博客系列的第四部分。 埃米尔(Emil)花了十年时间探索人类学习。 他曾在牛津大学商学院工作,投资了教育创业公司,并建立了教育技术公司。 去年,他加入了42大学 ,将他对人类学习的知识应用于机器学习。

If you build something or get stuck, ping me below or on twitter: emilwallner. I’d love to see what you are building.

如果您有建物或被卡住,请在下面或在Twitter上对我ping: emilwallner 。 我很想看看你在建什么。

This was first published as a community post on Floydhub’s blog.

它最初是作为社区帖子发布在Floydhub的博客上的。

翻译自: https://www.freecodecamp.org/news/how-you-can-train-an-ai-to-convert-your-design-mockups-into-html-and-css-cc7afd82fed4/

ai css 线条粗细

版权声明:本文为博主原创文章,遵循 CC 4.0 BY-SA 版权协议,转载请附上原文出处链接和本声明。
本文链接:https://blog.csdn.net/cumichun6193/article/details/108159612

智能推荐

解决React在安装antd之后出现的Can't resolve './locale'或者浏览器显示Cannot find module './locale’问题的两种方案_react脚手架 can't resolve './const-程序员宅基地

文章浏览阅读1.1w次,点赞4次,收藏3次。React在安装antd之后出现的Can’t resolve './locale’或者浏览器显示Cannot find module './locale’问题,是因为moment的版本有问题,而react默认使用了最新的moment,但是在[email protected]中是没有问题的。1.第一种解决方法解决方案就是配置webpack的alias,将所有的 moment 路径引用导入到 [email protected]操作步骤安装moment 依赖 npm install [email protected]_react脚手架 can't resolve './const

Redis之下载与安装(二)_(二)redis下载和安装-程序员宅基地

文章浏览阅读161次。其实Redis并不正式支持windows版本,官网那个5.0.x稳定版其实是linux版本_(二)redis下载和安装

AD10 PCB文件统一设置字体大小(默认字体大小)_ad10统一修改字符大小-程序员宅基地

文章浏览阅读4.3w次,点赞10次,收藏23次。这个网上给我的提示(http://zhidao.baidu.com/link?url=UNxy0GoaU7jj0QRgCikKIdHIrE7C-FOiojG-5nE6a_QBqXVuYdublOROizQyNRtfNudH53WZQrGj6Dgv_sXcr_):默认值设置:Tools_Preference_PCB Editor_Defaults,选Component点Edit Values.._ad10统一修改字符大小

鼠标光标是黑色方块,切换为竖线的方法_光标变成黑块怎么变回竖线-程序员宅基地

文章浏览阅读2.9w次,点赞39次,收藏36次。编辑时有两种模式:1、光标位置显示为竖线,即通常使用的模式,插入模式2、光标位置显示为方块,即覆盖模式。这2个可以互相切换,切换方法:1、如果目前是插入模式,点击键盘的insert键,则切换成覆盖模式,反之同样。2、笔记本电脑键盘没有insert键时可以用软键盘点击insert键进行切换(软键盘打开方式:win+R,输入osk回车),或者接一个有Insert的键盘点击 insert 就好..._光标变成黑块怎么变回竖线

生态速递丨微擎系统已支持一键部署至云托管-程序员宅基地

文章浏览阅读385次。微信云托管是微信团队和腾讯云联合提供的以云原生为基础的,免运维、高可用服务上云解决方案,无需服务器,1分钟即可部署小程序/公众号服务端。微擎是一家中小企业云端商业及营销解决方案提供商,主要从事网络技术研发及零售科技服务。现有认证开发者超30000名,服务规模超50万家。为降低用户使用门槛,方便用户更加方便快捷地上云,微擎已支持通过“云安装”功能,将代码托管至微信云托管,无需单独购买服务器和域名,即可实现一键部署,简化操作,非常适合初创、中小企业以及研发人力有限的团队。操作指引几步完成配置一、在微擎

RBF神经网络参考模型自适应MATLAB实现(分析)_rbf自适应控制-程序员宅基地

文章浏览阅读2.4w次,点赞28次,收藏209次。由于BP神经网络的收敛速度慢,不适合安在自适应系统里,所以选择了速度比较快的RBF神经网络,看了关于RBF神经网络自适应控制的一些原理和MATLAB代码,有一些自己的理解写在这里。一般的神经网络的作用是去做一些分类,回归等工作,能够根据系统输入,在训练好的神经网络系统下分类或者预测出系统的输出,我主要的工作不是做分类器,这个方法主要运用在机器视觉上,我的研究生主要工作是逼近一个未知模型,这就需要..._rbf自适应控制

随便推点

Oracle & 神通数据库 清空所有表数据操作_神通数据库清除数据的脚本-程序员宅基地

文章浏览阅读1.3k次,点赞2次,收藏2次。1、拼接处truncate所有表的语句select 'truncate table '||table_name||';' from user_tables;2、Ctrl+A全选,粘贴至命令行,执行Over._神通数据库清除数据的脚本

切图常说的@1X@2X@3X是什么意思?_web1x 2x-程序员宅基地

文章浏览阅读1w次。苹果IOS程序开发不同分辨率的设备统一为一个尺寸而标记的。@3X就是@1X分辨率的3倍。如图,iPad2 是768 x 1024,iPad Retina 是1536 x 2048,开发时都按 768 * 1024 操作。但实际上两者有一倍差异。为了达到最佳效果,使用的图片大小不一样。这时候就用同一个名称,但 Retina 的图加上 @2x 后缀。系统加载图片时,在 iPad2 上会加载 @_web1x 2x

element-ui el-input / el-select输入框的非空校验_vue2对el-select和el-input如何判断是否为空-程序员宅基地

文章浏览阅读1.7w次,点赞4次,收藏8次。一、之前写的实现直接js判断就可以.prevent .stop 是阻止继续冒泡 不一样的见 官方说明 1https://cn.vuejs.org/v2/cookbook/form-validation.html#%E4%BD%BF%E7%94%A8%E8%87%AA%E5%AE%9A%E4%B9%89%E6%A0%A1%E9%AA%8C2 自定义 (我没看)https://..._vue2对el-select和el-input如何判断是否为空

QMessageBox 中的 OK 按钮改为中文“确定”_qmessagebox按钮ok显示中文-程序员宅基地

文章浏览阅读1.2w次。有很多资料用于将 QMessageBox 的 OK 改为中文。但大多很麻烦。本文提供一个简便方法,用于定制 QMessageBox 的按钮,包括将其翻译成中文显示。QMessageBox 对其内部的 Button 进行维护,用户可以使用 addButton() 方法,以及 removeButton() 方法添加或者移除按钮。每个 Button 都有个角色属性(enum QMessag_qmessagebox按钮ok显示中文

linux读取ntfs硬盘很慢,在Ubuntu上NTFS写入速度确实很慢(<15MB/s)-程序员宅基地

文章浏览阅读5.4k次。问题描述当复制大文件或使用dd测试写入速度时,使用NTFS文件系统在驱动器上可以得到的最大写入速度约为12-15MB /s。我测试了多个驱动器(全部使用SATA连接),这些驱动器在Windows上或使用ext4格式化时的写入速度均为100MB /s +,因此这不是对齐或驱动器问题。top显示mount.ntfs进程的CPU使用率很高。AMD双核处理器(2.2 GHz)内核版本:3.5.0-23-g..._ubuntu ntfs 速度慢

点云分割--RANSAC圆柱体分割_ransac的圆柱分割算法-程序员宅基地

文章浏览阅读1.3k次。1.版本要求版本: >PCL1.32.简介有时我们想分割出点云中的圆柱体,比如汽车的轮子。ransac圆柱体分割是分割点云中圆柱体的方法之一,但目前通过本人实验发现ransac分割圆柱体并不可靠,算法鲁棒性很低,不像平面分割那样稳定可靠,因此建议大家在项目中不要使用此算法,如果实在想用,还请多测试检查算法可靠性。3.数据本例中使用的点云数据(test.pcd)请见百度网盘分享。链接:https://pan.baidu.com/s/1-8jo148CSqBXO53hCrTCJw提取码:kw_ransac的圆柱分割算法

推荐文章

热门文章

相关标签